書名： Hands-On Q-Learning with Python
作者名： Nazia Habib
本章字?jǐn)?shù)： 150字
更新時(shí)間： 2021-06-24 15:13:16

Decaying epsilon

We've discussed epsilon decay in the context of exploration versus exploitation. The more we get to know our environment, the less random exploration we want to do and the more actions we want to take that we know will give us high rewards. Our goal should always be to take advantage of what we already know.

We do this by reducing the agent's epsilon value by a particular amount as the game progresses. Remember that epsilon is the likelihood (in percentage) that the agent will take a random action, instead of taking the current highest Q-valued action for the current state.

When we reduce epsilon, the likelihood of a random action becomes smaller, and we take more opportunities to benefit from the high-valued actions that we have already discovered.

For similar reasons, it can be to our benefit to decay alpha and gamma along with epsilon.

官术网_书友最值得收藏!

Hands-On Q-Learning with Python

Decaying epsilon