- Hands-On Q-Learning with Python
- Nazia Habib
- 197字
- 2021-06-24 15:13:16
Fine-tuning your model – learning, discount, and exploration rates
Recall our discussion of the three major hyperparameters of a Q-learning model:
- Alpha: The learning rate
- Gamma: The discount rate
- Epsilon: The exploration rate
What values should we choose for these hyperparameters to optimize the performance of our taxi agent? We will discover these values through experimentation once we have constructed our game environment, and we can also take advantage of existing research on the taxi problem and set the variables to known optimal values.
A large part of our model-tuning and optimization phase will consist of comparing the performance of different combinations of these three hyperparamenters together.
One option that we have is the ability to decay any, or all, of these hyperparameters – in other words, to reduce their values as we progress through a game loop or conduct repeated trials. In practice, we will almost always decay epsilon, since we want our agent to adapt to the knowledge it has of its environment and explore less as it becomes better aware of the highest-valued actions to take. But it can sometimes be to our benefit to decay the other hyperparameters as well.