- Hands-On Q-Learning with Python
- Nazia Habib
- 167字
- 2021-06-24 15:13:17
Decaying alpha
In a totally deterministic environment, we will want to keep alpha at 1 at all times, since we already know that alpha = 1 will cause the agent to learn the best policy for that environment. But, in a stochastic environment, including most of the environments that we will be working in when we build Q-learning models, decaying alpha based on what we have already learned can allow our algorithm to converge faster.
In practice, for a problem such as this, we are unlikely to decay alpha in the course of running an environment, as the noticeable benefits will be negligible. We will see this in action when we begin choosing values for the hyperparameters.
For the taxi problem, we are likely to start with an alpha such as 0.1 and progressively compare it to higher values. We could also run a programmatic method, such as a cross-validated grid search, to identify the optimal hyperparameter values that allow the algorithm to converge fastest.