- Hands-On Q-Learning with Python
- Nazia Habib
- 167字
- 2021-06-24 15:13:17
Decaying alpha
In a totally deterministic environment, we will want to keep alpha at 1 at all times, since we already know that alpha = 1 will cause the agent to learn the best policy for that environment. But, in a stochastic environment, including most of the environments that we will be working in when we build Q-learning models, decaying alpha based on what we have already learned can allow our algorithm to converge faster.
In practice, for a problem such as this, we are unlikely to decay alpha in the course of running an environment, as the noticeable benefits will be negligible. We will see this in action when we begin choosing values for the hyperparameters.
For the taxi problem, we are likely to start with an alpha such as 0.1 and progressively compare it to higher values. We could also run a programmatic method, such as a cross-validated grid search, to identify the optimal hyperparameter values that allow the algorithm to converge fastest.
- Dreamweaver CS3+Flash CS3+Fireworks CS3創(chuàng)意網(wǎng)站構(gòu)建實例詳解
- AWS:Security Best Practices on AWS
- 網(wǎng)頁編程技術(shù)
- 數(shù)據(jù)中心建設(shè)與管理指南
- 計算機應(yīng)用復(fù)習(xí)與練習(xí)
- 嵌入式Linux上的C語言編程實踐
- Mastering Exploratory Analysis with pandas
- PowerMill 2020五軸數(shù)控加工編程應(yīng)用實例
- HBase Essentials
- Windows 7故障與技巧200例
- 樂高創(chuàng)意機器人教程(中級 上冊 10~16歲) (青少年iCAN+創(chuàng)新創(chuàng)意實踐指導(dǎo)叢書)
- R Statistics Cookbook
- 單片機硬件接口電路及實例解析
- 暗戰(zhàn)強人:黑客及反黑客工具快速精通
- 仿蛛機器人的設(shè)計與制作