官术网_书友最值得收藏!

Decaying alpha 

In a totally deterministic environment, we will want to keep alpha at 1 at all times, since we already know that alpha = 1 will cause the agent to learn the best policy for that environment. But, in a stochastic environment, including most of the environments that we will be working in when we build Q-learning models, decaying alpha based on what we have already learned can allow our algorithm to converge faster. 

In practice, for a problem such as this, we are unlikely to decay alpha in the course of running an environment, as the noticeable benefits will be negligible. We will see this in action when we begin choosing values for the hyperparameters.

For the taxi problem, we are likely to start with an alpha such as 0.1 and progressively compare it to higher values. We could also run a programmatic method, such as a cross-validated grid search, to identify the optimal hyperparameter values that allow the algorithm to converge fastest.

主站蜘蛛池模板: 河源市| 乐亭县| 永清县| 武邑县| 贵州省| 陵川县| 安宁市| 印江| 香河县| 遂昌县| 重庆市| 陈巴尔虎旗| 体育| 水富县| 五常市| 图木舒克市| 南汇区| 安徽省| 寻乌县| 丹棱县| 博白县| 马鞍山市| 马鞍山市| 罗源县| 法库县| 遂川县| 建湖县| 清徐县| 台东县| 门源| 会昌县| 浑源县| 常德市| 赣榆县| 西贡区| 浦县| 芜湖县| 长顺县| 公安县| 多伦县| 龙陵县|