官术网_书友最值得收藏!

Decaying alpha 

In a totally deterministic environment, we will want to keep alpha at 1 at all times, since we already know that alpha = 1 will cause the agent to learn the best policy for that environment. But, in a stochastic environment, including most of the environments that we will be working in when we build Q-learning models, decaying alpha based on what we have already learned can allow our algorithm to converge faster. 

In practice, for a problem such as this, we are unlikely to decay alpha in the course of running an environment, as the noticeable benefits will be negligible. We will see this in action when we begin choosing values for the hyperparameters.

For the taxi problem, we are likely to start with an alpha such as 0.1 and progressively compare it to higher values. We could also run a programmatic method, such as a cross-validated grid search, to identify the optimal hyperparameter values that allow the algorithm to converge fastest.

主站蜘蛛池模板: 武威市| 东方市| 阜阳市| 广州市| 阳曲县| 黎川县| 永宁县| 高邑县| 遂川县| 德昌县| 五华县| 都匀市| 温泉县| 甘谷县| 汶川县| 忻城县| 扶沟县| 都江堰市| 万州区| 长垣县| 枞阳县| 格尔木市| 孝感市| 镇江市| 景宁| 罗定市| 寻乌县| 宁强县| 阳新县| 安达市| 东乌珠穆沁旗| 清苑县| 乐亭县| 江永县| 阳原县| 太康县| 手游| 乡城县| 清苑县| 博野县| 井陉县|