官术网_书友最值得收藏!

Value-based versus policy-based iteration

We'll be using value-based iteration for the projects in this book. The description of the Bellman equation given previously offers a very high-level understanding of how value-based iteration works. The main difference is that in value-based iteration, the agent learns the expected reward value of each state-action pair, and in policy-based iteration, the agent learns the function that maps states to actions.

One simple way to describe this difference is that a value-based agent, when it has mastered its environment, will not explicitly be able to simulate that environment. It will not be able to give an actual function that maps states to actions. A policy-based agent, on the other hand, will be able to give that function. 

Note that Q-learning and SARSA are both value-based algorithms. Because we are working with Q-learning in this book, we will not study policy-based iteration in detail here. The main thing to bear in mind about policy-based iteration is that it gives us the ability to learn stochastic policies and it is more useful for working with continuous action spaces.

主站蜘蛛池模板: 府谷县| 英超| 泰宁县| 将乐县| 清涧县| 曲阳县| 丰顺县| 凯里市| 台南市| 蓬莱市| 罗甸县| 东宁县| 株洲市| 华宁县| 同仁县| 镇巴县| 德安县| 九龙坡区| 眉山市| 焉耆| 元朗区| 玉环县| 彰武县| 武川县| 武乡县| 阿克| 汉阴县| 安宁市| 泗水县| 白山市| 藁城市| 健康| 民丰县| 龙泉市| 沙田区| 民权县| 和静县| 仙桃市| 双城市| 昂仁县| 汾西县|