官术网_书友最值得收藏!

Value-based versus policy-based iteration

We'll be using value-based iteration for the projects in this book. The description of the Bellman equation given previously offers a very high-level understanding of how value-based iteration works. The main difference is that in value-based iteration, the agent learns the expected reward value of each state-action pair, and in policy-based iteration, the agent learns the function that maps states to actions.

One simple way to describe this difference is that a value-based agent, when it has mastered its environment, will not explicitly be able to simulate that environment. It will not be able to give an actual function that maps states to actions. A policy-based agent, on the other hand, will be able to give that function. 

Note that Q-learning and SARSA are both value-based algorithms. Because we are working with Q-learning in this book, we will not study policy-based iteration in detail here. The main thing to bear in mind about policy-based iteration is that it gives us the ability to learn stochastic policies and it is more useful for working with continuous action spaces.

主站蜘蛛池模板: 赤壁市| 宜君县| 新宁县| 仲巴县| 织金县| 德昌县| 龙川县| 东明县| 永泰县| 宁陵县| 抚宁县| 平舆县| 东光县| 广元市| 沭阳县| 蓝山县| 大同县| 桑植县| 古浪县| 封丘县| 石屏县| 惠水县| 延川县| 独山县| 柳林县| 景德镇市| 田林县| 古浪县| 仁布县| 芒康县| 宜宾县| 平度市| 嵊州市| 基隆市| 剑河县| 镇远县| 巢湖市| 永定县| 石家庄市| 于都县| 海林市|