官术网_书友最值得收藏!

Value-based versus policy-based iteration

We'll be using value-based iteration for the projects in this book. The description of the Bellman equation given previously offers a very high-level understanding of how value-based iteration works. The main difference is that in value-based iteration, the agent learns the expected reward value of each state-action pair, and in policy-based iteration, the agent learns the function that maps states to actions.

One simple way to describe this difference is that a value-based agent, when it has mastered its environment, will not explicitly be able to simulate that environment. It will not be able to give an actual function that maps states to actions. A policy-based agent, on the other hand, will be able to give that function. 

Note that Q-learning and SARSA are both value-based algorithms. Because we are working with Q-learning in this book, we will not study policy-based iteration in detail here. The main thing to bear in mind about policy-based iteration is that it gives us the ability to learn stochastic policies and it is more useful for working with continuous action spaces.

主站蜘蛛池模板: 高淳县| 崇明县| 晋中市| 威远县| 通化县| 延川县| 进贤县| 宜都市| 和平区| 宜兰县| 东安县| 敖汉旗| 临清市| 商洛市| 平邑县| 津市市| 南开区| 公安县| 丘北县| 阿拉善右旗| 和顺县| 新泰市| 平阳县| 陆丰市| 南充市| 云霄县| 扶沟县| 沐川县| 通榆县| 山丹县| 黔西县| 镇原县| 枣阳市| 南川市| 东宁县| 曲水县| 隆子县| 香港 | 永嘉县| 长岭县| 冷水江市|