官术网_书友最值得收藏!

Value-based versus policy-based iteration

We'll be using value-based iteration for the projects in this book. The description of the Bellman equation given previously offers a very high-level understanding of how value-based iteration works. The main difference is that in value-based iteration, the agent learns the expected reward value of each state-action pair, and in policy-based iteration, the agent learns the function that maps states to actions.

One simple way to describe this difference is that a value-based agent, when it has mastered its environment, will not explicitly be able to simulate that environment. It will not be able to give an actual function that maps states to actions. A policy-based agent, on the other hand, will be able to give that function. 

Note that Q-learning and SARSA are both value-based algorithms. Because we are working with Q-learning in this book, we will not study policy-based iteration in detail here. The main thing to bear in mind about policy-based iteration is that it gives us the ability to learn stochastic policies and it is more useful for working with continuous action spaces.

主站蜘蛛池模板: 二手房| 凤翔县| 石屏县| 丹阳市| 慈溪市| 高邮市| 乌拉特中旗| 宁城县| 汽车| 景泰县| 昌宁县| 潼南县| 台州市| 镇原县| 南岸区| 浙江省| 鹤庆县| 贡嘎县| 嵊州市| 东城区| 通城县| 三门县| 博兴县| 盐源县| 彩票| 许昌县| 汉源县| 文山县| 霍州市| 浦城县| 德格县| 乐都县| 鲁山县| 南漳县| 广宁县| 通海县| 子长县| 英吉沙县| 四子王旗| 成武县| 长宁区|