- Hands-On Q-Learning with Python
- Nazia Habib
- 185字
- 2021-06-24 15:13:11
Value-based versus policy-based iteration
We'll be using value-based iteration for the projects in this book. The description of the Bellman equation given previously offers a very high-level understanding of how value-based iteration works. The main difference is that in value-based iteration, the agent learns the expected reward value of each state-action pair, and in policy-based iteration, the agent learns the function that maps states to actions.
One simple way to describe this difference is that a value-based agent, when it has mastered its environment, will not explicitly be able to simulate that environment. It will not be able to give an actual function that maps states to actions. A policy-based agent, on the other hand, will be able to give that function.
Note that Q-learning and SARSA are both value-based algorithms. Because we are working with Q-learning in this book, we will not study policy-based iteration in detail here. The main thing to bear in mind about policy-based iteration is that it gives us the ability to learn stochastic policies and it is more useful for working with continuous action spaces.
- Google Cloud Platform Cookbook
- 精通MATLAB圖像處理
- R Machine Learning By Example
- Blockchain Quick Start Guide
- Java開發技術全程指南
- 微型計算機控制技術
- Visual Basic從初學到精通
- 運動控制器與交流伺服系統的調試和應用
- 信息物理系統(CPS)測試與評價技術
- 大數據驅動的設備健康預測及維護決策優化
- Working with Linux:Quick Hacks for the Command Line
- 工業機器人實操進階手冊
- 電動汽車驅動與控制技術
- 工業機器人操作
- Oracle 11g Anti-hacker's Cookbook