- Hands-On Q-Learning with Python
- Nazia Habib
- 91字
- 2021-06-24 15:13:13
Questions
- What is the difference between a reward and a value?
- What is a hyperparameter? Give an example of a hyperparameter other than the ones discussed in this chapter.
- Why will a Q-learning agent not choose the highest Q-valued action for its current state?
- Explain one benefit of a decaying gamma.
- Describe in one or two sentences the difference between the decision-making strategies of SARSA and Q-learning.
- What kind of policy does Q-learning implicitly assume the agent is following?
- Under what circumstances will SARSA and Q-learning produce the same results?
推薦閱讀
- 大學計算機基礎:基礎理論篇
- 我的J2EE成功之路
- 實時流計算系統設計與實現
- Getting Started with Oracle SOA B2B Integration:A Hands-On Tutorial
- Hands-On Cloud Solutions with Azure
- STM32嵌入式微控制器快速上手
- 網絡組建與互聯
- 統計學習理論與方法:R語言版
- 我也能做CTO之程序員職業規劃
- 突破,Objective-C開發速學手冊
- 計算智能算法及其生產調度應用
- 手把手教你學Photoshop CS3
- EJB JPA數據庫持久層開發實踐詳解
- Learning iOS 8 for Enterprise
- Practical Network Automation