- Hands-On Q-Learning with Python
- Nazia Habib
- 91字
- 2021-06-24 15:13:13
Questions
- What is the difference between a reward and a value?
- What is a hyperparameter? Give an example of a hyperparameter other than the ones discussed in this chapter.
- Why will a Q-learning agent not choose the highest Q-valued action for its current state?
- Explain one benefit of a decaying gamma.
- Describe in one or two sentences the difference between the decision-making strategies of SARSA and Q-learning.
- What kind of policy does Q-learning implicitly assume the agent is following?
- Under what circumstances will SARSA and Q-learning produce the same results?
推薦閱讀
- 智能傳感器技術(shù)與應(yīng)用
- Go Machine Learning Projects
- 基于LabWindows/CVI的虛擬儀器設(shè)計(jì)與應(yīng)用
- 商戰(zhàn)數(shù)據(jù)挖掘:你需要了解的數(shù)據(jù)科學(xué)與分析思維
- 手把手教你學(xué)AutoCAD 2010
- 機(jī)器學(xué)習(xí)與大數(shù)據(jù)技術(shù)
- 計(jì)算機(jī)圖形圖像處理:Photoshop CS3
- Zabbix Network Monitoring(Second Edition)
- 控制系統(tǒng)計(jì)算機(jī)仿真
- 新編計(jì)算機(jī)組裝與維修
- 所羅門的密碼
- 網(wǎng)絡(luò)服務(wù)器搭建與管理
- 網(wǎng)絡(luò)脆弱性掃描產(chǎn)品原理及應(yīng)用
- Python文本分析
- 傳感器原理與工程應(yīng)用