- Hands-On Q-Learning with Python
- Nazia Habib
- 154字
- 2021-06-24 15:13:17
Decaying gamma
Decaying gamma will have the agent prioritize short-term rewards as it learns what those rewards are, and puts less emphasis on long-term rewards.
Remember that a gamma value of 0 will cause an agent to totally disregard future values and focus only on current rewards, and that a gamma value of 1 will cause it to prioritize future values in the same way as current ones. Decaying gamma will, therefore, increase its focus onto current rewards and away from future rewards.
Intuitively, this benefits us, because the closer we get to our goal, the more we want to take advantage of these short-term rewards instead of holding out for future rewards that won't be available after we complete the task. We can reach our goal faster and more efficiently by changing the use of the resources that we have available to us as the availability of those resources changes.
- Practical Ansible 2
- Dreamweaver CS3網(wǎng)頁(yè)設(shè)計(jì)與網(wǎng)站建設(shè)詳解
- Apache Hive Essentials
- 大數(shù)據(jù)時(shí)代的數(shù)據(jù)挖掘
- 物聯(lián)網(wǎng)與云計(jì)算
- 80x86/Pentium微型計(jì)算機(jī)原理及應(yīng)用
- Hybrid Cloud for Architects
- 多媒體制作與應(yīng)用
- AI的25種可能
- 重估:人工智能與賦能社會(huì)
- Effective Business Intelligence with QuickSight
- Eclipse RCP應(yīng)用系統(tǒng)開(kāi)發(fā)方法與實(shí)戰(zhàn)
- 網(wǎng)絡(luò)安全原理與應(yīng)用
- 實(shí)戰(zhàn)大數(shù)據(jù)(Hadoop+Spark+Flink):從平臺(tái)構(gòu)建到交互式數(shù)據(jù)分析(離線/實(shí)時(shí))
- 歐姆龍CP1系列PLC原理與應(yīng)用