- Python Reinforcement Learning
- Sudharsan Ravichandiran Sean Saito Rajalingappaa Shanmugamani Yang Wenzhuo
- 338字
- 2021-06-24 15:17:31
Discount factor
We have seen that an agent goal is to maximize the return. For an episodic task, we can define our return as Rt= rt+1 + rt+2 + ..... +rT, where T is the final state of the episode, and we try to maximize the return Rt.
Since we don't have any final state for a continuous task, we can define our return for continuous tasks as Rt= rt+1 + rt+2+....,which sums up to infinity. But how can we maximize the return if it never stops?
That's why we introduce the notion of a discount factor. We can redefine our return with a discount factor , as follows:


The discount factor decides how much importance we give to the future rewards and immediate rewards. The value of the discount factor lies within 0 to 1. A discount factor of 0 means that immediate rewards are more important, while a discount factor of 1 would mean that future rewards are more important than immediate rewards.
A discount factor of 0 will never learn considering only the immediate rewards; similarly, a discount factor of 1 will learn forever looking for the future reward, which may lead to infinity. So the optimal value of the discount factor lies between 0.2 to 0.8.
We give importance to immediate rewards and future rewards depending on the use case. In some cases, future rewards are more desirable than immediate rewards and vice versa. In a chess game, the goal is to defeat the opponent's king. If we give importance to the immediate reward, which is acquired by actions like our pawn defeating any opponent player and so on, the agent will learn to perform this sub-goal instead of learning to reach the actual goal. So, in this case, we give importance to future rewards, whereas in some cases, we prefer immediate rewards over future rewards. (Say, would you prefer chocolates if I gave you them today or 13 months later?)
- ETL數據整合與處理(Kettle)
- Mastering Ninject for Dependency Injection
- Access 2016數據庫技術及應用
- 數據要素五論:信息、權屬、價值、安全、交易
- Microsoft Power BI數據可視化與數據分析
- Hadoop 3.x大數據開發實戰
- Python金融實戰
- Oracle PL/SQL實例精解(原書第5版)
- PostgreSQL指南:內幕探索
- 區塊鏈技術應用與實踐案例
- 大數據技術原理與應用:概念、存儲、處理、分析與應用
- Unity for Architectural Visualization
- 精通Neo4j
- Scratch Cookbook
- Kafka權威指南(第2版)