官术网_书友最值得收藏!

Gamma – current versus future rewards

Let's discuss the concept of current rewards versus future rewards. Your agent's discount rate gamma has a value between zero and one, and its function is to discount future rewards against immediate rewards.

Your agent is deciding what action to take based not only on the reward it expects to get for taking that action, but on the future rewards it might be able to get from the state it will be in after taking that action.

One easy way to illustrate discounting rewards is with the following example of a mouse in a maze collecting cheese as rewards and avoiding cats and traps (that is, electric shocks):

The rewards that are closest to the cats, even though their point values are higher (three versus one), should be discounted if we want to maximize how long the mouse agent lives and how much cheese it can collect. These rewards come with a higher risk of the mouse being killed, so we lower their value accordingly. In other words, collecting the closest cheese should be given a higher priority when the mouse decides what actions to take.

When we discount a future reward, we make it less valuable than an immediate reward (similar to how we take into account the time value of money when making a loan and treat a dollar received today as more valuable than a dollar received a year from now).

The value of gamma that we choose varies according to how highly we value future rewards:

  • If we choose a value of zero for gamma, the agent will not care about future rewards at all and will only take current rewards into account
  • Choosing a value of one for gamma will make the agent consider future rewards as highly as current rewards
主站蜘蛛池模板: 涞水县| 钟祥市| 策勒县| 台中县| 双峰县| 宣城市| 西乡县| 肥乡县| 金山区| 台前县| 赤水市| 驻马店市| 车险| 永新县| 广平县| 泰州市| 通州市| 海安县| 绩溪县| 耒阳市| 甘洛县| 营山县| 张掖市| 和田县| 民勤县| 沈阳市| 静乐县| 土默特右旗| 嘉禾县| 新昌县| 二手房| 崇文区| 金沙县| 柘荣县| 霍山县| 玛纳斯县| 神木县| 高淳县| 红河县| 聊城市| 鄱阳县|