官术网_书友最值得收藏!

Gamma – current versus future rewards

Let's discuss the concept of current rewards versus future rewards. Your agent's discount rate gamma has a value between zero and one, and its function is to discount future rewards against immediate rewards.

Your agent is deciding what action to take based not only on the reward it expects to get for taking that action, but on the future rewards it might be able to get from the state it will be in after taking that action.

One easy way to illustrate discounting rewards is with the following example of a mouse in a maze collecting cheese as rewards and avoiding cats and traps (that is, electric shocks):

The rewards that are closest to the cats, even though their point values are higher (three versus one), should be discounted if we want to maximize how long the mouse agent lives and how much cheese it can collect. These rewards come with a higher risk of the mouse being killed, so we lower their value accordingly. In other words, collecting the closest cheese should be given a higher priority when the mouse decides what actions to take.

When we discount a future reward, we make it less valuable than an immediate reward (similar to how we take into account the time value of money when making a loan and treat a dollar received today as more valuable than a dollar received a year from now).

The value of gamma that we choose varies according to how highly we value future rewards:

  • If we choose a value of zero for gamma, the agent will not care about future rewards at all and will only take current rewards into account
  • Choosing a value of one for gamma will make the agent consider future rewards as highly as current rewards
主站蜘蛛池模板: 商洛市| 北宁市| 伊金霍洛旗| 长武县| 叙永县| 常山县| 汉源县| 哈尔滨市| 千阳县| 三门县| 墨竹工卡县| 云南省| 武乡县| 邵阳市| 宝坻区| 七台河市| 岢岚县| 安庆市| 长沙市| 旬邑县| 崇信县| 靖边县| 澄城县| 乡城县| 和田县| 新晃| 股票| 浑源县| 武乡县| 和田市| 广宁县| 隆林| 锦屏县| 金沙县| 景德镇市| 冕宁县| 启东市| 扎兰屯市| 阳曲县| 镇巴县| 凤台县|