官术网_书友最值得收藏!

Rewards

In RL literature, rewards at a time instant t are typically denoted as Rt. Thus, the total rewards earned in an episode is given by R = r1+ r2 + ... + rt, where t is the length of the episode (which can be finite or infinite).

The concept of discounting is used in RL, where a parameter called the discount factor is used, typically represented by γ and 0 ≤ γ ≤ 1 and a power of it multiplies rt. γ = 0, making the agent myopic, and aiming only for the immediate rewards. γ = 1 makes the agent far-sighted to the point that it procrastinates the accomplishment of the final goal. Thus, a value of γ in the 0-1 range (0 and 1 exclusive) is used to ensure that the agent is neither too myopic nor too far-sighted. γ ensures that the agent prioritizes its actions to maximize the total discounted rewards, Rt, from time instant t, which is given by the following: 

Since 0 ≤ γ ≤ 1, the rewards into the distant future are valued much less than the rewards that the agent can earn in the immediate future. This helps the agent to not waste time and to prioritize its actions. In practice, γ = 0.9-0.99 is typically used in most RL problems.

主站蜘蛛池模板: 昭苏县| 冀州市| 佛冈县| 永定县| 扬中市| 金川县| 惠州市| 滦南县| 资中县| 抚远县| 贵南县| 紫金县| 和林格尔县| 洞头县| 永宁县| 阿巴嘎旗| 祥云县| 盘山县| 紫金县| 壤塘县| 乌什县| 晋中市| 竹北市| 金坛市| 榕江县| 麻栗坡县| 若羌县| 外汇| 吉安市| 泽普县| 牟定县| 葵青区| 师宗县| 远安县| 玉环县| 青河县| 西和县| 辽阳市| 锡林浩特市| 宁国市| 阜南县|