官术网_书友最值得收藏!

Defining the Bellman equation

The Bellman equation, named after the great computer scientist and applied mathematician Richard E. Bellman, is an optimality condition associated with dynamic programming. It is widely used in RL to update the policy of an agent.

Let's define the following two quantities: 

The first quantity, Ps,s', is the transition probability from state s to the new state s'. The second quantity, Rs,s'is the expected reward the agent receives from state s, taking action a, and moving to the new state s'. Note that we have assumed the MDP property, that is, the transition to state at time t+1 only depends on the state and action at time t. Stated in these terms, the Bellman equation is a recursive relationship, and is given by the following equations respectively for the value function and action-value function:

Note that the Bellman equations represent the value function V at a state, and as functions of the value function at other states; similarly for the action-value function Q.

主站蜘蛛池模板: 临漳县| 左贡县| 齐齐哈尔市| 禹城市| 莱西市| 绥宁县| 科尔| 台中县| 阿克苏市| 会东县| 海南省| 抚州市| 淮南市| 大名县| 泌阳县| 南城县| 屏东县| 南部县| 双柏县| 临沧市| 平远县| 嫩江县| 兴海县| 利辛县| 康马县| 浮山县| 云南省| 山阴县| 隆回县| 儋州市| 璧山县| 乌鲁木齐市| 安仁县| 利辛县| 蚌埠市| 巨鹿县| 包头市| 阳曲县| 淮南市| 方正县| 二连浩特市|