官术网_书友最值得收藏!

Defining the Bellman equation

The Bellman equation, named after the great computer scientist and applied mathematician Richard E. Bellman, is an optimality condition associated with dynamic programming. It is widely used in RL to update the policy of an agent.

Let's define the following two quantities: 

The first quantity, Ps,s', is the transition probability from state s to the new state s'. The second quantity, Rs,s'is the expected reward the agent receives from state s, taking action a, and moving to the new state s'. Note that we have assumed the MDP property, that is, the transition to state at time t+1 only depends on the state and action at time t. Stated in these terms, the Bellman equation is a recursive relationship, and is given by the following equations respectively for the value function and action-value function:

Note that the Bellman equations represent the value function V at a state, and as functions of the value function at other states; similarly for the action-value function Q.

主站蜘蛛池模板: 蒙山县| 蒙阴县| 城市| 从化市| 邵阳县| 罗城| 响水县| 汶上县| 禹城市| 卫辉市| 凤庆县| 基隆市| 进贤县| 永川市| 海丰县| 天峨县| 亚东县| 衢州市| 永登县| 凤山市| 绍兴市| 凤山市| 郧西县| 资溪县| 枞阳县| 礼泉县| 阜南县| 托里县| 晋城| 凤山市| 榆中县| 宜黄县| 利辛县| 眉山市| 屏南县| 新营市| 扶余县| 刚察县| 周口市| 赤峰市| 红河县|