官术网_书友最值得收藏!

The Bellman equation and optimality

The Bellman equation, named after Richard Bellman, American mathematician, helps us to solve MDP. It is omnipresent in RL. When we say solve the MDP, it actually means finding the optimal policies and value functions. There can be many different value functions according to different policies. The optimal value function  is the one which yields maximum value compared to all the other value functions:

 

Similarly, the optimal policy is the one which results in an optimal value function.

Since the optimal value function  is the one that has a higher value compared to all other value functions (that is, maximum return), it will be the maximum of the Q function. So, the optimal value function can easily be computed by taking the maximum of the Q function as follows:

  -- (3)

The Bellman equation for the value function can be represented as, (we will see how we derived this equation in the next topic):

It indicates the recursive relation between a value of a state and its successor state and the average over all possibilities.

Similarly, the Bellman equation for the Q function can be represented as follows:

 --- (4)

Substituting equation (4) in (3), we get:

The preceding equation is called a Bellman optimality equation. In the upcoming sections, we will see how to find optimal policies by solving this equation.         

主站蜘蛛池模板: 铁岭县| 大丰市| 牡丹江市| 明星| 安丘市| 准格尔旗| 曲靖市| 吉林市| 罗定市| 康乐县| 达日县| 纳雍县| 湘潭市| 潜江市| 长沙县| 郁南县| 高邮市| 太原市| 定州市| 比如县| 惠州市| 进贤县| 利辛县| 秦皇岛市| 蒙自县| 随州市| 大洼县| 化德县| 麻江县| 邓州市| 赤峰市| 南充市| 门头沟区| 万安县| 北辰区| 双鸭山市| 夏津县| 临洮县| 故城县| 肥东县| 云霄县|