官术网_书友最值得收藏!

The Bellman equation and optimality

The Bellman equation, named after Richard Bellman, American mathematician, helps us to solve MDP. It is omnipresent in RL. When we say solve the MDP, it actually means finding the optimal policies and value functions. There can be many different value functions according to different policies. The optimal value function  is the one which yields maximum value compared to all the other value functions:

 

Similarly, the optimal policy is the one which results in an optimal value function.

Since the optimal value function  is the one that has a higher value compared to all other value functions (that is, maximum return), it will be the maximum of the Q function. So, the optimal value function can easily be computed by taking the maximum of the Q function as follows:

  -- (3)

The Bellman equation for the value function can be represented as, (we will see how we derived this equation in the next topic):

It indicates the recursive relation between a value of a state and its successor state and the average over all possibilities.

Similarly, the Bellman equation for the Q function can be represented as follows:

 --- (4)

Substituting equation (4) in (3), we get:

The preceding equation is called a Bellman optimality equation. In the upcoming sections, we will see how to find optimal policies by solving this equation.         

主站蜘蛛池模板: 尚义县| 凌云县| 方山县| 夏河县| 娄底市| 萍乡市| 澜沧| 松滋市| 旌德县| 桃江县| 霍林郭勒市| 乌拉特前旗| 上蔡县| 买车| 新昌县| 新宁县| 定西市| 大厂| 太仆寺旗| 嘉兴市| 乌兰浩特市| 万年县| 天门市| 偃师市| 枣阳市| 长垣县| 东港市| 苗栗市| 莒南县| 安徽省| 曲阳县| 五峰| 乌恰县| 丰原市| 延安市| 谷城县| 博客| 日喀则市| 历史| 辉南县| 新平|