官术网_书友最值得收藏!

State value function

A state value function is also called simply a value function. It specifies how good it is for an agent to be in a particular state with a policy π. A value function is often denoted by V(s). It denotes the value of a state following a policy.

We can define a state value function as follows:

This specifies the expected return starting from state s according to policy π. We can substitute the value of Rt in the value function from (2) as follows:

  

Note that the state value function depends on the policy and it varies depending on the policy we choose.

We can view value functions in a table. Let us say we have two states and both of these states follow the policy π. Based on the value of these two states, we can tell how good it is for our agent to be in that state following a policy. The greater the value, the better the state is:

 

Based on the preceding table, we can tell that it is good to be in state 2, as it has high value. We will see how to estimate these values intuitively in the upcoming sections. 

主站蜘蛛池模板: 遂溪县| 咸宁市| 盐津县| 城步| 井陉县| 桐庐县| 余江县| 永吉县| 定兴县| 天峨县| 平原县| 正蓝旗| 且末县| 北海市| 苍南县| 云霄县| 洱源县| 汝州市| 泽普县| 长汀县| 大渡口区| 嫩江县| 金乡县| 中方县| 平南县| 广东省| 凉山| 阳原县| 肥西县| 泰宁县| 荥阳市| 惠州市| 太和县| 景泰县| 镇原县| 揭西县| 丰顺县| 高密市| 响水县| 翁牛特旗| 榆中县|