書名： Python Reinforcement Learning
作者名： Sudharsan Ravichandiran Sean Saito Rajalingappaa Shanmugamani Yang Wenzhuo
本章字數： 214字
更新時間： 2021-06-24 15:17:32

State value function

A state value function is also called simply a value function. It specifies how good it is for an agent to be in a particular state with a policy π. A value function is often denoted by V(s). It denotes the value of a state following a policy.

We can define a state value function as follows:

This specifies the expected return starting from state s according to policy π. We can substitute the value of R_t in the value function from (2) as follows:

Note that the state value function depends on the policy and it varies depending on the policy we choose.

We can view value functions in a table. Let us say we have two states and both of these states follow the policy π. Based on the value of these two states, we can tell how good it is for our agent to be in that state following a policy. The greater the value, the better the state is:

Based on the preceding table, we can tell that it is good to be in state 2, as it has high value. We will see how to estimate these values intuitively in the upcoming sections.

官术网_书友最值得收藏!

Python Reinforcement Learning

State value function