官术网_书友最值得收藏!

State-action value function (Q function)

 A state-action value function is also called the Q function. It specifies how good it is for an agent to perform a particular action in a state with a policy π. The Q function is denoted by Q(s, a). It denotes the value of taking an action in a state following a policy π

We can define Q function as follows:

This specifies the expected return starting from state s with the action a according to policy π. We can substitute the value of Rt in the Q function from (2) as follows:

  

The difference between the value function and the Q function is that the value function specifies the goodness of a state, while a Q function specifies the goodness of an action in a state.

Like state value functions, Q functions can be viewed in a table. It is also called a Q table. Let us say we have two states and two actions; our Q table looks like the following:

 

Thus, the Q table shows the value of all possible state action pairs. So, by looking at this table, we can come to the conclusion that performing action 1 in state 1 and action 2 in state 2 is the better option as it has high value.

Whenever we say value function V(S) or Q function Q( S, a), it actually means the value table and Q table, as shown previously.

主站蜘蛛池模板: 天气| 和平县| 乌恰县| 三台县| 沂南县| 扎鲁特旗| 巴林左旗| 吴川市| 云龙县| 宁德市| 新巴尔虎左旗| 大洼县| 霍城县| 竹山县| 偏关县| 桂平市| 运城市| 华亭县| 屯昌县| 乐亭县| 上栗县| 宜章县| 彩票| 邮箱| 南平市| 泸水县| 潞西市| 喜德县| 通江县| 榆林市| 康乐县| 曲水县| 贵阳市| 广水市| 拜城县| 昭通市| 玉林市| 延吉市| 卢湾区| 襄汾县| 吴旗县|