- Python Reinforcement Learning
- Sudharsan Ravichandiran Sean Saito Rajalingappaa Shanmugamani Yang Wenzhuo
- 260字
- 2021-06-24 15:17:32
State-action value function (Q function)
A state-action value function is also called the Q function. It specifies how good it is for an agent to perform a particular action in a state with a policy π. The Q function is denoted by Q(s, a). It denotes the value of taking an action in a state following a policy π.
We can define Q function as follows:

This specifies the expected return starting from state s with the action a according to policy π. We can substitute the value of Rt in the Q function from (2) as follows:

The difference between the value function and the Q function is that the value function specifies the goodness of a state, while a Q function specifies the goodness of an action in a state.
Like state value functions, Q functions can be viewed in a table. It is also called a Q table. Let us say we have two states and two actions; our Q table looks like the following:

Thus, the Q table shows the value of all possible state action pairs. So, by looking at this table, we can come to the conclusion that performing action 1 in state 1 and action 2 in state 2 is the better option as it has high value.
Whenever we say value function V(S) or Q function Q( S, a), it actually means the value table and Q table, as shown previously.
- 同步:秩序如何從混沌中涌現(xiàn)
- 數(shù)據(jù)庫基礎(chǔ)與應(yīng)用:Access 2010
- Python數(shù)據(jù)分析入門:從數(shù)據(jù)獲取到可視化
- 正則表達(dá)式必知必會(huì)
- SQL查詢:從入門到實(shí)踐(第4版)
- 文本挖掘:基于R語言的整潔工具
- 3D計(jì)算機(jī)視覺:原理、算法及應(yīng)用
- 數(shù)據(jù)驅(qū)動(dòng)設(shè)計(jì):A/B測(cè)試提升用戶體驗(yàn)
- 基于Apache CXF構(gòu)建SOA應(yīng)用
- 數(shù)字媒體交互設(shè)計(jì)(初級(jí)):Web產(chǎn)品交互設(shè)計(jì)方法與案例
- 數(shù)據(jù)庫應(yīng)用系統(tǒng)開發(fā)實(shí)例
- 數(shù)字IC設(shè)計(jì)入門(微課視頻版)
- 爬蟲實(shí)戰(zhàn):從數(shù)據(jù)到產(chǎn)品
- 菜鳥學(xué)SPSS數(shù)據(jù)分析
- Oracle 11g+ASP.NET數(shù)據(jù)庫系統(tǒng)開發(fā)案例教程