官术网_书友最值得收藏!

Learning SARSA 

SARSA is another on-policy algorithm that was very popular, particularly in the 1990s. It is an extension of TD-learning, which we saw previously, and is an on-policy algorithm. SARSA keeps an update of the state-action value function, and as new experiences are encountered, this state-action value function is updated using the Bellman equation of dynamic programming. We extend the preceding TD algorithm to state-action value function, Q(st,at), and this approach is called SARSA: 

Here, from a given state st, we take action at, receive a reward rt+1, transition to a new state st+1, and thereafter take an action at+1 that then continues on and on. This quintuple (st, at, rt+1, st+1, at+1) gives the algorithm the name SARSA. SARSA is an on-policy algorithm, as the same policy is updated as was used to estimate Q. For exploration, you can use, say, ε-greedy. 

主站蜘蛛池模板: 万宁市| 德兴市| 亚东县| 毕节市| 尚志市| 德昌县| 阳朔县| 五华县| 巧家县| 甘泉县| 内丘县| 吕梁市| 长垣县| 岑巩县| 民乐县| 霍州市| 大同县| 石河子市| 包头市| 泽州县| 屯留县| 连平县| 马山县| 徐汇区| 额尔古纳市| 灵石县| 乌恰县| 安多县| 茂名市| 讷河市| 扶绥县| 漳州市| 营口市| 古浪县| 鞍山市| 大关县| 武定县| 遂昌县| 南投市| 东至县| 长武县|