官术网_书友最值得收藏!

Learning SARSA 

SARSA is another on-policy algorithm that was very popular, particularly in the 1990s. It is an extension of TD-learning, which we saw previously, and is an on-policy algorithm. SARSA keeps an update of the state-action value function, and as new experiences are encountered, this state-action value function is updated using the Bellman equation of dynamic programming. We extend the preceding TD algorithm to state-action value function, Q(st,at), and this approach is called SARSA: 

Here, from a given state st, we take action at, receive a reward rt+1, transition to a new state st+1, and thereafter take an action at+1 that then continues on and on. This quintuple (st, at, rt+1, st+1, at+1) gives the algorithm the name SARSA. SARSA is an on-policy algorithm, as the same policy is updated as was used to estimate Q. For exploration, you can use, say, ε-greedy. 

主站蜘蛛池模板: 会东县| 蒙自县| 乌审旗| 汉沽区| 客服| 新河县| 和硕县| 炉霍县| 确山县| 江源县| 河曲县| 大兴区| 阿合奇县| 宜黄县| 泽库县| 冕宁县| 丰镇市| 利川市| 二手房| 建始县| 文山县| 柏乡县| 宜春市| 宕昌县| 镶黄旗| 通山县| 黄龙县| 新沂市| 邢台县| 洛宁县| 合作市| 奎屯市| 平安县| 漾濞| 义乌市| 津市市| 通海县| 广灵县| 海淀区| 小金县| 恩施市|