官术网_书友最值得收藏!

Learning SARSA 

SARSA is another on-policy algorithm that was very popular, particularly in the 1990s. It is an extension of TD-learning, which we saw previously, and is an on-policy algorithm. SARSA keeps an update of the state-action value function, and as new experiences are encountered, this state-action value function is updated using the Bellman equation of dynamic programming. We extend the preceding TD algorithm to state-action value function, Q(st,at), and this approach is called SARSA: 

Here, from a given state st, we take action at, receive a reward rt+1, transition to a new state st+1, and thereafter take an action at+1 that then continues on and on. This quintuple (st, at, rt+1, st+1, at+1) gives the algorithm the name SARSA. SARSA is an on-policy algorithm, as the same policy is updated as was used to estimate Q. For exploration, you can use, say, ε-greedy. 

主站蜘蛛池模板: 兴文县| 大埔区| 长丰县| 凤阳县| 莆田市| 开化县| 夏邑县| 潼关县| 赤城县| 论坛| 福建省| 会泽县| 日喀则市| 彰武县| 清流县| 遂昌县| 萝北县| 庆安县| 景宁| 山丹县| 庄浪县| 平罗县| 楚雄市| 平舆县| 吴川市| 昭平县| 乐清市| 碌曲县| 资兴市| 金湖县| 乐东| 遂宁市| 寿宁县| 桂东县| 灵璧县| 榆林市| 龙海市| 沙河市| 东至县| 霍城县| 桦南县|