書名： TensorFlow Reinforcement Learning Quick Start Guide
作者名： Kaushik Balakrishnan
本章字數： 151字
更新時間： 2021-06-24 15:29:09

Learning SARSA

SARSA is another on-policy algorithm that was very popular, particularly in the 1990s. It is an extension of TD-learning, which we saw previously, and is an on-policy algorithm. SARSA keeps an update of the state-action value function, and as new experiences are encountered, this state-action value function is updated using the Bellman equation of dynamic programming. We extend the preceding TD algorithm to state-action value function, Q(s_t,a_t), and this approach is called SARSA:

Here, from a given state s_t, we take action a_t, receive a reward r_t+1, transition to a new state s_t+1, and thereafter take an action a_t+1 that then continues on and on. This quintuple (s_t, a_t, r_t+1, s_t+1, a_t+1) gives the algorithm the name SARSA. SARSA is an on-policy algorithm, as the same policy is updated as was used to estimate Q. For exploration, you can use, say, ε-greedy.

官术网_书友最值得收藏!

TensorFlow Reinforcement Learning Quick Start Guide

Learning SARSA