官术网_书友最值得收藏!

When to choose SARSA over Q-learning

As mentioned earlier, Q-learning and SARSA are very similar algorithms, and in fact, Q-learning is sometimes called SARSA-max. When the agent's policy is simply the greedy one (that is, it chooses the highest-valued action from the next state no matter what), Q-learning and SARSA will produce the same results.

In practice, we will not be using a simple greedy strategy and will instead choose something such as epsilon-greedy, where some of the actions are chosen at random. We will explore this in more depth when we discuss epsilon decay strategies further. 

We can, therefore, think of SARSA as a more general version of Q-learning. The algorithms are very similar, and in practice, modifying a Q-learning implementation to SARSA involves nothing more than changing the update method for the Q-values. As we've seen, however, the difference in performance can be profound.

In many problems, SARSA will perform better than Q-learning, especially when there is a good chance that the agent will choose to take a random suboptimal action in the next step, as we explored in the cliff-walking example. In this case, Q-learning's assumption that the agent is following the optimal policy may be far enough from true that SARSA will converge faster and with fewer errors.

主站蜘蛛池模板: 黄平县| 临泽县| 文成县| 铜鼓县| 大庆市| 柞水县| 罗源县| 静海县| 台前县| 井陉县| 吉安市| 元氏县| 临海市| 布拖县| 丹东市| 金堂县| 五家渠市| 池州市| 云浮市| 天津市| 卢湾区| 泽普县| 东乌珠穆沁旗| 芮城县| 资溪县| 南通市| 抚顺县| 台中市| 奇台县| 安顺市| 竹北市| 河池市| 蕉岭县| 来宾市| 石狮市| 惠安县| 连州市| 建始县| 义马市| 韩城市| 韶关市|