官术网_书友最值得收藏!

When to choose SARSA over Q-learning

As mentioned earlier, Q-learning and SARSA are very similar algorithms, and in fact, Q-learning is sometimes called SARSA-max. When the agent's policy is simply the greedy one (that is, it chooses the highest-valued action from the next state no matter what), Q-learning and SARSA will produce the same results.

In practice, we will not be using a simple greedy strategy and will instead choose something such as epsilon-greedy, where some of the actions are chosen at random. We will explore this in more depth when we discuss epsilon decay strategies further. 

We can, therefore, think of SARSA as a more general version of Q-learning. The algorithms are very similar, and in practice, modifying a Q-learning implementation to SARSA involves nothing more than changing the update method for the Q-values. As we've seen, however, the difference in performance can be profound.

In many problems, SARSA will perform better than Q-learning, especially when there is a good chance that the agent will choose to take a random suboptimal action in the next step, as we explored in the cliff-walking example. In this case, Q-learning's assumption that the agent is following the optimal policy may be far enough from true that SARSA will converge faster and with fewer errors.

主站蜘蛛池模板: 清流县| 繁峙县| 平安县| 临清市| 嘉兴市| 吴桥县| 高陵县| 上饶县| 五家渠市| 紫金县| 霍林郭勒市| 罗城| 阳东县| 龙州县| 江川县| 秀山| 塔河县| 利津县| 泸定县| 花莲县| 珲春市| 琼海市| 甘泉县| 潞城市| 淳安县| 长春市| 华亭县| 罗定市| 鲜城| 永登县| 星子县| 临沭县| 长岭县| 和平区| 库尔勒市| 杂多县| 遂溪县| 南通市| 四子王旗| 乌兰察布市| 肇州县|