官术网_书友最值得收藏!

SARSA versus Q-learning – on-policy or off?

Similar to Q-learning, SARSA is a model-free RL method that does not explicitly learn the agent's policy function. 

The primary difference between SARSA and Q-learning is that SARSA is an on-policy method while Q-learning is an off-policy method. The effective difference between the two algorithms happens in the step where the Q-table is updated. Let's discuss what that means with some examples:

Monte Carlo tree search (MCTS) is a type of model-based RL. We won't be discussing it in detail here, but it's useful to explore further as a contrast to model-free RL algorithms. Briefly, in model-based RL, we attempt to explicitly model a value function instead of relying on sampling and observation, so that we don't have to rely as much on trial and error in the learning process.

主站蜘蛛池模板: 渭源县| 和林格尔县| 墨脱县| 屯门区| 且末县| 开远市| 乳山市| 安徽省| 句容市| 苏尼特左旗| 酉阳| 凯里市| 黑山县| 天水市| 鹿邑县| 海盐县| 佛坪县| 镇江市| 崇州市| 正镶白旗| 阿克苏市| 泸溪县| 锡林浩特市| 鹤峰县| 阿瓦提县| 庄河市| 依安县| 凌云县| 沙田区| 南川市| 拉萨市| 阜康市| 三原县| 东乡| 桐乡市| 张家口市| 汾阳市| 东至县| 巢湖市| 庆元县| 响水县|