官术网_书友最值得收藏!

Temporal Difference, SARSA, and Q-Learning

In the previous chapter, we looked at the basics of RL. In this chapter, we will cover temporal difference (TD) learning, SARSA, and Q-learning, which were very widely used algorithms in RL before deep RL became more common. Understanding these older-generation algorithms is essential if you want to master the field, and will also lay the foundation for delving into deep RL. We will therefore spend this chapter looking at examples using these older generation algorithms. In addition, we will also code some of these algorithms using Python. We will not be using TensorFlow for this chapter, as the problems do not involve any deep neural networks under study. However, this chapter will lay the groundwork for more advanced topics that we will cover in the subsequent chapters, and will also be our first coding experience of an RL algorithm. Specifically, this chapter will be our first deep dive into a standard RL algorithm, and how you can use it to train RL agents for a specific task. It will also be our first hands-on effort at RL, including both theory and practice. 

Some of the topics that will be covered in this chapter are as follows:

  • Understanding TD learning
  • Learning SARSA
  • Understanding Q-learning
  • Cliff walking with SARSA and Q-learning
  • Grid world with SARSA
主站蜘蛛池模板: 达孜县| 彝良县| 睢宁县| 辽阳市| 永兴县| 西畴县| 毕节市| 肥城市| 商城县| 龙泉市| 灯塔市| 湖南省| 漳州市| 保德县| 芒康县| 平江县| 苏尼特左旗| 泽州县| 习水县| 聊城市| 太原市| 丹寨县| 三河市| 重庆市| 婺源县| 大洼县| 泰州市| 剑阁县| 固阳县| 荆门市| 比如县| 谢通门县| 宜章县| 香格里拉县| 绥滨县| 利川市| 奉新县| 阿拉善左旗| 铁力市| 汤阴县| 凭祥市|