官术网_书友最值得收藏!

Temporal Difference, SARSA, and Q-Learning

In the previous chapter, we looked at the basics of RL. In this chapter, we will cover temporal difference (TD) learning, SARSA, and Q-learning, which were very widely used algorithms in RL before deep RL became more common. Understanding these older-generation algorithms is essential if you want to master the field, and will also lay the foundation for delving into deep RL. We will therefore spend this chapter looking at examples using these older generation algorithms. In addition, we will also code some of these algorithms using Python. We will not be using TensorFlow for this chapter, as the problems do not involve any deep neural networks under study. However, this chapter will lay the groundwork for more advanced topics that we will cover in the subsequent chapters, and will also be our first coding experience of an RL algorithm. Specifically, this chapter will be our first deep dive into a standard RL algorithm, and how you can use it to train RL agents for a specific task. It will also be our first hands-on effort at RL, including both theory and practice. 

Some of the topics that will be covered in this chapter are as follows:

  • Understanding TD learning
  • Learning SARSA
  • Understanding Q-learning
  • Cliff walking with SARSA and Q-learning
  • Grid world with SARSA
主站蜘蛛池模板: 茌平县| 高陵县| 城步| 韶山市| 永新县| 西乌珠穆沁旗| 灵山县| 龙海市| 襄汾县| 阳新县| 山东| 全州县| 华容县| 定结县| 曲阳县| 丰城市| 社旗县| 长寿区| 武功县| 深水埗区| 喀喇沁旗| 呼伦贝尔市| 厦门市| 仁寿县| 大冶市| 枞阳县| 杭锦后旗| 普兰店市| 屏南县| 灵石县| 巴林右旗| 凤翔县| 金坛市| 苏尼特左旗| 临西县| 子长县| 阜阳市| 松江区| 鄂州市| 芦溪县| 阿合奇县|