官术网_书友最值得收藏!

Temporal Difference, SARSA, and Q-Learning

In the previous chapter, we looked at the basics of RL. In this chapter, we will cover temporal difference (TD) learning, SARSA, and Q-learning, which were very widely used algorithms in RL before deep RL became more common. Understanding these older-generation algorithms is essential if you want to master the field, and will also lay the foundation for delving into deep RL. We will therefore spend this chapter looking at examples using these older generation algorithms. In addition, we will also code some of these algorithms using Python. We will not be using TensorFlow for this chapter, as the problems do not involve any deep neural networks under study. However, this chapter will lay the groundwork for more advanced topics that we will cover in the subsequent chapters, and will also be our first coding experience of an RL algorithm. Specifically, this chapter will be our first deep dive into a standard RL algorithm, and how you can use it to train RL agents for a specific task. It will also be our first hands-on effort at RL, including both theory and practice. 

Some of the topics that will be covered in this chapter are as follows:

  • Understanding TD learning
  • Learning SARSA
  • Understanding Q-learning
  • Cliff walking with SARSA and Q-learning
  • Grid world with SARSA
主站蜘蛛池模板: 白城市| 广宁县| 珠海市| 崇仁县| 田林县| 格尔木市| 阳泉市| 修水县| 汨罗市| 三台县| 南投县| 长春市| 仙居县| 泰宁县| 冀州市| 呼和浩特市| 秦安县| 井陉县| 沙河市| 阿克苏市| 和政县| 保山市| 盐边县| 崇州市| 衡山县| 新郑市| 岢岚县| 独山县| 平顶山市| 彭州市| 闵行区| 安康市| 平塘县| 敦化市| 冕宁县| 台安县| 潼南县| 凉山| 贵港市| 敖汉旗| 松滋市|