- TensorFlow Reinforcement Learning Quick Start Guide
- Kaushik Balakrishnan
- 222字
- 2021-06-24 15:29:09
Temporal Difference, SARSA, and Q-Learning
In the previous chapter, we looked at the basics of RL. In this chapter, we will cover temporal difference (TD) learning, SARSA, and Q-learning, which were very widely used algorithms in RL before deep RL became more common. Understanding these older-generation algorithms is essential if you want to master the field, and will also lay the foundation for delving into deep RL. We will therefore spend this chapter looking at examples using these older generation algorithms. In addition, we will also code some of these algorithms using Python. We will not be using TensorFlow for this chapter, as the problems do not involve any deep neural networks under study. However, this chapter will lay the groundwork for more advanced topics that we will cover in the subsequent chapters, and will also be our first coding experience of an RL algorithm. Specifically, this chapter will be our first deep dive into a standard RL algorithm, and how you can use it to train RL agents for a specific task. It will also be our first hands-on effort at RL, including both theory and practice.
Some of the topics that will be covered in this chapter are as follows:
- Understanding TD learning
- Learning SARSA
- Understanding Q-learning
- Cliff walking with SARSA and Q-learning
- Grid world with SARSA
- Word 2003、Excel 2003、PowerPoint 2003上機(jī)指導(dǎo)與練習(xí)
- PPT,要你好看
- Java編程全能詞典
- 精通MATLAB神經(jīng)網(wǎng)絡(luò)
- 21天學(xué)通PHP
- IoT Penetration Testing Cookbook
- Windows 8應(yīng)用開發(fā)實戰(zhàn)
- 現(xiàn)代傳感技術(shù)
- Ceph:Designing and Implementing Scalable Storage Systems
- 基于Xilinx ISE的FPAG/CPLD設(shè)計與應(yīng)用
- 面向?qū)ο蟪绦蛟O(shè)計綜合實踐
- 網(wǎng)絡(luò)服務(wù)器搭建與管理
- The DevOps 2.1 Toolkit:Docker Swarm
- 工業(yè)機(jī)器人入門實用教程
- PHP求職寶典