- TensorFlow Reinforcement Learning Quick Start Guide
- Kaushik Balakrishnan
- 177字
- 2021-06-24 15:29:08
Summary
In this chapter, we were introduced to the basic concepts of RL. We understood the relationship between an agent and its environment, and also learned about the MDP setting. We learned the concept of reward functions and the use of discounted rewards, as well as the idea of value and advantage functions. In addition, we saw the Bellman equation and how it is used in RL. We also learned the difference between an on-policy and an off-policy RL algorithm. Furthermore, we examined the distinction between model-free and model-based RL algorithms. All of this lays the groundwork for us to delve deeper into RL algorithms and how we can use them to train agents for a given task.
In the next chapter, we will investigate our first two RL algorithms: Q-learning and SARSA. Note that in Chapter 2, Temporal Difference, SARSA, and Q-Learning, we will be using Python-based agents as they are tabular-learning. But from Chapter 3, Deep Q-Network, onward, we will be using TensorFlow to code deep RL agents, as we will require neural networks.
- 繪制進(jìn)程圖:可視化D++語言(第1冊)
- 數(shù)據(jù)展現(xiàn)的藝術(shù)
- Microsoft Power BI Quick Start Guide
- 嵌入式系統(tǒng)應(yīng)用
- 精通MATLAB圖像處理
- 機(jī)器學(xué)習(xí)與大數(shù)據(jù)技術(shù)
- 可編程序控制器應(yīng)用實(shí)訓(xùn)(三菱機(jī)型)
- 精通數(shù)據(jù)科學(xué)算法
- Prometheus監(jiān)控實(shí)戰(zhàn)
- 中國戰(zhàn)略性新興產(chǎn)業(yè)研究與發(fā)展·智能制造裝備
- Salesforce Advanced Administrator Certification Guide
- Cloud Security Automation
- Working with Linux:Quick Hacks for the Command Line
- Mastering Ansible(Second Edition)
- 運(yùn)動(dòng)控制系統(tǒng)(第2版)