官术网_书友最值得收藏!

Summary

In this chapter, we were introduced to the basic concepts of RL. We understood the relationship between an agent and its environment, and also learned about the MDP setting. We learned the concept of reward functions and the use of discounted rewards, as well as the idea of value and advantage functions. In addition, we saw the Bellman equation and how it is used in RL. We also learned the difference between an on-policy and an off-policy RL algorithm. Furthermore, we examined the distinction between model-free and model-based RL algorithms. All of this lays the groundwork for us to delve deeper into RL algorithms and how we can use them to train agents for a given task.

In the next chapter, we will investigate our first two RL algorithms: Q-learning and SARSA. Note that in Chapter 2, Temporal Difference, SARSA, and Q-Learning, we will be using Python-based agents as they are tabular-learning. But from Chapter 3, Deep Q-Network, onward, we will be using TensorFlow to code deep RL agents, as we will require neural networks.

主站蜘蛛池模板: 尉犁县| 金秀| 雅江县| 凌源市| 图木舒克市| 中卫市| 北海市| 毕节市| 望城县| 广饶县| 常熟市| 宜昌市| 遂溪县| 平乐县| 栾川县| 会昌县| 兰考县| 浦江县| 禹州市| 仁怀市| 景洪市| 天全县| 固始县| 神农架林区| 葵青区| 古丈县| 白山市| 兴文县| 禄劝| 西乌| 龙江县| 思茅市| 方城县| 日土县| 繁昌县| 延庆县| 永寿县| 讷河市| 天祝| 红原县| 濮阳县|