- TensorFlow Reinforcement Learning Quick Start Guide
- Kaushik Balakrishnan
- 195字
- 2021-06-24 15:29:10
Understanding Q-learning
Q-learning is an off-policy algorithm that was first proposed by Christopher Watkins in 1989, and is a widely used RL algorithm. Q-learning, such as SARSA, keeps an update of the state-action value function for each state-action pair, and recursively updates it using the Bellman equation of dynamic programming as new experiences are collected. Note that it is an off-policy algorithm as it uses the state-action value function evaluated at the action, which will maximize the value. Q-learning is used for problems where the actions are discrete – for example, if we have the actions move north, move south, move east, move west, and we are to decide the optimum action in a given state, then Q-learning is applicable in such settings.
In the classical Q-learning approach, the update is given as follows, where the max is performed over actions, that is, we choose the action a corresponding to the maximum value of Q at state st+1:

The α is the learning rate, which is a hyper-parameter that the user can specify.
Before we code the algorithms in Python, let's find out what kind of problems will be considered.
- Mastering Mesos
- Hands-On Internet of Things with MQTT
- 精通Windows Vista必讀
- STM32G4入門與電機控制實戰:基于X-CUBE-MCSDK的無刷直流電機與永磁同步電機控制實現
- 21天學通Java
- 精通數據科學算法
- 網絡化分布式系統預測控制
- Deep Reinforcement Learning Hands-On
- 教育機器人的風口:全球發展現狀及趨勢
- Godot Engine Game Development Projects
- FPGA/CPLD應用技術(Verilog語言版)
- Linux內核精析
- Windows安全指南
- 和機器人一起進化
- 貫通開源Web圖形與報表技術全集