- TensorFlow Reinforcement Learning Quick Start Guide
- Kaushik Balakrishnan
- 151字
- 2021-06-24 15:29:09
Learning SARSA
SARSA is another on-policy algorithm that was very popular, particularly in the 1990s. It is an extension of TD-learning, which we saw previously, and is an on-policy algorithm. SARSA keeps an update of the state-action value function, and as new experiences are encountered, this state-action value function is updated using the Bellman equation of dynamic programming. We extend the preceding TD algorithm to state-action value function, Q(st,at), and this approach is called SARSA:

Here, from a given state st, we take action at, receive a reward rt+1, transition to a new state st+1, and thereafter take an action at+1 that then continues on and on. This quintuple (st, at, rt+1, st+1, at+1) gives the algorithm the name SARSA. SARSA is an on-policy algorithm, as the same policy is updated as was used to estimate Q. For exploration, you can use, say, ε-greedy.
- 空間機器人遙操作系統及控制
- AWS:Security Best Practices on AWS
- WOW!Illustrator CS6完全自學寶典
- IoT Penetration Testing Cookbook
- Photoshop CS3圖層、通道、蒙版深度剖析寶典
- Nginx高性能Web服務器詳解
- 電氣控制與PLC技術應用
- Mastering Game Development with Unreal Engine 4(Second Edition)
- Learn QGIS
- 三菱FX/Q系列PLC工程實例詳解
- JRuby語言實戰技術
- 筆記本電腦維修之電路分析基礎
- 無人駕駛感知智能
- 簡明學中文版Flash動畫制作
- 筆記本電腦使用與維護