- TensorFlow Reinforcement Learning Quick Start Guide
- Kaushik Balakrishnan
- 333字
- 2021-06-24 15:29:09
Relation between the value functions and state
The value function is an agent's estimate of how good a given state is. For instance, if a robot is near the edge of a cliff and may fall, that state is bad and must have a low value. On the other hand, if the robot/agent is near its final goal, that state is a good state to be in, as the rewards they will soon receive are high, and so that state will have a higher value.
The value function, V, is updated after reaching a st state and receiving a rt reward from the environment. The simplest TD learning algorithm is called TD(0) and performs an update using the following equation where α is the learning rate and 0 ≤ α ≤ 1:

Note that in some reference papers or books, the preceding formula will have rt instead of rt+1. This is just a difference in convention and is not an error; rt+1 here denotes the reward received from st state and transitioning to st+1.
There is also another TD learning variant called TD(λ) that used eligibility traces e(s), which are a record of visiting a state. More formally, we perform a TD(λ) update as follows:

The eligibility traces are given by the following equation:

Here, e(s) = 0 at t = 0. For each step the agent takes, the eligibility trace decreases by γλ for all states, and is incremented by 1 for the state visited in the current time step. Here, 0 ≤ λ ≤ 1, and it is a parameter that decides how much of the credit from a reward is to be assigned to distant states. Next, we will look at the theory behind our next two RL algorithms, SARSA and Q-learning, both of which are quite popular in the RL community.
- 高效能辦公必修課:Word圖文處理
- Instant Raspberry Pi Gaming
- Machine Learning for Cybersecurity Cookbook
- 輕松學(xué)Java
- ServiceNow Cookbook
- PIC單片機C語言非常入門與視頻演練
- 群體智能與數(shù)據(jù)挖掘
- 讓每張照片都成為佳作的Photoshop后期技法
- 計算機組成與操作系統(tǒng)
- 未來學(xué)徒:讀懂人工智能飛馳時代
- 智能+:制造業(yè)的智能化轉(zhuǎn)型
- FreeCAD [How-to]
- CPLD/FPGA技術(shù)應(yīng)用
- PVCBOT零基礎(chǔ)機器人制作(第2版)
- 多傳感器數(shù)據(jù)智能融合理論與應(yīng)用