- Hands-On Q-Learning with Python
- Nazia Habib
- 456字
- 2021-06-24 15:13:08
States and actions
When first launched, your agent knows nothing about its environment and takes purely random actions.
As an example, suppose that a hypothetical self-driving car powered by a Q-learning algorithm notices that it's reached a red light, but it doesn't know that it's supposed to stop. It moves one block forward and receives a large penalty.
The car makes note of that penalty in the Q-table. The next time it encounters a red light, it looks at the Q-table when deciding what to do, and because the move-forward action in the state where it is stopped at a red light now has a lower reward value than any other action, it is less likely to decide to run the red light again.
Likewise, when it takes a correct action, such as stopping at a red light or safely moving closer to the destination, it gets a reward. Thus, it remembers that taking that action in that state led to a reward, and it becomes more likely to take that action again next time.
While a self-driving car in the real world will, of course, not be expected to teach itself what red lights mean, the driving problem is a popular learning simulation (and one that we'll be implementing in this book) because it's straightforward and easy to model as a state-action function (also called a finite state machine).The following is a sample finite state machine:

When we model a state-action function for any system, we decide the variables that we want to keep track of, and this lets us determine how many states the system can be in.
For example, a state variable for a vehicle might include information about what intersection the car is located at, whether the traffic light is red or green, and whether there are other cars around. Because we're keeping track of multiple variables, we might represent this as a vector.
The possible actions for a self-driving vehicle agent can be: move forward one block, turn left, turn right, and stop and wait – and these actions are mapped to the appropriate values of the state variable.
Recall that an agent's state-action function is called its policy. A policy can be either simple and straightforward or complex and difficult to enumerate, depending on the problem itself and the number of states and actions.
In the model-free version of Q-learning, it's important to note that we do not learn an agent's policy explicitly. We only update the output values that we see as a result of that policy, which we are mapping to the state-action inputs. This is why we refer to model-free Q-learning as a value-based algorithm as opposed to a policy-based algorithm.
- 數(shù)據(jù)展現(xiàn)的藝術(shù)
- GNU-Linux Rapid Embedded Programming
- Seven NoSQL Databases in a Week
- 商戰(zhàn)數(shù)據(jù)挖掘:你需要了解的數(shù)據(jù)科學(xué)與分析思維
- SQL Server數(shù)據(jù)庫(kù)應(yīng)用基礎(chǔ)(第2版)
- 筆記本電腦電路分析與故障診斷
- WOW!Photoshop CS6完全自學(xué)寶典
- 從祖先到算法:加速進(jìn)化的人類文化
- Flink內(nèi)核原理與實(shí)現(xiàn)
- 超好玩的Python少兒編程
- Arduino創(chuàng)意機(jī)器人入門:基于Mind+
- SQL語(yǔ)言與數(shù)據(jù)庫(kù)操作技術(shù)大全
- 單片機(jī)原理、接口及應(yīng)用系統(tǒng)設(shè)計(jì)
- Modern Big Data Processing with Hadoop
- Practical Computer Vision