官术网_书友最值得收藏!

States and actions

When first launched, your agent knows nothing about its environment and takes purely random actions.

As an example, suppose that a hypothetical self-driving car powered by a Q-learning algorithm notices that it's reached a red light, but it doesn't know that it's supposed to stop. It moves one block forward and receives a large penalty. 

The car makes note of that penalty in the Q-table. The next time it encounters a red light, it looks at the Q-table when deciding what to do, and because the move-forward action in the state where it is stopped at a red light now has a lower reward value than any other action, it is less likely to decide to run the red light again.

Likewise, when it takes a correct action, such as stopping at a red light or safely moving closer to the destination, it gets a reward. Thus, it remembers that taking that action in that state led to a reward, and it becomes more likely to take that action again next time. 

While a self-driving car in the real world will, of course, not be expected to teach itself what red lights mean, the driving problem is a popular learning simulation (and one that we'll be implementing in this book) because it's straightforward and easy to model as a state-action function (also called a finite state machine).The following is a sample finite state machine:

When we model a state-action function for any system, we decide the variables that we want to keep track of, and this lets us determine how many states the system can be in. 

For example, a state variable for a vehicle might include information about what intersection the car is located at, whether the traffic light is red or green, and whether there are other cars around. Because we're keeping track of multiple variables, we might represent this as a vector.

The possible actions for a self-driving vehicle agent can be: move forward one block, turn left, turn right, and stop and wait – and these actions are mapped to the appropriate values of the state variable. 

Recall that an agent's state-action function is called its policy. A policy can be either simple and straightforward or complex and difficult to enumerate, depending on the problem itself and the number of states and actions.

In the model-free version of Q-learning, it's important to note that we do not learn an agent's policy explicitly. We only update the output values that we see as a result of that policy, which we are mapping to the state-action inputs. This is why we refer to model-free Q-learning as a value-based algorithm as opposed to a policy-based algorithm. 

主站蜘蛛池模板: 扶余县| 饶平县| 大洼县| 中西区| 泽普县| 丰镇市| 永川市| 洛南县| 罗定市| 东宁县| 喀什市| 农安县| 久治县| 临清市| 安塞县| 邵东县| 万宁市| 定西市| 莫力| 溧阳市| 宣恩县| 阆中市| 华安县| 巧家县| 离岛区| 蒲城县| 武威市| 石渠县| 雷山县| 苍梧县| 宜都市| 章丘市| 阜阳市| 泰宁县| 肇源县| 嘉定区| 永嘉县| 桂东县| 绥芬河市| 文化| 开平市|