書名： Hands-On Q-Learning with Python
作者名： Nazia Habib
本章字數： 408字
更新時間： 2021-06-24 15:13:09

States

Whatever we need to know about our environment is stored as part of our state, which can be represented as a vector of the variables that we care about:

The location (x and y coordinates)
The direction
The color of light (red or green)
The other cars present (for example, one binary flag for each spot a car might be in)
The distance from the destination

The following screenshot is from the game Pac-Man:

Taking Pac-Man as another example, we can use a state vector to represent the variables that we want to keep track of—such as the location of the dots left in the maze, where the Pac-Man character currently is and what direction it is moving in, the location and direction of each ghost, and whether the ghosts can be eaten or not.

We can represent any variables in our state vector that we think are important to our knowledge of the game. At any point in time, our state vector should represent for us the things that we want to know about our environment.

Ideally, we should be able to look at our state vector and have all the information we need to optimally determine what action we need to take. A well-designed state space is key to an effective RL solution.

However, we can quickly see that the number of states in an environment depends on the variables that we choose to keep track of. In other words, it is arbitrary to some respect. Not all algorithm designers will represent the same environment using the same state space. One thing we notice (as developers and researchers) is that even a small change in the way state spaces are represented in an environment can cause a huge difference in the difficulty level of a problem.

When we use a standardized packaged environment such as the ones we'll be working with in OpenAI Gym, the state space (also called an observation space) will be determined for us. We'll also have a predetermined action space and reward structure.

One good reason to use a standardized environment such as the one offered by OpenAI Gym is that it allows you to compare the performance of your RL algorithms to the work of others. Having a level playing field for the state space allows us to meaningfully compare RL algorithms to each other in a way we otherwise could not.

官术网_书友最值得收藏!

Hands-On Q-Learning with Python

States