官术网_书友最值得收藏!

States

Whatever we need to know about our environment is stored as part of our state, which can be represented as a vector of the variables that we care about:

  • The location (x and y coordinates)
  • The direction 
  • The color of light (red or green)
  • The other cars present (for example, one binary flag for each spot a car might be in)
  • The distance from the destination

The following screenshot is from the game Pac-Man:

Taking Pac-Man as another example, we can use a state vector to represent the variables that we want to keep track of—such as the location of the dots left in the maze, where the Pac-Man character currently is and what direction it is moving in, the location and direction of each ghost, and whether the ghosts can be eaten or not.

We can represent any variables in our state vector that we think are important to our knowledge of the game. At any point in time, our state vector should represent for us the things that we want to know about our environment.

Ideally, we should be able to look at our state vector and have all the information we need to optimally determine what action we need to take. A well-designed state space is key to an effective RL solution.

However, we can quickly see that the number of states in an environment depends on the variables that we choose to keep track of. In other words, it is arbitrary to some respect. Not all algorithm designers will represent the same environment using the same state space. One thing we notice (as developers and researchers) is that even a small change in the way state spaces are represented in an environment can cause a huge difference in the difficulty level of a problem.

When we use a standardized packaged environment such as the ones we'll be working with in OpenAI Gym, the state space (also called an observation space) will be determined for us. We'll also have a predetermined action space and reward structure. 

One good reason to use a standardized environment such as the one offered by OpenAI Gym is that it allows you to compare the performance of your RL algorithms to the work of others. Having a level playing field for the state space allows us to meaningfully compare RL algorithms to each other in a way we otherwise could not. 

主站蜘蛛池模板: 黔东| 师宗县| 定南县| 安国市| 扎囊县| 多伦县| 商丘市| 恩平市| 汶川县| 成都市| 井研县| 宁明县| 文山县| 修文县| 昭觉县| 云和县| 高碑店市| 佳木斯市| 贵定县| 榕江县| 梓潼县| 柳州市| 利辛县| 河北区| 石屏县| 宁海县| 同江市| 大方县| 将乐县| 成安县| 年辖:市辖区| 武强县| 丁青县| 博客| 应城市| 共和县| 乌拉特前旗| 靖宇县| 泰安市| 兴山县| 香格里拉县|