- Hands-On Q-Learning with Python
- Nazia Habib
- 408字
- 2021-06-24 15:13:09
States
Whatever we need to know about our environment is stored as part of our state, which can be represented as a vector of the variables that we care about:
- The location (x and y coordinates)
- The direction
- The color of light (red or green)
- The other cars present (for example, one binary flag for each spot a car might be in)
- The distance from the destination
The following screenshot is from the game Pac-Man:

Taking Pac-Man as another example, we can use a state vector to represent the variables that we want to keep track of—such as the location of the dots left in the maze, where the Pac-Man character currently is and what direction it is moving in, the location and direction of each ghost, and whether the ghosts can be eaten or not.
We can represent any variables in our state vector that we think are important to our knowledge of the game. At any point in time, our state vector should represent for us the things that we want to know about our environment.
Ideally, we should be able to look at our state vector and have all the information we need to optimally determine what action we need to take. A well-designed state space is key to an effective RL solution.
However, we can quickly see that the number of states in an environment depends on the variables that we choose to keep track of. In other words, it is arbitrary to some respect. Not all algorithm designers will represent the same environment using the same state space. One thing we notice (as developers and researchers) is that even a small change in the way state spaces are represented in an environment can cause a huge difference in the difficulty level of a problem.
When we use a standardized packaged environment such as the ones we'll be working with in OpenAI Gym, the state space (also called an observation space) will be determined for us. We'll also have a predetermined action space and reward structure.
One good reason to use a standardized environment such as the one offered by OpenAI Gym is that it allows you to compare the performance of your RL algorithms to the work of others. Having a level playing field for the state space allows us to meaningfully compare RL algorithms to each other in a way we otherwise could not.
- 構建高質量的C#代碼
- 樂高機器人EV3設計指南:創造者的搭建邏輯
- 輕松學PHP
- 西門子S7-200 SMART PLC從入門到精通
- Java開發技術全程指南
- Hands-On Cybersecurity with Blockchain
- 西門子S7-200 SMART PLC實例指導學與用
- AWS Certified SysOps Administrator:Associate Guide
- Implementing Oracle API Platform Cloud Service
- 電腦日常使用與維護322問
- 機床電氣控制與PLC
- C#求職寶典
- 人工智能云平臺:原理、設計與應用
- 精通ROS機器人編程(原書第2版)
- 51單片機應用程序開發與實踐