官术网_书友最值得收藏!

How it works...

In this recipe, we print out the state array for every step. But what does each float in the array mean? We can find more information about CartPole on Gym's GitHub wiki page: https://github.com/openai/gym/wiki/CartPole-v0. It turns out that those four floats represent the following:  

  • Cart position: This ranges from -2.4 to 2.4, and any position beyond this range will trigger episode termination.
  • Cart velocity.
  • Pole angle: Any value less than -0.209 (-12 degrees) or greater than 0.209 (12 degrees) will trigger episode termination.
  • Pole velocity at the tip.

In terms of the action, it is either 0 or 1, which corresponds to pushing the cart to the left and to the right, respectively.

The reward in this environment is +1 for every timestep before the episode terminates. We can also verify this by printing out the reward for every step. And the total reward is simply the number of timesteps.

主站蜘蛛池模板: 石阡县| 南开区| 淳安县| 丰原市| 萨迦县| 沧源| 余庆县| 思南县| 兴化市| 改则县| 宜丰县| 黄大仙区| 都匀市| 仪征市| 三亚市| 纳雍县| 东丽区| 德清县| 宽甸| 嘉定区| 宁都县| 阜宁县| 大城县| 广灵县| 西藏| 广丰县| 呼伦贝尔市| 江华| 兰州市| 新宾| 商水县| 沂源县| 西丰县| 临清市| 宁夏| 涟水县| 铅山县| 吴忠市| 临澧县| 富裕县| 无极县|