官术网_书友最值得收藏!

How it works...

In this recipe, we print out the state array for every step. But what does each float in the array mean? We can find more information about CartPole on Gym's GitHub wiki page: https://github.com/openai/gym/wiki/CartPole-v0. It turns out that those four floats represent the following:  

  • Cart position: This ranges from -2.4 to 2.4, and any position beyond this range will trigger episode termination.
  • Cart velocity.
  • Pole angle: Any value less than -0.209 (-12 degrees) or greater than 0.209 (12 degrees) will trigger episode termination.
  • Pole velocity at the tip.

In terms of the action, it is either 0 or 1, which corresponds to pushing the cart to the left and to the right, respectively.

The reward in this environment is +1 for every timestep before the episode terminates. We can also verify this by printing out the reward for every step. And the total reward is simply the number of timesteps.

主站蜘蛛池模板: 观塘区| 论坛| 郓城县| 虹口区| 教育| 华容县| 永安市| 福贡县| 娱乐| 肥东县| 汝州市| 桃园市| 呼伦贝尔市| 苍梧县| 吉隆县| 科技| 隆林| 江山市| 建水县| 璧山县| 珠海市| 临湘市| 叶城县| 山丹县| 会泽县| 册亨县| 德钦县| 淮滨县| 鄂尔多斯市| 天峻县| 龙胜| 沙坪坝区| 新巴尔虎左旗| 泗水县| 高平市| 化州市| 栾城县| 阿图什市| 永顺县| 尼勒克县| 辛集市|