官术网_书友最值得收藏!

How it works...

In this recipe, we print out the state array for every step. But what does each float in the array mean? We can find more information about CartPole on Gym's GitHub wiki page: https://github.com/openai/gym/wiki/CartPole-v0. It turns out that those four floats represent the following:  

  • Cart position: This ranges from -2.4 to 2.4, and any position beyond this range will trigger episode termination.
  • Cart velocity.
  • Pole angle: Any value less than -0.209 (-12 degrees) or greater than 0.209 (12 degrees) will trigger episode termination.
  • Pole velocity at the tip.

In terms of the action, it is either 0 or 1, which corresponds to pushing the cart to the left and to the right, respectively.

The reward in this environment is +1 for every timestep before the episode terminates. We can also verify this by printing out the reward for every step. And the total reward is simply the number of timesteps.

主站蜘蛛池模板: 龙川县| 当阳市| 弥渡县| 景洪市| 兴安盟| 丹江口市| 吴忠市| 福建省| 宁南县| 镇巴县| 宜兰市| 宜兰市| 东港市| 茌平县| 大石桥市| 宁波市| 南丰县| 祁东县| 保康县| 邹平县| 濉溪县| 顺义区| 保山市| 绥宁县| 宣化县| 于田县| 凌云县| 象山县| 柯坪县| 珲春市| 土默特左旗| 佳木斯市| 津南区| 波密县| 余江县| 昌都县| 赣榆县| 来宾市| 兴城市| 鲁山县| 邳州市|