- PyTorch 1.x Reinforcement Learning Cookbook
- Yuxi (Hayden) Liu
- 159字
- 2021-06-24 12:34:39
How it works...
In this recipe, we print out the state array for every step. But what does each float in the array mean? We can find more information about CartPole on Gym's GitHub wiki page: https://github.com/openai/gym/wiki/CartPole-v0. It turns out that those four floats represent the following:
- Cart position: This ranges from -2.4 to 2.4, and any position beyond this range will trigger episode termination.
- Cart velocity.
- Pole angle: Any value less than -0.209 (-12 degrees) or greater than 0.209 (12 degrees) will trigger episode termination.
- Pole velocity at the tip.
In terms of the action, it is either 0 or 1, which corresponds to pushing the cart to the left and to the right, respectively.
The reward in this environment is +1 for every timestep before the episode terminates. We can also verify this by printing out the reward for every step. And the total reward is simply the number of timesteps.
推薦閱讀
- 大學(xué)計(jì)算機(jī)基礎(chǔ):基礎(chǔ)理論篇
- 腦動(dòng)力:Linux指令速查效率手冊
- 大數(shù)據(jù)項(xiàng)目管理:從規(guī)劃到實(shí)現(xiàn)
- Cinema 4D R13 Cookbook
- 基于LPC3250的嵌入式Linux系統(tǒng)開發(fā)
- Dreamweaver CS3網(wǎng)頁制作融會(huì)貫通
- 程序設(shè)計(jì)語言與編譯
- 水晶石精粹:3ds max & ZBrush三維數(shù)字靜幀藝術(shù)
- 人工智能趣味入門:光環(huán)板程序設(shè)計(jì)
- Containers in OpenStack
- 嵌入式Linux系統(tǒng)實(shí)用開發(fā)
- 筆記本電腦電路分析與故障診斷
- 人工智能:智能人機(jī)交互
- 手把手教你學(xué)Photoshop CS3
- PHP求職寶典