- PyTorch 1.x Reinforcement Learning Cookbook
- Yuxi (Hayden) Liu
- 159字
- 2021-06-24 12:34:39
How it works...
In this recipe, we print out the state array for every step. But what does each float in the array mean? We can find more information about CartPole on Gym's GitHub wiki page: https://github.com/openai/gym/wiki/CartPole-v0. It turns out that those four floats represent the following:
- Cart position: This ranges from -2.4 to 2.4, and any position beyond this range will trigger episode termination.
- Cart velocity.
- Pole angle: Any value less than -0.209 (-12 degrees) or greater than 0.209 (12 degrees) will trigger episode termination.
- Pole velocity at the tip.
In terms of the action, it is either 0 or 1, which corresponds to pushing the cart to the left and to the right, respectively.
The reward in this environment is +1 for every timestep before the episode terminates. We can also verify this by printing out the reward for every step. And the total reward is simply the number of timesteps.
推薦閱讀
- 精通MATLAB神經(jīng)網(wǎng)絡(luò)
- LabVIEW虛擬儀器從入門到測(cè)控應(yīng)用130例
- Natural Language Processing Fundamentals
- 80x86/Pentium微型計(jì)算機(jī)原理及應(yīng)用
- Visual FoxPro程序設(shè)計(jì)
- Hands-On Data Warehousing with Azure Data Factory
- Apache源代碼全景分析(第1卷):體系結(jié)構(gòu)與核心模塊
- PLC與變頻技術(shù)應(yīng)用
- Learning Linux Shell Scripting
- 機(jī)床電氣控制與PLC
- MongoDB 4 Quick Start Guide
- 步步驚“芯”
- Data Analysis with R(Second Edition)
- 中小型網(wǎng)站建設(shè)與管理
- Flash CS3動(dòng)畫制作融會(huì)貫通