- Python Reinforcement Learning
- Sudharsan Ravichandiran Sean Saito Rajalingappaa Shanmugamani Yang Wenzhuo
- 361字
- 2021-06-24 15:17:27
Training a robot to walk
Now let's learn how to train a robot to walk using Gym along with some fundamentals.
The strategy is that X points will be given as a reward when the robot moves forward, and if the robot fails to move then Y points will be reduced. So the robot will learn to walk in the event of maximizing the reward.
First, we will import the library, then we will create a simulation instance by the make function. Open AI Gym provides an environment called BipedalWalker-v2 for training robotic agents in a simple terrain:
import gym
env = gym.make('BipedalWalker-v2')
Then, for each episode (agent-environment interaction between the initial and final state), we will initialize the environment using the reset method:
for episode in range(100):
observation = env.reset()
Then we will loop and render the environment:
for i in range(10000):
env.render()
We sample random actions from the environment's action space. Every environment has an action space which contains all possible valid actions:
action = env.action_space.sample()
For each action step, we will record observation, reward, done, and info:
observation, reward, done, info = env.step(action)
observation is the object representing an observation of the environment. For example, the state of the robot in the terrain.
reward are the rewards gained by the previous action. For example, the reward gained by a robot on successfully moving forward.
done is the Boolean; when it is true, it indicates that the episode has completed (that is, the robot learned to walk or failed completely). Once the episode has completed, we can initialize the environment for the next episode using env.reset().
info is the information that is useful for debugging.
When done is true, we print the time steps taken for the episode and break the current episode:
if done:
print("{} timesteps taken for the Episode".format(i+1))
break
The complete code is as follows:
import gym
env = gym.make('BipedalWalker-v2')
for i_episode in range(100):
observation = env.reset()
for t in range(10000):
env.render()
print(observation)
action = env.action_space.sample()
observation, reward, done, info = env.step(action)
if done:
print("{} timesteps taken for the episode".format(t+1))
break
The output is shown in the following screenshot:

- 數(shù)據(jù)浪潮
- 數(shù)據(jù)分析實(shí)戰(zhàn):基于EXCEL和SPSS系列工具的實(shí)踐
- Architects of Intelligence
- Oracle 12c云數(shù)據(jù)庫(kù)備份與恢復(fù)技術(shù)
- 一本書(shū)講透Elasticsearch:原理、進(jìn)階與工程實(shí)踐
- 視覺(jué)大數(shù)據(jù)智能分析算法實(shí)戰(zhàn)
- 數(shù)據(jù)挖掘競(jìng)賽實(shí)戰(zhàn):方法與案例
- 利用Python進(jìn)行數(shù)據(jù)分析(原書(shū)第2版)
- Google Cloud Platform for Architects
- 成功之路:ORACLE 11g學(xué)習(xí)筆記
- 數(shù)據(jù)庫(kù)技術(shù)與應(yīng)用:SQL Server 2008
- Learn Selenium
- 實(shí)用數(shù)據(jù)結(jié)構(gòu)基礎(chǔ)(第四版)
- 醫(yī)療大數(shù)據(jù)分析與應(yīng)用
- PostgreSQL實(shí)戰(zhàn)