官术网_书友最值得收藏!

  • Python Reinforcement Learning
  • Sudharsan Ravichandiran Sean Saito Rajalingappaa Shanmugamani Yang Wenzhuo
  • 323字
  • 2021-06-24 15:17:28

Building a video game bot

Let's learn how to build a video game bot which plays a car racing game. Our objective is that the car has to move forward without getting stuck on any obstacles or hitting other cars.

First, we import the necessary libraries:

import gym
import universe # register universe environment
import random

Then we simulate our car racing environment using the make function:

env = gym.make('flashgames.NeonRace-v0')
env.configure(remotes=1) #automatically creates a local docker container

Let's create the variables for moving the car:

# Move left
left = [('KeyEvent', 'ArrowUp', True), ('KeyEvent', 'ArrowLeft', True),
('KeyEvent', 'ArrowRight', False)]

#Move right
right = [('KeyEvent', 'ArrowUp', True), ('KeyEvent', 'ArrowLeft', False),
('KeyEvent', 'ArrowRight', True)]

# Move forward
forward = [('KeyEvent', 'ArrowUp', True), ('KeyEvent', 'ArrowRight', False),
('KeyEvent', 'ArrowLeft', False), ('KeyEvent', 'n', True)]

We will initialize some other variables:

# We use turn variable for deciding whether to turn or not
turn = 0

# We store all the rewards in rewards list
rewards = []

#we will use buffer as some threshold
buffer_size = 100

#we will initially set action as forward, which just move the car forward #without any turn
action = forward

Now, let's allow our game agent to play in an infinite loop that continuously performs an action based on interaction with the environment:

while True:
turn -= 1
# Let us say initially we take no turn and move forward.
# We will check value of turn, if it is less than 0
# then there is no necessity for turning and we just move forward.
if turn <= 0:
action = forward
turn = 0

Then we use env.step() to perform an action (moving forward for now) for a one-time step:

 action_n = [action for ob in observation_n]
observation_n, reward_n, done_n, info = env.step(action_n)

For each time step, we record the results in the observation_n, reward_n, done_n, and info variables:

  • observation _n: State of the car
  • reward_n: Reward gained by the previous action, if the car successfully moves forward without getting stuck on obstacles
  • done_n: It is a Boolean; it will be set to true if the game is over
  • info_n: Used for debugging purposes

Obviously, an agent (car) cannot move forward throughout the game; it needs to take a turn, avoid obstacles, and will also hit other vehicles. But it has to determine whether it should take a turn and, if yes, then in which direction it should turn.

First, we will calculate the mean of the rewards we obtained so far; if it is 0 then it is clear that we got stuck somewhere while moving forward and we need to take a turn. Then again, which direction do we need to turn? Do you recollect the policy functions we studied in Chapter 1, Introduction to Reinforcement Learning.

Referring to the same concept, we have two policies here: one is turning left and the other is turning right. We will take a random policy here and calculate a reward and improve upon that.

We will generate a random number and if it is less than 0.5, then we will take a right, otherwise we will take a left. Later, we will update our rewards and, based on our rewards, we will learn which direction is best:

if len(rewards) >= buffer_size:
mean = sum(rewards)/len(rewards)

if mean == 0:
turn = 20
if random.random() < 0.5:
action = right
else:
action = left

rewards = []

Then, for each episode (say the game is over), we reinitialize the environment (start the game from the beginning) using the env.render():

  env.render()

The complete code is as follows:

import gym
import universe # register universe environment
import random

env = gym.make('flashgames.NeonRace-v0')
env.configure(remotes=1) # automatically creates a local docker container
observation_n = env.reset()

##Declare actions
#Move left
left = [('KeyEvent', 'ArrowUp', True), ('KeyEvent', 'ArrowLeft', True),
('KeyEvent', 'ArrowRight', False)]

#move right
right = [('KeyEvent', 'ArrowUp', True), ('KeyEvent', 'ArrowLeft', False),
('KeyEvent', 'ArrowRight', True)]

# Move forward
forward = [('KeyEvent', 'ArrowUp', True), ('KeyEvent', 'ArrowRight', False),
('KeyEvent', 'ArrowLeft', False), ('KeyEvent', 'n', True)]

#Determine whether to turn or not
turn = 0
#store rewards in a list
rewards = []
#use buffer as a threshold
buffer_size = 100
#initial action as forward
action = forward

while True:
turn -= 1
if turn <= 0:
action = forward
turn = 0
action_n = [action for ob in observation_n]
observation_n, reward_n, done_n, info = env.step(action_n)
rewards += [reward_n[0]]

if len(rewards) >= buffer_size:
mean = sum(rewards)/len(rewards)

if mean == 0:
turn = 20
if random.random() < 0.5:
action = right
else:
action = left

rewards = []

env.render()

If you run the program, you can see how the car learns to move without getting stuck or hitting other vehicles:

主站蜘蛛池模板: 曲周县| 扶沟县| 綦江县| 宜宾县| 南乐县| 东源县| 寿光市| 饶河县| 安国市| 成都市| 龙陵县| 博爱县| 宁海县| 江西省| 壶关县| 福清市| 崇文区| 平南县| 安福县| 靖边县| 辉县市| 江阴市| 长顺县| 七台河市| 通州区| 淮安市| 林州市| 盐津县| 蒙自县| 景宁| 崇信县| 应用必备| 荣成市| 凤翔县| 元朗区| 南投市| 辛集市| 呼伦贝尔市| 大方县| 平乐县| 仁寿县|