There's more...

So far, we've run only one episode. In order to assess how well the agent performs, we can simulate many episodes and then average the total rewards for an individual episode. The average total reward will tell us about the performance of the agent that takes random actions.

Let’s set 10,000 episodes:

 >>> n_episode = 10000

In each episode, we compute the total reward by accumulating the reward in every step:

 >>> total_rewards = []
 >>> for episode in range(n_episode):
 ...     state = env.reset()
 ...     total_reward = 0
 ...     is_done = False
 ...     while not is_done:
 ...         action = env.action_space.sample()
 ...         state, reward, is_done, _ = env.step(action)
 ...         total_reward += reward
 ...     total_rewards.append(total_reward)

Finally, we calculate the average total reward:

 >>> print('Average total reward over {} episodes: {}'.format(
           n_episode, sum(total_rewards) / n_episode))
 Average total reward over 10000 episodes: 22.2473

On average, taking a random action scores 22.25.

We all know that taking random actions is not sophisticated enough, and we will implement an advanced policy in upcoming recipes. But for the next recipe, let's take a break and review the basics of PyTorch.

官术网_书友最值得收藏!

PyTorch 1.x Reinforcement Learning Cookbook

There's more...