官术网_书友最值得收藏!

There's more...

So far, we've run only one episode. In order to assess how well the agent performs, we can simulate many episodes and then average the total rewards for an individual episode. The average total reward will tell us about the performance of the agent that takes random actions.

Let’s set 10,000 episodes:

 >>> n_episode = 10000

In each episode, we compute the total reward by accumulating the reward in every step:

 >>> total_rewards = []
>>> for episode in range(n_episode):
... state = env.reset()
... total_reward = 0
... is_done = False
... while not is_done:
... action = env.action_space.sample()
... state, reward, is_done, _ = env.step(action)
... total_reward += reward
... total_rewards.append(total_reward)

Finally, we calculate the average total reward:

 >>> print('Average total reward over {} episodes: {}'.format(
n_episode, sum(total_rewards) / n_episode))
Average total reward over 10000 episodes: 22.2473

On average, taking a random action scores 22.25.  

We all know that taking random actions is not sophisticated enough, and we will implement an advanced policy in upcoming recipes. But for the next recipe, let's take a break and review the basics of PyTorch.

主站蜘蛛池模板: 闵行区| 襄垣县| 丹阳市| 卢湾区| 淅川县| 梁平县| 玉山县| 和平区| 子长县| 依安县| 永州市| 磴口县| 苍梧县| 云浮市| 吉安市| 武鸣县| 楚雄市| 南江县| 合川市| 汝城县| 兴义市| 来凤县| 鄢陵县| 东丰县| 宝山区| 夏河县| 苍梧县| 金平| 长宁县| 射阳县| 东莞市| 定州市| 西峡县| 来宾市| 涞水县| 丰镇市| 剑阁县| 包头市| 威海市| 西充县| 北票市|