官术网_书友最值得收藏!

There's more...

So far, we've run only one episode. In order to assess how well the agent performs, we can simulate many episodes and then average the total rewards for an individual episode. The average total reward will tell us about the performance of the agent that takes random actions.

Let’s set 10,000 episodes:

 >>> n_episode = 10000

In each episode, we compute the total reward by accumulating the reward in every step:

 >>> total_rewards = []
>>> for episode in range(n_episode):
... state = env.reset()
... total_reward = 0
... is_done = False
... while not is_done:
... action = env.action_space.sample()
... state, reward, is_done, _ = env.step(action)
... total_reward += reward
... total_rewards.append(total_reward)

Finally, we calculate the average total reward:

 >>> print('Average total reward over {} episodes: {}'.format(
n_episode, sum(total_rewards) / n_episode))
Average total reward over 10000 episodes: 22.2473

On average, taking a random action scores 22.25.  

We all know that taking random actions is not sophisticated enough, and we will implement an advanced policy in upcoming recipes. But for the next recipe, let's take a break and review the basics of PyTorch.

主站蜘蛛池模板: 尉氏县| 洛阳市| 聂荣县| 阳朔县| 苍南县| 蕉岭县| 新营市| 依安县| 揭东县| 天气| 聂荣县| 广德县| 梓潼县| 平遥县| 乌审旗| 云霄县| 广东省| 临邑县| 海阳市| 新野县| 马尔康县| 屯留县| 怀集县| 加查县| 岳池县| 资溪县| 葫芦岛市| 广宗县| 兴文县| 资溪县| 秦皇岛市| 磴口县| 波密县| 临清市| 洛浦县| 来安县| 尉犁县| 仁布县| 天长市| 化州市| 隆化县|