- PyTorch 1.x Reinforcement Learning Cookbook
- Yuxi (Hayden) Liu
- 188字
- 2021-06-24 12:34:40
There's more...
So far, we've run only one episode. In order to assess how well the agent performs, we can simulate many episodes and then average the total rewards for an individual episode. The average total reward will tell us about the performance of the agent that takes random actions.
Let’s set 10,000 episodes:
>>> n_episode = 10000
In each episode, we compute the total reward by accumulating the reward in every step:
>>> total_rewards = []
>>> for episode in range(n_episode):
... state = env.reset()
... total_reward = 0
... is_done = False
... while not is_done:
... action = env.action_space.sample()
... state, reward, is_done, _ = env.step(action)
... total_reward += reward
... total_rewards.append(total_reward)
Finally, we calculate the average total reward:
>>> print('Average total reward over {} episodes: {}'.format(
n_episode, sum(total_rewards) / n_episode))
Average total reward over 10000 episodes: 22.2473
On average, taking a random action scores 22.25.
We all know that taking random actions is not sophisticated enough, and we will implement an advanced policy in upcoming recipes. But for the next recipe, let's take a break and review the basics of PyTorch.
推薦閱讀
- 基于C語(yǔ)言的程序設(shè)計(jì)
- 自動(dòng)檢測(cè)與傳感技術(shù)
- 數(shù)據(jù)運(yùn)營(yíng)之路:掘金數(shù)據(jù)化時(shí)代
- 21天學(xué)通ASP.NET
- 工業(yè)機(jī)器人實(shí)操進(jìn)階手冊(cè)
- Artificial Intelligence By Example
- 人工智能:智能人機(jī)交互
- 計(jì)算機(jī)組裝與維修實(shí)訓(xùn)
- 計(jì)算機(jī)應(yīng)用基礎(chǔ)學(xué)習(xí)指導(dǎo)與練習(xí)(Windows XP+Office 2003)
- 基于Quartus Ⅱ的數(shù)字系統(tǒng)Verilog HDL設(shè)計(jì)實(shí)例詳解
- 互聯(lián)網(wǎng)單元測(cè)試及實(shí)踐
- PVCBOT零基礎(chǔ)機(jī)器人制作(第2版)
- Learning OpenShift
- SQL語(yǔ)言與數(shù)據(jù)庫(kù)操作技術(shù)大全
- 數(shù)據(jù)結(jié)構(gòu)與算法(C++語(yǔ)言版)