- PyTorch 1.x Reinforcement Learning Cookbook
- Yuxi (Hayden) Liu
- 111字
- 2021-06-24 12:34:43
There's more...
If we examine the reward/episode plot, it seems that we can also stop early during training when it has been solved – the average reward over 100 consecutive episodes is no less than 195. We just add the following lines of code to the training session:
>>> if episode >= 99 and sum(total_rewards[-100:]) >= 19500:
... break
Re-run the training session. You should get something similar to the following, which stops after several hundred episodes:
Episode 1: 10.0
Episode 2: 27.0
Episode 3: 28.0
Episode 4: 15.0
Episode 5: 12.0
……
……
Episode 549: 200.0
Episode 550: 200.0
Episode 551: 200.0
Episode 552: 200.0
Episode 553: 200.0