官术网_书友最值得收藏!

There's more...

If we examine the reward/episode plot, it seems that we can also stop early during training when it has been solved – the average reward over 100 consecutive episodes is no less than 195. We just add the following lines of code to the training session:

 >>> if episode >= 99 and sum(total_rewards[-100:]) >= 19500:
... break

Re-run the training session. You should get something similar to the following, which stops after several hundred episodes:

Episode 1: 10.0
Episode 2: 27.0
Episode 3: 28.0
Episode 4: 15.0
Episode 5: 12.0
……
……
Episode 549: 200.0
Episode 550: 200.0
Episode 551: 200.0
Episode 552: 200.0
Episode 553: 200.0
主站蜘蛛池模板: 洞口县| 罗江县| 虹口区| 通许县| 湖口县| 大兴区| 五华县| 盘山县| 东城区| 神池县| 秭归县| 临城县| 崇阳县| 日土县| 黑河市| 怀化市| 缙云县| 婺源县| 漾濞| 离岛区| 衡水市| 顺平县| 三都| 屯门区| 金湖县| 宁阳县| 滦平县| 泽普县| 澄城县| 西充县| 光山县| 喀喇沁旗| 眉山市| 永嘉县| 忻城县| 朝阳县| 福泉市| 金川县| 凤阳县| 青铜峡市| 五台县|