官术网_书友最值得收藏!

There's more...

If we examine the reward/episode plot, it seems that we can also stop early during training when it has been solved – the average reward over 100 consecutive episodes is no less than 195. We just add the following lines of code to the training session:

 >>> if episode >= 99 and sum(total_rewards[-100:]) >= 19500:
... break

Re-run the training session. You should get something similar to the following, which stops after several hundred episodes:

Episode 1: 10.0
Episode 2: 27.0
Episode 3: 28.0
Episode 4: 15.0
Episode 5: 12.0
……
……
Episode 549: 200.0
Episode 550: 200.0
Episode 551: 200.0
Episode 552: 200.0
Episode 553: 200.0
主站蜘蛛池模板: 聂拉木县| 广丰县| 和硕县| 东乌| 安阳市| 红桥区| 大渡口区| 巩留县| 郯城县| 广水市| 越西县| 张家川| 通许县| 罗江县| 绿春县| 道真| 普兰店市| 淮安市| 海口市| 南雄市| 丹巴县| 桃源县| 桃源县| 宁化县| 凤凰县| 安化县| 宿迁市| 蓝田县| 利川市| 阿拉善左旗| 册亨县| 开平市| 将乐县| 蒲城县| 泰宁县| 托里县| 新邵县| 海安县| 耒阳市| 汝阳县| 大埔区|