官术网_书友最值得收藏!

Developing the hill-climbing algorithm

As we can see in the random search policy, each episode is independent. In fact, all episodes in random search can be run in parallel, and the weight that achieves the best performance will eventually be selected. We've also verified this with the plot of reward versus episode, where there is no upward trend. In this recipe, we will develop a different algorithm, a hill-climbing algorithm, to transfer the knowledge acquired in one episode to the next episode.

In the hill-climbing algorithm, we also start with a randomly chosen weight. But here, for every episode, we add some noise to the weight. If the total reward improves, we update the weight with the new one; otherwise, we keep the old weight. In this approach, the weight is gradually improved as we progress through the episodes, instead of jumping around in each episode.

主站蜘蛛池模板: 陇南市| 泸定县| 兰西县| 扎赉特旗| 柘城县| 米林县| 石河子市| 鸡泽县| 涞水县| 斗六市| 北辰区| 敖汉旗| SHOW| 西藏| 紫云| 民和| 高碑店市| 晋州市| 龙里县| 育儿| 滨海县| 开远市| 宣恩县| 桦甸市| 林周县| 大邑县| 朝阳县| 双流县| 昌平区| 崇义县| 新平| 西平县| 永济市| 远安县| 孟津县| 中江县| 桃园县| 孙吴县| 元阳县| 永昌县| 游戏|