官术网_书友最值得收藏!

Developing the hill-climbing algorithm

As we can see in the random search policy, each episode is independent. In fact, all episodes in random search can be run in parallel, and the weight that achieves the best performance will eventually be selected. We've also verified this with the plot of reward versus episode, where there is no upward trend. In this recipe, we will develop a different algorithm, a hill-climbing algorithm, to transfer the knowledge acquired in one episode to the next episode.

In the hill-climbing algorithm, we also start with a randomly chosen weight. But here, for every episode, we add some noise to the weight. If the total reward improves, we update the weight with the new one; otherwise, we keep the old weight. In this approach, the weight is gradually improved as we progress through the episodes, instead of jumping around in each episode.

主站蜘蛛池模板: 蓬安县| 北碚区| 罗山县| 天峻县| 华宁县| 临澧县| 罗田县| 泾川县| 缙云县| 长子县| 陇西县| 沂南县| 纳雍县| 漯河市| 墨玉县| 中牟县| 容城县| 监利县| 涞源县| 延寿县| 南召县| 龙陵县| 安达市| 平阳县| 石棉县| 绿春县| 滨海县| 塔河县| 正镶白旗| 松江区| 杭锦旗| 枣强县| 松桃| 沁源县| 延庆县| 中卫市| 石嘴山市| 怀宁县| 福鼎市| 江山市| 曲周县|