官术网_书友最值得收藏!

Developing the hill-climbing algorithm

As we can see in the random search policy, each episode is independent. In fact, all episodes in random search can be run in parallel, and the weight that achieves the best performance will eventually be selected. We've also verified this with the plot of reward versus episode, where there is no upward trend. In this recipe, we will develop a different algorithm, a hill-climbing algorithm, to transfer the knowledge acquired in one episode to the next episode.

In the hill-climbing algorithm, we also start with a randomly chosen weight. But here, for every episode, we add some noise to the weight. If the total reward improves, we update the weight with the new one; otherwise, we keep the old weight. In this approach, the weight is gradually improved as we progress through the episodes, instead of jumping around in each episode.

主站蜘蛛池模板: 芷江| 清涧县| 渭南市| 武平县| 阳春市| 资中县| 施秉县| 大悟县| 宁远县| 邛崃市| 汾阳市| 张家口市| 乐陵市| 松江区| 会东县| 蓬安县| 长宁县| 永城市| 靖安县| 彭阳县| 泗水县| 景宁| 宝山区| 惠水县| 湖口县| 太保市| 绥江县| 中方县| 洛隆县| 南靖县| 即墨市| 乌兰县| 延津县| 台中市| 正阳县| 南岸区| 宕昌县| 桦甸市| 新密市| 上蔡县| 大英县|