官术网_书友最值得收藏!

How it works...

We are able to achieve much better performance with the hill-climbing algorithm than with random search by simply adding adaptive noise to each episode. We can think of it as a special kind of gradient descent without a target variable. The additional noise is the gradient, albeit in a random way. The noise scale is the learning rate, and it is adaptive to the reward from the previous episode. The target variable in hill climbing becomes achieving the highest reward. In summary, rather than isolating each episode, the agent in the hill-climbing algorithm makes use of the knowledge learned from each episode and performs a more reliable action in the next episode. As the name hill climbing implies, the reward moves upwards through the episodes as the weight gradually moves towards the optimum value.

主站蜘蛛池模板: 涟源市| 高台县| 诸城市| 湘乡市| 崇文区| 柳河县| 广安市| 孝义市| 深水埗区| 天等县| 松潘县| 杂多县| 辛集市| 同江市| 西乌珠穆沁旗| 革吉县| 鄯善县| 盐山县| 扬州市| 庆安县| 湟源县| 日土县| 高唐县| 乌什县| 齐河县| 诏安县| 富锦市| 石嘴山市| 蕲春县| 柳河县| 德昌县| 信丰县| 新竹市| 广德县| 峨山| 江北区| 万年县| 壤塘县| 黎平县| 西盟| 当涂县|