書名： PyTorch 1.x Reinforcement Learning Cookbook
作者名： Yuxi (Hayden) Liu
本章字數： 137字
更新時間： 2021-06-24 12:34:42

How it works...

We are able to achieve much better performance with the hill-climbing algorithm than with random search by simply adding adaptive noise to each episode. We can think of it as a special kind of gradient descent without a target variable. The additional noise is the gradient, albeit in a random way. The noise scale is the learning rate, and it is adaptive to the reward from the previous episode. The target variable in hill climbing becomes achieving the highest reward. In summary, rather than isolating each episode, the agent in the hill-climbing algorithm makes use of the knowledge learned from each episode and performs a more reliable action in the next episode. As the name hill climbing implies, the reward moves upwards through the episodes as the weight gradually moves towards the optimum value.

官术网_书友最值得收藏!

PyTorch 1.x Reinforcement Learning Cookbook

How it works...