- PyTorch 1.x Reinforcement Learning Cookbook
- Yuxi (Hayden) Liu
- 146字
- 2021-06-24 12:34:41
Developing the hill-climbing algorithm
As we can see in the random search policy, each episode is independent. In fact, all episodes in random search can be run in parallel, and the weight that achieves the best performance will eventually be selected. We've also verified this with the plot of reward versus episode, where there is no upward trend. In this recipe, we will develop a different algorithm, a hill-climbing algorithm, to transfer the knowledge acquired in one episode to the next episode.
In the hill-climbing algorithm, we also start with a randomly chosen weight. But here, for every episode, we add some noise to the weight. If the total reward improves, we update the weight with the new one; otherwise, we keep the old weight. In this approach, the weight is gradually improved as we progress through the episodes, instead of jumping around in each episode.
- Canvas LMS Course Design
- R Data Mining
- 手把手教你學(xué)AutoCAD 2010
- Java開發(fā)技術(shù)全程指南
- Visual C# 2008開發(fā)技術(shù)實(shí)例詳解
- STM32G4入門與電機(jī)控制實(shí)戰(zhàn):基于X-CUBE-MCSDK的無刷直流電機(jī)與永磁同步電機(jī)控制實(shí)現(xiàn)
- 視覺檢測(cè)技術(shù)及智能計(jì)算
- 完全掌握AutoCAD 2008中文版:機(jī)械篇
- Machine Learning with the Elastic Stack
- 嵌入式Linux系統(tǒng)實(shí)用開發(fā)
- Cloudera Hadoop大數(shù)據(jù)平臺(tái)實(shí)戰(zhàn)指南
- 渲染王3ds Max三維特效動(dòng)畫技術(shù)
- Generative Adversarial Networks Projects
- 巧學(xué)活用AutoCAD
- Microsoft Office 365:Exchange Online Implementation and Migration(Second Edition)