- PyTorch 1.x Reinforcement Learning Cookbook
- Yuxi (Hayden) Liu
- 146字
- 2021-06-24 12:34:41
Developing the hill-climbing algorithm
As we can see in the random search policy, each episode is independent. In fact, all episodes in random search can be run in parallel, and the weight that achieves the best performance will eventually be selected. We've also verified this with the plot of reward versus episode, where there is no upward trend. In this recipe, we will develop a different algorithm, a hill-climbing algorithm, to transfer the knowledge acquired in one episode to the next episode.
In the hill-climbing algorithm, we also start with a randomly chosen weight. But here, for every episode, we add some noise to the weight. If the total reward improves, we update the weight with the new one; otherwise, we keep the old weight. In this approach, the weight is gradually improved as we progress through the episodes, instead of jumping around in each episode.
- 現(xiàn)代測(cè)控系統(tǒng)典型應(yīng)用實(shí)例
- Ansible Quick Start Guide
- 數(shù)據(jù)中心建設(shè)與管理指南
- 數(shù)控銑削(加工中心)編程與加工
- 計(jì)算機(jī)網(wǎng)絡(luò)技術(shù)實(shí)訓(xùn)
- 大數(shù)據(jù)安全與隱私保護(hù)
- AI 3.0
- 激光選區(qū)熔化3D打印技術(shù)
- 網(wǎng)絡(luò)服務(wù)搭建、配置與管理大全(Linux版)
- Unity Multiplayer Games
- 基于敏捷開發(fā)的數(shù)據(jù)結(jié)構(gòu)研究
- Hands-On Dashboard Development with QlikView
- 筆記本電腦電路分析與故障診斷
- 未來學(xué)徒:讀懂人工智能飛馳時(shí)代
- 電腦故障排除與維護(hù)終極技巧金典