官术网_书友最值得收藏!

Developing a policy gradient algorithm

The last recipe of the first chapter is about solving the CartPole environment with a policy gradient algorithm. This may be more complicated than we need for this simple problem, in which the random search and hill-climbing algorithms suffice. However, it is a great algorithm to learn, and we will use it in more complicated environments later in the book.

In the policy gradient algorithm, the model weight moves in the direction of the gradient at the end of each episode. We will explain the computation of gradients in the next section. Also, in each step, it samples an action from the policy based on the probabilities computed using the state and weight. It no longer takes an action with certainty, in contrast with random search and hill climbing (by taking the action achieving the higher score). Hence, the policy switches from deterministic to stochastic.  

主站蜘蛛池模板: 萨嘎县| 西乌珠穆沁旗| 寿光市| 林周县| 临沧市| 历史| 门头沟区| 大同市| 卢龙县| 安平县| 区。| 万山特区| 海城市| 通道| 大方县| 灵宝市| 灵石县| 曲松县| 丹阳市| 吴忠市| 建平县| 布拖县| 枞阳县| 雷波县| 桐梓县| 勃利县| 成都市| 海南省| 阿图什市| 淄博市| 扶绥县| 同心县| 西昌市| 井陉县| 阳城县| 呼伦贝尔市| 隆安县| 比如县| 万山特区| 鹤山市| 眉山市|