官术网_书友最值得收藏!

Developing a policy gradient algorithm

The last recipe of the first chapter is about solving the CartPole environment with a policy gradient algorithm. This may be more complicated than we need for this simple problem, in which the random search and hill-climbing algorithms suffice. However, it is a great algorithm to learn, and we will use it in more complicated environments later in the book.

In the policy gradient algorithm, the model weight moves in the direction of the gradient at the end of each episode. We will explain the computation of gradients in the next section. Also, in each step, it samples an action from the policy based on the probabilities computed using the state and weight. It no longer takes an action with certainty, in contrast with random search and hill climbing (by taking the action achieving the higher score). Hence, the policy switches from deterministic to stochastic.  

主站蜘蛛池模板: 成安县| 邵阳市| 连云港市| 铜山县| 拉孜县| 宁强县| 遂川县| 东安县| 阳西县| 英吉沙县| 塘沽区| 璧山县| 武宣县| 香格里拉县| 灵璧县| 若羌县| 新营市| 虎林市| 萝北县| 瑞昌市| 南康市| 保定市| 镶黄旗| 涡阳县| 深泽县| 玉田县| 泽普县| 临泽县| 保山市| 年辖:市辖区| 甘谷县| 安多县| 年辖:市辖区| 高邮市| 集贤县| 五华县| 怀来县| 凤凰县| 鸡泽县| 昌邑市| 静宁县|