官术网_书友最值得收藏!

Developing a policy gradient algorithm

The last recipe of the first chapter is about solving the CartPole environment with a policy gradient algorithm. This may be more complicated than we need for this simple problem, in which the random search and hill-climbing algorithms suffice. However, it is a great algorithm to learn, and we will use it in more complicated environments later in the book.

In the policy gradient algorithm, the model weight moves in the direction of the gradient at the end of each episode. We will explain the computation of gradients in the next section. Also, in each step, it samples an action from the policy based on the probabilities computed using the state and weight. It no longer takes an action with certainty, in contrast with random search and hill climbing (by taking the action achieving the higher score). Hence, the policy switches from deterministic to stochastic.  

主站蜘蛛池模板: 沅陵县| 乌鲁木齐县| 哈巴河县| 遂溪县| 顺义区| 秦皇岛市| 北票市| 临西县| 阿克| 海南省| 东海县| 龙里县| 阜南县| 康定县| 余干县| 永州市| 云南省| 昌吉市| 台山市| 星座| 金坛市| 建阳市| 长垣县| 江城| 田林县| 汽车| 古交市| 建阳市| 义乌市| 定远县| 东兰县| 荥经县| 浦北县| 竹溪县| 曲靖市| 鹿泉市| 历史| 莎车县| 垣曲县| 太仆寺旗| 收藏|