官术网_书友最值得收藏!

Implementing and evaluating a random search policy

After some practice with PyTorch programming, starting from this recipe, we will be working on more sophisticated policies to solve the CartPole problem than purely random actions. We start with the random search policy in this recipe.

A simple, yet effective, approach is to map an observation to a vector of two numbers representing two actions. The action with the higher value will be picked. The linear mapping is depicted by a weight matrix whose size is 4 x 2 since the observations are 4-dimensional in this case. In each episode, the weight is randomly generated and is used to compute the action for every step in this episode. The total reward is then calculated. This process repeats for many episodes and, in the end, the weight that enables the highest total reward will become the learned policy. This approach is called random search because the weight is randomly picked in each trial with the hope that the best weight will be found with a large number of trials.

主站蜘蛛池模板: 方城县| 沾益县| 松滋市| 蕲春县| 南通市| 五华县| 通城县| 乌海市| 洱源县| 益阳市| 永年县| 建始县| 大荔县| 云梦县| 石棉县| 舒城县| 青河县| 达孜县| 萍乡市| 类乌齐县| 治多县| 建昌县| 兴宁市| 汽车| 兰西县| 石门县| 新野县| 延寿县| 怀宁县| 华宁县| 九龙县| 邵阳县| 伊通| 江川县| 留坝县| 商洛市| 湘乡市| 黑山县| 鹤岗市| 深水埗区| 泸西县|