官术网_书友最值得收藏!

Chapter 4. The Cross-Entropy Method

In this chapter, we will wrap up the part one of the book and get familiar with one of the RL methods—cross-entropy. Despite the fact that it is much less famous than other tools in the RL practitioner's toolbox, such as deep Q-network (DQN) or Advantage Actor-Critic, this method has its own strengths. The most important are as follows:

  • Simplicity: The cross-entropy method is really simple, which makes it an intuitive method to follow. For example, its implementation on PyTorch is less than 100 lines of code.
  • Good convergence: In simple environments that don't require complex, multistep policies to be learned and discovered and have short episodes with frequent rewards, cross-entropy usually works very well. Of course, lots of practical problems don't fall into this category, but sometimes they do. In such cases, cross-entropy (on its own or as a part of a larger system) can be the perfect fit.

In the following sections, we will start from the practical side of cross-entropy, and then look at how it works in two environments in Gym (the familiar CartPole and the "grid world" of FrozenLake). Then, at the end of the chapter, we will take a look at the theoretical background of the method. This section is optional and requires a bit more knowledge of probability and statistics, but if you want to understand why the method works then you can delve into it.

主站蜘蛛池模板: 邳州市| 衡水市| 台东县| 成武县| 井冈山市| 虎林市| 吐鲁番市| 攀枝花市| 临泽县| 怀仁县| 红河县| 宜良县| 图们市| 阿拉善右旗| 西畴县| 昌都县| 洛川县| 凌海市| 油尖旺区| 乌鲁木齐县| 土默特左旗| 宿迁市| 承德市| 泾川县| 西平县| 瑞丽市| 郯城县| 延吉市| 长乐市| 临猗县| 略阳县| 岗巴县| 饶河县| 博爱县| 阳东县| 环江| 定兴县| 肇源县| 龙南县| 延长县| 毕节市|