官术网_书友最值得收藏!

Chapter 4. The Cross-Entropy Method

In this chapter, we will wrap up the part one of the book and get familiar with one of the RL methods—cross-entropy. Despite the fact that it is much less famous than other tools in the RL practitioner's toolbox, such as deep Q-network (DQN) or Advantage Actor-Critic, this method has its own strengths. The most important are as follows:

  • Simplicity: The cross-entropy method is really simple, which makes it an intuitive method to follow. For example, its implementation on PyTorch is less than 100 lines of code.
  • Good convergence: In simple environments that don't require complex, multistep policies to be learned and discovered and have short episodes with frequent rewards, cross-entropy usually works very well. Of course, lots of practical problems don't fall into this category, but sometimes they do. In such cases, cross-entropy (on its own or as a part of a larger system) can be the perfect fit.

In the following sections, we will start from the practical side of cross-entropy, and then look at how it works in two environments in Gym (the familiar CartPole and the "grid world" of FrozenLake). Then, at the end of the chapter, we will take a look at the theoretical background of the method. This section is optional and requires a bit more knowledge of probability and statistics, but if you want to understand why the method works then you can delve into it.

主站蜘蛛池模板: 慈利县| 台东市| 五华县| 巴林左旗| 佳木斯市| 高陵县| 许昌市| 通化县| 阿巴嘎旗| 东乡| 阳信县| 奎屯市| 新昌县| 凭祥市| 晴隆县| 塔城市| 图们市| 临潭县| 长岭县| 穆棱市| 汝南县| 灵宝市| 齐河县| 调兵山市| 沙雅县| 石首市| 塘沽区| 黔江区| 屯门区| 都昌县| 丹巴县| 阳曲县| 饶平县| 黄冈市| 岳普湖县| 静安区| 三河市| 无锡市| 泸溪县| 彭山县| 蓬莱市|