- Deep Reinforcement Learning Hands-On
- Maxim Lapan
- 246字
- 2021-06-25 20:46:54
Chapter 4. The Cross-Entropy Method
In this chapter, we will wrap up the part one of the book and get familiar with one of the RL methods—cross-entropy. Despite the fact that it is much less famous than other tools in the RL practitioner's toolbox, such as deep Q-network (DQN) or Advantage Actor-Critic, this method has its own strengths. The most important are as follows:
- Simplicity: The cross-entropy method is really simple, which makes it an intuitive method to follow. For example, its implementation on PyTorch is less than 100 lines of code.
- Good convergence: In simple environments that don't require complex, multistep policies to be learned and discovered and have short episodes with frequent rewards, cross-entropy usually works very well. Of course, lots of practical problems don't fall into this category, but sometimes they do. In such cases, cross-entropy (on its own or as a part of a larger system) can be the perfect fit.
In the following sections, we will start from the practical side of cross-entropy, and then look at how it works in two environments in Gym (the familiar CartPole and the "grid world" of FrozenLake). Then, at the end of the chapter, we will take a look at the theoretical background of the method. This section is optional and requires a bit more knowledge of probability and statistics, but if you want to understand why the method works then you can delve into it.
- 過程控制工程及仿真
- Linux Mint System Administrator’s Beginner's Guide
- 空間機器人遙操作系統及控制
- Natural Language Processing Fundamentals
- Getting Started with MariaDB
- 現代機械運動控制技術
- 人工智能與人工生命
- Java Web整合開發全程指南
- Machine Learning with the Elastic Stack
- 深度學習與目標檢測
- 從零開始學JavaScript
- 基于Proteus的PIC單片機C語言程序設計與仿真
- 系統安裝、維護與數據備份技巧
- 互聯網單元測試及實踐
- Flash CS3動畫制作融會貫通