官术网_书友最值得收藏!

Model-free and model-based training

RL algorithms that do not learn a model of how the environment works are called model-free algorithms. By contrast, if a model of the environment is constructed, then the algorithm is called model-based. In general, if value (V) or action-value (Q) functions are used to evaluate the performance, they are called model-free algorithms as no specific model of the environment is used. On the other hand, if you build a model of how the environment transitions from one state to another or determines how many rewards the agent will receive from the environment via a model, then they are called model-based algorithms. 

In model-free algorithms, as aforementioned, we do not construct a model of the environment. Thus, the agent has to take an action at a state to figure out if it is a good or a bad choice. In model-based RL, an approximate model of the environment is learned; either jointly learned along with the policy, or learned a priori. This model of the environment is used to make decisions, as well as to train the policy. We will learn more about both classes of RL algorithms in later chapters.

主站蜘蛛池模板: 永泰县| 临桂县| 德阳市| 海门市| 额济纳旗| 萝北县| 本溪市| SHOW| 陕西省| 洞口县| 浦江县| 贺兰县| 济宁市| 繁昌县| 通道| 祁东县| 吐鲁番市| 新乡市| 达尔| 通州市| 金阳县| 喀喇沁旗| 临汾市| 汶川县| 桃园市| 敖汉旗| 罗山县| 宁南县| 库伦旗| 抚顺县| 柘荣县| 海阳市| 辉县市| 都昌县| 新乡市| 泸水县| 清苑县| 商城县| 新龙县| 鹤岗市| 姜堰市|