官术网_书友最值得收藏!

Model-free and model-based training

RL algorithms that do not learn a model of how the environment works are called model-free algorithms. By contrast, if a model of the environment is constructed, then the algorithm is called model-based. In general, if value (V) or action-value (Q) functions are used to evaluate the performance, they are called model-free algorithms as no specific model of the environment is used. On the other hand, if you build a model of how the environment transitions from one state to another or determines how many rewards the agent will receive from the environment via a model, then they are called model-based algorithms. 

In model-free algorithms, as aforementioned, we do not construct a model of the environment. Thus, the agent has to take an action at a state to figure out if it is a good or a bad choice. In model-based RL, an approximate model of the environment is learned; either jointly learned along with the policy, or learned a priori. This model of the environment is used to make decisions, as well as to train the policy. We will learn more about both classes of RL algorithms in later chapters.

主站蜘蛛池模板: 府谷县| 榆中县| 徐水县| 武定县| 莱州市| 盐亭县| 晋州市| 秦皇岛市| 邹城市| 仪陇县| 江城| 上思县| 沾益县| 绿春县| 阿城市| 玉山县| 饶河县| 榆林市| 安岳县| 巴南区| 浠水县| 武鸣县| 巫溪县| 浠水县| 晋江市| 江西省| 上饶市| 凉城县| 阳曲县| 辽中县| 手机| 元谋县| 泰兴市| 威宁| 郑州市| 特克斯县| 昭通市| 南城县| 和林格尔县| 白银市| 秭归县|