官术网_书友最值得收藏!

Markov Decision Process

MDP is an extension of the Markov chain. It provides a mathematical framework for modeling decision-making situations. Almost all Reinforcement Learning problems can be modeled as MDP.

MDP is represented by five important elements: 

  • A set of states  the agent can actually be in.
  • A set of actions that can be performed by an agent, for moving from one state to another.
  • A transition probability (), which is the probability of moving from one state  to another state by performing some action .
  • A reward probability (), which is the probability of a reward acquired by the agent for moving from one state to another state  by performing some action .
  • A discount factor (), which controls the importance of immediate and future rewards. We will discuss this in detail in the upcoming sections.
主站蜘蛛池模板: 临颍县| 新绛县| 玉屏| 阿勒泰市| 西吉县| 噶尔县| 交口县| 南靖县| 定南县| 文登市| 乌鲁木齐县| 鸡东县| 长宁区| 桐庐县| 金平| 晋中市| 乡宁县| 海伦市| 满城县| 盈江县| 梧州市| 大余县| 门源| 华容县| 翼城县| 北碚区| 寿光市| 客服| 凤庆县| 汶川县| 余江县| 邯郸市| 钦州市| 西盟| 昭平县| 二手房| 铁力市| 松滋市| 东源县| 哈密市| 苏尼特右旗|