官术网_书友最值得收藏!

Markov Decision Process

MDP is an extension of the Markov chain. It provides a mathematical framework for modeling decision-making situations. Almost all Reinforcement Learning problems can be modeled as MDP.

MDP is represented by five important elements: 

  • A set of states  the agent can actually be in.
  • A set of actions that can be performed by an agent, for moving from one state to another.
  • A transition probability (), which is the probability of moving from one state  to another state by performing some action .
  • A reward probability (), which is the probability of a reward acquired by the agent for moving from one state to another state  by performing some action .
  • A discount factor (), which controls the importance of immediate and future rewards. We will discuss this in detail in the upcoming sections.
主站蜘蛛池模板: 焉耆| 顺义区| 东安县| 凤庆县| 洛阳市| 隆尧县| 格尔木市| 莒南县| 娱乐| 佛山市| 义马市| 芮城县| 黄冈市| 万荣县| 怀化市| 河东区| 饶平县| 阿勒泰市| 涞源县| 通海县| 道真| 云和县| 黎川县| 江津市| 丰都县| 祁东县| 高雄县| 嘉义市| 墨江| 建昌县| 南雄市| 郧西县| 揭东县| 太谷县| 安塞县| 建水县| 凯里市| 元谋县| 荆门市| 辽宁省| 赣榆县|