官术网_书友最值得收藏!

Markov chains

MDPs are based on Markov chains. A Markov chain is a model of a system of events where the probability of the next event depends only on the state of the previous event (or the current event).

Here's a very simple example of a process that can be modeled as a Markov chain. Suppose that we're playing a game with a binary outcome repeatedly—at the end of every round of the game, we will either win or lose:

The chain has two states and the system moves back and forth between them. The arrows between them are the probabilities that we will move to the corresponding states. If we are in the Win state, there is a 60% chance that we will stay in Win and a 40% chance that we will move to Lose at the next timestep (the time it takes to play another round). Likewise, if we lose this round, there is a 70% chance that we will win the next round and a 30% chance that we will lose again.

As another example, suppose that we're playing a simple stochastic game that pays out rewards. Here is a partial state diagram of that game; we start in the state marked $5:

Every time we flip a fair coin, we win $1 if the coin comes up as heads or lose $1 if it comes up as tails. We start with $5 and flip the coin three times. A more general version of this process is called a random walk. The following tree diagram demonstrates a two-step random walk with a coin flip:

The following are the results when we flip the coin three times:

  • First flip: Heads – $1 win
  • Second flip: Tails – $1 loss
  • Third flip: Heads – $1 win

We now have $6; we can think of this as our current state, which we will call S0. From this state, we can go to one of two states, as follows:

  • S1: The state of having $5 – 50% probability
  • S2: The state of having $7 – 50% probability

The event that will cause us to go into one of these two states is a coin flip; that is, it is a stochastic event that has a probability of 50% for either outcome (because we have defined this as a fair coin). 

We know the current state we are in (the state of having $6) and the two possible outcomes that will result from the coin flip event (either having $5 or having $7). Note that although we have continuously been playing this game and our current state depends on the previous states that we have been in, we don't need to know how much money we started with (or how much we won or lost previously) to know how much we will have on the next flip and what the probability is of either outcome.

In other words, we do not need knowledge of previous states to determine what the next state will be. Whatever our knowledge of previous states, our probability calculation for the next state will always be the same. 

This memorylessness is the defining characteristic of a problem that can be modeled as a Markov chain. If I have $X now, I know for sure that I will have either $X+1 or $X-1 after the next coin flip. It doesn't matter if I had $X+1 or $X-1 after the previous coin flip; for the purposes of predicting what I will get next, all that matters is what I have now. Knowing about the past will not change my predictions about the future. 

主站蜘蛛池模板: 南部县| 海宁市| 永兴县| 民丰县| 延安市| 长岭县| 汝阳县| 白水县| 凯里市| 利辛县| 宜宾市| 中西区| 涟源市| 关岭| 五家渠市| 灵石县| 昌图县| 当雄县| 龙岩市| 云安县| 清河县| 平武县| 和龙市| 蒲城县| 三亚市| 共和县| 富宁县| 卢湾区| 奎屯市| 剑川县| 永平县| 大厂| 富平县| 蒙城县| 武冈市| 永仁县| 汨罗市| 康定县| 顺义区| 武乡县| 龙江县|