書名： Hands-On Q-Learning with Python
作者名： Nazia Habib
本章字數： 544字
更新時間： 2021-06-24 15:13:15

The Markov property

A Markov chain has the following characteristic, called the Markov property:

This states, mathematically, that the likelihood distribution of the next state depends only on the current state and not on previous states. Given our knowledge of the current state, St, the probability of reaching S_t+1 is the same as the probability of reaching St+1, given the knowledge of all the previous states.

To illustrate this further, let's talk about a different stochastic system where the Markov property won't apply. For example, we are working on a job site and have three pieces of equipment that we might be assigned at random over the course of three days. The equipment is given to us without a replacement in the original pool of equipment being assigned. There are two pieces of functioning equipment and one piece that is non-functioning:

If we're assigned non-functioning equipment on Day 1, we know for sure that on Day 2 we will be assigned functioning equipment, since we know there are only three potential pieces that we could have been assigned.

On the other hand, if we come onto the job site starting on Day 2 and are assigned functioning equipment, with no knowledge of what happened on Day 1, we know that we have a 50% probability of getting either functioning or non-functioning equipment on Day 3. If we did have knowledge of what happened on Day 1 (that is, if we received either functioning or non-functioning equipment) we would know for sure what we would receive on Day 3.

Because our knowledge of the probability of each outcome changes with the knowledge that we have of this system, it does not have the Markov property. Knowing information about the past changes our prediction of the future.

You can think of a system having the Markov property as memoryless. Having more information about the past will not change our prediction of the future. If we change the system that we just described to make sure that the equipment that is given to us is replaced, the system will have the Markov property. There are now many outcomes that are available to us that weren't before:

In this case, if the only information we have is that we were assigned functioning equipment on Day 2, then on Day 3, we know we have a 50% chance of getting functioning equipment or non-functioning equipment.

Note that this probability calculation does not depend on the specific examples that we've chosen for the preceding chart! Think about flipping a fair coin 100 times; even if you get heads every single time, your odds of getting tails the next time are still 50%, if you're really dealing with a fair coin. Similarly, even if we are assigned non-functioning equipment every single day, our probability of getting functioning equipment the next day will still be 50%.

We can neatly model our new system as follows:

If we are in state F today, we have a 50% chance of staying in state F or moving to state NF, and vice versa. Notice that this is true no matter how much information we include in our probability calculation. Previous events do not affect the probability of future events.

官术网_书友最值得收藏!

Hands-On Q-Learning with Python

The Markov property