- Python Reinforcement Learning
- Sudharsan Ravichandiran Sean Saito Rajalingappaa Shanmugamani Yang Wenzhuo
- 375字
- 2021-06-24 15:17:30
The Markov chain and Markov process
Before going into MDP, let us understand the Markov chain and Markov process, which form the foundation of MDP.
The Markov property states that the future depends only on the present and not on the past. The Markov chain is a probabilistic model that solely depends on the current state to predict the next state and not the previous states, that is, the future is conditionally independent of the past. The Markov chain strictly follows the Markov property.
For example, if we know that the current state is cloudy, we can predict that next state could be rainy. We came to this conclusion that the next state could be rainy only by considering the current state (cloudy) and not the past states, which might be sunny, windy, and so on. However, the Markov property does not hold true for all processes. For example, throwing a dice (the next state) has no dependency on the previous number, whatever showed up on the dice (the current state).
Moving from one state to another is called transition and its probability is called a transition probability. We can formulate the transition probabilities in the form of a table, as shown next, and it is called a Markov table. It shows, given the current state, what the probability of moving to the next state is:

We can also represent the Markov chain in the form a state diagram that shows the transition probability:

The preceding state diagram shows the probability of moving from one state to another. Still don't understand the Markov chain? Okay, let us talk.
Me: "What are you doing?"
You: "I'm reading about the Markov chain."
Me: "What is your plan after reading?"
You: "I'm going to sleep."
Me: "Are you sure you're going to sleep?"
You: "Probably. I'll watch TV if I'm not sleepy."
Me: "Cool; this is also a Markov chain."
You: "Eh?"
We can formulate our conversation into a Markov chain and draw a state diagram as follows:

The Markov chain lies in the core concept that the future depends only on the present and not on the past. A stochastic process is called a Markov process if it follows the Markov property.
- 公有云容器化指南:騰訊云TKE實戰與應用
- 數據分析實戰:基于EXCEL和SPSS系列工具的實踐
- 云計算環境下的信息資源集成與服務
- 分布式數據庫系統:大數據時代新型數據庫技術(第3版)
- Oracle RAC 11g實戰指南
- Libgdx Cross/platform Game Development Cookbook
- iOS and OS X Network Programming Cookbook
- 數據庫系統原理及應用教程(第4版)
- Oracle高性能自動化運維
- Python醫學數據分析入門
- 軟件成本度量國家標準實施指南:理論、方法與實踐
- Hadoop 3.x大數據開發實戰
- 金融商業算法建模:基于Python和SAS
- Python數據分析與數據化運營
- 探索新型智庫發展之路:藍迪國際智庫報告·2015(下冊)