- PyTorch 1.x Reinforcement Learning Cookbook
- Yuxi (Hayden) Liu
- 197字
- 2021-06-24 12:34:44
Creating a Markov chain
Let's get started by creating a Markov chain, on which the MDP is developed.
A Markov chain describes a sequence of events that comply with the Markov property. It is defined by a set of possible states, S = {s0, s1, ... , sm}, and a transition matrix, T(s, s'), consisting of the probabilities of state s transitioning to state s'. With the Markov property, the future state of the process, given the present state, is conditionally independent of past states. In other words, the state of the process at t+1 is dependent only on the state at t. Here, we use a process of study and sleep as an example and create a Markov chain based on two states, s0 (study) and s1 (sleep). Let's say we have the following transition matrix:

In the next section, we will compute the transition matrix after k steps, and the probabilities of being in each state given an initial distribution of states, such as [0.7, 0.3], meaning there is a 70% chance that the process starts with study and a 30% chance that it starts with sleep.
- 走入IBM小型機世界
- Learning Apache Cassandra(Second Edition)
- 模型制作
- STM32嵌入式微控制器快速上手
- 永磁同步電動機變頻調速系統及其控制(第2版)
- 3D Printing for Architects with MakerBot
- Embedded Programming with Modern C++ Cookbook
- Ceph:Designing and Implementing Scalable Storage Systems
- MATLAB/Simulink權威指南:開發環境、程序設計、系統仿真與案例實戰
- Visual FoxPro數據庫基礎及應用
- Photoshop行業應用基礎
- 啊哈C!思考快你一步
- AI的25種可能
- Hands-On Dashboard Development with QlikView
- Spark大數據商業實戰三部曲:內核解密|商業案例|性能調優