書名： Keras Reinforcement Learning Projects
作者名： Giuseppe Ciaburro
本章字數(shù)： 480字
更新時間： 2021-08-13 15:26:07

Transition matrix

The study of time-homogeneous Markov chains (whose transition probabilities are independent of time) becomes particularly simple and effective using matrix representation. In particular, the formula expressed by the previous proposition becomes much more readable. The structure of a Markov chain is therefore completely represented by the following transition matrix:

The properties of transition probability matrices derive directly from the nature of the elements that compose them. In fact, by observing that the elements of the matrix are probabilities, they must have a value between 0 and 1. So, this is a positive matrix in which the sum of the elements of each row is unitary. In fact, the elements of the i-th row are the probabilities that the chain, being in the S_istate at the t instant, transits in S₁ or in S₂,... or in S_n at the next step, and such transitions are mutually exclusive and exhaustive of all possibilities. Such a matrix (positive with unit sum rows) is called stochastic, and we will call each positive row vector stochastic, as follows:

In this vector, the sum of the elements takes the unit value, as shown in the following formula:

Now we will see that this particular form assumes this matrix in the case of the one-dimensional random walk. As we said previously, in a one-dimensional random walk, we study the motion of a point-like particle that is constrained to move along a straight line in only two directions (right and left).

It can either move (randomly) one step to the right with a fixed probability of p or to the left with a probability of q with p + q = 1. Each step is of equal length and independent of the others, as shown in the following diagram:

Suppose that the random variable of Z_n with n = 1,2, ... are independent and all have the same distribution. Then the immediate position of the particle n is given by the following formula:

Here, X₀ = 0 and the state space is S = (0, ±1, ±2,…). The X_n process is a Markov chain because, to determine the probability that the particle in the next moment is in a certain position, we just need to know where it is at the current moment, even if we are aware of where it was in all the moments before the current one. This can be summarized as follows:

Here, the Z_n variables are independent. The transition matrix is a matrix with finite rows and as many columns, having 0 on the main diagonal, p on the diagonal above the main, q on the diagonal lower than the main, and 0 elsewhere, as shown in the following diagram:

It is clear that this generalization greatly simplifies the problem.

官术网_书友最值得收藏!

Keras Reinforcement Learning Projects

Transition matrix