- PyTorch 1.x Reinforcement Learning Cookbook
- Yuxi (Hayden) Liu
- 361字
- 2021-06-24 12:34:44
How it works...
In Step 2, we calculated the transition probability after k steps, which is the kth power of the transition matrix. You will see the following output:
>>> print("Transition probability after 2 steps:\n{}".format(T_2))
Transition probability after 2 steps:
tensor([[0.6400, 0.3600],
[0.4800, 0.5200]])
>>> print("Transition probability after 5 steps:\n{}".format(T_5))
Transition probability after 5 steps:
tensor([[0.5670, 0.4330],
[0.5773, 0.4227]])
>>> print(
"Transition probability after 10 steps:\n{}".format(T_10))
Transition probability after 10 steps:
tensor([[0.5715, 0.4285],
[0.5714, 0.4286]])
>>> print(
"Transition probability after 15 steps:\n{}".format(T_15))
Transition probability after 15 steps:
tensor([[0.5714, 0.4286],
[0.5714, 0.4286]])
>>> print(
"Transition probability after 20 steps:\n{}".format(T_20))
Transition probability after 20 steps:
tensor([[0.5714, 0.4286],
[0.5714, 0.4286]])
We can see that, after 10 to 15 steps, the transition probability converges. This means that, no matter what state the process is in, it has the same probability of transitioning to s0 (57.14%) and s1 (42.86%).
In Step 4, we calculated the state distribution after k = 1, 2, 5, 10, 15, and 20 steps, which is the multiplication of the initial state distribution and the transition probability. You can see the results here:
>>> print("Distribution of states after 1 step:\n{}".format(v_1))
Distribution of states after 1 step:
tensor([[0.5200, 0.4800]])
>>> print("Distribution of states after 2 steps:\n{}".format(v_2))
Distribution of states after 2 steps:
tensor([[0.5920, 0.4080]])
>>> print("Distribution of states after 5 steps:\n{}".format(v_5))
Distribution of states after 5 steps:
tensor([[0.5701, 0.4299]])
>>> print(
"Distribution of states after 10 steps:\n{}".format(v_10))
Distribution of states after 10 steps:
tensor([[0.5714, 0.4286]])
>>> print(
"Distribution of states after 15 steps:\n{}".format(v_15))
Distribution of states after 15 steps:
tensor([[0.5714, 0.4286]])
>>> print(
"Distribution of states after 20 steps:\n{}".format(v_20))
Distribution of states after 20 steps:
tensor([[0.5714, 0.4286]])
We can see that, after 10 steps, the state distribution converges. The probability of being in s0 (57.14%) and the probability of being in s1 (42.86%) remain unchanged in the long run.
Starting with [0.7, 0.3], the state distribution after one iteration becomes [0.52, 0.48]. Details of its calculation are illustrated in the following diagram:

After another iteration, the state distribution becomes [0.592, 0.408] as calculated in the following diagram:

As time progresses, the state distribution reaches equilibrium.
- 數(shù)據(jù)挖掘?qū)嵱冒咐治?/a>
- Python Data Science Essentials
- Blender Compositing and Post Processing
- 21天學(xué)通Visual C++
- Windows 7寶典
- Android游戲開(kāi)發(fā)案例與關(guān)鍵技術(shù)
- 聊天機(jī)器人:入門、進(jìn)階與實(shí)戰(zhàn)
- 新編計(jì)算機(jī)組裝與維修
- Red Hat Linux 9實(shí)務(wù)自學(xué)手冊(cè)
- 強(qiáng)化學(xué)習(xí)
- Redash v5 Quick Start Guide
- 計(jì)算機(jī)硬件技術(shù)基礎(chǔ)學(xué)習(xí)指導(dǎo)與練習(xí)
- PostgreSQL High Performance Cookbook
- 51單片機(jī)應(yīng)用程序開(kāi)發(fā)與實(shí)踐
- Flash CS3動(dòng)畫(huà)制作融會(huì)貫通