- PyTorch 1.x Reinforcement Learning Cookbook
- Yuxi (Hayden) Liu
- 127字
- 2021-06-24 12:34:45
There's more...
We decide to experiment with different values for the discount factor. Let's start with 0, which means we only care about the immediate reward:
>>> gamma = 0
>>> V = cal_value_matrix_inversion(gamma, trans_matrix, R)
>>> print("The value function under the optimal policy is:\n{}".format(V))
The value function under the optimal policy is:
tensor([[ 1.],
[ 0.],
[-1.]])
This is consistent with the reward function as we only look at the reward received in the next move.
As the discount factor increases toward 1, future rewards are considered. Let's take a look at ??=0.99:
>>> gamma = 0.99
>>> V = cal_value_matrix_inversion(gamma, trans_matrix, R)
>>> print("The value function under the optimal policy is:\n{}".format(V))
The value function under the optimal policy is:
tensor([[65.8293],
[64.7194],
[63.4876]])
推薦閱讀
- Mastering Hadoop 3
- 我的J2EE成功之路
- Learning Apache Cassandra(Second Edition)
- 機器人智能運動規劃技術
- Creo Parametric 1.0中文版從入門到精通
- Storm應用實踐:實時事務處理之策略
- LAMP網站開發黃金組合Linux+Apache+MySQL+PHP
- 機器人人工智能
- INSTANT Adobe Story Starter
- Mastering OpenStack(Second Edition)
- MPC5554/5553微處理器揭秘
- Building Google Cloud Platform Solutions
- 新一代人工智能與語音識別
- RealFlow流體制作經典實例解析
- 人工智能基礎