- Reinforcement Learning with TensorFlow
- Sayon Dutta
- 229字
- 2021-08-27 18:51:57
The value function for optimality
Agents should be able to think about both immediate and future rewards. Therefore, a value is assigned to each encountered state that reflects this future information too. This is called value function. Here comes the concept of delayed rewards, where being at present what actions taken now will lead to potential rewards in future.
V(s), that is, value of the state is defined as the expected value of rewards to be received in future for all the actions taken from this state to subsequent states until the agent reaches the goal state. Basically, value functions tell us how good it is to be in this state. The higher the value, the better the state.
Rewards assigned to each (s,a,s') triple is fixed. This is not the case with the value of the state; it is subjected to change with every action in the episode and with different episodes too.
One solution comes in mind, instead of the value function, why don't we store the knowledge of every possible state?
The answer is simple: it's time-consuming and expensive, and this cost grows exponentially. Therefore, it's better to store the knowledge of the current state, that is, V(s):
More details on the value function will be covered in Chapter 3, The Markov Decision Process and Partially Observable MDP.
- 面向STEM的mBlock智能機器人創(chuàng)新課程
- 網(wǎng)絡(luò)服務(wù)器架設(shè)(Windows Server+Linux Server)
- 離散事件系統(tǒng)建模與仿真
- 機器人創(chuàng)新實訓教程
- 3D Printing for Architects with MakerBot
- Moodle Course Design Best Practices
- 樂高機器人—槍械武器庫
- PostgreSQL 10 Administration Cookbook
- 從零開始學SQL Server
- Mastering Ceph
- Learn QGIS
- 基于RPA技術(shù)財務(wù)機器人的應(yīng)用與研究
- 生成對抗網(wǎng)絡(luò)項目實戰(zhàn)
- 計算機應(yīng)用基礎(chǔ)學習指導與練習(Windows XP+Office 2003)
- Generative Adversarial Networks Projects