官术网_书友最值得收藏!

Bellman equations

As we mentioned, the Q-table functions as your agent's brain. Everything it has learned about its environment is stored in this table. The function that powers your agent's decisions is called a Bellman equation. There are many different Bellman equations, and we will be using a version of the following equation: 

Here, newQ(s,a) is the new value that we are computing for the state-action pair to enter into the Q-table; Q(s,a) is the current state; alpha is the learning rate; R(s,a) is the reward for that state-action pair; gamma is the discount rate; and maxQ(s', a') is the maximum expected future reward given to the new state (that is, the highest possible reward for all of the actions the agent could take from the new state): 

This equation might seem intimidating at first, but it will become much more straightforward once we start translating it into Python code. The maxQ'(s', a') term will be implemented with an argmax function, which we will discuss in detail. This applies to most of the complex math we will encounter here; once you begin coding, it becomes much simpler and clearer to understand.

主站蜘蛛池模板: 冕宁县| 元氏县| 银川市| 额尔古纳市| 奉化市| 县级市| 英德市| 乐亭县| 古丈县| 凤阳县| 隆子县| 佳木斯市| 广河县| 林甸县| 临潭县| 缙云县| 西宁市| 乌拉特前旗| 西贡区| 遂溪县| 凤山县| 菏泽市| 江川县| 偏关县| 德州市| 四会市| 龙陵县| 云南省| 西畴县| 错那县| 体育| 吕梁市| 常熟市| 南昌县| 许昌市| 芦山县| 扶绥县| 信宜市| 云安县| 垦利县| 繁昌县|