官术网_书友最值得收藏!

Bellman equations

As we mentioned, the Q-table functions as your agent's brain. Everything it has learned about its environment is stored in this table. The function that powers your agent's decisions is called a Bellman equation. There are many different Bellman equations, and we will be using a version of the following equation: 

Here, newQ(s,a) is the new value that we are computing for the state-action pair to enter into the Q-table; Q(s,a) is the current state; alpha is the learning rate; R(s,a) is the reward for that state-action pair; gamma is the discount rate; and maxQ(s', a') is the maximum expected future reward given to the new state (that is, the highest possible reward for all of the actions the agent could take from the new state): 

This equation might seem intimidating at first, but it will become much more straightforward once we start translating it into Python code. The maxQ'(s', a') term will be implemented with an argmax function, which we will discuss in detail. This applies to most of the complex math we will encounter here; once you begin coding, it becomes much simpler and clearer to understand.

主站蜘蛛池模板: 建宁县| 高要市| 江油市| 石城县| 那坡县| 长寿区| 翁牛特旗| 奉贤区| 通州区| 聂荣县| 邵阳县| 湖北省| 太仆寺旗| 大埔县| 阿拉尔市| 永定县| 衢州市| 天全县| 舟曲县| 青浦区| 福建省| 五莲县| 龙山县| 紫阳县| 内丘县| 青岛市| 潜江市| 中牟县| 南溪县| 合江县| 石屏县| 左云县| 海淀区| 襄汾县| 通州市| 扬中市| 屏东县| 四会市| 安泽县| 桂平市| 常山县|