官术网_书友最值得收藏!

Bellman equations

As we mentioned, the Q-table functions as your agent's brain. Everything it has learned about its environment is stored in this table. The function that powers your agent's decisions is called a Bellman equation. There are many different Bellman equations, and we will be using a version of the following equation: 

Here, newQ(s,a) is the new value that we are computing for the state-action pair to enter into the Q-table; Q(s,a) is the current state; alpha is the learning rate; R(s,a) is the reward for that state-action pair; gamma is the discount rate; and maxQ(s', a') is the maximum expected future reward given to the new state (that is, the highest possible reward for all of the actions the agent could take from the new state): 

This equation might seem intimidating at first, but it will become much more straightforward once we start translating it into Python code. The maxQ'(s', a') term will be implemented with an argmax function, which we will discuss in detail. This applies to most of the complex math we will encounter here; once you begin coding, it becomes much simpler and clearer to understand.

主站蜘蛛池模板: 上犹县| 乡城县| 于田县| 峨边| 内黄县| 彭州市| 中宁县| 保德县| 台山市| 武清区| 沂南县| 怀集县| 利津县| 东兰县| 张家界市| 花莲市| 阿拉善盟| 阿拉善右旗| 天门市| 丽水市| 浦东新区| 沛县| 手机| 青神县| 安阳县| 额济纳旗| 大化| 平陆县| 邢台县| 万州区| 芜湖市| 泾川县| 襄城县| 刚察县| 平乐县| 扎鲁特旗| 马关县| 平潭县| 江津市| 长白| 义乌市|