書名： Hands-On Q-Learning with Python
作者名： Nazia Habib
本章字數： 196字
更新時間： 2021-06-24 15:13:10

Bellman equations

As we mentioned, the Q-table functions as your agent's brain. Everything it has learned about its environment is stored in this table. The function that powers your agent's decisions is called a Bellman equation. There are many different Bellman equations, and we will be using a version of the following equation:

Here, newQ(s,a) is the new value that we are computing for the state-action pair to enter into the Q-table; Q(s,a) is the current state; alpha is the learning rate; R(s,a) is the reward for that state-action pair; gamma is the discount rate; and maxQ(s', a') is the maximum expected future reward given to the new state (that is, the highest possible reward for all of the actions the agent could take from the new state):

This equation might seem intimidating at first, but it will become much more straightforward once we start translating it into Python code. The maxQ'(s', a') term will be implemented with an argmax function, which we will discuss in detail. This applies to most of the complex math we will encounter here; once you begin coding, it becomes much simpler and clearer to understand.

官术网_书友最值得收藏!

Hands-On Q-Learning with Python

Bellman equations