官术网_书友最值得收藏!

Solving the optimization problem

Every time your agent steps through the environment, it will update the Q-table with the rewards it has received. Once your Q-table stops updating and reaches its final state, we will know that your agent has found the optimal path to its destination. It will have solved the MDP represented by its environment.

What this means in practice is that your agent will have found the best actions to take from each state that it has encountered through its exploration of its environment. It will have learned enough about the environment to have found an optimal strategy for navigating a path to the goal. When your Q-table stops updating, we say that it has converged to its final state.

We can be sure that when the Q-table converges, then the agent has found the optimal solution. Q-learning, as we've discussed, is only one learning algorithm that can find a solution to the problem, and there are others that are sometimes more efficient or faster. The reason that we choose to use Q-learning as our introduction to RL is that it is relatively simple, straightforward to learn, and it gives us a good introduction to the types of problems that we'll be facing in this optimization space. 

主站蜘蛛池模板: 延寿县| 钟山县| 香格里拉县| 邳州市| 辰溪县| 泽普县| 庆云县| 长顺县| 麻江县| 天等县| 习水县| 小金县| 西昌市| 沈丘县| 兰溪市| 五河县| 奇台县| 射洪县| 晋江市| 灵丘县| 普兰县| 和平区| 新邵县| 柏乡县| 永丰县| 张家口市| 武胜县| 龙江县| 三穗县| 额敏县| 晋城| 杨浦区| 重庆市| 林西县| 邹城市| 邯郸市| 铁岭县| 耿马| 吴江市| 德兴市| 安溪县|