- Hands-On Q-Learning with Python
- Nazia Habib
- 212字
- 2021-06-24 15:13:16
Solving the optimization problem
Every time your agent steps through the environment, it will update the Q-table with the rewards it has received. Once your Q-table stops updating and reaches its final state, we will know that your agent has found the optimal path to its destination. It will have solved the MDP represented by its environment.
What this means in practice is that your agent will have found the best actions to take from each state that it has encountered through its exploration of its environment. It will have learned enough about the environment to have found an optimal strategy for navigating a path to the goal. When your Q-table stops updating, we say that it has converged to its final state.
We can be sure that when the Q-table converges, then the agent has found the optimal solution. Q-learning, as we've discussed, is only one learning algorithm that can find a solution to the problem, and there are others that are sometimes more efficient or faster. The reason that we choose to use Q-learning as our introduction to RL is that it is relatively simple, straightforward to learn, and it gives us a good introduction to the types of problems that we'll be facing in this optimization space.
- GNU-Linux Rapid Embedded Programming
- Mastering Mesos
- AutoCAD繪圖實用速查通典
- Practical Ansible 2
- 網(wǎng)上沖浪
- Deep Learning Quick Reference
- Java實用組件集
- 精通Windows Vista必讀
- 最簡數(shù)據(jù)挖掘
- 深度學(xué)習(xí)中的圖像分類與對抗技術(shù)
- Python:Data Analytics and Visualization
- 人工智能技術(shù)入門
- MATLAB-Simulink系統(tǒng)仿真超級學(xué)習(xí)手冊
- AVR單片機(jī)工程師是怎樣煉成的
- Hands-On Geospatial Analysis with R and QGIS