- Hands-On Q-Learning with Python
- Nazia Habib
- 274字
- 2021-06-24 15:13:12
Epsilon – exploration versus exploitation
Your agent's exploration rate epsilon also ranges from zero to one. As the agent explores its environment, it learns that some actions are better to take than others, but what about states and actions that it hasn't seen yet? We don't want it to get stuck on a local maximum, taking the same currently highest-valued actions over and over when there might be better actions it hasn't tried to take yet.
When you set your epsilon value, there will be a probability equal to epsilon that your agent will take a random (exploratory) action, and a probability equal to 1-epsilon that it will take the current highest Q-valued action for its current state. As we step through a full Q-table update example in the SARSA and the cliff-walking problem section, we'll see how the value that we choose for epsilon affects the rate at which the Q-table converges and the agent discovers the optimal solution.
As the agent gets more and more familiar with its environment, we want it to start sticking to the high-valued actions it's already discovered and do less exploration of the states it hasn't seen. We achieve this by having epsilon decay over time as the agent learns more about its environment and the Q-table converges on its final optimal values.
There are many different ways to decay epsilon, either by using a constant decay factor or basing the decay factor on some other internal variable. Ideally, we want the epsilon decay function to be directly based on the Q-values that we've already discovered. We'll discuss what this means in the next section.
- 機器學習實戰:基于Sophon平臺的機器學習理論與實踐
- 基于LabWindows/CVI的虛擬儀器設計與應用
- 新手學電腦快速入門
- INSTANT Drools Starter
- 大數據驅動的機械裝備智能運維理論及應用
- Containers in OpenStack
- Working with Linux:Quick Hacks for the Command Line
- Excel 2010函數與公式速查手冊
- 過程控制系統
- 在實戰中成長:C++開發之路
- Natural Language Processing and Computational Linguistics
- 基于人工免疫原理的檢測系統模型及其應用
- EJB JPA數據庫持久層開發實踐詳解
- 項目實踐精解:C#核心技術應用開發
- 人工智能:重塑個人、商業與社會