- Hands-On Q-Learning with Python
- Nazia Habib
- 150字
- 2021-06-24 15:13:16
Decaying epsilon
We've discussed epsilon decay in the context of exploration versus exploitation. The more we get to know our environment, the less random exploration we want to do and the more actions we want to take that we know will give us high rewards. Our goal should always be to take advantage of what we already know.
We do this by reducing the agent's epsilon value by a particular amount as the game progresses. Remember that epsilon is the likelihood (in percentage) that the agent will take a random action, instead of taking the current highest Q-valued action for the current state.
When we reduce epsilon, the likelihood of a random action becomes smaller, and we take more opportunities to benefit from the high-valued actions that we have already discovered.
For similar reasons, it can be to our benefit to decay alpha and gamma along with epsilon.
- 樂高機(jī)器人:WeDo編程與搭建指南
- 大學(xué)計(jì)算機(jī)基礎(chǔ):基礎(chǔ)理論篇
- 基于LabWindows/CVI的虛擬儀器設(shè)計(jì)與應(yīng)用
- 輕松學(xué)Java Web開發(fā)
- Photoshop CS4經(jīng)典380例
- Visual C# 2008開發(fā)技術(shù)實(shí)例詳解
- 80x86/Pentium微型計(jì)算機(jī)原理及應(yīng)用
- Ceph:Designing and Implementing Scalable Storage Systems
- RedHat Linux用戶基礎(chǔ)
- Microsoft System Center Confi guration Manager
- 工業(yè)機(jī)器人實(shí)操進(jìn)階手冊(cè)
- 網(wǎng)絡(luò)服務(wù)器搭建與管理
- Eclipse RCP應(yīng)用系統(tǒng)開發(fā)方法與實(shí)戰(zhàn)
- 網(wǎng)絡(luò)安全原理與應(yīng)用
- Raspberry Pi 3 Projects for Java Programmers