- Hands-On Q-Learning with Python
- Nazia Habib
- 133字
- 2021-06-24 15:13:17
MABP – a classic exploration versus exploitation problem
Several MABP environments have been created for OpenAI Gym, and they are well worth exploring for a clearer picture of how the problem works. We will not be solving a bandit problem from scratch with the code in this book, but we will go into some solutions in detail and discuss their relevance to epsilon decay strategies.
The main thing to bear in mind when solving any bandit problem is that we are always trying to discover the optimal outcome in a system by balancing our need to both explore and exploit our knowledge of our environment. Effectively, we are learning as we go and we are taking advantage of the knowledge that we already have in the process of gaining new knowledge.
推薦閱讀
- Managing Mission:Critical Domains and DNS
- 輕松學PHP
- 工業(yè)機器人現(xiàn)場編程(FANUC)
- SharePoint 2010開發(fā)最佳實踐
- 控制系統(tǒng)計算機仿真
- AI 3.0
- 電氣控制與PLC技術應用
- 云計算和大數(shù)據(jù)的應用
- AI的25種可能
- Silverlight 2完美征程
- Windows 7故障與技巧200例
- 網(wǎng)管員世界2009超值精華本
- Learning Couchbase
- 數(shù)據(jù)共享與數(shù)據(jù)整合技術
- Mastering Adobe Premiere Pro CS6 Hotshot