書名： Hands-On Q-Learning with Python
作者名： Nazia Habib
本章字數： 133字
更新時間： 2021-06-24 15:13:17

MABP – a classic exploration versus exploitation problem

Several MABP environments have been created for OpenAI Gym, and they are well worth exploring for a clearer picture of how the problem works. We will not be solving a bandit problem from scratch with the code in this book, but we will go into some solutions in detail and discuss their relevance to epsilon decay strategies.

The main thing to bear in mind when solving any bandit problem is that we are always trying to discover the optimal outcome in a system by balancing our need to both explore and exploit our knowledge of our environment. Effectively, we are learning as we go and we are taking advantage of the knowledge that we already have in the process of gaining new knowledge.

官术网_书友最值得收藏!

Hands-On Q-Learning with Python

MABP – a classic exploration versus exploitation problem