官术网_书友最值得收藏!

MABP – a classic exploration versus exploitation problem

Several MABP environments have been created for OpenAI Gym, and they are well worth exploring for a clearer picture of how the problem works. We will not be solving a bandit problem from scratch with the code in this book, but we will go into some solutions in detail and discuss their relevance to epsilon decay strategies.

The main thing to bear in mind when solving any bandit problem is that we are always trying to discover the optimal outcome in a system by balancing our need to both explore and exploit our knowledge of our environment. Effectively, we are learning as we go and we are taking advantage of the knowledge that we already have in the process of gaining new knowledge. 

主站蜘蛛池模板: 休宁县| 巴林左旗| 桦甸市| 绥中县| 康平县| 友谊县| 定襄县| 阿鲁科尔沁旗| 兴国县| 南宫市| 海淀区| 四川省| 古交市| 华蓥市| 汝州市| 永善县| 虹口区| 宜川县| 大余县| 彭州市| 塔城市| 永丰县| 陆川县| 聂荣县| 温泉县| 南投县| 普定县| 垣曲县| 临澧县| 仙桃市| 秦安县| 金塔县| 毕节市| 马尔康县| 华亭县| 玉树县| 中超| 汤原县| 上栗县| 华安县| 法库县|