官术网_书友最值得收藏!

MABP – a classic exploration versus exploitation problem

Several MABP environments have been created for OpenAI Gym, and they are well worth exploring for a clearer picture of how the problem works. We will not be solving a bandit problem from scratch with the code in this book, but we will go into some solutions in detail and discuss their relevance to epsilon decay strategies.

The main thing to bear in mind when solving any bandit problem is that we are always trying to discover the optimal outcome in a system by balancing our need to both explore and exploit our knowledge of our environment. Effectively, we are learning as we go and we are taking advantage of the knowledge that we already have in the process of gaining new knowledge. 

主站蜘蛛池模板: 祁连县| 梨树县| 饶平县| 泰兴市| 临沂市| 南阳市| 罗田县| 鹤壁市| 宝清县| 天津市| 阳原县| 蒲城县| 汕头市| 南召县| 太谷县| 尼勒克县| 嵩明县| 富锦市| 吉安县| 枣强县| 舟曲县| 靖安县| 荥经县| 宁波市| 丽江市| 安图县| 孝昌县| 卓尼县| 锡林浩特市| 阳原县| 汕头市| 九龙县| 宜兰市| 镇巴县| 双鸭山市| 安仁县| 泾川县| 女性| 木兰县| 四子王旗| 泗洪县|