官术网_书友最值得收藏!

Decaying epsilon

We've discussed epsilon decay in the context of exploration versus exploitation. The more we get to know our environment, the less random exploration we want to do and the more actions we want to take that we know will give us high rewards. Our goal should always be to take advantage of what we already know. 

We do this by reducing the agent's epsilon value by a particular amount as the game progresses. Remember that epsilon is the likelihood (in percentage) that the agent will take a random action, instead of taking the current highest Q-valued action for the current state.

When we reduce epsilon, the likelihood of a random action becomes smaller, and we take more opportunities to benefit from the high-valued actions that we have already discovered. 

For similar reasons, it can be to our benefit to decay alpha and gamma along with epsilon.

主站蜘蛛池模板: 宁陕县| 长岭县| 安阳县| 涡阳县| 丹东市| 甘谷县| 堆龙德庆县| 苏尼特左旗| 宝应县| 扎鲁特旗| 平泉县| 昌宁县| 阿克陶县| 紫阳县| 定西市| 调兵山市| 图们市| 红桥区| 盈江县| 永安市| 西宁市| 三门县| 大姚县| 永康市| 巴东县| 西乡县| 大安市| 仙居县| 黄浦区| 萨嘎县| 杨浦区| 安图县| 将乐县| 互助| 灵丘县| 阿巴嘎旗| 怀来县| 宝鸡市| 临邑县| 万年县| 永新县|