官术网_书友最值得收藏!

Fine-tuning your model – learning, discount, and exploration rates

Recall our discussion of the three major hyperparameters of a Q-learning model: 

  • Alpha: The learning rate
  • Gamma: The discount rate
  • Epsilon: The exploration rate

What values should we choose for these hyperparameters to optimize the performance of our taxi agent? We will discover these values through experimentation once we have constructed our game environment, and we can also take advantage of existing research on the taxi problem and set the variables to known optimal values. 

A large part of our model-tuning and optimization phase will consist of comparing the performance of different combinations of these three hyperparamenters together. 

One option that we have is the ability to decay any, or all, of these hyperparameters – in other words, to reduce their values as we progress through a game loop or conduct repeated trials. In practice, we will almost always decay epsilon, since we want our agent to adapt to the knowledge it has of its environment and explore less as it becomes better aware of the highest-valued actions to take. But it can sometimes be to our benefit to decay the other hyperparameters as well.

主站蜘蛛池模板: 竹北市| 将乐县| 汉寿县| 达州市| 福安市| 南木林县| 厦门市| 岳阳县| 越西县| 江源县| 新余市| 谷城县| 东乡族自治县| 民勤县| 罗平县| 托克逊县| 集安市| 乐清市| 洛阳市| 河池市| 旺苍县| 合江县| 冕宁县| 东乡族自治县| 高清| 化隆| 河间市| 阿城市| 巴林左旗| 苍梧县| 淮阳县| 伊春市| 忻州市| 镇康县| 兴业县| 霍邱县| 社旗县| 神农架林区| 建瓯市| 响水县| 麻栗坡县|