官术网_书友最值得收藏!

The optimizer and initial learning rate

The Adam optimizer (adaptive moment estimator) is used in training that implements an advanced version of stochastic gradient descent. The Adam optimizer takes care of the curvature in the cost function, and at the same time, it uses momentum to ensure steady progress toward a good local minima. For the problem at hand, since we are using transfer learning and want to use as many of the previously learned features from the pre-trained network as possible, we will use a small initial learning rate of 0.00001. This will ensure that the network doesn't lose the useful features learned by the pre-trained networks, and fine-tunes to an optimal point less aggressively, based on the new data for the problem at hand. The Adam optimizer can be defined as follows:

adam = optimizers.Adam(lr=0.00001, beta_1=0.9, beta_2=0.999, epsilon=1e-08, decay=0.0)

The beta_1 parameter controls the contribution of the current gradient in the momentum computation, whereas the beta_2 parameter controls the contribution of the square of the gradient in the gradient normalization, which helps to tackle the curvature in the cost function. 

主站蜘蛛池模板: 郓城县| 泾阳县| 泊头市| 泰安市| 滨海县| 惠安县| 老河口市| 攀枝花市| 文成县| 历史| 兴安县| 运城市| 什邡市| 兴化市| 北海市| 汉中市| 安吉县| 东至县| 宁明县| 微山县| 凤山市| 济阳县| 宁强县| 化隆| 遂昌县| 东平县| 开封市| 当阳市| 同仁县| 昭苏县| 镇沅| 景泰县| 孟连| 宜兰县| 崇州市| 蓝田县| 高阳县| 霍州市| 东辽县| 延吉市| 盐池县|