官术网_书友最值得收藏!

The optimizer and initial learning rate

The Adam optimizer (adaptive moment estimator) is used in training that implements an advanced version of stochastic gradient descent. The Adam optimizer takes care of the curvature in the cost function, and at the same time, it uses momentum to ensure steady progress toward a good local minima. For the problem at hand, since we are using transfer learning and want to use as many of the previously learned features from the pre-trained network as possible, we will use a small initial learning rate of 0.00001. This will ensure that the network doesn't lose the useful features learned by the pre-trained networks, and fine-tunes to an optimal point less aggressively, based on the new data for the problem at hand. The Adam optimizer can be defined as follows:

adam = optimizers.Adam(lr=0.00001, beta_1=0.9, beta_2=0.999, epsilon=1e-08, decay=0.0)

The beta_1 parameter controls the contribution of the current gradient in the momentum computation, whereas the beta_2 parameter controls the contribution of the square of the gradient in the gradient normalization, which helps to tackle the curvature in the cost function. 

主站蜘蛛池模板: 灵台县| 西安市| 理塘县| 罗源县| 济南市| 凤台县| 东明县| 沅陵县| 通江县| 邵东县| 合肥市| 屏边| 新乡市| 哈尔滨市| 鹤岗市| 凤山县| 黎川县| 策勒县| 龙门县| 绥芬河市| 普格县| 襄汾县| 多伦县| 南充市| 时尚| 汕尾市| 乐山市| 福贡县| 喀喇沁旗| 瑞昌市| 建宁县| 东兴市| 台江县| 湟中县| 高台县| 青铜峡市| 孟连| 怀柔区| 西华县| 忻城县| 瑞金市|