官术网_书友最值得收藏!

The optimizer and initial learning rate

The Adam optimizer (adaptive moment estimator) is used in training that implements an advanced version of stochastic gradient descent. The Adam optimizer takes care of the curvature in the cost function, and at the same time, it uses momentum to ensure steady progress toward a good local minima. For the problem at hand, since we are using transfer learning and want to use as many of the previously learned features from the pre-trained network as possible, we will use a small initial learning rate of 0.00001. This will ensure that the network doesn't lose the useful features learned by the pre-trained networks, and fine-tunes to an optimal point less aggressively, based on the new data for the problem at hand. The Adam optimizer can be defined as follows:

adam = optimizers.Adam(lr=0.00001, beta_1=0.9, beta_2=0.999, epsilon=1e-08, decay=0.0)

The beta_1 parameter controls the contribution of the current gradient in the momentum computation, whereas the beta_2 parameter controls the contribution of the square of the gradient in the gradient normalization, which helps to tackle the curvature in the cost function. 

主站蜘蛛池模板: 荣成市| 泰来县| 遵义县| 阿尔山市| 从化市| 措勤县| 株洲市| 沙坪坝区| 皋兰县| 望谟县| 越西县| 德庆县| 舟山市| 尼勒克县| 西藏| 江门市| 安国市| 黔西| 济源市| 昌邑市| 扎赉特旗| 北辰区| 华池县| 武清区| 洪泽县| 济南市| 河源市| 重庆市| 溆浦县| 且末县| 江津市| 克什克腾旗| 广灵县| 泗水县| 会东县| 隆尧县| 应用必备| 普兰县| 虞城县| 介休市| 宜城市|