官术网_书友最值得收藏!

The optimizer and initial learning rate

The Adam optimizer (adaptive moment estimator) is used in training that implements an advanced version of stochastic gradient descent. The Adam optimizer takes care of the curvature in the cost function, and at the same time, it uses momentum to ensure steady progress toward a good local minima. For the problem at hand, since we are using transfer learning and want to use as many of the previously learned features from the pre-trained network as possible, we will use a small initial learning rate of 0.00001. This will ensure that the network doesn't lose the useful features learned by the pre-trained networks, and fine-tunes to an optimal point less aggressively, based on the new data for the problem at hand. The Adam optimizer can be defined as follows:

adam = optimizers.Adam(lr=0.00001, beta_1=0.9, beta_2=0.999, epsilon=1e-08, decay=0.0)

The beta_1 parameter controls the contribution of the current gradient in the momentum computation, whereas the beta_2 parameter controls the contribution of the square of the gradient in the gradient normalization, which helps to tackle the curvature in the cost function. 

主站蜘蛛池模板: 黔西县| 安化县| 汽车| 会昌县| 陈巴尔虎旗| 图木舒克市| 射洪县| 壤塘县| 武穴市| 门头沟区| 民勤县| 濮阳市| 原平市| 云林县| 宜春市| 汉中市| 虞城县| 珲春市| 沂南县| 鹤庆县| 宝应县| 蓬溪县| 手游| 武城县| 达拉特旗| 柳江县| 牟定县| 府谷县| 友谊县| 平邑县| 夏津县| 安福县| 瑞昌市| 隆昌县| 淮滨县| 慈利县| 崇左市| 綦江县| 高要市| 无为县| 涞源县|