官术网_书友最值得收藏!

How to do it...

Now that we understand how learning rate influences the output values, let's see the impact of the learning rate in action on the MNIST dataset we saw earlier, where we keep the same model architecture but will only be changing the learning rate parameter.

Note that we will be using the same data-preprocessing steps as those of step 1 and step 2 in the Scaling input dataset recipe.

Once we have the dataset preprocessed, we vary the learning rate of the model by specifying the optimizer in the next step:

  1. We change the learning rate as follows:
from keras import optimizers
adam=optimizers.Adam(lr=0.01)

With the preceding code, we have initialized the Adam optimizer with a specified learning rate of 0.01.

  1. We build, compile, and fit the model as follows:
model = Sequential()
model.add(Dense(1000, input_dim=784, activation='relu'))
model.add(Dense(10, activation='softmax'))
model.compile(loss='categorical_crossentropy', optimizer=adam, metrics=['accuracy'])

history = model.fit(X_train, y_train, validation_data=(X_test, y_test), epochs=500, batch_size=1024, verbose=1)

The accuracy of the preceding network is ~90% at the end of 500 epochs. Let's have a look at how loss function and accuracy vary over a different number of epochs (the code to generate the plots in the following diagram remains the same as the code we used in step 8 of the Training a vanilla neural network recipe):

Note that when the learning rate was high (0.01 in the current scenario) compared to 0.0001 (in the scenario considered in the Scaling input dataset recipe), the loss decreased less smoothly when compared to the low-learning-rate model.

The low-learning-rate model updates the weights slowly, thereby resulting in a smoothly reducing loss function, as well as a high accuracy, which was achieved slowly over a higher number of epochs.

Alternatively, the step changes in loss values when the learning rate is higher are due to the loss values getting stuck in a local minima until the weight values change to optimal values. A lower learning rate gives a better possibility of arriving at the optimal weight values faster, as the weights are changed slowly, but steadily, in the right direction.

In a similar manner, let's explore the network accuracy when the learning rate is as high as 0.1:

from keras import optimizers
adam=optimizers.Adam(lr=0.1)

model = Sequential()
model.add(Dense(1000, input_dim=784, activation='relu'))
model.add(Dense(10, activation='softmax'))
model.compile(loss='categorical_crossentropy', optimizer=adam, metrics=['accuracy'])
history = model.fit(X_train, y_train, validation_data=(X_test, y_test), epochs=500, batch_size=1024, verbose=1)

It is to be noted that the loss values could not decrease much further, as the learning rate was high; that is, potentially the weights got stuck in a local minima:

Thus, it is, in general, a good idea to set the learning rate to a low value and let the network learn over a high number of epochs.

主站蜘蛛池模板: 松滋市| 凤山县| 乌兰察布市| 上高县| 临桂县| 平湖市| 滁州市| 娄烦县| 新余市| 祁连县| 斗六市| 西畴县| 青川县| 台前县| 稷山县| 辽源市| 芜湖市| 加查县| 古田县| 南投市| 象山县| 邵阳县| 香河县| 大余县| 汝阳县| 襄樊市| 霸州市| 赫章县| 泉州市| 大埔区| 贡嘎县| 贵州省| 五华县| 日喀则市| 曲麻莱县| 榕江县| 孙吴县| 本溪市| 新龙县| 睢宁县| 仁寿县|