官术网_书友最值得收藏!

How to do it...

Now that we understand how learning rate influences the output values, let's see the impact of the learning rate in action on the MNIST dataset we saw earlier, where we keep the same model architecture but will only be changing the learning rate parameter.

Note that we will be using the same data-preprocessing steps as those of step 1 and step 2 in the Scaling input dataset recipe.

Once we have the dataset preprocessed, we vary the learning rate of the model by specifying the optimizer in the next step:

  1. We change the learning rate as follows:
from keras import optimizers
adam=optimizers.Adam(lr=0.01)

With the preceding code, we have initialized the Adam optimizer with a specified learning rate of 0.01.

  1. We build, compile, and fit the model as follows:
model = Sequential()
model.add(Dense(1000, input_dim=784, activation='relu'))
model.add(Dense(10, activation='softmax'))
model.compile(loss='categorical_crossentropy', optimizer=adam, metrics=['accuracy'])

history = model.fit(X_train, y_train, validation_data=(X_test, y_test), epochs=500, batch_size=1024, verbose=1)

The accuracy of the preceding network is ~90% at the end of 500 epochs. Let's have a look at how loss function and accuracy vary over a different number of epochs (the code to generate the plots in the following diagram remains the same as the code we used in step 8 of the Training a vanilla neural network recipe):

Note that when the learning rate was high (0.01 in the current scenario) compared to 0.0001 (in the scenario considered in the Scaling input dataset recipe), the loss decreased less smoothly when compared to the low-learning-rate model.

The low-learning-rate model updates the weights slowly, thereby resulting in a smoothly reducing loss function, as well as a high accuracy, which was achieved slowly over a higher number of epochs.

Alternatively, the step changes in loss values when the learning rate is higher are due to the loss values getting stuck in a local minima until the weight values change to optimal values. A lower learning rate gives a better possibility of arriving at the optimal weight values faster, as the weights are changed slowly, but steadily, in the right direction.

In a similar manner, let's explore the network accuracy when the learning rate is as high as 0.1:

from keras import optimizers
adam=optimizers.Adam(lr=0.1)

model = Sequential()
model.add(Dense(1000, input_dim=784, activation='relu'))
model.add(Dense(10, activation='softmax'))
model.compile(loss='categorical_crossentropy', optimizer=adam, metrics=['accuracy'])
history = model.fit(X_train, y_train, validation_data=(X_test, y_test), epochs=500, batch_size=1024, verbose=1)

It is to be noted that the loss values could not decrease much further, as the learning rate was high; that is, potentially the weights got stuck in a local minima:

Thus, it is, in general, a good idea to set the learning rate to a low value and let the network learn over a high number of epochs.

主站蜘蛛池模板: 林甸县| 安吉县| 洪江市| 封开县| 龙海市| 玛曲县| 津市市| 澜沧| 株洲市| 托里县| 遂宁市| 临西县| 灵宝市| 调兵山市| 台安县| 三明市| 白玉县| 金门县| 拉萨市| 扎鲁特旗| 民勤县| 通山县| 蒙自县| 山丹县| 项城市| 治县。| 安远县| 基隆市| 吉木乃县| 云林县| 公安县| 望奎县| 芮城县| 武威市| 泰安市| 灌云县| 河源市| 山丹县| 马鞍山市| 北辰区| 克拉玛依市|