書名： Python Deep Learning
作者名： Ivan Vasilev Daniel Slater Gianmario Spacagna Peter Roelants Valentino Zocca
本章字數： 1024字
更新時間： 2021-07-02 14:31:10

Using Keras to classify handwritten digits

In this section, we'll use Keras to classify the images of the MNIST dataset. It's comprised of 70,000 examples of handwritten digits by different people. The first 60,000 are typically used for training and the remaining 10,000 for testing:

Sample of digits taken from the MNIST dataset

One of the advantages of Keras is that it can import this dataset for you without needing to explicitly download it from the web (it will download it for you):

Our first step will be to download the datasets using Keras:

from keras.datasets import mnist

Then, we need to import a few classes to use a feed-forward network:

from keras.models import Sequential  
from keras.layers.core import Dense, Activation 
from keras.utils import np_utils

Next, we'll load the training and testing data. (X_train, Y_train) are the training images and labels, and (X_test, Y_test) are the test images and labels:

(X_train, Y_train), (X_test, Y_test) = mnist.load_data()

We need to modify the data to be able to use it. X_train contains 60,000 28 x 28 pixel images, and X_test contains 10,000. To feed them to the network as inputs, we want to reshape each sample as a 784-pixel long array, rather than a (28,28) two-dimensional matrix. We can accomplish this with these two lines:

X_train = X_train.reshape(60000, 784)      
X_test = X_test.reshape(10000, 784)

The labels indicate the value of the digit depicted in the images. We want to convert this into a 10-entry one-hot encoded vector comprised of zeroes and just one 1 in the entry corresponding to the digit. For example, 4 is mapped to [0, 0, 0, 0, 1, 0, 0, 0, 0, 0]. Conversely, our network will have 10 output neurons:

classes = 10 
Y_train = np_utils.to_categorical(Y_train, classes)      
Y_test = np_utils.to_categorical(Y_test, classes)

Before calling our main function, we need to set the size of the input layer (the size of the MNIST images), the number of hidden neurons, the number of epochs to train the network, and the mini batch size:

input_size = 784 
batch_size = 100      
hidden_neurons = 100      
epochs = 100

We are ready to define our network. In this case, we'll use the Sequential model, where each layer serves as an input to the next. In Keras, Dense means fully-connected layer. We'll use a network with one hidden layer, sigmoid activation, and softmax output:

model = Sequential([
    Dense(hidden_neurons, input_dim=input_size),
    Activation('sigmoid'),
    Dense(classes),
    Activation('softmax')
])

Keras now provides a simple way to specify the cost function (the loss) and its optimization, in this case, cross-entropy and stochastic gradient descent. We'll use the default values for learning rate, momentum, and so on:

model.compile(loss='categorical_crossentropy', metrics=['accuracy'], optimizer='sgd')

Softmax and cross-entropy

In the Logistic regression section of Chapter 2, Neural Networks, we learned how to apply regression to binary classification (two classes) problems. The softmax function is a generalization of this concept for multiple classes. Let's look at the following formula:

Here, i, j = 0, 1, 2, ... n and x_i represent each of n arbitrary real values, corresponding to n mutually exclusive classes. The softmax "squashes" the input values in the (0, 1) interval, similar to the logistic function. But it has the additional property that the sum of all the squashed outputs adds up to 1. We can interpret the softmax outputs as a normalized probability distribution of the classes. Then, it makes sense to use a loss function, which compares the difference between the estimated class probabilities and the actual class distribution (the difference is known as cross-entropy). As we mentioned in step 5 of this section, the actual distribution is usually a one-hot-encoded vector, where the real class has a probability of 1, and all others have a probability of 0. The loss function that does this is called cross-entropy loss:

Here, q_i(x) is the estimated probability of the output to belong to class i (out of n total classes) and p_i(x) is the actual probability. When we use one-hot-encoded target values for p_i(x), only the target class has a non-zero value (1) and all others are zeros. In this case, cross entropy loss will only capture the error on the target class and will discard all other errors. For the sake of simplicity, we'll assume that we apply the formula over a single training sample.

We are ready to train the network. In Keras, we can do this in a simple way, with the fit method:

model.fit(X_train, Y_train, batch_size=batch_size, nb_epoch=epochs, verbose=1)

All that's left to do is to add code to evaluate the network accuracy on the test data:

score = model.evaluate(X_test, Y_test, verbose=1) print('Test accuracy:', score[1])

And that's it. The test accuracy will be about 96%, which is not a great result, but this example runs in less than 30 seconds on a CPU. We can make some simple improvements, such as a larger number of hidden neurons, or a higher number of epochs. We'll leave those experiments to you, to familiarize yourself with the code.

To see what the network has learned, we can visualize the weights of the hidden layer. The following code allows us to obtain them:

weights = model.layers[0].get_weights()

To do this, we'll reshape the weights for each neuron back to a 28x28 two-dimensional array:

import matplotlib.pyplot as plt
import matplotlib.cm as cm
import numpy

fig = plt.figure()

w = weights[0].T
for neuron in range(hidden_neurons):
 ax = fig.add_subplot(10, 10, neuron + 1)
 ax.axis("off")
 ax.imshow(numpy.reshape(w[neuron], (28, 28)), cmap=cm.Greys_r)

plt.savefig("neuron_images.png", dpi=300)
plt.show()

And we can see the result in the following image:

Composite figure of what was learned by all the hidden neurons

For simplicity, we've aggregated the images of all neurons in a single figure that represents a composite of all neurons. Clearly, since the initial images are very small and do not have lots of details (they are just digits), the features learned by the hidden neurons are not all that interesting. But it's already clear that each neuron is learning a different shape.

官术网_书友最值得收藏!

Python Deep Learning

Using Keras to classify handwritten digits