- Python Deep Learning
- Ivan Vasilev Daniel Slater Gianmario Spacagna Peter Roelants Valentino Zocca
- 1024字
- 2021-07-02 14:31:10
Using Keras to classify handwritten digits
In this section, we'll use Keras to classify the images of the MNIST dataset. It's comprised of 70,000 examples of handwritten digits by different people. The first 60,000 are typically used for training and the remaining 10,000 for testing:

One of the advantages of Keras is that it can import this dataset for you without needing to explicitly download it from the web (it will download it for you):
- Our first step will be to download the datasets using Keras:
from keras.datasets import mnist
- Then, we need to import a few classes to use a feed-forward network:
from keras.models import Sequential from keras.layers.core import Dense, Activation from keras.utils import np_utils
- Next, we'll load the training and testing data. (X_train, Y_train) are the training images and labels, and (X_test, Y_test) are the test images and labels:
(X_train, Y_train), (X_test, Y_test) = mnist.load_data()
- We need to modify the data to be able to use it. X_train contains 60,000 28 x 28 pixel images, and X_test contains 10,000. To feed them to the network as inputs, we want to reshape each sample as a 784-pixel long array, rather than a (28,28) two-dimensional matrix. We can accomplish this with these two lines:
X_train = X_train.reshape(60000, 784) X_test = X_test.reshape(10000, 784)
- The labels indicate the value of the digit depicted in the images. We want to convert this into a 10-entry one-hot encoded vector comprised of zeroes and just one 1 in the entry corresponding to the digit. For example, 4 is mapped to [0, 0, 0, 0, 1, 0, 0, 0, 0, 0]. Conversely, our network will have 10 output neurons:
classes = 10 Y_train = np_utils.to_categorical(Y_train, classes) Y_test = np_utils.to_categorical(Y_test, classes)
- Before calling our main function, we need to set the size of the input layer (the size of the MNIST images), the number of hidden neurons, the number of epochs to train the network, and the mini batch size:
input_size = 784 batch_size = 100 hidden_neurons = 100 epochs = 100
- We are ready to define our network. In this case, we'll use the Sequential model, where each layer serves as an input to the next. In Keras, Dense means fully-connected layer. We'll use a network with one hidden layer, sigmoid activation, and softmax output:
model = Sequential([
Dense(hidden_neurons, input_dim=input_size),
Activation('sigmoid'),
Dense(classes),
Activation('softmax')
])
- Keras now provides a simple way to specify the cost function (the loss) and its optimization, in this case, cross-entropy and stochastic gradient descent. We'll use the default values for learning rate, momentum, and so on:
model.compile(loss='categorical_crossentropy', metrics=['accuracy'], optimizer='sgd')
Softmax and cross-entropy
In the Logistic regression section of Chapter 2, Neural Networks, we learned how to apply regression to binary classification (two classes) problems. The softmax function is a generalization of this concept for multiple classes. Let's look at the following formula:

Here, i, j = 0, 1, 2, ... n and xi represent each of n arbitrary real values, corresponding to n mutually exclusive classes. The softmax "squashes" the input values in the (0, 1) interval, similar to the logistic function. But it has the additional property that the sum of all the squashed outputs adds up to 1. We can interpret the softmax outputs as a normalized probability distribution of the classes. Then, it makes sense to use a loss function, which compares the difference between the estimated class probabilities and the actual class distribution (the difference is known as cross-entropy). As we mentioned in step 5 of this section, the actual distribution is usually a one-hot-encoded vector, where the real class has a probability of 1, and all others have a probability of 0. The loss function that does this is called cross-entropy loss:

Here, qi(x) is the estimated probability of the output to belong to class i (out of n total classes) and pi(x) is the actual probability. When we use one-hot-encoded target values for pi(x), only the target class has a non-zero value (1) and all others are zeros. In this case, cross entropy loss will only capture the error on the target class and will discard all other errors. For the sake of simplicity, we'll assume that we apply the formula over a single training sample.
- We are ready to train the network. In Keras, we can do this in a simple way, with the fit method:
model.fit(X_train, Y_train, batch_size=batch_size, nb_epoch=epochs, verbose=1)
- All that's left to do is to add code to evaluate the network accuracy on the test data:
score = model.evaluate(X_test, Y_test, verbose=1) print('Test accuracy:', score[1])
And that's it. The test accuracy will be about 96%, which is not a great result, but this example runs in less than 30 seconds on a CPU. We can make some simple improvements, such as a larger number of hidden neurons, or a higher number of epochs. We'll leave those experiments to you, to familiarize yourself with the code.
- To see what the network has learned, we can visualize the weights of the hidden layer. The following code allows us to obtain them:
weights = model.layers[0].get_weights()
- To do this, we'll reshape the weights for each neuron back to a 28x28 two-dimensional array:
import matplotlib.pyplot as plt
import matplotlib.cm as cm
import numpy
fig = plt.figure()
w = weights[0].T
for neuron in range(hidden_neurons):
ax = fig.add_subplot(10, 10, neuron + 1)
ax.axis("off")
ax.imshow(numpy.reshape(w[neuron], (28, 28)), cmap=cm.Greys_r)
plt.savefig("neuron_images.png", dpi=300)
plt.show()
- And we can see the result in the following image:

For simplicity, we've aggregated the images of all neurons in a single figure that represents a composite of all neurons. Clearly, since the initial images are very small and do not have lots of details (they are just digits), the features learned by the hidden neurons are not all that interesting. But it's already clear that each neuron is learning a different shape.
- iOS 9 Game Development Essentials
- SQL學習指南(第3版)
- Flash CS6中文版應用教程(第三版)
- Spring Boot進階:原理、實戰與面試題分析
- Android底層接口與驅動開發技術詳解
- Oracle 18c 必須掌握的新特性:管理與實戰
- Visual C++開發入行真功夫
- Tableau 10 Bootcamp
- Learning Docker Networking
- Orleans:構建高性能分布式Actor服務
- Django Design Patterns and Best Practices
- 奔跑吧 Linux內核
- 邊玩邊學Scratch3.0少兒趣味編程
- 程序員面試金典(第6版)
- C語言程序設計實驗指導與習題精解