書名： Python Deep Learning
作者名： Ivan Vasilev Daniel Slater Gianmario Spacagna Peter Roelants Valentino Zocca
本章字數： 1403字
更新時間： 2021-07-02 14:31:06

Code example of a neural network for the XOR function

In this section, we'll create a simple network with one hidden layer, which solves the XOR function. As we mentioned at the end of the previous chapter, the XOR function is a linearly inseparable problem, hence the need for a hidden layer. The source code will allow you to easily modify the number of layers and the number of neurons per layer, so you can try a number of different scenarios. We'll not use any ML libraries. Instead, we'll implement them from scratch only with the help of numpy. We'll also use matplotlib to visualize the results:

With that, let's start by importing these libraries:

import matplotlib.pyplot as plt
import numpy  
from matplotlib.colors import ListedColormap

Next, we define the activation function and its derivative (we use tanh(x) in this example):

def tanh(x): return (1.0 - numpy.exp(-2*x))/(1.0 + numpy.exp(-2*x)) 
def tanh_derivative(x):      
    return (1 + tanh(x))*(1 - tanh(x))

Then, we start the definition of the NeuralNetwork class:

class NeuralNetwork:

Because of the Python syntax, anything inside the NeuralNetwork class will have to be indented.

First, we define the __init__ initializer of NeuralNetwork:
- net_arch is a one-dimensional array containing the number of neurons for each layer. For example [2, 4, and 1] means an input layer with two neurons, a hidden layer with four neurons, and an output layer with one neuron. Since we are studying the XOR function, the input layer will have two neurons, and the output layer will only have one neuron.

We also set the activation function to the hyperbolic tangent, and we will then define its derivative.
Finally, we initialize the network weights with random values in the range (-1, 1), as demonstrated in the following code block:

# net_arch consists of a list of integers, indicating
# the number of neurons in each layer
def __init__(self, net_arch):
    self.activation_func = tanh
    self.activation_derivative = tanh_derivative
    self.layers = len(net_arch)
    self.steps_per_epoch = 1000
    self.net_arch = net_arch

    # initialize the weights with random values in the range (-1,1)
    self.weights = []
    for layer in range(len(net_arch) - 1):
        w = 2 * numpy.random.rand(net_arch[layer] + 1, net_arch[layer + 1]) - 1
        self.weights.append(w)

Next, we need to define the fit function, which will train our network.

First, we add 1 to the input data (the always-on bias neuron) and set up the code to print the result at the end of each epoch to keep a track of our progress.
In the last line, nn represents the NeuralNetwork class and predict is the function in the NeuralNetwork class that we'll define later:

def fit(self, data, labels, learning_rate=0.1, epochs=10):
    """
    :param data: data is the set of all possible pairs of booleans
                 True or False indicated by the integers 1 or 0
                 labels is the result of the logical operation 'xor'
                 on each of those input pairs
    :param labels: array of 0/1 for each datum
    """

    # Add bias units to the input layer
    ones = numpy.ones((1, data.shape[0]))
    Z = numpy.concatenate((ones.T, data), axis=1)
    training = epochs * self.steps_per_epoch
    for k in range(training):
        if k % self.steps_per_epoch == 0:
            # print ('epochs:', k/self.steps_per_epoch)
            print('epochs: {}'.format(k / self.steps_per_epoch))
            for s in data:
                print(s, nn.predict(s))

Next, we select a random sample from the training set and propagate it forward through the network so that we can calculate the error between the network output and the target data:

        sample = numpy.random.randint(data.shape[0])
        y = [Z[sample]]

        for i in range(len(self.weights) - 1):
            activation = numpy.dot(y[i], self.weights[i])
            activation_f = self.activation_func(activation)
            # add the bias for the next layer
            activation_f = numpy.concatenate((numpy.ones(1), numpy.array(activation_f)))
            y.append(activation_f)

        # last layer
        activation = numpy.dot(y[-1], self.weights[-1])
        activation_f = self.activation_func(activation)
        y.append(activation_f)

Now that we have the error, we can propagate it backward, so we can update the weights. We'll use stochastic gradient descent to update the weights (that is, we are going to update the weights after each step):

        # error for the output layer
        error = labels[sample] - y[-1]
        delta_vec = [error * self.activation_derivative(y[-1])]

        # we need to begin from the back from the next to last layer
        for i in range(self.layers - 2, 0, -1):
            error = delta_vec[-1].dot(self.weights[i][1:].T)
            error = error * self.activation_derivative(y[i][1:])
            delta_vec.append(error)

        # reverse
        # [level3(output)→level2(hidden)] ? [level2(hidden)→level3(output)]
        delta_vec.reverse()

        # backpropagation
        # 1. Multiply its output delta and input activation 
        #    to get the gradient of the weight.
        # 2. Subtract a ratio (percentage) of the gradient from the weight
        for i in range(len(self.weights)):
            layer = y[i].reshape(1, nn.net_arch[i] + 1)

            delta = delta_vec[i].reshape(1, nn.net_arch[i + 1])
            self.weights[i] += learning_rate * layer.T.dot(delta)

This concludes the training phase of the network. We'll now write a predict function to check the results, which returns the network output:

    def predict(self, x):
 val = numpy.concatenate((numpy.ones(1).T, numpy.array(x)))
 for i in range(0, len(self.weights)):
 val = self.activation_func(numpy.dot(val, self.weights[i]))
 val = numpy.concatenate((numpy.ones(1).T, numpy.array(val)))

 return val[1]

Finally, we'll write a function, which plots the lines separating the classes, based on the input variables (we'll see the plots at the end of the section):

    def plot_decision_regions(self, X, y, points=200):
        markers = ('o', '^')
        colors = ('red', 'blue')
        cmap = ListedColormap(colors)

        x1_min, x1_max = X[:, 0].min() - 1, X[:, 0].max() + 1
        x2_min, x2_max = X[:, 1].min() - 1, X[:, 1].max() + 1

        # To produce zoomed-out figures, you can replace the preceding 2 lines with:
        # x1_min, x1_max = -10, 11
        # x2_min, x2_max = -10, 11

        resolution = max(x1_max - x1_min, x2_max - x2_min) / float(points)

        xx1, xx2 = numpy.meshgrid(numpy.arange(x1_min,
                                               x1_max,
                                               resolution),
                                  numpy.arange(x2_min, x2_max, resolution))
        input = numpy.array([xx1.ravel(), xx2.ravel()]).T
        Z = numpy.empty(0)
        for i in range(input.shape[0]):
            val = nn.predict(numpy.array(input[i]))
            if val < 0.5:
                val = 0
            if val >= 0.5:
                val = 1
            Z = numpy.append(Z, val)

        Z = Z.reshape(xx1.shape)

        plt.pcolormesh(xx1, xx2, Z, cmap=cmap)
        plt.xlim(xx1.min(), xx1.max())
        plt.ylim(xx2.min(), xx2.max())
        # plot all samples

        classes = ["False", "True"]

        for idx, cl in enumerate(numpy.unique(y)):
            plt.scatter(x=X[y == cl, 0],
                        y=X[y == cl, 1],
                        alpha=1.0,
                        c=colors[idx],
                        edgecolors='black',
                        marker=markers[idx],
                        s=80,
                        label=classes[idx])

        plt.xlabel('x-axis')
        plt.ylabel('y-axis')
        plt.legend(loc='upper left')
        plt.show()

In the following code block, we can see the code to run the entire process:

if __name__ == '__main__':
    numpy.random.seed(0)

    # Initialize the NeuralNetwork with 2 input, 2 hidden, and 1 output neurons
    nn = NeuralNetwork([2, 2, 1])

    X = numpy.array([[0, 0],
                     [0, 1],
                     [1, 0],
                     [1, 1]])

    y = numpy.array([0, 1, 1, 0])

    nn.fit(X, y, epochs=10)

    print("Final prediction")
    for s in X:
        print(s, nn.predict(s))

    nn.plot_decision_regions(X, y)

We use numpy.random.seed(0) to ensure that the weight initialization is consistent across runs, so we'll be able to compare results, but it's not necessary for the implementation of the neural net.

In the following diagrams, you can see how the nn.plot_decision_regions function method plots the hypersurfaces, which separate the classes. The circles represent the network output for the (True, True) and (False, False) inputs, while the triangles represent the (True, False) and (False, True) inputs for the XOR function:

The following diagram represents the output:

This is the same diagram, the top one zooming out, and the bottom one zooming in, on the selected inputs. The neural network learns to separate those points creating a band containing the two True output values. You can generate the zoomed-out image by modifying the x1_min, x1_max, x2_min, and x2_max variables in the lot_decision_regions function.

Networks with different architectures can produce different separating regions. We can try different combinations of hidden layers when we instantiate the network. When we build the default network, nn = NeuralNetwork([2,2,1]), the first and last values (2 and 1) represent the input and output layers and cannot be modified, but we can add different numbers of hidden layers with different numbers of neurons. For example, ([2,4,3,1]) will represent a 3-layer neural network, with four neurons in the first hidden layer and three neurons in the second hidden layer. You'll be able to see that while the network finds the right solution, the curves separating the regions will be different, depending on the chosen architecture.Now, nn = NeuralNetwork([2,4,3,1]) will produce the following separation:

And here is the separation for nn = NeuralNetwork([2,4,1]):

The architecture of the neural network defines the way the network goes about solving the problem at hand, and different architectures provide different approaches (though they may all give the same result). We are now ready to start looking more closely at what deep neural nets are and their applications.

官术网_书友最值得收藏!

Python Deep Learning

Code example of a neural network for the XOR function