- Python Deep Learning
- Ivan Vasilev Daniel Slater Gianmario Spacagna Peter Roelants Valentino Zocca
- 1403字
- 2021-07-02 14:31:06
Code example of a neural network for the XOR function
In this section, we'll create a simple network with one hidden layer, which solves the XOR function. As we mentioned at the end of the previous chapter, the XOR function is a linearly inseparable problem, hence the need for a hidden layer. The source code will allow you to easily modify the number of layers and the number of neurons per layer, so you can try a number of different scenarios. We'll not use any ML libraries. Instead, we'll implement them from scratch only with the help of numpy. We'll also use matplotlib to visualize the results:
- With that, let's start by importing these libraries:
import matplotlib.pyplot as plt
import numpy
from matplotlib.colors import ListedColormap
- Next, we define the activation function and its derivative (we use tanh(x) in this example):
def tanh(x): return (1.0 - numpy.exp(-2*x))/(1.0 + numpy.exp(-2*x)) def tanh_derivative(x): return (1 + tanh(x))*(1 - tanh(x))
- Then, we start the definition of the NeuralNetwork class:
class NeuralNetwork:
Because of the Python syntax, anything inside the NeuralNetwork class will have to be indented.
- First, we define the __init__ initializer of NeuralNetwork:
- net_arch is a one-dimensional array containing the number of neurons for each layer. For example [2, 4, and 1] means an input layer with two neurons, a hidden layer with four neurons, and an output layer with one neuron. Since we are studying the XOR function, the input layer will have two neurons, and the output layer will only have one neuron.
- We also set the activation function to the hyperbolic tangent, and we will then define its derivative.
- Finally, we initialize the network weights with random values in the range (-1, 1), as demonstrated in the following code block:
# net_arch consists of a list of integers, indicating
# the number of neurons in each layer
def __init__(self, net_arch):
self.activation_func = tanh
self.activation_derivative = tanh_derivative
self.layers = len(net_arch)
self.steps_per_epoch = 1000
self.net_arch = net_arch
# initialize the weights with random values in the range (-1,1)
self.weights = []
for layer in range(len(net_arch) - 1):
w = 2 * numpy.random.rand(net_arch[layer] + 1, net_arch[layer + 1]) - 1
self.weights.append(w)
- Next, we need to define the fit function, which will train our network.
- First, we add 1 to the input data (the always-on bias neuron) and set up the code to print the result at the end of each epoch to keep a track of our progress.
- In the last line, nn represents the NeuralNetwork class and predict is the function in the NeuralNetwork class that we'll define later:
def fit(self, data, labels, learning_rate=0.1, epochs=10):
"""
:param data: data is the set of all possible pairs of booleans
True or False indicated by the integers 1 or 0
labels is the result of the logical operation 'xor'
on each of those input pairs
:param labels: array of 0/1 for each datum
"""
# Add bias units to the input layer
ones = numpy.ones((1, data.shape[0]))
Z = numpy.concatenate((ones.T, data), axis=1)
training = epochs * self.steps_per_epoch
for k in range(training):
if k % self.steps_per_epoch == 0:
# print ('epochs:', k/self.steps_per_epoch)
print('epochs: {}'.format(k / self.steps_per_epoch))
for s in data:
print(s, nn.predict(s))
- Next, we select a random sample from the training set and propagate it forward through the network so that we can calculate the error between the network output and the target data:
sample = numpy.random.randint(data.shape[0])
y = [Z[sample]]
for i in range(len(self.weights) - 1):
activation = numpy.dot(y[i], self.weights[i])
activation_f = self.activation_func(activation)
# add the bias for the next layer
activation_f = numpy.concatenate((numpy.ones(1), numpy.array(activation_f)))
y.append(activation_f)
# last layer
activation = numpy.dot(y[-1], self.weights[-1])
activation_f = self.activation_func(activation)
y.append(activation_f)
- Now that we have the error, we can propagate it backward, so we can update the weights. We'll use stochastic gradient descent to update the weights (that is, we are going to update the weights after each step):
# error for the output layer
error = labels[sample] - y[-1]
delta_vec = [error * self.activation_derivative(y[-1])]
# we need to begin from the back from the next to last layer
for i in range(self.layers - 2, 0, -1):
error = delta_vec[-1].dot(self.weights[i][1:].T)
error = error * self.activation_derivative(y[i][1:])
delta_vec.append(error)
# reverse
# [level3(output)→level2(hidden)] ? [level2(hidden)→level3(output)]
delta_vec.reverse()
# backpropagation
# 1. Multiply its output delta and input activation
# to get the gradient of the weight.
# 2. Subtract a ratio (percentage) of the gradient from the weight
for i in range(len(self.weights)):
layer = y[i].reshape(1, nn.net_arch[i] + 1)
delta = delta_vec[i].reshape(1, nn.net_arch[i + 1])
self.weights[i] += learning_rate * layer.T.dot(delta)
- This concludes the training phase of the network. We'll now write a predict function to check the results, which returns the network output:
def predict(self, x):
val = numpy.concatenate((numpy.ones(1).T, numpy.array(x)))
for i in range(0, len(self.weights)):
val = self.activation_func(numpy.dot(val, self.weights[i]))
val = numpy.concatenate((numpy.ones(1).T, numpy.array(val)))
return val[1]
- Finally, we'll write a function, which plots the lines separating the classes, based on the input variables (we'll see the plots at the end of the section):
def plot_decision_regions(self, X, y, points=200):
markers = ('o', '^')
colors = ('red', 'blue')
cmap = ListedColormap(colors)
x1_min, x1_max = X[:, 0].min() - 1, X[:, 0].max() + 1
x2_min, x2_max = X[:, 1].min() - 1, X[:, 1].max() + 1
# To produce zoomed-out figures, you can replace the preceding 2 lines with:
# x1_min, x1_max = -10, 11
# x2_min, x2_max = -10, 11
resolution = max(x1_max - x1_min, x2_max - x2_min) / float(points)
xx1, xx2 = numpy.meshgrid(numpy.arange(x1_min,
x1_max,
resolution),
numpy.arange(x2_min, x2_max, resolution))
input = numpy.array([xx1.ravel(), xx2.ravel()]).T
Z = numpy.empty(0)
for i in range(input.shape[0]):
val = nn.predict(numpy.array(input[i]))
if val < 0.5:
val = 0
if val >= 0.5:
val = 1
Z = numpy.append(Z, val)
Z = Z.reshape(xx1.shape)
plt.pcolormesh(xx1, xx2, Z, cmap=cmap)
plt.xlim(xx1.min(), xx1.max())
plt.ylim(xx2.min(), xx2.max())
# plot all samples
classes = ["False", "True"]
for idx, cl in enumerate(numpy.unique(y)):
plt.scatter(x=X[y == cl, 0],
y=X[y == cl, 1],
alpha=1.0,
c=colors[idx],
edgecolors='black',
marker=markers[idx],
s=80,
label=classes[idx])
plt.xlabel('x-axis')
plt.ylabel('y-axis')
plt.legend(loc='upper left')
plt.show()
- In the following code block, we can see the code to run the entire process:
if __name__ == '__main__':
numpy.random.seed(0)
# Initialize the NeuralNetwork with 2 input, 2 hidden, and 1 output neurons
nn = NeuralNetwork([2, 2, 1])
X = numpy.array([[0, 0],
[0, 1],
[1, 0],
[1, 1]])
y = numpy.array([0, 1, 1, 0])
nn.fit(X, y, epochs=10)
print("Final prediction")
for s in X:
print(s, nn.predict(s))
nn.plot_decision_regions(X, y)
We use numpy.random.seed(0) to ensure that the weight initialization is consistent across runs, so we'll be able to compare results, but it's not necessary for the implementation of the neural net.
In the following diagrams, you can see how the nn.plot_decision_regions function method plots the hypersurfaces, which separate the classes. The circles represent the network output for the (True, True) and (False, False) inputs, while the triangles represent the (True, False) and (False, True) inputs for the XOR function:
The following diagram represents the output:


This is the same diagram, the top one zooming out, and the bottom one zooming in, on the selected inputs. The neural network learns to separate those points creating a band containing the two True output values. You can generate the zoomed-out image by modifying the x1_min, x1_max, x2_min, and x2_max variables in the lot_decision_regions function.
Networks with different architectures can produce different separating regions. We can try different combinations of hidden layers when we instantiate the network. When we build the default network, nn = NeuralNetwork([2,2,1]), the first and last values (2 and 1) represent the input and output layers and cannot be modified, but we can add different numbers of hidden layers with different numbers of neurons. For example, ([2,4,3,1]) will represent a 3-layer neural network, with four neurons in the first hidden layer and three neurons in the second hidden layer. You'll be able to see that while the network finds the right solution, the curves separating the regions will be different, depending on the chosen architecture.Now, nn = NeuralNetwork([2,4,3,1]) will produce the following separation:

And here is the separation for nn = NeuralNetwork([2,4,1]):

The architecture of the neural network defines the way the network goes about solving the problem at hand, and different architectures provide different approaches (though they may all give the same result). We are now ready to start looking more closely at what deep neural nets are and their applications.
- 程序員面試算法寶典
- Python高級編程
- C語言程序設計
- INSTANT Django 1.5 Application Development Starter
- 青少年Python編程入門
- Learning Apache Mahout Classification
- Access 2010數據庫應用技術(第2版)
- 大學計算機基礎
- Magento 2 Beginners Guide
- 深入理解Java虛擬機:JVM高級特性與最佳實踐
- 用Go語言自制編譯器
- Hands-On ROS for Robotics Programming
- Java EE互聯網輕量級框架整合開發:SSM+Redis+Spring微服務(上下冊)
- JavaScript程序設計基礎教程(慕課版)
- AngularJS by Example