書名： Python Deep Learning
作者名： Ivan Vasilev Daniel Slater Gianmario Spacagna Peter Roelants Valentino Zocca
本章字數(shù)： 654字
更新時間： 2021-07-02 14:31:05

Putting it all together with an example

As we already mentioned, multi-layer neural networks can classify linearly separable classes. In fact, the Universal Approximation Theorem states that any continuous functions on compact subsets of Rn can be approximated by a neural network with at least one hidden layer. The formal proof of such a theorem is too complex to be explained here, but we'll attempt to give an intuitive explanation using some basic mathematics. We'll implement a neural network that approximates the boxcar function, in the following diagram on the right, which is a simple type of step function. Since a series of step functions can approximate any continuous function on a compact subset of R, this will give us an idea of why the Universal Approximation Theorem holds:

The diagram on the left depicts continuous function approximation with a series of step functions, while the diagram on the right illustrates a single boxcar step function

To do this, we'll use the logistic sigmoid activation function. As we know, the logistic sigmoid is defined as , where :

Let's assume that we have only one input neuron, x = x₁
In the following diagrams, we can see that by making w very large, the sigmoid becomes close to a step function. On the other hand, b will simply translate the function along the x axis, and the translation t will be equal to -b/w (t = -b/w):

On the left, we have a standard sigmoid with a weight of 1 and a bias of 0; in the middle, we have a sigmoid with a weight of 10; and on the right, we have a sigmoid with a weight of 10 and a bias of 50

With this in mind, let's define the architecture of our network. It will have a single input neuron, one hidden layer with two neurons, and a single output neuron:

Both hidden neurons use the logistic sigmoid activation. The weights and biases of the network are organized in such a way as to take advantage of the sigmoid properties we described previously. The top neuron will initiate the first transition t₁ (0 to 1), and then, after some time has elapsed, the second neuron will initiate the opposite transition t₂. The following code implements this example:

# The user can modify the values of the weight w
# as well as bias_value_1 and bias_value_2 to observe
# how this plots to different step functions

import matplotlib.pyplot as plt
import numpy

weight_value = 1000

# modify to change where the step function starts
bias_value_1 = 5000

# modify to change where the step function ends
bias_value_2 = -5000

# plot the
plt.axis([-10, 10, -1, 10])

print("The step function starts at {0} and ends at {1}"
      .format(-bias_value_1 / weight_value,
              -bias_value_2 / weight_value))

inputs = numpy.arange(-10, 10, 0.01)
outputs = list()

# iterate over a range of inputs
for x in inputs:
    y1 = 1.0 / (1.0 + numpy.exp(-weight_value * x - bias_value_1))
    y2 = 1.0 / (1.0 + numpy.exp(-weight_value * x - bias_value_2))

    # modify to change the height of the step function
    w = 7

    # network output
    y = y1 * w - y2 * w

    outputs.append(y)

plt.plot(inputs, outputs, lw=2, color='black')
plt.show()

We set large values for weight_value, bias_value_1, and bias_value_2. In this way, the expressions numpy.exp(-weight_value * x - bias_value_1) and numpy.exp(-weight_value * x - bias_value_2) can switch between 0 and infinity in a very short interval of the input. In turn,y1 and y2 will switch between 1 and 0. This would make for a stepwise (as opposed to gradual) logistic sigmoid shape, as explained previously. Because the numpy.exp expressions get an infinity value, the code will produce overflow encountered in exp warning, but this is normal.

This code, when executed, produces the following result:

官术网_书友最值得收藏!

Python Deep Learning

Putting it all together with an example