- Python Deep Learning
- Ivan Vasilev Daniel Slater Gianmario Spacagna Peter Roelants Valentino Zocca
- 645字
- 2021-07-02 14:31:04
Different types of activation function
We now know that multi-layer networks can classify linearly inseparable classes. But to do this, they need to satisfy one more condition. If the neurons don't have activation functions, their output would be the weighted sum of the inputs, , which is a linear function. Then the entire neural network, that is, a composition of neurons, becomes a composition of linear functions, which is also a linear function. This means that even if we add hidden layers, the network will still be equivalent to a simple linear regression model, with all its limitations. To turn the network into a non-linear function, we'll use non-linear activation functions for the neurons. Usually, all neurons in the same layer have the same activation function, but different layers may have different activation functions. The most common activation functions are as follows:
: This function lets the activation value go through and is called the identity function.
: This function activates the neuron; if the activation is above a certain value, it's called the threshold activity function.
: This function is one of the most commonly used, as its output is bounded between 0 and 1, and it can be interpreted stochastically as the probability of the neuron activating. It's commonly called the logistic function, or the logistic sigmoid.
: This activation function is called bipolar sigmoid, and it's simply a logistic sigmoid rescaled and translated to have a range in (-1, 1).
: This activation function is called the hyperbolic tangent (or tanh).
: This activation function is probably the closest to its biological counterpart. It's a mix of the identity and the threshold function, and it's called the rectifier, or ReLU, as in Rectified Linear Unit. There are variations on the ReLU, such as Noisy ReLU, Leaky ReLU, and ELU (Exponential Linear Unit).
The identity activation function, or the threshold function, was widely used at the inception of neural networks with implementations such as the perceptron or the Adaline (adaptive linear neuron), but subsequently lost traction in favor of the logistic sigmoid, the hyperbolic tangent, or the ReLU and its variations. The latter three activation functions differ in the following ways:
- Their range is different.
- Their derivatives behave differently during training.
The range for the logistic function is (0,1), which is one reason why this is the preferred function for stochastic networks, in other words, networks with neurons that may activate based on a probability function. The hyperbolic function is very similar to the logistic function, but its range is (-1, 1). In contrast, the ReLU has a range of (0, ∞).
But let's look at the derivative (or the gradient) for each of the three functions, which is important for the training of the network. This is similar to how, in the linear regression example that we introduced in Chapter 1, Machine Learning – an Introduction, we were trying to minimize the function, following it along the direction opposite to its derivative.
For a logistic function f, the derivative is f * (1-f), while if f is the hyperbolic tangent, its derivative is (1+f) * (1-f).


If f is the ReLU, the derivative is much simpler, that is, . Later in the book, we'll see the deep networks exhibit the vanishing gradients problem, and the advantage of the ReLU is that its derivative is constant and does not tend to zero as a becomes large.
- OpenStack Cloud Computing Cookbook(Third Edition)
- 零基礎學C++程序設計
- Effective C#:改善C#代碼的50個有效方法(原書第3版)
- Arduino by Example
- Building a Game with Unity and Blender
- Java高手真經(高級編程卷):Java Web高級開發技術
- Visual C++數字圖像處理技術詳解
- Learning Python by Building Games
- C語言程序設計學習指導與習題解答
- PLC應用技術(三菱FX2N系列)
- Unity 2017 Mobile Game Development
- 第一行代碼 C語言(視頻講解版)
- Java網絡編程核心技術詳解(視頻微課版)
- Solr Cookbook(Third Edition)
- Azure Serverless Computing Cookbook