官术网_书友最值得收藏!

The hyperbolic tangent activation function

The output, y, of a hyperbolic tangent activation function (tanh) as a function of its total input, x, is given as follows:

The tanh activation function outputs values in the range [-11], as you can see in the following graph:

Figure 1.7: Tanh activation function

One thing to note is that both the sigmoid and the tanh activation functions are linear within a small range of the input, beyond which the output saturates. In the saturation zone, the gradients of the activation functions (with respect to the input) are very small or close to zero; this means that they are very prone to the vanishing gradient problem. As you will see later on, neural networks learn from the backpropagation method, where the gradient of a layer is dependent on the gradients of the activation units in the succeeding layers, up to the final output layer. Therefore, if the units in the activation units are working in the saturation region, much less of the error is backpropagated to the early layers of the neural network. Neural networks minimize the prediction error in order to learn the weights and biases (W) by utilizing the gradients. This means that, if the gradients are small or vanish to zero, then the neural network will fail to learn these weights properly.

主站蜘蛛池模板: 桃源县| 兴山县| 浦城县| 固镇县| 临潭县| 清新县| 永新县| 景东| 若尔盖县| 桂阳县| 东明县| 长垣县| 奎屯市| 海门市| 旬阳县| 海门市| 揭阳市| 宁陕县| 马公市| 罗源县| 沙洋县| 瓮安县| 新和县| 安国市| 新建县| 丰都县| 麻江县| 黄平县| 盘山县| 南京市| 集安市| 平凉市| 武川县| 成安县| 临夏市| 边坝县| 璧山县| 济宁市| 滁州市| 张北县| 龙山县|