官术网_书友最值得收藏!

The rectified linear unit function

The rectified linear unit, better known as ReLU, is the most widely used activation function:

The ReLU function has the advantage of being non linear. Thus, backpropagation is easy and can therefore stack multiple hidden layers activated by the ReLU function, where for x<=0, the function f(x) = 0 and for x>0, f(x)=x.

The main advantage of the ReLU function over other activation functions is that it does not activate all the neurons at the same time. This can be observed from the preceding graph of the ReLU function, where we see that if the input is negative it outputs zero and the neuron does not activate. This results in a sparse network, and fast and easy computation.

Derivative graph of ReLU, shows f'(x) = 0 for x<=0 and f'(x) = 1 for x>0

Looking at the preceding gradients graph of ReLU preceding, we can see the negative side of the graph shows a constant zero. Therefore, activations falling in that region will have zero gradients and therefore, weights will not get updated. This leads to inactivity of the nodes/neurons as they will not learn. To overcome this problem, we have Leaky ReLUs, which modify the function as:

This prevents the gradient from becoming zero in the negative side and the weight training continues, but slowly, owing to the low value of .

主站蜘蛛池模板: 浦县| 黑水县| 全南县| 洪江市| 庆城县| 株洲市| 锡林郭勒盟| 安西县| 甘南县| 龙口市| 灵宝市| 乃东县| 桐梓县| 兴宁市| 长治县| 和平区| 中超| 张北县| 博湖县| 房产| 湘潭市| 榆社县| 常熟市| 九台市| 赤城县| 田东县| 昆山市| 应用必备| 黔西| 成安县| 兴宁市| 湾仔区| 北碚区| 姜堰市| 龙山县| 安顺市| 五原县| 仙桃市| 吉木乃县| 玉溪市| 曲阳县|