官术网_书友最值得收藏!

Which activation functions to use?

Given that neural networks are to support nonlinearity and more complexity, the activation function to be used has to be robust enough to have the following:

  • It should be differential; we will see why we need differentiation in backpropagation. It should not cause gradients to vanish.
  • It should be simple and fast in processing.
  • It should not be zero centered.

The sigmoid is the most used activation function, but it suffers from the following setbacks:

  • Since it uses logistic model, the computations are time consuming and complex
  • It cause gradients to vanish and no signals pass through the neurons at some point of time
  • It is slow in convergence
  • It is not zero centered

These drawbacks are solved by ReLU. ReLU is simple and is faster to process. It does not have the vanishing gradient problem and has shown vast improvements compared to the sigmoid and tanh functions. ReLU is the most preferred activation function for neural networks and DL problems.

ReLU is used for hidden layers, while the output layer can use a softmax function for logistic problems and a linear function of regression problems.

主站蜘蛛池模板: 罗定市| 岳阳县| 民和| 临颍县| 大丰市| 达日县| 永安市| 武山县| 洞口县| 确山县| 大姚县| 鹤庆县| 宁强县| 长武县| 靖安县| 美姑县| 天峻县| 德州市| 肇庆市| 贵定县| 贵德县| 平乡县| 安龙县| 礼泉县| 蒙自县| 北京市| 长宁区| 翼城县| 聂荣县| 卫辉市| 丽水市| 丰县| 保亭| 四川省| 得荣县| 新竹市| 芦溪县| 台东市| 东方市| 丰台区| 兴宁市|