官术网_书友最值得收藏!

Choosing the right activation function

In most cases, we should always consider ReLU first. But keep in mind that ReLU should only be applied to hidden layers. If your model suffers from dead neurons, then think about adjusting your learning rate, or try Leaky ReLU or maxout.

It is not recommended to use either sigmoid or tanh as they suffer from the vanishing gradient problem and also converge very slowly. Take sigmoid for example. Its derivative is greater than 0.25 everywhere, making terms during backpropagating even smaller. While for ReLU, its derivative is one at every point above zero, thus creating a more stable network.

Now you have gained a basic knowledge of the key components in neural networks, let's move on to understanding how the networks learn from data. 

主站蜘蛛池模板: 民丰县| 钦州市| 简阳市| 潢川县| 沾益县| 梓潼县| 黄梅县| 寻乌县| 米林县| 乐安县| 东海县| 灵台县| 东方市| 尤溪县| 门源| 固始县| 东乡族自治县| 黑水县| 三河市| 衢州市| 大港区| 阳谷县| 运城市| 北碚区| 陆良县| 余干县| 九寨沟县| 任丘市| 钟祥市| 若尔盖县| 综艺| 庆元县| 东明县| 什邡市| 石泉县| 通道| 乌拉特中旗| 西华县| 鄂州市| 扶绥县| 湘乡市|