官术网_书友最值得收藏!

Choosing the right activation function

In most cases, we should always consider ReLU first. But keep in mind that ReLU should only be applied to hidden layers. If your model suffers from dead neurons, then think about adjusting your learning rate, or try Leaky ReLU or maxout.

It is not recommended to use either sigmoid or tanh as they suffer from the vanishing gradient problem and also converge very slowly. Take sigmoid for example. Its derivative is greater than 0.25 everywhere, making terms during backpropagating even smaller. While for ReLU, its derivative is one at every point above zero, thus creating a more stable network.

Now you have gained a basic knowledge of the key components in neural networks, let's move on to understanding how the networks learn from data. 

主站蜘蛛池模板: 巴楚县| 防城港市| 桐柏县| 原平市| 禹州市| 德安县| 紫金县| 镇原县| 山阳县| 石门县| 天长市| 鄂伦春自治旗| 双江| 嘉黎县| 贵南县| 鄂州市| 沾化县| 图们市| 黄龙县| 资阳市| 博爱县| 仙居县| 久治县| 莆田市| 金华市| 云龙县| 司法| 托克逊县| 勐海县| 邯郸县| 鹤山市| 潞西市| 蚌埠市| 宜城市| 古浪县| 延寿县| 潜山县| 博罗县| 吐鲁番市| 永城市| 平和县|