官术网_书友最值得收藏!

Vanishing and exploding gradients

These are very important issues in many deep neural networks. The deeper the architecture, the more likely it suffers from these issues. What is happening is that during the backpropagation stage, weights are adjusted in proportion to the gradient value. So we may have two different scenarios:

  • If the gradients are too small, then this is called the vanishing gradients problem. It makes the learning process very slow or even stops updating entirely. For example, using sigmoid as the activation function, where its derivatives are always smaller than 0.25, after a few layers of backpropagation, the lower layers will hardly receive any useful signals from the errors, thus the network is not updated properly.
  • If the gradients get too large then it can cause the learning to diverge, this is called exploding gradients. This often happens when the activation function is not bounded or the learning rate is too big.
主站蜘蛛池模板: 奉化市| 天等县| 佳木斯市| 哈巴河县| 建始县| 晋州市| 文山县| 通榆县| 普安县| 溧阳市| 平定县| 宝鸡市| 汝南县| 潼关县| 北票市| 江城| 宁陵县| 青浦区| 嘉黎县| 奉化市| 东乡| 湄潭县| 务川| 尤溪县| 永仁县| 恭城| 中阳县| 永安市| 天祝| 全椒县| 临西县| 雷波县| 阆中市| 甘德县| 甘德县| 修水县| 娱乐| 高清| 沂南县| 乐都县| 东海县|