官术网_书友最值得收藏!

  • Deep Learning Essentials
  • Wei Di Anurag Bhardwaj Jianing Wei
  • 155字
  • 2021-06-30 19:17:55

Vanishing and exploding gradients

These are very important issues in many deep neural networks. The deeper the architecture, the more likely it suffers from these issues. What is happening is that during the backpropagation stage, weights are adjusted in proportion to the gradient value. So we may have two different scenarios:

  • If the gradients are too small, then this is called the vanishing gradients problem. It makes the learning process very slow or even stops updating entirely. For example, using sigmoid as the activation function, where its derivatives are always smaller than 0.25, after a few layers of backpropagation, the lower layers will hardly receive any useful signals from the errors, thus the network is not updated properly.
  • If the gradients get too large then it can cause the learning to diverge, this is called exploding gradients. This often happens when the activation function is not bounded or the learning rate is too big.
主站蜘蛛池模板: 岚皋县| 遂川县| 高雄市| 丘北县| 蓝山县| 德安县| 崇明县| 石林| 淮阳县| 台安县| 垫江县| 彭水| 万安县| 沙坪坝区| 虎林市| 冕宁县| 新竹县| 中卫市| 安龙县| 英超| 永靖县| 清水河县| 长乐市| 旺苍县| 平塘县| 彭阳县| 仁怀市| 凌云县| 南陵县| 隆化县| 南皮县| 台湾省| 仁化县| 安塞县| 彝良县| 吉安县| 许昌县| 遂川县| 肥乡县| 福贡县| 辰溪县|