官术网_书友最值得收藏!

Vanishing and exploding gradients

These are very important issues in many deep neural networks. The deeper the architecture, the more likely it suffers from these issues. What is happening is that during the backpropagation stage, weights are adjusted in proportion to the gradient value. So we may have two different scenarios:

  • If the gradients are too small, then this is called the vanishing gradients problem. It makes the learning process very slow or even stops updating entirely. For example, using sigmoid as the activation function, where its derivatives are always smaller than 0.25, after a few layers of backpropagation, the lower layers will hardly receive any useful signals from the errors, thus the network is not updated properly.
  • If the gradients get too large then it can cause the learning to diverge, this is called exploding gradients. This often happens when the activation function is not bounded or the learning rate is too big.
主站蜘蛛池模板: 武穴市| 延津县| 宜丰县| 壤塘县| 淳安县| 铁岭市| 神农架林区| 克什克腾旗| 南安市| 阳高县| 博兴县| 当雄县| 浦东新区| 黑水县| 轮台县| 襄汾县| 临武县| 湘西| 响水县| 邹城市| 湘潭县| 义乌市| 马鞍山市| 诸城市| 阿拉善盟| 抚顺县| 隆尧县| 五台县| 巨鹿县| 陇西县| 三原县| 巫山县| 新乡县| 长子县| 沾益县| 鄯善县| 梅河口市| 沧源| 克拉玛依市| 盘山县| 灵川县|