- Deep Learning Essentials
- Wei Di Anurag Bhardwaj Jianing Wei
- 155字
- 2021-06-30 19:17:55
Vanishing and exploding gradients
These are very important issues in many deep neural networks. The deeper the architecture, the more likely it suffers from these issues. What is happening is that during the backpropagation stage, weights are adjusted in proportion to the gradient value. So we may have two different scenarios:
- If the gradients are too small, then this is called the vanishing gradients problem. It makes the learning process very slow or even stops updating entirely. For example, using sigmoid as the activation function, where its derivatives are always smaller than 0.25, after a few layers of backpropagation, the lower layers will hardly receive any useful signals from the errors, thus the network is not updated properly.
- If the gradients get too large then it can cause the learning to diverge, this is called exploding gradients. This often happens when the activation function is not bounded or the learning rate is too big.
推薦閱讀
- Dreamweaver CS3+Flash CS3+Fireworks CS3創(chuàng)意網(wǎng)站構(gòu)建實例詳解
- 高性能混合信號ARM:ADuC7xxx原理與應(yīng)用開發(fā)
- 現(xiàn)代機械運動控制技術(shù)
- 精通特征工程
- 網(wǎng)絡(luò)組建與互聯(lián)
- 基于ARM 32位高速嵌入式微控制器
- 傳感器與物聯(lián)網(wǎng)技術(shù)
- 中國戰(zhàn)略性新興產(chǎn)業(yè)研究與發(fā)展·工業(yè)機器人
- 工業(yè)自動化技術(shù)實訓(xùn)指導(dǎo)
- Web編程基礎(chǔ)
- 數(shù)據(jù)清洗
- PostgreSQL High Performance Cookbook
- Raspberry Pi 3 Projects for Java Programmers
- SolarWinds Server & Application Monitor:Deployment and Administration
- 設(shè)計中的人因:34個設(shè)計小故事