- Deep Learning Essentials
- Wei Di Anurag Bhardwaj Jianing Wei
- 155字
- 2021-06-30 19:17:55
Vanishing and exploding gradients
These are very important issues in many deep neural networks. The deeper the architecture, the more likely it suffers from these issues. What is happening is that during the backpropagation stage, weights are adjusted in proportion to the gradient value. So we may have two different scenarios:
- If the gradients are too small, then this is called the vanishing gradients problem. It makes the learning process very slow or even stops updating entirely. For example, using sigmoid as the activation function, where its derivatives are always smaller than 0.25, after a few layers of backpropagation, the lower layers will hardly receive any useful signals from the errors, thus the network is not updated properly.
- If the gradients get too large then it can cause the learning to diverge, this is called exploding gradients. This often happens when the activation function is not bounded or the learning rate is too big.
推薦閱讀
- 零起步輕松學單片機技術(第2版)
- 自動控制工程設計入門
- 面向STEM的mBlock智能機器人創新課程
- Introduction to DevOps with Kubernetes
- ETL with Azure Cookbook
- Ansible Quick Start Guide
- 21天學通C++
- DevOps:Continuous Delivery,Integration,and Deployment with DevOps
- 大學C/C++語言程序設計基礎
- 大數據驅動的設備健康預測及維護決策優化
- 三菱FX/Q系列PLC工程實例詳解
- 工業機器人集成應用
- 手把手教你學Photoshop CS3
- Eclipse RCP應用系統開發方法與實戰
- Hands-On Microservices with C#