- Deep Learning with PyTorch
- Vishnu Subramanian
- 182字
- 2021-06-24 19:16:28
ReLU
ReLU has become more popular in the recent years; we can find either its usage or one of its variants' usages in almost any modern architecture. It has a simple mathematical formulation:
f(x)=max(0,x)
In simple words, ReLU squashes any input that is negative to zero and leaves positive numbers as they are. We can visualize the ReLU function as follows:

Image source: http://datareview.info/article/eto-nuzhno-znat-klyuchevyie-rekomendatsii-po-glubokomu-obucheniyu-chast-2/
Some of the pros and cons of using ReLU are as follows:
- It helps the optimizer in finding the right set of weights sooner. More technically it makes the convergence of stochastic gradient descent faster.
- It is computationally inexpensive, as we are just thresholding and not calculating anything like we did for the sigmoid and tangent functions.
- ReLU has one disadvantage; when a large gradient passes through it during the backward propagation, they often become non-responsive; these are called dead neutrons, which can be controlled by carefully choosing the learning rate. We will discuss how to choose learning rates when we discuss the different ways to adjust the learning rate in Chapter 4, Fundamentals of Machine Learning.
推薦閱讀
- Intel FPGA/CPLD設計(基礎篇)
- Learning Cocos2d-x Game Development
- 深入理解Spring Cloud與實戰
- Linux KVM虛擬化架構實戰指南
- 極簡Spring Cloud實戰
- 深入淺出SSD:固態存儲核心技術、原理與實戰
- 現代辦公設備使用與維護
- The Applied AI and Natural Language Processing Workshop
- AMD FPGA設計優化寶典:面向Vivado/SystemVerilog
- 分布式微服務架構:原理與實戰
- 分布式系統與一致性
- Apple Motion 5 Cookbook
- CC2530單片機技術與應用
- OpenGL Game Development By Example
- 單片機開發與典型工程項目實例詳解