官术网_书友最值得收藏!

L1 regularization and Lasso

L1 regularization usually entails some loss of predictive power of the model. 

One of the properties of L1 regularization is to force the smallest weights to 0 and thereby reduce the number of features taken into account in the model. This is a desired behavior when the number of features (n) is large compared to the number of samples (N). L1 is better suited for datasets with many features. 

The Stochastic Gradient Descent algorithm with L1 regularization is known as the Least Absolute Shrinkage and Selection Operator (Lasso) algorithm.

In both cases the hyper-parameters of the model are as follows:

  • The learning rate  of the SGD algorithm
  • A parameter  to tune the amount of regularization added to the model

A third type of regularization called ElasticNet consists in adding both a L2 and a L1 regularization term to the model. This brings up the best of both regularization schemas at the expense of an extra hyper-parameter.

In other contexts, although experts have different opinions (https://www.quora.com/What-is-the-difference-between-L1-and-L2-regularization) on which type of regularization is more effective, the consensus seems to favor L2 over L1 regularization.

L2 and L1 regularization are both available in Amazon ML while ElasticNet is not. The amount of regularization available is limited to three values for  : mild (10-6), medium (10-4), and aggressive (10-2).

主站蜘蛛池模板: 三台县| 东乌| 开封县| 香格里拉县| 边坝县| 读书| 南汇区| 奉贤区| 灵寿县| 宝鸡市| 武宁县| 阳西县| 文登市| 新乡县| 屏南县| 宝坻区| 清原| 克什克腾旗| 万全县| 施甸县| 塔河县| 沙坪坝区| 章丘市| 华容县| 石狮市| 临泽县| 奇台县| 阿拉善盟| 宣恩县| 云安县| 耒阳市| 柳林县| 贵德县| 邛崃市| 洪洞县| 湄潭县| 金坛市| 呼玛县| 雅安市| 渭源县| 合肥市|