官术网_书友最值得收藏!

LASSO

LASSO applies the L1-norm instead of the L2-norm as in ridge regression, which is the sum of the absolute value of the feature weights and so minimizes RSS + λ(sum |Bj|). This shrinkage penalty will indeed force a feature weight to zero. This is a clear advantage over ridge regression, as it may improve the model interpretability.

The mathematics behind the reason that the L1-norm allows the weights/coefficients to become zero is beyond the scope of this book (refer to Tibsharini, 1996 for further details).

If LASSO is so great, then ridge regression must be obsolete in machine learning. Not so fast! In a situation of high collinearity or high pairwise correlations, LASSO may force a predictive feature to zero, hence you can lose the predictive ability; that is, if both feature A and B should be in your model, LASSO may shrink one of their coefficients to zero. The following quote sums up this issue nicely:

"One might expect the lasso to perform better in a setting where a relatively small number of predictors have substantial coefficients, and the remaining predictors have coefficients that are very small or that equal zero. Ridge regression will perform better when the response is a function of many predictors, all with coefficients of roughly equal size."
– James, 2013

There is the possibility of achieving the best of both worlds and that leads us to the next topic, elastic net.

主站蜘蛛池模板: 义马市| 婺源县| 高青县| 永登县| 南溪县| 杭州市| 大洼县| 蓬安县| 疏勒县| 兴化市| 四会市| 阜新| 武宁县| 墨脱县| 通化县| 繁昌县| 东阿县| 元谋县| 玉屏| 兴和县| 湘潭市| 北碚区| 娄底市| 新蔡县| 聂拉木县| 修武县| 乌拉特中旗| 梨树县| 太保市| 永州市| 崇左市| 镇原县| 阿鲁科尔沁旗| 三江| 无锡市| 阜宁县| 饶河县| 毕节市| 桦甸市| 黑龙江省| 莒南县|