官术网_书友最值得收藏!

L1 penalty in action

To see how the L1 penalty works, we can use a simulated linear regression problem. The code for the rest of this chapter is in Chapter3/overfitting.R. We simulate the data, using a correlated set of predictors:

set.seed(1234)
X <- mvrnorm(n = 200, mu = c(0, 0, 0, 0, 0),
Sigma = matrix(c(
1, .9999, .99, .99, .10,
.9999, 1, .99, .99, .10,
.99, .99, 1, .99, .10,
.99, .99, .99, 1, .10,
.10, .10, .10, .10, 1
), ncol = 5))
y <- rnorm(200, 3 + X %*% matrix(c(1, 1, 1, 1, 0)), .5)

Next, we can fit an OLS regression model to the first 100 cases, and then use lasso. To use lasso, we use the glmnet() function from the glmnet package. This function can actually fit the L1 or the L2 (discussed in the next section) penalties, and which one occurs is determined by the argument, alpha. When alpha = 1, it is the L1 penalty (that is, lasso), and when alpha = 0, it is the L2 penalty (that is, ridge regression). Further, because we don't know which value of lambda we should pick, we can evaluate a range of options and tune this hyper-parameter automatically using cross-validation, which is the cv.glmnet() function. We can then plot the lasso object to see the mean squared error for a variety of lambda values to allow us to select the correct level of regularization:

m.ols <- lm(y[1:100] ~ X[1:100, ])
m.lasso.cv <- cv.glmnet(X[1:100, ], y[1:100], alpha = 1)
plot(m.lasso.cv)
Figure 3.10: Lasso regularization

One thing that we can see from the graph is that, when the penalty gets too high, the cross-validated model increases. Indeed, lasso seems to do well with very low lambda values, perhaps indicating lasso does not help improve out-of-sample performance/generalizability much for this dataset. For the sake of this example, we will continue but in actual use, this might give us pause to consider whether lasso was really helping. Finally, we can compare the coefficients with those from lasso:

cbind(OLS = coef(m.ols),Lasso = coef(m.lasso.cv)[,1])
OLS Lasso
(Intercept) 2.958 2.99
X[1:100, ]1 -0.082 1.41
X[1:100, ]2 2.239 0.71
X[1:100, ]3 0.602 0.51
X[1:100, ]4 1.235 1.17
X[1:100, ]5 -0.041 0.00

Notice that the OLS coefficients are noisier and also that, in lasso, predictor 5 is penalized to 0. Recall from the simulated data that the true coefficients are 3, 1, 1, 1, 1, and 0. The OLS estimates have much too low a value for the first predictor and much too high a value for the second, whereas lasso has more accurate values for each. This demonstrates that lasso regression generalizes better than OLS regression for this dataset.

主站蜘蛛池模板: 岱山县| 晋州市| 克什克腾旗| 福泉市| 梁平县| 吉木萨尔县| 泌阳县| 巴彦淖尔市| 稻城县| 桐城市| 柳江县| 东乌珠穆沁旗| 余庆县| 敖汉旗| 衡南县| 安康市| 无极县| 迁安市| 新竹县| 法库县| 凌云县| 天全县| 普兰县| 镇安县| 龙门县| 溧水县| 汝州市| 吉水县| 许昌县| 凌源市| 亚东县| 北海市| 乌海市| 香格里拉县| 广昌县| 白玉县| 迁西县| 湛江市| 甘南县| 松潘县| 大洼县|