- Deep Learning with R for Beginners
- Mark Hodnett Joshua F. Wiley Yuxi (Hayden) Liu Pablo Maldonado
- 456字
- 2021-06-24 14:30:42
L1 penalty in action
To see how the L1 penalty works, we can use a simulated linear regression problem. The code for the rest of this chapter is in Chapter3/overfitting.R. We simulate the data, using a correlated set of predictors:
set.seed(1234)
X <- mvrnorm(n = 200, mu = c(0, 0, 0, 0, 0),
Sigma = matrix(c(
1, .9999, .99, .99, .10,
.9999, 1, .99, .99, .10,
.99, .99, 1, .99, .10,
.99, .99, .99, 1, .10,
.10, .10, .10, .10, 1
), ncol = 5))
y <- rnorm(200, 3 + X %*% matrix(c(1, 1, 1, 1, 0)), .5)
Next, we can fit an OLS regression model to the first 100 cases, and then use lasso. To use lasso, we use the glmnet() function from the glmnet package. This function can actually fit the L1 or the L2 (discussed in the next section) penalties, and which one occurs is determined by the argument, alpha. When alpha = 1, it is the L1 penalty (that is, lasso), and when alpha = 0, it is the L2 penalty (that is, ridge regression). Further, because we don't know which value of lambda we should pick, we can evaluate a range of options and tune this hyper-parameter automatically using cross-validation, which is the cv.glmnet() function. We can then plot the lasso object to see the mean squared error for a variety of lambda values to allow us to select the correct level of regularization:
m.ols <- lm(y[1:100] ~ X[1:100, ])
m.lasso.cv <- cv.glmnet(X[1:100, ], y[1:100], alpha = 1)
plot(m.lasso.cv)

One thing that we can see from the graph is that, when the penalty gets too high, the cross-validated model increases. Indeed, lasso seems to do well with very low lambda values, perhaps indicating lasso does not help improve out-of-sample performance/generalizability much for this dataset. For the sake of this example, we will continue but in actual use, this might give us pause to consider whether lasso was really helping. Finally, we can compare the coefficients with those from lasso:
cbind(OLS = coef(m.ols),Lasso = coef(m.lasso.cv)[,1])
OLS Lasso
(Intercept) 2.958 2.99
X[1:100, ]1 -0.082 1.41
X[1:100, ]2 2.239 0.71
X[1:100, ]3 0.602 0.51
X[1:100, ]4 1.235 1.17
X[1:100, ]5 -0.041 0.00
Notice that the OLS coefficients are noisier and also that, in lasso, predictor 5 is penalized to 0. Recall from the simulated data that the true coefficients are 3, 1, 1, 1, 1, and 0. The OLS estimates have much too low a value for the first predictor and much too high a value for the second, whereas lasso has more accurate values for each. This demonstrates that lasso regression generalizes better than OLS regression for this dataset.
- 大數據技術基礎
- 數據產品經理高效學習手冊:產品設計、技術常識與機器學習
- Hands-On Machine Learning with Microsoft Excel 2019
- UDK iOS Game Development Beginner's Guide
- Lean Mobile App Development
- 大數據時代下的智能轉型進程精選(套裝共10冊)
- 數據驅動:從方法到實踐
- 數亦有道:Python數據科學指南
- Starling Game Development Essentials
- 數據挖掘原理與SPSS Clementine應用寶典
- Learning Proxmox VE
- Oracle PL/SQL實例精解(原書第5版)
- 辦公應用與計算思維案例教程
- Solaris操作系統原理實驗教程
- 深入理解InfluxDB:時序數據庫詳解與實踐