官术网_书友最值得收藏!

LASSO

It's a simple matter to update the code we used for ridge regression to accommodate LASSO. I'm going to change just two things: the random seed and I'll set alpha to 1:

> set.seed(1876)

> lasso <- glmnet::cv.glmnet(
x,
y,
nfolds = 5,
type.measure = "auc",
alpha = 1,
family = "binomial"
)

The plot of the model is quite interesting:

> plot(lasso)

The output of the preceding code is as follows:

You can now see the number of non-zero features as the Lambda changes. The number of features included at one standard error is just eight!

Let's have a gander at those coefficients:

> coef(lasso, s = "lambda.1se")
17 x 1 sparse Matrix of class "dgCMatrix"
1
(Intercept) -0.30046007
TwoFactor1 -0.53307368
TwoFactor2 0.52110703
Linear1 .
Linear2 -0.42669146
Linear3 0.35514853
Linear4 -0.20726177
Linear5 0.10381320
Linear6 .
Nonlinear1 0.10478862
Nonlinear2 .
Nonlinear3 .
Noise1 .
Noise2 .
Noise3 .
Noise4 .
random1 -0.06581589

Now, this looks much better. LASSO threw out those nonsense noise features and Linear1. However, before we start congratulating ourselves, look at how Linear6 was constrained to zero. Does it need to be in the model or not? We could undoubtedly adjust the lambda value and see where it enters and what effect it makes. 

It's time to check how it does on the training data:

> lasso_pred <-
data.frame(predict(
lasso,
newx = x,
type = "response",
s = "lambda.1se"
))

> Metrics::auc(y, lasso_pred$X1)
[1] 0.8621664

> classifierplots::density_plot(y, lasso_pred$X1)

The output of the preceding code is as follows:

These are quite similar results to those with ridge regression. Correct evaluation, however, is done on the test data:

> lasso_test <-
data.frame(predict(lasso, newx = as.matrix(test[, -17]), type = 'response'),
s = "lambda.1se")

> Metrics::auc(test$y, lasso_test$X1)
[1] 0.8684276

> Metrics::logLoss(test$y, lasso_test$X1)
[1] 0.4512764

> classifierplots::density_plot(test$y, lasso_test$X1)

The output of the preceding code is as follows:

The LASSO model does have a slightly lower AUC and marginally higher log-loss (0.45 versus 0.43). In the real world, I'm not sure that would be meaningful given that we have a more parsimonious model with LASSO. I guess that's another dimension alongside bias-variance, predictive power versus complexity.

Speaking of complexity, let's move on to elastic net.

主站蜘蛛池模板: 望城县| 沙坪坝区| 西充县| 普兰店市| 桦川县| 石景山区| 长泰县| 黄骅市| 中牟县| 桃江县| 西昌市| 阿尔山市| 英德市| 电白县| 山阳县| 米脂县| 贡山| 高清| 达拉特旗| 娄烦县| 洛浦县| 双流县| 乐安县| 崇阳县| 郸城县| 寻乌县| 武强县| 台北县| 麻栗坡县| 普宁市| 昌都县| 安远县| 固阳县| 子洲县| 临泉县| 五家渠市| 盐亭县| 铜川市| 麻城市| 山阳县| 苏尼特右旗|