官术网_书友最值得收藏!

LASSO

It's a simple matter to update the code we used for ridge regression to accommodate LASSO. I'm going to change just two things: the random seed and I'll set alpha to 1:

> set.seed(1876)

> lasso <- glmnet::cv.glmnet(
x,
y,
nfolds = 5,
type.measure = "auc",
alpha = 1,
family = "binomial"
)

The plot of the model is quite interesting:

> plot(lasso)

The output of the preceding code is as follows:

You can now see the number of non-zero features as the Lambda changes. The number of features included at one standard error is just eight!

Let's have a gander at those coefficients:

> coef(lasso, s = "lambda.1se")
17 x 1 sparse Matrix of class "dgCMatrix"
1
(Intercept) -0.30046007
TwoFactor1 -0.53307368
TwoFactor2 0.52110703
Linear1 .
Linear2 -0.42669146
Linear3 0.35514853
Linear4 -0.20726177
Linear5 0.10381320
Linear6 .
Nonlinear1 0.10478862
Nonlinear2 .
Nonlinear3 .
Noise1 .
Noise2 .
Noise3 .
Noise4 .
random1 -0.06581589

Now, this looks much better. LASSO threw out those nonsense noise features and Linear1. However, before we start congratulating ourselves, look at how Linear6 was constrained to zero. Does it need to be in the model or not? We could undoubtedly adjust the lambda value and see where it enters and what effect it makes. 

It's time to check how it does on the training data:

> lasso_pred <-
data.frame(predict(
lasso,
newx = x,
type = "response",
s = "lambda.1se"
))

> Metrics::auc(y, lasso_pred$X1)
[1] 0.8621664

> classifierplots::density_plot(y, lasso_pred$X1)

The output of the preceding code is as follows:

These are quite similar results to those with ridge regression. Correct evaluation, however, is done on the test data:

> lasso_test <-
data.frame(predict(lasso, newx = as.matrix(test[, -17]), type = 'response'),
s = "lambda.1se")

> Metrics::auc(test$y, lasso_test$X1)
[1] 0.8684276

> Metrics::logLoss(test$y, lasso_test$X1)
[1] 0.4512764

> classifierplots::density_plot(test$y, lasso_test$X1)

The output of the preceding code is as follows:

The LASSO model does have a slightly lower AUC and marginally higher log-loss (0.45 versus 0.43). In the real world, I'm not sure that would be meaningful given that we have a more parsimonious model with LASSO. I guess that's another dimension alongside bias-variance, predictive power versus complexity.

Speaking of complexity, let's move on to elastic net.

主站蜘蛛池模板: 成都市| 丰镇市| 叶城县| 道孚县| 榆树市| 万盛区| 金秀| 织金县| 雷州市| 富锦市| 和平区| 怀远县| 吴忠市| 小金县| 柳江县| 宁蒗| 石首市| 四平市| 从江县| 大荔县| 寻乌县| 景洪市| 海伦市| 洪泽县| 北流市| 定西市| 阿克陶县| 尚志市| 长春市| 建湖县| 井冈山市| 晋宁县| 卓尼县| 包头市| 闸北区| 清丰县| 山东省| 清水河县| 花莲县| 琼海市| 武强县|