1小时赚100元游戏

書名： Hands-On Machine Learning with scikit：learn and Scientific Python Toolkits
作者名： Tarek Amr
本章字數： 1153字
更新時間： 2021-06-18 18:24:32

Regularizing the regressor

"It is vain to do with more what can be done with fewer."

– William of Occam

Originally, our objective was to minimize the MSE value of the regressor. Later on, we discovered that too many features are an issue. That's why we need a new objective. We still need to minimize the MSE value of the regressor, but we also need to incentivize the model to ignore the useless features. This second part of our objective is what regularization does in a nutshell.

Two algorithms are commonly used for regularized linear regression—lasso and ridge. Lasso pushes the model to have fewer coefficients—that is, it sets as many coefficients as possible to 0—while ridge pushes the model to have as small values as possible for its coefficients. Lasso uses a form of regularization called L1, which penalizes the absolute values of the coefficients, while ridge uses L2, which penalizes the squared values of the coefficients. These two algorithms have a hyperparameter (alpha), which controls how strongly the coefficients will be regularized. Setting alpha to 0 means no regularization at all, which brings us back to an ordinary least squares regressor. While larger values for alpha specify stronger regularization, we will start with the default value for alpha, and then see how to set it correctly later on.

The standard approach used in the ordinary least squares algorithm does not work here. We now have an objective function that aims to minimize the size of the coefficients, in addition to minimizing the predictor's MSE values. So, a solver is used to find the optimum coefficients to minimize the new objective functions. We will look further at solvers later in this chapter.

Training the lasso regressor

Training lasso is no different to training any other model. Similar to what we did in the previous section, we will set fit_intercept to False here:

from sklearn.linear_model import Ridge, Lasso

reg = Lasso(fit_intercept=False)
reg.fit(x_train_poly, y_train)

y_test_pred = reg.predict(x_test_poly)

Once done, we can print the R², MAE, and MSE:

R2 Regressor = 0.787 vs Baseline = -0.0
MAE Regressor = 2.381 vs Baseline = 6.2
MSE Regressor = 16.227 vs Baseline = 78.

Not only did we fix the problems introduced by the polynomial features, but we also have better performance than the original linear regressor. MAE is 2.4 here, compared to 3.6 before, MSE is 16.2, compared to 25.8 before, and R² is 0.79, compared to 0.73 before.

Now that we have seen promising results after applying regularization, it's time to see how to set an optimum value for the regularization parameter.

Finding the optimum regularization parameter

Ideally, after splitting the data into training and test sets, we would further split the training set into N folds. Then, we would make a list of all the values of alpha that we would like to test and loop over them one after the other. With each iteration, we would apply N-fold cross-validation to find the value for alpha that gives the minimal error. Thankfully, scikit-learn has a module called LassoCV (CV stands for cross-validation). Here, we are going to use this module to find the best value for alpha using five-fold cross-validation:

from sklearn.linear_model import LassoCV

# Make a list of 50 values between 0.000001 & 1,000,000
alphas = np.logspace(-6, 6, 50)

# We will do 5-fold cross validation
reg = LassoCV(alphas=alphas, fit_intercept=False, cv=5)
reg.fit(x_train_poly, y_train)

y_train_pred = reg.predict(x_train_poly)
y_test_pred = reg.predict(x_test_poly)

Once done, we can use the model for predictions. You may want to predict for both the training and test sets and see whether the model overfits on the training set. We can also print the chosen alpha, as follows:

print(f"LassoCV: Chosen alpha = {reg.alpha_}")

I got an alpha value of 1151.4.

Furthermore, we can also see, for each value of alpha, what the MSE value for each of the five folds was. We can access this information via mse_path_.

Since we have five values for MSE for each value of alpha, we can plot the mean of these five values, as well as the confidence interval around the mean.

The confidence interval is used to show the expected range that observed data may take. A 95% confidence interval means that we expect 95% of our values to fall within this range. Having a wide confidence interval means that the data may take a wide range of values, while a narrower confidence interval means that we can almost pinpoint exactly what value the data will take.

A 95% confidence interval is calculated as follows:

Here, the standard error is equal to the standard deviation divided by the square root of the number of samples (, since we have five folds here).

The equation for the confidence interval here is not 100% accurate. Statistically speaking, when dealing with small samples, and their underlying variance is not known, a t-distribution should be used instead of a z-distribution. Thus, given the small number of folds here, the 1.96 coefficient should be replaced by a more accurate value from the t-distribution table, where its degrees of freedom are inferred from the number of folds.

The following code snippets calculate and plot the confidence intervals for MSE versus alpha:

We start by calculating the descriptive statistics of the MSE values returned:

# n_folds equals to 5 here
n_folds = reg.mse_path_.shape[1]

# Calculate the mean and standard error for MSEs
mse_mean = reg.mse_path_.mean(axis=1)
mse_std = reg.mse_path_.std(axis=1)
# Std Error = Std Deviation / SQRT(number of samples)
mse_std_error = mse_std / np.sqrt(n_folds)

Then, we put our calculations into a data frame and plot them using the default line chart:

fig, ax = plt.subplots(1, 1, figsize=(16, 8))

# We multiply by 1.96 for a 95% Confidence Interval
pd.DataFrame(
    {
        'alpha': reg.alphas_,
        'Mean MSE': mse_mean,
        'Upper Bound MSE': mse_mean + 1.96 * mse_std_error,
        'Lower Bound MSE': mse_mean - 1.96 * mse_std_error,
    }
).set_index('alpha')[
    ['Mean MSE', 'Upper Bound MSE', 'Lower Bound MSE']
].plot(
    title='Regularization plot (MSE vs alpha)', 
    marker='.', logx=True, ax=ax
)

# Color the confidence interval 
plt.fill_between(
    reg.alphas_, 
    mse_mean + 1.96 * mse_std_error, 
    mse_mean - 1.96 * mse_std_error, 
)

# Print a vertical line for the chosen alpha
ax.axvline(reg.alpha_, linestyle='--', color='k')
ax.set_xlabel('Alpha')
ax.set_ylabel('Mean Squared Error')

Here is the output of the previous code:

The MSE value is lowest at the chosen alpha value. The confidence interval is also narrower there, which reflects more confidence in the expected MSE result.

Finally, setting the model's alpha value to the onesuggested and using it to make predictions for the test data gives us the following results:

Clearly, regularization fixed the issues caused by the curse of dimensionality earlier. Furthermore, we were able to use cross-validation to find the optimum regularization parameter. We plotted the confidence intervals of errors to visualize the effect of alpha on the regressor. The fact that I have been talking about the confidence intervals in this section inspired me to dedicate the next section to regression intervals.

官术网_书友最值得收藏!

Hands-On Machine Learning with scikit：learn and Scientific Python Toolkits

Regularizing the regressor

Training the lasso regressor

Finding the optimum regularization parameter