- Hands-On Machine Learning with scikit:learn and Scientific Python Toolkits
- Tarek Amr
- 1153字
- 2021-06-18 18:24:32
Regularizing the regressor
Originally, our objective was to minimize the MSE value of the regressor. Later on, we discovered that too many features are an issue. That's why we need a new objective. We still need to minimize the MSE value of the regressor, but we also need to incentivize the model to ignore the useless features. This second part of our objective is what regularization does in a nutshell.
Two algorithms are commonly used for regularized linear regression—lasso and ridge. Lasso pushes the model to have fewer coefficients—that is, it sets as many coefficients as possible to 0—while ridge pushes the model to have as small values as possible for its coefficients. Lasso uses a form of regularization called L1, which penalizes the absolute values of the coefficients, while ridge uses L2, which penalizes the squared values of the coefficients. These two algorithms have a hyperparameter (alpha), which controls how strongly the coefficients will be regularized. Setting alpha to 0 means no regularization at all, which brings us back to an ordinary least squares regressor. While larger values for alpha specify stronger regularization, we will start with the default value for alpha, and then see how to set it correctly later on.
Training the lasso regressor
Training lasso is no different to training any other model. Similar to what we did in the previous section, we will set fit_intercept to False here:
from sklearn.linear_model import Ridge, Lasso
reg = Lasso(fit_intercept=False)
reg.fit(x_train_poly, y_train)
y_test_pred = reg.predict(x_test_poly)
Once done, we can print the R2, MAE, and MSE:
R2 Regressor = 0.787 vs Baseline = -0.0 MAE Regressor = 2.381 vs Baseline = 6.2 MSE Regressor = 16.227 vs Baseline = 78.
Not only did we fix the problems introduced by the polynomial features, but we also have better performance than the original linear regressor. MAE is 2.4 here, compared to 3.6 before, MSE is 16.2, compared to 25.8 before, and R2 is 0.79, compared to 0.73 before.
Now that we have seen promising results after applying regularization, it's time to see how to set an optimum value for the regularization parameter.
Finding the optimum regularization parameter
Ideally, after splitting the data into training and test sets, we would further split the training set into N folds. Then, we would make a list of all the values of alpha that we would like to test and loop over them one after the other. With each iteration, we would apply N-fold cross-validation to find the value for alpha that gives the minimal error. Thankfully, scikit-learn has a module called LassoCV (CV stands for cross-validation). Here, we are going to use this module to find the best value for alpha using five-fold cross-validation:
from sklearn.linear_model import LassoCV
# Make a list of 50 values between 0.000001 & 1,000,000
alphas = np.logspace(-6, 6, 50)
# We will do 5-fold cross validation
reg = LassoCV(alphas=alphas, fit_intercept=False, cv=5)
reg.fit(x_train_poly, y_train)
y_train_pred = reg.predict(x_train_poly)
y_test_pred = reg.predict(x_test_poly)
Once done, we can use the model for predictions. You may want to predict for both the training and test sets and see whether the model overfits on the training set. We can also print the chosen alpha, as follows:
print(f"LassoCV: Chosen alpha = {reg.alpha_}")
I got an alpha value of 1151.4.
Furthermore, we can also see, for each value of alpha, what the MSE value for each of the five folds was. We can access this information via mse_path_.
Since we have five values for MSE for each value of alpha, we can plot the mean of these five values, as well as the confidence interval around the mean.
A 95% confidence interval is calculated as follows:
Here, the standard error is equal to the standard deviation divided by the square root of the number of samples (, since we have five folds here).
The following code snippets calculate and plot the confidence intervals for MSE versus alpha:
- We start by calculating the descriptive statistics of the MSE values returned:
# n_folds equals to 5 here
n_folds = reg.mse_path_.shape[1]
# Calculate the mean and standard error for MSEs
mse_mean = reg.mse_path_.mean(axis=1)
mse_std = reg.mse_path_.std(axis=1)
# Std Error = Std Deviation / SQRT(number of samples)
mse_std_error = mse_std / np.sqrt(n_folds)
- Then, we put our calculations into a data frame and plot them using the default line chart:
fig, ax = plt.subplots(1, 1, figsize=(16, 8))
# We multiply by 1.96 for a 95% Confidence Interval
pd.DataFrame(
{
'alpha': reg.alphas_,
'Mean MSE': mse_mean,
'Upper Bound MSE': mse_mean + 1.96 * mse_std_error,
'Lower Bound MSE': mse_mean - 1.96 * mse_std_error,
}
).set_index('alpha')[
['Mean MSE', 'Upper Bound MSE', 'Lower Bound MSE']
].plot(
title='Regularization plot (MSE vs alpha)',
marker='.', logx=True, ax=ax
)
# Color the confidence interval
plt.fill_between(
reg.alphas_,
mse_mean + 1.96 * mse_std_error,
mse_mean - 1.96 * mse_std_error,
)
# Print a vertical line for the chosen alpha
ax.axvline(reg.alpha_, linestyle='--', color='k')
ax.set_xlabel('Alpha')
ax.set_ylabel('Mean Squared Error')
Here is the output of the previous code:

The MSE value is lowest at the chosen alpha value. The confidence interval is also narrower there, which reflects more confidence in the expected MSE result.
Finally, setting the model's alpha value to the onesuggested and using it to make predictions for the test data gives us the following results:

Clearly, regularization fixed the issues caused by the curse of dimensionality earlier. Furthermore, we were able to use cross-validation to find the optimum regularization parameter. We plotted the confidence intervals of errors to visualize the effect of alpha on the regressor. The fact that I have been talking about the confidence intervals in this section inspired me to dedicate the next section to regression intervals.
- C/C++算法從菜鳥到達人
- Test-Driven Development with Django
- JavaScript動態網頁編程
- FPGA嵌入式項目開發實戰
- Python網絡爬蟲技術與應用
- Unity 5.X從入門到精通
- PHP 8從入門到精通(視頻教學版)
- NGUI for Unity
- Microsoft Dynamics GP 2013 Cookbook
- Android開發權威指南(第二版)
- Mastering VMware vSphere Storage
- jQuery EasyUI從零開始學
- 天天學敏捷:Scrum團隊轉型記
- Mastering Social Media Mining with R
- 面向WebAssembly編程:應用開發方法與實踐