- Hands-On Machine Learning with scikit:learn and Scientific Python Toolkits
- Tarek Amr
- 536字
- 2021-06-18 18:24:32
Finding regression intervals
It's not always guaranteed that we have accurate models. Sometimes, our data is inherently noisy and we cannot model it using a regressor. In these cases, it is important to be able to quantify how certain we arein our estimations. Usually, regressors make point predictions. These are the expected values (typically the mean) of the target (y) at each value of x. A Bayesian ridge regressor is capable of returning the expected values as usual, yet it also returns the standard deviation of the target (y) at each value of x.
To demonstrate how this works, let's create a noisy dataset, where :
import numpy as np
import pandas as pd
df_noisy = pd.DataFrame(
{
'x': np.random.random_integers(0, 30, size=150),
'noise': np.random.normal(loc=0.0, scale=5.0, size=150)
}
)
df_noisy['y'] = df_noisy['x'] + df_noisy['noise']
Then, we can plot it in the form of a scatter plot:
df_noisy.plot(
kind='scatter', x='x', y='y'
)
Plotting the resulting data frame will give us the following plot:

Now, let's train two regressors on the same data—LinearRegression and BayesianRidge. I will stick to the default values for the Bayesian ridge hyperparameters here:
from sklearn.linear_model import LinearRegression
from sklearn.linear_model import BayesianRidge
lr = LinearRegression()
br = BayesianRidge()
lr.fit(df_noisy[['x']], df_noisy['y'])
df_noisy['y_lr_pred'] = lr.predict(df_noisy[['x']])
br.fit(df_noisy[['x']], df_noisy['y'])
df_noisy['y_br_pred'], df_noisy['y_br_std'] = br.predict(df_noisy[['x']], return_std=True)
Notice how the Bayesian ridge regressor returns two values when predicting.
The predictions made by the two models are very similar. Nevertheless, we can use the standard deviation returned to calculate a range around the values that we expect most of the future data to fall into.The following code snippet creates plots for the two models and their predictions:
fig, axs = plt.subplots(1, 3, figsize=(16, 6), sharex=True, sharey=True)
# We plot the data 3 times
df_noisy.sort_values('x').plot(
title='Data', kind='scatter', x='x', y='y', ax=axs[0]
)
df_noisy.sort_values('x').plot(
kind='scatter', x='x', y='y', ax=axs[1], marker='o', alpha=0.25
)
df_noisy.sort_values('x').plot(
kind='scatter', x='x', y='y', ax=axs[2], marker='o', alpha=0.25
)
# Here we plot the Linear Regression predictions
df_noisy.sort_values('x').plot(
title='LinearRegression', kind='scatter', x='x', y='y_lr_pred',
ax=axs[1], marker='o', color='k', label='Predictions'
)
# Here we plot the Bayesian Ridge predictions
df_noisy.sort_values('x').plot(
title='BayesianRidge', kind='scatter', x='x', y='y_br_pred',
ax=axs[2], marker='o', color='k', label='Predictions'
)
# Here we plot the range around the expected values
# We multiply by 1.96 for a 95% Confidence Interval
axs[2].fill_between(
df_noisy.sort_values('x')['x'],
df_noisy.sort_values('x')['y_br_pred'] - 1.96 *
df_noisy.sort_values('x')['y_br_std'],
df_noisy.sort_values('x')['y_br_pred'] + 1.96 *
df_noisy.sort_values('x')['y_br_std'],
color="k", alpha=0.2, label="Predictions +/- 1.96 * Std Dev"
)
fig.show()
Running the preceding code gives us the following graphs. In the BayesianRidge case, the shaded area shows where we expect 95% of our targets to fall:

Regression intervals are handy when we want to quantify our uncertainties. In Chapter 8, Ensembles – When One Model Is Not Enough, we will revisit regression intervals
- Puppet 4 Essentials(Second Edition)
- C++程序設(shè)計(jì)教程
- 零基礎(chǔ)學(xué)Scratch少兒編程:小學(xué)課本中的Scratch創(chuàng)意編程
- Android Application Development Cookbook(Second Edition)
- Selenium Testing Tools Cookbook(Second Edition)
- Learning JavaScript Data Structures and Algorithms(Second Edition)
- 新印象:解構(gòu)UI界面設(shè)計(jì)
- Python從入門到精通(第3版)
- SpringBoot從零開(kāi)始學(xué)(視頻教學(xué)版)
- 并行編程方法與優(yōu)化實(shí)踐
- Java多線程并發(fā)體系實(shí)戰(zhàn)(微課視頻版)
- Python計(jì)算機(jī)視覺(jué)與深度學(xué)習(xí)實(shí)戰(zhàn)
- 體驗(yàn)之道:從需求到實(shí)踐的用戶體驗(yàn)實(shí)戰(zhàn)
- HTML5+CSS+JavaScript深入學(xué)習(xí)實(shí)錄
- Learning Puppet