官术网_书友最值得收藏!

Supervised hello world!

In this example, we want to show how to perform a simple linear regression with bidimensional data. In particular, let's assume that we have a custom dataset containing 100 samples, as follows:

import numpy as np
import pandas as pd

T = np.expand_dims(np.linspace(0.0, 10.0, num=100), axis=1)
X = (T * np.random.uniform(1.0, 1.5, size=(100, 1))) + np.random.normal(0.0, 3.5, size=(100, 1))
df = pd.DataFrame(np.concatenate([T, X], axis=1), columns=['t', 'x'])
We have also created a pandas DataFrame because it's easier to create plots using the seaborn library ( https://seaborn.pydata.org). In the book, the code for the plots (using Matplotlib or seaborn) is normally omitted, but it's always present in the repository.

We want to express the dataset in a synthetic way, as follows:

This task can be carried out using a linear regression algorithm, as follows:

from sklearn.linear_model import LinearRegression

lr = LinearRegression()
lr.fit(T, X)

print('x(t) = {0:.3f}t + {1:.3f}'.format(lr.coef_[0][0], lr.intercept_[0]))

The output of the last command is the following:

x(t) = 1.169t + 0.628

We can also get visual confirmation, drawing the dataset together with the regression line, as shown in the following graph:

Dataset and regression line

In this example, the regression algorithm minimized a squared error cost function, trying to reduce the discrepancy between the predicted value and the actual one. The presence of Gaussian (with null mean) noise has a minimum impact on the slope, thanks to the symmetric distribution.

主站蜘蛛池模板: 天台县| 郁南县| 石狮市| 沿河| 东阿县| 广水市| 民县| 延安市| 兴山县| 卓资县| 马龙县| 克东县| 合山市| 汨罗市| 杭锦旗| 齐河县| 丁青县| 佳木斯市| 塔城市| 镶黄旗| 全南县| 康乐县| 泽州县| 祁门县| 汕尾市| 诸城市| 沾益县| 策勒县| 高平市| 秦安县| 自贡市| 达州市| 东安县| 平南县| 泰和县| 化德县| 昌吉市| 新兴县| 鄂州市| 讷河市| 资中县|