官术网_书友最值得收藏!

Multiple linear regression model

Multiple linear regression is a straightforward generalization of single predictor models. In a multiple linear regression model, the dependent variable is related to two or more independent variables. To perform a multiple linear regression analysis, the scikit-learn library will be used. From the sklearn.linear_model field, the LinearRegression class performs an ordinary least squares linear regression:

As usual, we will load the library through the following command:

from sklearn.linear_model import LinearRegression

Now, we can use the LinearRegression() function, as follows:

LModel = LinearRegression()

To fit the linear model, the fit() function will be used:

LModel.fit(X_train, Y_train)

In this case, in the training phase, we used the data extracted for this phase. At this point, we can use the model to make predictions.

To do this, the predict() function is also available in the scikit-learn library:

Y_predLM = LModel.predict(X_test)

Usually, a scatterplot is used to determine whether or not there is a relationship between data. However, a scatterplot can also be used to analyze the performance of a linear model. By reporting the actual and predicted values on the two axes, it is possible to check how this data is arranged. To help with the analysis, it is possible to trace the bisector of the quadrant, that is, the line of equation Y = X. Theoretically, all observations should rest on this line, but we can be satisfied that the data is closer to this line. About half of the data points must be below the line and the other half must be above the line. The points that move away significantly from this line represent possible outliers.

To plot the two scatterplots, we will use the matplotlib library:

plt.figure(1)
plt.subplot(121)
plt.scatter(Y_test, Y_predKM)
plt.xlabel("Actual values")
plt.ylabel("Predicted values")
plt.title("Keras Neural Network Model")

plt.subplot(122)
plt.scatter(Y_test, Y_predLM)
plt.xlabel("Actual values")
plt.ylabel("Predicted values")
plt.title("SKLearn Linear Regression Model")
plt.show()

In the following diagram, we can see two scatterplots:

Analyzing the preceding graphs, it is clear that the Keras model (on the left) returns better results. The points are better suited to the line (they are divided equally in equal parts above and below the line), and isolated points are fewer. To confirm this first intuition, we must also calculate the MSE for the linear regression model. To do this, the sklearn.metrics.mean_squared_error() function will be used. This function computes MSE regression loss.

First, we have to import the function:

from sklearn.metrics import mean_squared_error

Then, we can compute the MSE, as follows:

mse = mean_squared_error(Y_test, Y_predLM)
print('Linear Regression Model')
print(mse)

The following result is returned:

Linear Regression Model
0.014089115439987464

Comparing this value (0.014089115439987464) with the one returned by the Keras model (0.0038815933421901066), we can state that the Keras model is more performant and recording a decidedly lower error (one order of magnitude).

主站蜘蛛池模板: 祁东县| 武陟县| 海门市| 新巴尔虎左旗| 都安| 安徽省| 鹤壁市| 石首市| 河源市| 开阳县| 黔东| 铅山县| 宝清县| 泸定县| 北辰区| 文成县| 射阳县| 永城市| 扎兰屯市| 达日县| 鹤庆县| 岳阳县| 花莲市| 东安县| 普兰店市| 县级市| 合江县| 东乡族自治县| 阿勒泰市| 防城港市| 即墨市| 霍城县| 广元市| 呼伦贝尔市| 沙湾县| 阿鲁科尔沁旗| 定边县| 宜兴市| 新龙县| 共和县| 正蓝旗|