官术网_书友最值得收藏!

  • Keras 2.x Projects
  • Giuseppe Ciaburro
  • 460字
  • 2021-07-02 14:36:16

Multiple linear regression model

Multiple linear regression is a straightforward generalization of single predictor models. In a multiple linear regression model, the dependent variable is related to two or more independent variables. To perform a multiple linear regression analysis, the scikit-learn library will be used. From the sklearn.linear_model field, the LinearRegression class performs an ordinary least squares linear regression:

As usual, we will load the library through the following command:

from sklearn.linear_model import LinearRegression

Now, we can use the LinearRegression() function, as follows:

LModel = LinearRegression()

To fit the linear model, the fit() function will be used:

LModel.fit(X_train, Y_train)

In this case, in the training phase, we used the data extracted for this phase. At this point, we can use the model to make predictions.

To do this, the predict() function is also available in the scikit-learn library:

Y_predLM = LModel.predict(X_test)

Usually, a scatterplot is used to determine whether or not there is a relationship between data. However, a scatterplot can also be used to analyze the performance of a linear model. By reporting the actual and predicted values on the two axes, it is possible to check how this data is arranged. To help with the analysis, it is possible to trace the bisector of the quadrant, that is, the line of equation Y = X. Theoretically, all observations should rest on this line, but we can be satisfied that the data is closer to this line. About half of the data points must be below the line and the other half must be above the line. The points that move away significantly from this line represent possible outliers.

To plot the two scatterplots, we will use the matplotlib library:

plt.figure(1)
plt.subplot(121)
plt.scatter(Y_test, Y_predKM)
plt.xlabel("Actual values")
plt.ylabel("Predicted values")
plt.title("Keras Neural Network Model")

plt.subplot(122)
plt.scatter(Y_test, Y_predLM)
plt.xlabel("Actual values")
plt.ylabel("Predicted values")
plt.title("SKLearn Linear Regression Model")
plt.show()

In the following diagram, we can see two scatterplots:

Analyzing the preceding graphs, it is clear that the Keras model (on the left) returns better results. The points are better suited to the line (they are divided equally in equal parts above and below the line), and isolated points are fewer. To confirm this first intuition, we must also calculate the MSE for the linear regression model. To do this, the sklearn.metrics.mean_squared_error() function will be used. This function computes MSE regression loss.

First, we have to import the function:

from sklearn.metrics import mean_squared_error

Then, we can compute the MSE, as follows:

mse = mean_squared_error(Y_test, Y_predLM)
print('Linear Regression Model')
print(mse)

The following result is returned:

Linear Regression Model
0.014089115439987464

Comparing this value (0.014089115439987464) with the one returned by the Keras model (0.0038815933421901066), we can state that the Keras model is more performant and recording a decidedly lower error (one order of magnitude).

主站蜘蛛池模板: 津市市| 南岸区| 修文县| 区。| 濮阳市| 民勤县| 大足县| 萝北县| 库车县| 龙游县| 宜兰县| 冷水江市| 南和县| 鄂托克旗| 洛宁县| 桑日县| 彰化市| 浪卡子县| 宣汉县| 右玉县| 南平市| 南昌县| 天全县| 梁河县| 水城县| 临邑县| 休宁县| 佛冈县| 紫云| 五寨县| 和平区| 景泰县| 福贡县| 全南县| 健康| 那坡县| 华容县| 绥棱县| 泽库县| 闽侯县| 攀枝花市|