- Mastering Machine Learning with scikit-learn(Second Edition)
- Gavin Hackeling
- 422字
- 2021-07-02 19:01:12
Evaluating the model
We have used a learning algorithm to estimate a model's parameters from training data. How can we assess whether our model is a good representation of the real relationship? Let's assume that you have found another page in your pizza journal. We will use this page's entries as a test set to measure the performance of our model. We have added a fourth column; it contains the prices predicted by our model.

Several measures can be used to assess our model's predictive capability. We will evaluate our pizza price predictor using a measure called R-squared. Also known as the coefficient of determination, R-squared measures how close the data are to a regression line. There are several methods for calculating R-squared. In the case of simple linear regression, R-squared is equal to the square of the Pearson product-moment correlation coefficient (PPMCC), or Pearson's r. Using this method, R-squared must be a positive number between zero and one. This method is intuitive; if R-squared describes the proportion of variance in the response variable that is explained by the model, it cannot be greater than one or less than zero. Other methods, including the method used by scikit-learn, do not calculate R-squared as the square of Pearson's r. Using these methods, R-squared can be negative if the model performs extremely poorly. It is important to note the limitations of performance metrics. R-squared in particular is sensitive to outliers, and can spuriously increase when features are added to the model.
We will follow the method used by scikit-learn to calculate R-squared for our pizza price predictor. First we must measure the total sum of squares. yi is the observed value of the response variable for the ith test instance, and is the mean of the observed values of the response variable.


Next we must find the RSS. Recall that this is also our cost function.


Finally, we can find R-squared using the following:


The R-squared score of 0.662 indicates that a large proportion of the variance in the test instances' prices is explained by the model. Now let's confirm our calculation using scikit-learn. The score method of LinearRegression returns the model's R-squared value, as seen in the following example:
# In[1]:
import numpy as np
from sklearn.linear_model import LinearRegression
X_train = np.array([6, 8, 10, 14, 18]).reshape(-1, 1)
y_train = [7, 9, 13, 17.5, 18]
X_test = np.array([8, 9, 11, 16, 12]).reshape(-1, 1)
y_test = [11, 8.5, 15, 18, 11]
model = LinearRegression()
model.fit(X_train, y_train)
r_squared = model.score(X_test, y_test)
print(r_squared )
# Out[1]:
0.6620
- Object-Oriented JavaScript(Second Edition)
- concrete5 Cookbook
- 西門子S7-200 SMART PLC編程從入門到實踐
- 常用工具軟件立體化教程(微課版)
- Django 3.0入門與實踐
- Python函數式編程(第2版)
- C#面向對象程序設計(第2版)
- JavaScript悟道
- 深度學習入門:基于Python的理論與實現
- The Applied Data Science Workshop
- Node.js Web Development
- 數據庫技術及應用教程上機指導與習題(第2版)
- Visual C++ 開發從入門到精通
- 編譯原理學習與實踐指導
- Expert Angular