官术网_书友最值得收藏!

Evaluating supervised learning algorithms

When performing predictive modeling, otherwise known as supervised learning, performance is directly tied to the model’s ability to exploit structure in the data and use that structure to make appropriate predictions. In general, we can further break down supervised learning into two more specific types, classification (predicting qualitative responses) and regression (predicting quantitative responses).

When we are evaluating classification problems, we will directly calculate the accuracy of a logistic regression model using a five-fold cross-validation:

# Example code for evaluating a classification problem
from sklearn.linear_model import LogisticRegression
from sklearn.model_selection import cross_val_score
X = some_data_in_tabular_format
y = response_variable
lr = LinearRegression()
scores = cross_val_score(lr, X, y, cv=5, scoring='accuracy')
scores
>> [.765, .67, .8, .62, .99]

Similarly, when evaluating a regression problem, we will use the mean squared error (MSE) of a linear regression using a five-fold cross-validation:

# Example code for evaluating a regression problem
from sklearn.linear_model import LinearRegression
from sklearn.model_selection import cross_val_score
X = some_data_in_tabular_format
y = response_variable
lr = LinearRegression()
scores = cross_val_score(lr, X, y, cv=5, scoring='mean_squared_error')
scores
>> [31.543, 29.5433, 32.543, 32.43, 27.5432]

We will use these two linear models instead of newer, more advanced models for their speed and their low variance. This way, we can be surer that any increase in performance is directly related to the feature engineering procedure and not to the model’s ability to pick up on obscure and hidden patterns.

主站蜘蛛池模板: 黄山市| 海淀区| 汉阴县| 湖北省| 海门市| 石城县| 栖霞市| 新河县| 长泰县| 东丰县| 锡林郭勒盟| 玉溪市| 宁南县| 中山市| 辽阳市| 乳源| 色达县| 新密市| 嫩江县| 平陆县| 广德县| 桂平市| 梁河县| 阜宁县| 临湘市| 湘潭县| 察雅县| 临泽县| 青龙| 乡城县| 寿宁县| 大兴区| 深水埗区| 托里县| 兴山县| 喀喇| 仪征市| 阳谷县| 荃湾区| 延安市| 奉新县|