官术网_书友最值得收藏!

Evaluating supervised learning algorithms

When performing predictive modeling, otherwise known as supervised learning, performance is directly tied to the model’s ability to exploit structure in the data and use that structure to make appropriate predictions. In general, we can further break down supervised learning into two more specific types, classification (predicting qualitative responses) and regression (predicting quantitative responses).

When we are evaluating classification problems, we will directly calculate the accuracy of a logistic regression model using a five-fold cross-validation:

# Example code for evaluating a classification problem
from sklearn.linear_model import LogisticRegression
from sklearn.model_selection import cross_val_score
X = some_data_in_tabular_format
y = response_variable
lr = LinearRegression()
scores = cross_val_score(lr, X, y, cv=5, scoring='accuracy')
scores
>> [.765, .67, .8, .62, .99]

Similarly, when evaluating a regression problem, we will use the mean squared error (MSE) of a linear regression using a five-fold cross-validation:

# Example code for evaluating a regression problem
from sklearn.linear_model import LinearRegression
from sklearn.model_selection import cross_val_score
X = some_data_in_tabular_format
y = response_variable
lr = LinearRegression()
scores = cross_val_score(lr, X, y, cv=5, scoring='mean_squared_error')
scores
>> [31.543, 29.5433, 32.543, 32.43, 27.5432]

We will use these two linear models instead of newer, more advanced models for their speed and their low variance. This way, we can be surer that any increase in performance is directly related to the feature engineering procedure and not to the model’s ability to pick up on obscure and hidden patterns.

主站蜘蛛池模板: 宝山区| 昌吉市| 通山县| 宁德市| 永定县| 澄城县| 永平县| 沙湾县| 崇阳县| 莫力| 广安市| 漳州市| 盐山县| 封丘县| 吉林省| 定安县| 高邮市| 射阳县| 沅陵县| 喀喇| 永嘉县| 安福县| 稻城县| 海南省| 商洛市| 泽州县| 平顶山市| 精河县| 巴中市| 会理县| 曲靖市| 湟源县| 靖江市| 海兴县| 读书| 西乌珠穆沁旗| 南投市| 徐州市| 新巴尔虎左旗| 台安县| 柞水县|