官术网_书友最值得收藏!

Evaluating supervised learning algorithms

When performing predictive modeling, otherwise known as supervised learning, performance is directly tied to the model’s ability to exploit structure in the data and use that structure to make appropriate predictions. In general, we can further break down supervised learning into two more specific types, classification (predicting qualitative responses) and regression (predicting quantitative responses).

When we are evaluating classification problems, we will directly calculate the accuracy of a logistic regression model using a five-fold cross-validation:

# Example code for evaluating a classification problem
from sklearn.linear_model import LogisticRegression
from sklearn.model_selection import cross_val_score
X = some_data_in_tabular_format
y = response_variable
lr = LinearRegression()
scores = cross_val_score(lr, X, y, cv=5, scoring='accuracy')
scores
>> [.765, .67, .8, .62, .99]

Similarly, when evaluating a regression problem, we will use the mean squared error (MSE) of a linear regression using a five-fold cross-validation:

# Example code for evaluating a regression problem
from sklearn.linear_model import LinearRegression
from sklearn.model_selection import cross_val_score
X = some_data_in_tabular_format
y = response_variable
lr = LinearRegression()
scores = cross_val_score(lr, X, y, cv=5, scoring='mean_squared_error')
scores
>> [31.543, 29.5433, 32.543, 32.43, 27.5432]

We will use these two linear models instead of newer, more advanced models for their speed and their low variance. This way, we can be surer that any increase in performance is directly related to the feature engineering procedure and not to the model’s ability to pick up on obscure and hidden patterns.

主站蜘蛛池模板: 杭州市| 高邑县| 文水县| 武城县| 绥阳县| 武宣县| 温州市| 新蔡县| 乳源| 岗巴县| 东平县| 佛学| 威远县| 晋城| 凤山县| 张北县| 故城县| 阳朔县| 墨竹工卡县| 利津县| 加查县| 乡宁县| 闵行区| 马关县| 寿阳县| 防城港市| 印江| 乌审旗| 五河县| 原阳县| 陆河县| 托克逊县| 盘山县| 万载县| 永泰县| 贵港市| 宜都市| 慈溪市| 安西县| 罗平县| 固镇县|