官术网_书友最值得收藏!

Evaluating supervised learning algorithms

When performing predictive modeling, otherwise known as supervised learning, performance is directly tied to the model’s ability to exploit structure in the data and use that structure to make appropriate predictions. In general, we can further break down supervised learning into two more specific types, classification (predicting qualitative responses) and regression (predicting quantitative responses).

When we are evaluating classification problems, we will directly calculate the accuracy of a logistic regression model using a five-fold cross-validation:

# Example code for evaluating a classification problem
from sklearn.linear_model import LogisticRegression
from sklearn.model_selection import cross_val_score
X = some_data_in_tabular_format
y = response_variable
lr = LinearRegression()
scores = cross_val_score(lr, X, y, cv=5, scoring='accuracy')
scores
>> [.765, .67, .8, .62, .99]

Similarly, when evaluating a regression problem, we will use the mean squared error (MSE) of a linear regression using a five-fold cross-validation:

# Example code for evaluating a regression problem
from sklearn.linear_model import LinearRegression
from sklearn.model_selection import cross_val_score
X = some_data_in_tabular_format
y = response_variable
lr = LinearRegression()
scores = cross_val_score(lr, X, y, cv=5, scoring='mean_squared_error')
scores
>> [31.543, 29.5433, 32.543, 32.43, 27.5432]

We will use these two linear models instead of newer, more advanced models for their speed and their low variance. This way, we can be surer that any increase in performance is directly related to the feature engineering procedure and not to the model’s ability to pick up on obscure and hidden patterns.

主站蜘蛛池模板: 疏附县| 大理市| 江津市| 寿宁县| 乐安县| 丰顺县| 谷城县| 都昌县| 刚察县| 南雄市| 黎平县| 新巴尔虎左旗| 舞钢市| 灵璧县| 萨嘎县| 顺义区| 山阳县| 汉川市| 钦州市| 大足县| 辰溪县| 安顺市| 白银市| 佳木斯市| 庆安县| 葵青区| 龙州县| 托克托县| 永平县| 榆中县| 陕西省| 宜兴市| 松桃| 玉田县| 墨玉县| 重庆市| 昆山市| 三原县| 明溪县| 红原县| 韶山市|