- Feature Engineering Made Easy
- Sinan Ozdemir Divya Susarla
- 236字
- 2021-06-25 22:45:51
Evaluating supervised learning algorithms
When performing predictive modeling, otherwise known as supervised learning, performance is directly tied to the model’s ability to exploit structure in the data and use that structure to make appropriate predictions. In general, we can further break down supervised learning into two more specific types, classification (predicting qualitative responses) and regression (predicting quantitative responses).
When we are evaluating classification problems, we will directly calculate the accuracy of a logistic regression model using a five-fold cross-validation:
# Example code for evaluating a classification problem
from sklearn.linear_model import LogisticRegression
from sklearn.model_selection import cross_val_score
X = some_data_in_tabular_format
y = response_variable
lr = LinearRegression()
scores = cross_val_score(lr, X, y, cv=5, scoring='accuracy')
scores
>> [.765, .67, .8, .62, .99]
Similarly, when evaluating a regression problem, we will use the mean squared error (MSE) of a linear regression using a five-fold cross-validation:
# Example code for evaluating a regression problem
from sklearn.linear_model import LinearRegression
from sklearn.model_selection import cross_val_score
X = some_data_in_tabular_format
y = response_variable
lr = LinearRegression()
scores = cross_val_score(lr, X, y, cv=5, scoring='mean_squared_error')
scores
>> [31.543, 29.5433, 32.543, 32.43, 27.5432]
We will use these two linear models instead of newer, more advanced models for their speed and their low variance. This way, we can be surer that any increase in performance is directly related to the feature engineering procedure and not to the model’s ability to pick up on obscure and hidden patterns.
- Modern Programming: Object Oriented Programming and Best Practices
- Creating Mobile Apps with Sencha Touch 2
- 正則表達式必知必會
- Spark大數據編程實用教程
- 淘寶、天貓電商數據分析與挖掘實戰(第2版)
- Access 2010數據庫程序設計實踐教程
- 云計算
- 大數據時代系列(套裝9冊)
- NoSQL數據庫原理(第2版·微課版)
- Applying Math with Python
- 大數據原理與技術
- Oracle 11g數據庫系統設計、開發、管理與應用
- 用戶畫像:平臺構建與業務實踐
- Enterprise API Management
- Hands-On Design Patterns with Swift