- Feature Engineering Made Easy
- Sinan Ozdemir Divya Susarla
- 236字
- 2021-06-25 22:45:51
Evaluating supervised learning algorithms
When performing predictive modeling, otherwise known as supervised learning, performance is directly tied to the model’s ability to exploit structure in the data and use that structure to make appropriate predictions. In general, we can further break down supervised learning into two more specific types, classification (predicting qualitative responses) and regression (predicting quantitative responses).
When we are evaluating classification problems, we will directly calculate the accuracy of a logistic regression model using a five-fold cross-validation:
# Example code for evaluating a classification problem
from sklearn.linear_model import LogisticRegression
from sklearn.model_selection import cross_val_score
X = some_data_in_tabular_format
y = response_variable
lr = LinearRegression()
scores = cross_val_score(lr, X, y, cv=5, scoring='accuracy')
scores
>> [.765, .67, .8, .62, .99]
Similarly, when evaluating a regression problem, we will use the mean squared error (MSE) of a linear regression using a five-fold cross-validation:
# Example code for evaluating a regression problem
from sklearn.linear_model import LinearRegression
from sklearn.model_selection import cross_val_score
X = some_data_in_tabular_format
y = response_variable
lr = LinearRegression()
scores = cross_val_score(lr, X, y, cv=5, scoring='mean_squared_error')
scores
>> [31.543, 29.5433, 32.543, 32.43, 27.5432]
We will use these two linear models instead of newer, more advanced models for their speed and their low variance. This way, we can be surer that any increase in performance is directly related to the feature engineering procedure and not to the model’s ability to pick up on obscure and hidden patterns.
- 云數據中心基礎
- App+軟件+游戲+網站界面設計教程
- 業務數據分析:五招破解業務難題
- Oracle高性能自動化運維
- OracleDBA實戰攻略:運維管理、診斷優化、高可用與最佳實踐
- 跟老男孩學Linux運維:MySQL入門與提高實踐
- SQL優化最佳實踐:構建高效率Oracle數據庫的方法與技巧
- 深入淺出 Hyperscan:高性能正則表達式算法原理與設計
- SQL應用及誤區分析
- Apache Kylin權威指南
- Python 3爬蟲、數據清洗與可視化實戰
- Unity Game Development Blueprints
- Configuration Management with Chef-Solo
- 實用數據結構基礎(第四版)
- 產品經理數據修煉30問