- Learning Data Mining with Python(Second Edition)
- Robert Layton
- 259字
- 2021-07-02 23:40:07
Standard pre-processing
The pre-processing we will perform for this experiment is called feature-based normalization, which we perform using scikit-learn's MinMaxScaler class. Continuing with the Jupyter Notebook from the rest of this chapter, first, we import this class:
fromsklearn.preprocessing import MinMaxScaler
This class takes each feature and scales it to the range 0 to 1. This pre-processor replaces the minimum value with 0, the maximum with 1, and the other values somewhere in between based on a linear mapping.
To apply our pre-processor, we run the transform function on it. Transformers often need to be trained first, in the same way that the classifiers do. We can combine these steps by running the fit_transform function instead:
X_transformed = MinMaxScaler().fit_transform(X)
Here, X_transformed will have the same shape as X. However, each column will have a maximum of 1 and a minimum of 0.
There are various other forms of normalizing in this way, which is effective for other applications and feature types:
- Ensure the sum of the values for each sample equals to 1, using sklearn.preprocessing.Normalizer
- Force each feature to have a zero mean and a variance of 1, using sklearn.preprocessing.StandardScaler, which is a commonly used starting point for normalization
- Turn numerical features into binary features, where any value above a threshold is 1 and any below is 0, using sklearn.preprocessing.Binarizer
We will use combinations of these pre-processors in later chapters, along with other types of Transformers object.
Pre-processing is a critical step in the data mining pipeline and one that can mean the difference between a bad and great result.
- 現(xiàn)代C++編程:從入門到實(shí)踐
- Kubernetes實(shí)戰(zhàn)
- 算法零基礎(chǔ)一本通(Python版)
- Android Development with Kotlin
- Java開發(fā)入行真功夫
- Interactive Applications Using Matplotlib
- Mastering Linux Security and Hardening
- Sails.js Essentials
- .NET 4.0面向?qū)ο缶幊搪劊簯?yīng)用篇
- Drupal 8 Development Cookbook(Second Edition)
- 數(shù)字媒體技術(shù)概論
- 數(shù)據(jù)庫技術(shù)及應(yīng)用教程上機(jī)指導(dǎo)與習(xí)題(第2版)
- 熱處理常見缺陷分析與解決方案
- Python自動(dòng)化開發(fā)實(shí)戰(zhàn)
- Test-Driven Java Development(Second Edition)