官术网_书友最值得收藏!

Feature scaling

A very important engineering technique that is necessary to perform even with neural networks is feature scaling. It's necessary to scale the numerical input to have all the features on the same scale; otherwise, the network will give more importance to features with larger numerical values.

A very simple transformation is re-scaling the input between 0 and 1, also known as MinMax scaling. Other common operations are standardization and zero-mean translation, which makes sure the standard deviation of the input is 1 and the mean is 0, which in the scikit-learn library are implemented in the scale method:

from sklearn import preprocessing
import numpy as np
X_train = np.array([[ -3., 1., 2.],
[ 2., 0., 0.],
[ 1., 2., 3.]])
X_scaled = preprocessing.scale(X_train)

The preceding command generates the following result:

Out[2]:
array([[-1.38873015, 0. , 0.26726124],
[ 0.9258201 , -1.22474487, -1.33630621],
[ 0.46291005, 1.22474487, 1.06904497]])

You can find many other numerical transformations already available in scikit-learn. Some other important transformations from its documentation are as follows:

  • PowerTransformer: This transformation applies a power transformation to each feature in order to transform the data to follow a Gaussian-like distribution. It will find the optimal scaling factor to stabilize the variance and at the same time minimize skewness. The PowerTransformer transformation of scikit-learn will force the mean to be zero and force the variance to 1.
  • QuantileTransformerThis transformation has an additional output_distribution parameter that allows us to force a Gaussian distribution to the features instead of a uniform distribution. It will introduce saturation for our inputs' extreme values.
主站蜘蛛池模板: 五原县| 青冈县| 南陵县| 灯塔市| 庄河市| 海兴县| 合阳县| 大方县| 石棉县| 湘阴县| 安龙县| 台山市| 东海县| 敖汉旗| 蒲城县| 平顶山市| 阿拉善左旗| 克什克腾旗| 江山市| 苍南县| 榆树市| 乌恰县| 弥渡县| 冷水江市| 扬州市| 宣威市| 阿合奇县| 昂仁县| 依安县| 林周县| 新巴尔虎右旗| 十堰市| 赤壁市| 兴宁市| 黎川县| 台江县| 都兰县| 灵璧县| 靖宇县| 阿城市| 吉林省|