官术网_书友最值得收藏!

Normalizing and standardizing the features

We normalize (or standardize) data for computational efficiency and so we do not exceed the computer's limits. It is also advised to do so if we want to explore relationships between variables in a model.

Tip

Computers have limits: there is an upper bound to how big an integer value can be (although, on 64-bit machines, this is, for now, no longer an issue) and how good a precision can be for floating-point values.

Normalization transforms all the observations so that all their values fall between 0 and 1 (inclusive). Standardization shifts the distribution so that the mean of the resultant values is 0 and standard deviation equals 1.

Getting ready

To execute this recipe, you will need the pandas module.

No other prerequisites are required.

How to do it…

To perform normalization and standardization, we define two helper functions (the data_standardize.py file):

def normalize(col):
    '''
        Normalize column
    '''
    return (col - col.min()) / (col.max() - col.min())

def standardize(col):
    '''
        Standardize column
    '''
    return (col - col.mean()) / col.std()

How it works…

To normalize a set of observations, that is, to make each and every single one of them to be between 0 and 1, we subtract the minimum value from each observation and divide it by the range of the sample. The range in statistics is defined as a difference between the maximum and minimum value in the sample. Our normalize(...) method does exactly as described previously: it takes a set of values, subtracts the minimum from each observation, and divides it by the range.

Standardization works in a similar way: it subtracts the mean from each observation and divides the result by the standard deviation of the sample. This way, the resulting sample has a mean equal to 0 and standard deviation equal to 1. Our standardize(...) method performs these steps for us:

csv_read['n_price_mean'] = normalize(csv_read['price_mean'])
csv_read['s_price_mean'] = standardize(csv_read['price_mean'])
主站蜘蛛池模板: 平顺县| 龙川县| 山西省| 淅川县| 桐柏县| 托里县| 商水县| 分宜县| 兰溪市| 鸡西市| 靖边县| 东乡县| 航空| 乡城县| 闻喜县| 巴塘县| 闻喜县| 广饶县| 林周县| 景宁| 阜平县| 会同县| 武威市| 高阳县| 清苑县| 和龙市| 海阳市| 福州市| 扬州市| 芦溪县| 象州县| 盈江县| 前郭尔| 大埔区| 镶黄旗| 齐齐哈尔市| 治县。| 会同县| 怀远县| 双城市| 碌曲县|