官术网_书友最值得收藏!

How it works...

We begin by reading in our dataset and then standardizing it, as in the recipe on standardizing data (steps 1 and 2). (It is necessary to work with standardized data before applying PCA). We now instantiate a new PCA transformer instance, and use it to both learn the transformation (fit) and also apply the transform to the dataset, using fit_transform (step 3). In step 4, we analyze our transformation. In particular, note that the elements of pca.explained_variance_ratio_ indicate how much of the variance is accounted for in each direction. The sum is 1, indicating that all the variance is accounted for if we consider the full space in which the data lives. However, just by taking the first few directions, we can account for a large portion of the variance, while limiting our dimensionality. In our example, the first 40 directions account for 90% of the variance:

sum(pca.explained_variance_ratio_[0:40])

This produces the following output:

0.9068522354673663

This means that we can reduce our number of features to 40 (from 78) while preserving 90% of the variance. The implications of this are that many of the features of the PE header are closely correlated, which is understandable, as they are not designed to be independent.

主站蜘蛛池模板: 深水埗区| 明溪县| 云阳县| 临洮县| 耒阳市| 嘉禾县| 阜城县| 水富县| 梅州市| 海宁市| 阿拉尔市| 达州市| 平泉县| 兴海县| 长乐市| 防城港市| 华安县| 巴南区| 鹿泉市| 松滋市| 昭觉县| 平和县| 容城县| 黄浦区| 稻城县| 张家口市| 海宁市| 七台河市| 九江市| 灯塔市| 察哈| 安徽省| 临高县| 浏阳市| 牟定县| 大荔县| 罗甸县| 庄河市| 灌云县| 和林格尔县| 石门县|