官术网_书友最值得收藏!

How it works...

We begin by reading in our dataset and then standardizing it, as in the recipe on standardizing data (steps 1 and 2). (It is necessary to work with standardized data before applying PCA). We now instantiate a new PCA transformer instance, and use it to both learn the transformation (fit) and also apply the transform to the dataset, using fit_transform (step 3). In step 4, we analyze our transformation. In particular, note that the elements of pca.explained_variance_ratio_ indicate how much of the variance is accounted for in each direction. The sum is 1, indicating that all the variance is accounted for if we consider the full space in which the data lives. However, just by taking the first few directions, we can account for a large portion of the variance, while limiting our dimensionality. In our example, the first 40 directions account for 90% of the variance:

sum(pca.explained_variance_ratio_[0:40])

This produces the following output:

0.9068522354673663

This means that we can reduce our number of features to 40 (from 78) while preserving 90% of the variance. The implications of this are that many of the features of the PE header are closely correlated, which is understandable, as they are not designed to be independent.

主站蜘蛛池模板: 汉寿县| 集安市| 缙云县| 泸溪县| 苏尼特右旗| 凤城市| 昌邑市| 班戈县| 西宁市| 白城市| 扶绥县| 富平县| 象山县| 阿克| 青冈县| 江口县| 关岭| 建德市| 东安县| 浠水县| 蓝山县| 祁东县| 浦城县| 三明市| 罗平县| 凌云县| 黔西县| 彩票| 丰原市| 北碚区| 修文县| 雅安市| 遵化市| 措勤县| 长岛县| 平潭县| 正定县| 绥化市| 商水县| 光泽县| 木兰县|