官术网_书友最值得收藏!

How it works...

We begin by reading in our dataset and then standardizing it, as in the recipe on standardizing data (steps 1 and 2). (It is necessary to work with standardized data before applying PCA). We now instantiate a new PCA transformer instance, and use it to both learn the transformation (fit) and also apply the transform to the dataset, using fit_transform (step 3). In step 4, we analyze our transformation. In particular, note that the elements of pca.explained_variance_ratio_ indicate how much of the variance is accounted for in each direction. The sum is 1, indicating that all the variance is accounted for if we consider the full space in which the data lives. However, just by taking the first few directions, we can account for a large portion of the variance, while limiting our dimensionality. In our example, the first 40 directions account for 90% of the variance:

sum(pca.explained_variance_ratio_[0:40])

This produces the following output:

0.9068522354673663

This means that we can reduce our number of features to 40 (from 78) while preserving 90% of the variance. The implications of this are that many of the features of the PE header are closely correlated, which is understandable, as they are not designed to be independent.

主站蜘蛛池模板: 哈密市| 桂阳县| 承德市| 贡觉县| 平定县| 莆田市| 岳阳县| 平果县| 大厂| 额尔古纳市| 海盐县| 建瓯市| 安宁市| 天柱县| 通道| 伊通| 喜德县| 凤凰县| 宿迁市| 彩票| 普格县| 全南县| 昌黎县| 聂拉木县| 南充市| 元江| 佛冈县| 东阳市| 桃园县| 邻水| 葫芦岛市| 丘北县| 杨浦区| 临潭县| 绩溪县| 汉川市| 平罗县| 龙南县| 雅安市| 延川县| 临夏市|