官术网_书友最值得收藏!

Summarizing large data using principal component analysis

Suppose that you would like to build a predictor for an individual's expected net fiscal worth at age 45. There are a huge number of variables to be considered: IQ, current fiscal worth, marriage status, height, geographical location, health, education, career state, age, and many others you might come up with, such as number of LinkedIn connections or SAT scores.

The trouble with having so many features is several-fold. First, the amount of data, which will incur high storage costs and computational time for your algorithm. Second, with a large feature space, it is critical to have a large amount of data for the model to be accurate. That's to say, it becomes harder to distinguish the signal from the noise. For these reasons, when dealing with high-dimensional data such as this, we often employ dimensionality reduction techniques, such as PCA. More information on the topic can be found at https://en.wikipedia.org/wiki/Principal_component_analysis.

PCA allows us to take our features and return a smaller number of new features, formed from our original ones, with maximal explanatory power. In addition, since the new features are linear combinations of the old features, this allows us to anonymize our data, which is very handy when working with financial information, for example.

主站蜘蛛池模板: 松潘县| 利辛县| 雷波县| 香格里拉县| 宁都县| 阿合奇县| 景德镇市| 阿拉善右旗| 名山县| 逊克县| 苗栗县| 枣阳市| 太保市| 界首市| 旬邑县| 安图县| 揭东县| 宁明县| 阿克陶县| 多伦县| 香港 | 农安县| 彩票| 苏州市| 蒙阴县| 普定县| 安平县| 巴青县| 眉山市| 冕宁县| 抚远县| 东至县| 青岛市| 新巴尔虎右旗| 聊城市| 讷河市| 米泉市| 深州市| 新绛县| 甘谷县| 新田县|