- Machine Learning for Cybersecurity Cookbook
- Emmanuel Tsukerman
- 213字
- 2021-06-24 12:28:56
Summarizing large data using principal component analysis
Suppose that you would like to build a predictor for an individual's expected net fiscal worth at age 45. There are a huge number of variables to be considered: IQ, current fiscal worth, marriage status, height, geographical location, health, education, career state, age, and many others you might come up with, such as number of LinkedIn connections or SAT scores.
The trouble with having so many features is several-fold. First, the amount of data, which will incur high storage costs and computational time for your algorithm. Second, with a large feature space, it is critical to have a large amount of data for the model to be accurate. That's to say, it becomes harder to distinguish the signal from the noise. For these reasons, when dealing with high-dimensional data such as this, we often employ dimensionality reduction techniques, such as PCA. More information on the topic can be found at https://en.wikipedia.org/wiki/Principal_component_analysis.
PCA allows us to take our features and return a smaller number of new features, formed from our original ones, with maximal explanatory power. In addition, since the new features are linear combinations of the old features, this allows us to anonymize our data, which is very handy when working with financial information, for example.
- Mastering Mesos
- 我的J2EE成功之路
- Dreamweaver CS3網頁制作融會貫通
- 并行數據挖掘及性能優化:關聯規則與數據相關性分析
- Learning Apache Spark 2
- RPA:流程自動化引領數字勞動力革命
- Hybrid Cloud for Architects
- 網絡布線與小型局域網搭建
- 網站入侵與腳本攻防修煉
- Spark大數據商業實戰三部曲:內核解密|商業案例|性能調優
- 青少年VEX IQ機器人實訓課程(初級)
- 計算機應用基礎實訓(職業模塊)
- Hands-On DevOps
- Serverless Design Patterns and Best Practices
- EJB JPA數據庫持久層開發實踐詳解