- Feature Engineering Made Easy
- Sinan Ozdemir Divya Susarla
- 222字
- 2021-06-25 22:45:52
Feature selection – say no to bad attributes
By this chapter, we will have a level of comfort when dealing with new datasets. We will have under our belt the abilities to understand and clean the data in front of us. Once we are able to work with the data given to us, we can start to make big decisions such as, at what point is a feature actually an attribute. Recall that by this distinction, feature versus attribute, the question really is, which columns are not helping my ML pipeline and therefore are hurting my pipeline and should be removed? This chapter focuses on techniques used to make the decision of which attributes to get rid of in our dataset. We will explore several statistical and iterative processes that will aid us in this decision.
Among these processes are:
- Correlation coefficients
- Identifying and removing multicollinearity
- Chi-squared tests
- Anova tests
- Interpretation of p-values
- Iterative feature selection
- Using machine learning to measure entropy and information gain
All of these procedures will attempt to suggest the removal of features and will give different reasons for doing so. Ultimately, it will be up to us, the data scientists, to make the final call over which features will be allowed to remain and contribute to our machine learning algorithms.
- 數(shù)據(jù)庫原理及應(yīng)用教程(第4版)(微課版)
- 虛擬化與云計算
- InfluxDB原理與實戰(zhàn)
- MySQL從入門到精通(第3版)
- The Game Jam Survival Guide
- Proxmox VE超融合集群實踐真?zhèn)?/a>
- 數(shù)字IC設(shè)計入門(微課視頻版)
- 數(shù)據(jù)應(yīng)用工程:方法論與實踐
- Learning Ansible
- 數(shù)據(jù)挖掘與數(shù)據(jù)化運營實戰(zhàn):思路、方法、技巧與應(yīng)用
- 一本書講透數(shù)據(jù)治理:戰(zhàn)略、方法、工具與實踐
- 大數(shù)據(jù)用戶行為畫像分析實操指南
- 大數(shù)據(jù)網(wǎng)絡(luò)傳播模型和算法
- SQL必知必會(第5版)
- 大學計算機基礎(chǔ)習題與實驗指導(第2版)