- Machine Learning in Java
- AshishSingh Bhatia Bostjan Kaluza
- 269字
- 2021-06-10 19:29:56
Filling missing values
Machine learning algorithms generally do not work well with missing values. Rare exceptions include decision trees, Naive Bayes classifier, and some rule-based learners. It is very important to understand why a value is missing. It can be missing due to many reasons, such as random error, systematic error, and sensor noise. Once we identify the reason, there are multiple ways to deal with the missing values, as shown in the following list:
- Remove the instance: If there is enough data, and only a couple of non-relevant instances have some missing values, then it is safe to remove these instances.
- Remove the attribute: Removing an attribute makes sense when most of the values are missing, values are constant, or an attribute is strongly correlated with another attribute.
- Assign a special value (N/A): Sometimes a value is missing due to valid reasons, such as the value is out of scope, the discrete attribute value is not defined, or it is not possible to obtain or measure the value. For example, if a person never rates a movie, their rating on this movie is nonexistent.
- Take the average attribute value: If we have a limited number of instances, we might not be able to afford removing instances or attributes. In that case, we can estimate the missing values by assigning the average attribute value.
- Predict the value from other attributes: Predict the value from previous entries if the attribute possesses time dependencies.
As we have seen, the value can be missing for many reasons, and hence, it is important to understand why the value is missing, absent, or corrupted.
推薦閱讀
- 數據中心建設與管理指南
- 機器學習與大數據技術
- 數據產品經理:解決方案與案例分析
- 城市道路交通主動控制技術
- 統計學習理論與方法:R語言版
- Implementing AWS:Design,Build,and Manage your Infrastructure
- PostgreSQL 10 Administration Cookbook
- Visual FoxPro程序設計
- 自動化生產線安裝與調試(三菱FX系列)(第二版)
- Pentaho Analytics for MongoDB
- Visual Studio 2010 (C#) Windows數據庫項目開發
- Windows安全指南
- 基于Proteus的單片機應用技術
- ADuC系列ARM器件應用技術
- EJB JPA數據庫持久層開發實踐詳解