官术网_书友最值得收藏!

Filling missing values

Machine learning algorithms generally do not work well with missing values. Rare exceptions include decision trees, Naive Bayes classifier, and some rule-based learners. It is very important to understand why a value is missing. It can be missing due to many reasons, such as random error, systematic error, and sensor noise. Once we identify the reason, there are multiple ways to deal with the missing values, as shown in the following list:

  • Remove the instance: If there is enough data, and only a couple of non-relevant instances have some missing values, then it is safe to remove these instances.
  • Remove the attribute: Removing an attribute makes sense when most of the values are missing, values are constant, or an attribute is strongly correlated with another attribute.
  • Assign a special value (N/A): Sometimes a value is missing due to valid reasons, such as the value is out of scope, the discrete attribute value is not defined, or it is not possible to obtain or measure the value. For example, if a person never rates a movie, their rating on this movie is nonexistent.
  • Take the average attribute value: If we have a limited number of instances, we might not be able to afford removing instances or attributes. In that case, we can estimate the missing values by assigning the average attribute value.
  • Predict the value from other attributes: Predict the value from previous entries if the attribute possesses time dependencies.

As we have seen, the value can be missing for many reasons, and hence, it is important to understand why the value is missing, absent, or corrupted.

主站蜘蛛池模板: 内黄县| 东台市| 含山县| 新巴尔虎右旗| 湖口县| 衡阳市| 维西| 白河县| 樟树市| 柳州市| 云浮市| 沾益县| 莒南县| 河西区| 正安县| 大同市| 慈利县| 临江市| 平度市| 灌云县| 阿巴嘎旗| 阜南县| 怀化市| 龙陵县| 岳西县| 乐清市| 怀柔区| 定远县| 宜丰县| 乐昌市| 尖扎县| 布尔津县| 彰化县| 威宁| 贵南县| 涪陵区| 嘉兴市| 合水县| 修文县| 通许县| 通城县|