- Python Machine Learning By Example
- Yuxi (Hayden) Liu
- 293字
- 2021-07-02 22:57:18
Missing values
Quite often we miss values for certain features. This could happen for various reasons. It can be inconvenient, expensive, or even impossible to always have a value. Maybe we were not able to measure a certain quantity in the past, because we didn't have the right equipment, or we just didn't know that the feature was relevant. However, we are stuck with missing values from the past. Sometimes it's easy to figure out that we miss values and we can discover this just by scanning the data, or counting the number of values we have for a feature and comparing to the number of values we expect based on the number of rows. Certain systems encode missing values with, for example, values such as 999999. This makes sense if the valid values are much smaller than 999999. If you are lucky, you will have information about the features provided by whoever created the data in the form of a data dictionary or metadata.
Once we know that we miss values the question arises of how to deal with them. The simplest answer is to just ignore them. However, some algorithms can't deal with missing values, and the program will just refuse to continue. In other circumstances, ignoring missing values will lead to inaccurate results. The second solution is to substitute missing values by a fixed value—this is called imputing.
We can impute the arithmetic mean, median or mode of the valid values of a certain feature. Ideally, we will have a relation between features or within a variable that is somewhat reliable. For instance, we may know the seasonal averages of temperature for a certain location and be able to impute guesses for missing temperature values given a date.
- 計(jì)算機(jī)圖形學(xué)編程(使用OpenGL和C++)(第2版)
- Oracle 12c中文版數(shù)據(jù)庫管理、應(yīng)用與開發(fā)實(shí)踐教程 (清華電腦學(xué)堂)
- RTC程序設(shè)計(jì):實(shí)時(shí)音視頻權(quán)威指南
- R語言編程指南
- Kotlin Standard Library Cookbook
- Jupyter數(shù)據(jù)科學(xué)實(shí)戰(zhàn)
- 碼上行動(dòng):用ChatGPT學(xué)會(huì)Python編程
- Protocol-Oriented Programming with Swift
- 搞定J2EE:Struts+Spring+Hibernate整合詳解與典型案例
- 現(xiàn)代CPU性能分析與優(yōu)化
- Clojure Web Development Essentials
- Hadoop Blueprints
- Flutter之旅
- 微服務(wù)設(shè)計(jì)
- Learning Gerrit Code Review