官术网_书友最值得收藏!

Missing values

Quite often we miss values for certain features. This could happen for various reasons. It can be inconvenient, expensive, or even impossible to always have a value. Maybe we weren't able to measure a certain quantity in the past because we didn't have the right equipment or just didn't know that the feature was relevant. However, we're stuck with missing values from the past.

Sometimes, it's easy to figure out we're missing values and we can discover this just by scanning the data or counting the number of values we have for a feature and comparing to the number of values we expect based on the number of rows. Certain systems encode missing values with, for example, values such as 999,999 or -1. This makes sense if the valid values are much smaller than 999,999. If you're lucky, you'll have information about the features provided by whoever created the data in the form of a data dictionary or metadata.

Once we know that we're missing values, the question arises of how to deal with them. The simplest answer is to just ignore them. However, some algorithms can't deal with missing values, and the program will just refuse to continue. In other circumstances, ignoring missing values will lead to inaccurate results. The second solution is to substitute missing values with a fixed value—this is called imputing. We can impute the arithmetic mean, median, or mode of the valid values of a certain feature. Ideally, we'll have a relation between features or within a variable that's somewhat reliable. For instance, we may know the seasonal averages of temperature for a certain location and be able to impute guesses for missing temperature values given a date. We'll talk about dealing with missing data in detail in Chapter 10, Machine Learning Best Practices. Similarly, techniques in the following sections will be discussed and employed in later chapters, in case you feel lost.

主站蜘蛛池模板: 铜梁县| 平原县| 太白县| 方城县| 涟源市| 汉阴县| 奈曼旗| 武陟县| 无锡市| 灌阳县| 禄丰县| 左贡县| 靖远县| 高雄县| 南安市| 米脂县| 浠水县| 中江县| 元谋县| 洛扎县| 纳雍县| 从化市| 岑溪市| 云霄县| 湟中县| 汽车| 枣强县| 西乌| 株洲市| 肇州县| 张家川| 高青县| 西乌珠穆沁旗| 田阳县| 卓资县| 桐梓县| 崇明县| 盐边县| 太仓市| 江西省| 泾川县|