官术网_书友最值得收藏!

Imputation of missing data

When dealing with not-so-perfect or incomplete datasets, a missing register may not add value to the model in itself, but all the other elements of the row could be useful to the model. This is especially true when the model has a high percentage of incomplete values, so no row can be discarded.

The main question in this process is "how do you interpret a missing value?" There are many ways, and they usually depend on the problem itself.

A very naive approach could be set the value to zero, supposing that the mean of the data distribution is 0. An improved step could be to relate the missing data with the surrounding content, assigning the average of the whole column, or an interval of n elements of the same columns. Another option is to use the column's median or most frequent value.

Additionally, there are more advanced techniques, such as robust methods and even k-nearest neighbors, that we won't cover in this book.

主站蜘蛛池模板: 宜宾市| 刚察县| 财经| 商南县| 吉安县| 湘阴县| 古田县| 洛阳市| 正镶白旗| 禹州市| 丰都县| 黎川县| 台安县| 柳河县| 宁远县| 罗源县| 麻江县| 高碑店市| 页游| 桦川县| 上蔡县| 清镇市| 乌拉特中旗| 隆化县| 景东| 遵化市| 梅河口市| 汕尾市| 山西省| 蒲江县| 云南省| 六安市| 黑水县| 花垣县| 沭阳县| 兰西县| 汝城县| 茶陵县| 平谷区| 壤塘县| 天峻县|