官术网_书友最值得收藏!

Data preprocessing

The useful information in the data is usually referred to as a signal. On the other hand, the pieces of data that represent errors of different kinds and irrelevant data are known as noise. Errors can occur in the data during measurements, information transmission, or due to human errors. The goal of data cleansing procedures is to increase the signal/noise ratio. During this stage, you will usually transform all data to one format, delete entries with missed values, and check suspicious outliers (they can be both noise and signal). It is widely believed among ML engineers, that the data preprocessing stage usually consumes 90% of the time allocated for the ML project. Then, algorithm tweaking consumes another 90% of time. This statement is a joke only partially (about 10% of it). In Chapter 13Best Practices, we are going to discuss common problems with the data and how to fix them.

主站蜘蛛池模板: 田林县| 万年县| 遂平县| 平遥县| 高雄县| 曲阳县| 富蕴县| 凯里市| 盈江县| 亚东县| 黔西县| 万载县| 海盐县| 贺兰县| 恩平市| 涞源县| 梁山县| 唐山市| 彭州市| 巴彦淖尔市| 江源县| 靖西县| 罗平县| 阿荣旗| 长子县| 会东县| 阿巴嘎旗| 鲁甸县| 敦化市| 莆田市| 湄潭县| 金塔县| 平遥县| 巴林左旗| 萍乡市| 莱阳市| 长垣县| 焦作市| 胶南市| 绿春县| 桃源县|