官术网_书友最值得收藏!

Cleaning and preparing data

Feature selection is not the only consideration required when preprocessing your data. There are many other things that you may need to do to prepare your data for the algorithm that will ultimately analyze the data. Perhaps there are measurement errors that create significant outliers. There can also be instrumentation noise in the data that needs to be smoothed out. Your data may have missing values for some features. These are all issues that can either be ignored or addressed, depending, as always, on the context, the data, and the algorithm involved.

Additionally, the algorithm you use may require the data to be normalized to some range of values. Or perhaps your data is in a different format that the algorithm cannot use, as is often the case with neural networks which expect you to provide a vector of values, but you have JSON objects that come from a database. Sometimes you need to analyze only a specific subset of data from a larger source. If you're working with images you may need to resize, scale, pad, crop, or reduce the image to grayscale.

These tasks all fall into the realm of data preprocessing. Let's take a look at some specific scenarios and discuss possible approaches for each.

主站蜘蛛池模板: 微博| 资兴市| 宽城| 柯坪县| 长沙市| 额尔古纳市| 靖安县| 治多县| 象山县| 礼泉县| 奈曼旗| 大名县| 阳新县| 舟曲县| 武城县| 望城县| 康马县| 宽甸| 清徐县| 黄浦区| 余江县| 中山市| 封开县| 雅安市| 黔东| 渭南市| 隆德县| 科尔| 城市| 南投县| 揭阳市| 奉节县| 东方市| 咸丰县| 泰州市| 高尔夫| 儋州市| 镇平县| 安溪县| 和龙市| 东辽县|