官术网_书友最值得收藏!

Cleaning and preparing data

Feature selection is not the only consideration required when preprocessing your data. There are many other things that you may need to do to prepare your data for the algorithm that will ultimately analyze the data. Perhaps there are measurement errors that create significant outliers. There can also be instrumentation noise in the data that needs to be smoothed out. Your data may have missing values for some features. These are all issues that can either be ignored or addressed, depending, as always, on the context, the data, and the algorithm involved.

Additionally, the algorithm you use may require the data to be normalized to some range of values. Or perhaps your data is in a different format that the algorithm cannot use, as is often the case with neural networks which expect you to provide a vector of values, but you have JSON objects that come from a database. Sometimes you need to analyze only a specific subset of data from a larger source. If you're working with images you may need to resize, scale, pad, crop, or reduce the image to grayscale.

These tasks all fall into the realm of data preprocessing. Let's take a look at some specific scenarios and discuss possible approaches for each.

主站蜘蛛池模板: 和顺县| 松桃| 金溪县| 泉州市| 阿鲁科尔沁旗| 鄂托克前旗| 龙井市| 江都市| 兰溪市| 蓝田县| 赤壁市| 屯留县| 衡山县| 佛教| 云阳县| 临泉县| 昌都县| 磴口县| 益阳市| 陇西县| 海丰县| 康保县| 建水县| 巴楚县| 达拉特旗| 菏泽市| 巫溪县| 洱源县| 和硕县| 灵台县| 平度市| 巴青县| 平昌县| 玛纳斯县| 句容市| 瑞丽市| 尼勒克县| 阿拉善左旗| 海安县| 太原市| 拜城县|