官术网_书友最值得收藏!

Processing data

The processing (or transformation) of data is where the data scientist's programming skills will come in to play (although you can often find a data scientist performing some sort of processing in other steps, like collecting, visualizing, or learning).

Keep in mind that there are many aspects of processing that occur within data science. The most common are formatting (and reformatting), which involves activities such as mechanically setting data types, aggregating values, reordering or dropping columns, and so on, cleansing (or addressing the quality of the data), which is solving for such things as default or missing values, incomplete or inapposite values, and so on, and profiling, which adds context to the data by creating a statistical understanding of the data.

The processing to be completed on the data can be simple (for example, it can be a very simple and manual event requiring repetitious updates to data in an MS Excel worksheet), or complex (as with the use of programming languages such as R or Python), or even more sophisticated (as when processing logic is coded into routines that can then be scheduled and rerun automatically on new populations of data).

主站蜘蛛池模板: 乳山市| 江北区| 嘉鱼县| 根河市| 沁阳市| 浦北县| 广安市| 盐亭县| 黔江区| 上饶县| 高青县| 灵石县| 措勤县| 孟州市| 宁海县| 吉水县| 峨边| 康定县| 广州市| 宜兴市| 拜城县| 安陆市| 平山县| 胶南市| 淳化县| 邵武市| 台北市| 通山县| 田林县| 贵德县| 宁安市| 阿坝县| 墨江| 杭锦旗| 灯塔市| 枞阳县| 南安市| 武宣县| 邛崃市| 延长县| 普格县|