- Mastering Machine Learning with Spark 2.x
- Alex Tellez Max Pumperla Michal Malohlava
- 119字
- 2021-07-02 18:46:05
Data munging
Raw data for problems often comes from multiple sources with different and often incompatible formats. The beauty of the Spark programming model is its ability to define data operations that process the incoming data and transform it into a regular form that can be used for further feature engineering and model building. This process is commonly referred to as data munging and is where much of the battle is won with respect to data science projects. We keep this section intentionally brief because the best way to showcase the power--and necessity!--of data munging is by example. So, take heart; we have plenty of practice to go through in this book, which emphasizes this essential process.
推薦閱讀
- 零基礎(chǔ)學(xué)Visual C++第3版
- 編程卓越之道(卷3):軟件工程化
- Learning Data Mining with Python
- Visual C++數(shù)字圖像模式識別技術(shù)詳解
- Windows Server 2012 Unified Remote Access Planning and Deployment
- Python深度學(xué)習(xí):基于TensorFlow
- HTML5移動前端開發(fā)基礎(chǔ)與實(shí)戰(zhàn)(微課版)
- 計(jì)算機(jī)系統(tǒng)解密:從理解計(jì)算機(jī)到編寫高效代碼
- Python趣味創(chuàng)意編程
- micro:bit軟件指南
- Kotlin程序員面試算法寶典
- Instant AppFog
- Maya Programming with Python Cookbook
- JavaScript前端開發(fā)程序設(shè)計(jì)教程(微課版)
- Visual C++實(shí)用教程