- Mastering Machine Learning with Spark 2.x
- Alex Tellez Max Pumperla Michal Malohlava
- 119字
- 2021-07-02 18:46:05
Data munging
Raw data for problems often comes from multiple sources with different and often incompatible formats. The beauty of the Spark programming model is its ability to define data operations that process the incoming data and transform it into a regular form that can be used for further feature engineering and model building. This process is commonly referred to as data munging and is where much of the battle is won with respect to data science projects. We keep this section intentionally brief because the best way to showcase the power--and necessity!--of data munging is by example. So, take heart; we have plenty of practice to go through in this book, which emphasizes this essential process.
推薦閱讀
- Puppet 4 Essentials(Second Edition)
- 青少年軟件編程基礎(chǔ)與實戰(zhàn)(圖形化編程三級)
- 數(shù)據(jù)庫系統(tǒng)原理及MySQL應(yīng)用教程
- OpenShift在企業(yè)中的實踐:PaaS DevOps微服務(wù)(第2版)
- 編程數(shù)學(xué)
- Asynchronous Android Programming(Second Edition)
- ASP.NET Core 2 Fundamentals
- Learning AngularJS for .NET Developers
- QGIS Python Programming Cookbook(Second Edition)
- 智能手機(jī)故障檢測與維修從入門到精通
- jQuery for Designers Beginner's Guide Second Edition
- C++程序設(shè)計
- SAS編程演義
- 深入理解MySQL主從原理
- 前端程序員面試筆試真題與解析