官术网_书友最值得收藏!

Data munging

Raw data for problems often comes from multiple sources with different and often incompatible formats. The beauty of the Spark programming model is its ability to define data operations that process the incoming data and transform it into a regular form that can be used for further feature engineering and model building. This process is commonly referred to as data munging and is where much of the battle is won with respect to data science projects. We keep this section intentionally brief because the best way to showcase the power--and necessity!--of data munging is by example. So, take heart; we have plenty of practice to go through in this book, which emphasizes this essential process.

主站蜘蛛池模板: 广河县| 中江县| 津南区| 凉城县| 岫岩| 水富县| 宁德市| 神木县| 宁乡县| 通化市| 彝良县| 集安市| 莲花县| 郓城县| 汽车| 林甸县| 定南县| 荆州市| 吕梁市| 静安区| 靖西县| 泾源县| 凤阳县| 新丰县| 罗田县| 宣城市| 盐城市| 新田县| 文水县| 平利县| 涪陵区| 九龙坡区| 新野县| 广丰县| 襄汾县| 西安市| 靖西县| 靖宇县| 吉木乃县| 邯郸市| 达尔|