官术网_书友最值得收藏!

Data munging

Raw data for problems often comes from multiple sources with different and often incompatible formats. The beauty of the Spark programming model is its ability to define data operations that process the incoming data and transform it into a regular form that can be used for further feature engineering and model building. This process is commonly referred to as data munging and is where much of the battle is won with respect to data science projects. We keep this section intentionally brief because the best way to showcase the power--and necessity!--of data munging is by example. So, take heart; we have plenty of practice to go through in this book, which emphasizes this essential process.

主站蜘蛛池模板: 巴里| 赤水市| 东安县| 上杭县| 新巴尔虎右旗| 库尔勒市| 盘山县| 乳山市| 中方县| 湾仔区| 临洮县| 大丰市| 嘉黎县| 南木林县| 曲阳县| 大洼县| 彩票| 自治县| 辰溪县| 来安县| 望都县| 长宁县| 崇义县| 桦南县| 克拉玛依市| 克什克腾旗| 隆林| 新化县| 乌拉特前旗| 元阳县| 扶余县| 化德县| 道孚县| 清新县| 晋中市| 平舆县| 奇台县| 桃园县| 岫岩| 武陟县| 景宁|