Apache Spark Quick Start Guide
ApacheSparkisaflexibleframeworkthatallowsprocessingofbatchandreal-timedata.Itsunifiedenginehasmadeitquitepopularforbigdatausecases.ThisbookwillhelpyoutogetstartedwithApacheSpark2.0andwritebigdataapplicationsforavarietyofusecases.ItwillalsointroduceyoutoApacheSpark–oneofthemostpopularBigDataprocessingframeworks.AlthoughthisbookisintendedtohelpyougetstartedwithApacheSpark,butitalsofocusesonexplainingthecoreconcepts.ThispracticalguideprovidesaquickstarttotheSpark2.0architectureanditscomponents.ItteachesyouhowtosetupSparkonyourlocalmachine.Aswemoveahead,youwillbeintroducedtoresilientdistributeddatasets(RDDs)andDataFrameAPIs,andtheircorrespondingtransformationsandactions.Then,wemoveontothelifecycleofaSparkapplicationandlearnaboutthetechniquesusedtodebugslow-runningapplications.YouwillalsogothroughSpark’sbuilt-inmodulesforSQL,streaming,machinelearning,andgraphanalysis.Finally,thebookwilllayoutthebestpracticesandoptimizationtechniquesthatarekeyforwritingefficientSparkapplications.Bytheendofthisbook,youwillhaveasoundfundamentalunderstandingoftheApacheSparkframeworkandyouwillbeabletowriteandoptimizeSparkapplications.
·2.8萬字