Big Data Analysis with Python
Processingbigdatainrealtimeischallengingduetoscalability,informationinconsistency,andfaulttolerance.BigDataAnalysiswithPythonteachesyouhowtousetoolsthatcancontrolthisdataavalancheforyou.Withthisbook,you'lllearnpracticaltechniquestoaggregatedataintousefuldimensionsforposterioranalysis,extractstatisticalmeasurements,andtransformdatasetsintofeaturesforothersystems.ThebookbeginswithanintroductiontodatamanipulationinPythonusingpandas.You'llthengetfamiliarwithstatisticalanalysisandplottingtechniques.Withmultiplehands-onactivitiesinstore,you'llbeabletoanalyzedatathatisdistributedonseveralcomputersbyusingDask.Asyouprogress,you'llstudyhowtoaggregatedataforplotswhentheentiredatacannotbeaccommodatedinmemory.You'llalsoexploreHadoop(HDFSandYARN),whichwillhelpyoutacklelargerdatasets.ThebookalsocoversSparkandexplainshowitinteractswithothertools.Bytheendofthisbook,you'llbeabletobootstrapyourownPythonenvironment,processlargefiles,andmanipulatedatatogeneratestatistics,metrics,andgraphs.
·4萬字