官术网_书友最值得收藏!

Preface

Big Data Analytics aims at providing the fundamentals of Apache Spark and Hadoop, and how they are integrated together with most commonly used tools and techniques in an easy way. All Spark components (Spark Core, Spark SQL, DataFrames, Datasets, Conventional Streaming, Structured Streaming, MLLib, GraphX, and Hadoop core components), HDFS, MapReduce, and Yarn are explored in great depth with implementation examples on Spark + Hadoop clusters.

The Big Data Analytics industry is moving away from MapReduce to Spark. So, the advantages of Spark over MapReduce are explained in great depth to reap the benefits of in-memory speeds. The DataFrames API, the Data Sources API, and the new Dataset API are explained for building Big Data analytical applications. Real-time data analytics using Spark Streaming with Apache Kafka and HBase is covered to help in building streaming applications. New structured streaming concept is explained with an Internet of Things (IOT) use case. Machine learning techniques are covered using MLLib, ML Pipelines and SparkR; Graph Analytics are covered with GraphX and GraphFrames components of Spark.

This book also introduces web based notebooks such as Jupyter, Apache Zeppelin, and data flow tool Apache NiFi to analyze and visualize data, offering Spark as a Service using Livy Server.

主站蜘蛛池模板: 临邑县| 沧源| 嘉兴市| 平乐县| 丹棱县| 两当县| 扶绥县| 确山县| 霍州市| 岚皋县| 平塘县| 无锡市| 印江| 瑞丽市| 贵德县| 渭南市| 友谊县| 韩城市| 清徐县| 梓潼县| 鄂尔多斯市| 汤原县| 龙口市| 勃利县| 五河县| 衡山县| 青神县| 景洪市| 且末县| 太仓市| 太和县| 湖南省| 孝感市| 漯河市| 太康县| 沾益县| 蕲春县| 邹城市| 黄梅县| 东阿县| 昌黎县|