官术网_书友最值得收藏!

Summary

Apache Hadoop provides you with a reliable and scalable framework (HDFS) for Big Data storage and a powerful cluster resource management framework (YARN) to run and manage multiple Big Data applications. Apache Spark provides in-memory performance in Big Data processing and libraries and APIs for interactive exploratory analytics, real-time analytics, machine learning, and graph analytics. While MR was the primary processing engine on top of Hadoop, it had multiple drawbacks, such as poor performance and inflexibility in designing applications. Apache Spark is a replacement for MR. All MR-based tools, such as Hive, Pig, Mahout, and Crunch, have already started offering Apache Spark as an additional execution engine apart from MR.

Nowadays, Big Data projects are being implemented in many businesses, from large Fortune 500 companies to small start-ups. Organizations gain an edge if they can go from raw data to decisions quickly with easy-to-use tools to develop applications and explore data. Apache Spark will bring this speed and sophistication to Hadoop clusters.

In the next chapter, let's dive deep into Spark and learn Spark.

主站蜘蛛池模板: 潜山县| 樟树市| 封丘县| 克什克腾旗| 富裕县| 新野县| 哈密市| 曲阳县| 黄骅市| 洛扎县| 天峨县| 平南县| 海阳市| 郑州市| 天气| 于田县| 报价| 平阴县| 景德镇市| 台安县| 凤阳县| 平乐县| 凤冈县| 临潭县| 许昌市| 屏东县| 宝应县| 会昌县| 景洪市| 罗平县| 古田县| 济宁市| 苏尼特左旗| 察雅县| 广南县| 苗栗市| 民丰县| 嵊泗县| 惠安县| 宝鸡市| 南陵县|