官术网_书友最值得收藏!

  • Hands-On Big Data Modeling
  • James Lee Tao Wei Suresh Kumar Mukhiya
  • 185字
  • 2021-06-10 18:58:53

Reasons to choose Apache Spark

Apache Spark is very popular in the big data community these days. Here are some of the most prominent reasons for using Apache Spark in big data modeling and computation:

  • Speed: Speed is important in processing large datasets. Spark offers the ability to run computations up to one hundred times faster than Hadoop2 MapReduce in memory, or ten times faster on disk.
  • Accessibility: Spark was developed to be highly accessible, offering simple APIs in Python, Java, Scala, and SQL, and rich built-in libraries. In addition to this, it also integrates with other big data tools, including Hadoop clusters and sources such as Cassandra3.
  • Platform support: Apache spark was built to run on Hadoop and Mesos, standalone, or in the cloud. It can access diverse data sources, including HDFS, Cassandra, HBase, and S3.
  • Generality: Spark was developed to cover a wide range of workloads, including batch applications, iterative algorithms, interactive queries, and streaming. By supporting these workloads in the same engine, Spark makes it easy and inexpensive to combine different processing types, which is often necessary for data analysis production pipelines.
主站蜘蛛池模板: 长春市| 长泰县| 揭西县| 南郑县| 新干县| 香港 | 天柱县| 甘洛县| 台山市| 榆林市| 吉安市| 平乐县| 房山区| 香河县| 鄄城县| 雷州市| 三穗县| 望都县| 鄂伦春自治旗| 兴业县| 昌江| 宜君县| 右玉县| 大厂| 台南市| 读书| 灯塔市| 三门县| 翼城县| 南投市| 衡东县| 紫金县| 重庆市| 华坪县| 呈贡县| 新丰县| 青龙| 改则县| 惠东县| 睢宁县| 乌兰县|