官术网_书友最值得收藏!

Reasons to choose Apache Spark

Apache Spark is very popular in the big data community these days. Here are some of the most prominent reasons for using Apache Spark in big data modeling and computation:

  • Speed: Speed is important in processing large datasets. Spark offers the ability to run computations up to one hundred times faster than Hadoop2 MapReduce in memory, or ten times faster on disk.
  • Accessibility: Spark was developed to be highly accessible, offering simple APIs in Python, Java, Scala, and SQL, and rich built-in libraries. In addition to this, it also integrates with other big data tools, including Hadoop clusters and sources such as Cassandra3.
  • Platform support: Apache spark was built to run on Hadoop and Mesos, standalone, or in the cloud. It can access diverse data sources, including HDFS, Cassandra, HBase, and S3.
  • Generality: Spark was developed to cover a wide range of workloads, including batch applications, iterative algorithms, interactive queries, and streaming. By supporting these workloads in the same engine, Spark makes it easy and inexpensive to combine different processing types, which is often necessary for data analysis production pipelines.
主站蜘蛛池模板: 无极县| 濉溪县| 吉木萨尔县| 广东省| 平乐县| 贡嘎县| 成武县| 乌兰县| 岳池县| 永川市| 清新县| 和静县| 屏南县| 乌拉特中旗| 县级市| 罗山县| 原平市| 东平县| 吴旗县| 江永县| 博罗县| 杂多县| 石屏县| 沈阳市| 安阳县| 平阳县| 徐州市| 宣威市| 都江堰市| 汉沽区| 常熟市| 绥宁县| 横山县| 棋牌| 罗山县| 三门峡市| 西乡县| 张家港市| 海宁市| 洪湖市| 隆回县|