官术网_书友最值得收藏!

  • Big Data Analytics
  • Venkat Ankam
  • 220字
  • 2021-08-20 10:32:23

Why Hadoop plus Spark?

Apache Spark shines better when it is combined with Hadoop. To understand this, let's take a look at Hadoop and Spark features.

Hadoop features

Spark features

When both frameworks are combined, we get the power of enterprise-grade applications with in-memory performance, as shown in Figure 2.11:

Figure 2.11: Spark applications on the Hadoop platform

Frequently asked questions about Spark

The following are frequent questions that practitioners raise about Spark:

  • My dataset does not fit in-memory. How can I use Spark?

    Spark's operators spill the data to disk if it does not fit in-memory, allowing it to run on data of any size. Likewise, cached datasets that do not fit in-memory are either spilled to disk or recomputed on the fly when needed, as determined by the RDD's storage level. By default, Spark will recompute the partitions that don't fit in-memory. The storage level can be changed as MEMORY_AND_DISK to spill partitions to disk.

    Figure 2.12 shows you the performance difference between fully cached and on disk:

    Figure 2.12: Spark performance: Fully cached versus disk

  • How does fault recovery work in Spark?

    Spark's built-in fault tolerance based on the RDD lineage will automatically recover from failures. Figure 2.13 shows you the performance over failure in the 6th iteration in a k-means algorithm:

    Figure 2.13: Fault recovery performance

主站蜘蛛池模板: 大姚县| 乐安县| 耒阳市| 阿荣旗| 准格尔旗| 阜城县| 广宁县| 湘乡市| 波密县| 瓮安县| 云安县| 玉山县| 图木舒克市| 抚松县| 蓝田县| 上饶市| 永德县| 兴业县| 灯塔市| 商水县| 延安市| 盘山县| 获嘉县| 三河市| 清流县| 弋阳县| 个旧市| 万山特区| 蒲江县| 平安县| 措美县| 岐山县| 田阳县| 诏安县| 师宗县| 柯坪县| 永泰县| 监利县| 婺源县| 乌兰浩特市| 饶阳县|