- Big Data Analytics
- Venkat Ankam
- 220字
- 2021-08-20 10:32:23
Why Hadoop plus Spark?
Apache Spark shines better when it is combined with Hadoop. To understand this, let's take a look at Hadoop and Spark features.
Hadoop features

Spark features

When both frameworks are combined, we get the power of enterprise-grade applications with in-memory performance, as shown in Figure 2.11:

Figure 2.11: Spark applications on the Hadoop platform
Frequently asked questions about Spark
The following are frequent questions that practitioners raise about Spark:
- My dataset does not fit in-memory. How can I use Spark?
Spark's operators spill the data to disk if it does not fit in-memory, allowing it to run on data of any size. Likewise, cached datasets that do not fit in-memory are either spilled to disk or recomputed on the fly when needed, as determined by the RDD's storage level. By default, Spark will recompute the partitions that don't fit in-memory. The storage level can be changed as MEMORY_AND_DISK to spill partitions to disk.
Figure 2.12 shows you the performance difference between fully cached and on disk:
Figure 2.12: Spark performance: Fully cached versus disk
- How does fault recovery work in Spark?
Spark's built-in fault tolerance based on the RDD lineage will automatically recover from failures. Figure 2.13 shows you the performance over failure in the 6th iteration in a k-means algorithm:
Figure 2.13: Fault recovery performance
- 大學(xué)計(jì)算機(jī)應(yīng)用基礎(chǔ)實(shí)踐教程
- Java異步編程實(shí)戰(zhàn)
- 簡(jiǎn)單高效LATEX
- Building a RESTful Web Service with Spring
- JavaScript 網(wǎng)頁(yè)編程從入門(mén)到精通 (清華社"視頻大講堂"大系·網(wǎng)絡(luò)開(kāi)發(fā)視頻大講堂)
- 精通Scrapy網(wǎng)絡(luò)爬蟲(chóng)
- NGINX Cookbook
- 常用工具軟件立體化教程(微課版)
- Java Web開(kāi)發(fā)詳解
- 詳解MATLAB圖形繪制技術(shù)
- 并行編程方法與優(yōu)化實(shí)踐
- Python 3 Object:oriented Programming(Second Edition)
- 深度學(xué)習(xí)程序設(shè)計(jì)實(shí)戰(zhàn)
- Arduino電子設(shè)計(jì)實(shí)戰(zhàn)指南:零基礎(chǔ)篇
- Learning Cocos2d-JS Game Development