- Mastering Apache Spark 2.x(Second Edition)
- Romeo Kienzler
- 249字
- 2021-07-02 18:55:26
Hadoop Distributed File System
You might consider using an alternative to HDFS, depending upon your cluster requirements. For instance, IBM has the GPFS (General Purpose File System) for improved performance.
The reason why GPFS might be a better choice is that, coming from the high performance computing background, this filesystem has a full read write capability, whereas HDFS is designed as a write once, read many filesystem. It offers an improvement in performance over HDFS because it runs at the kernel level as opposed to HDFS, which runs in a Java Virtual Machine (JVM) that in turn runs as an operating system process. It also integrates with Hadoop and the Spark cluster tools. IBM runs setups with several hundred petabytes using GPFS.
Another commercial alternative is the MapR file system that, besides performance improvements, supports mirroring, snapshots, and high availability.
Ceph is an open source alternative to a distributed, fault-tolerant, and self-healing filesystem for commodity hard drives like HDFS. It runs in the Linux kernel as well and addresses many of the performance issues that HDFS has. Other promising candidates in this space are Alluxio (formerly Tachyon), Quantcast, GlusterFS, and Lustre.
Finally, Cassandra is not a filesystem but a NoSQL key value store and is tightly integrated with Apache Spark and is therefore traded as a valid and powerful alternative to HDFS--or even to any other distributed filesystem--especially as it supports predicate push-down using ApacheSparkSQL and the Catalyst optimizer, which we will cover in the following chapters.
- 零基礎搭建量化投資系統:以Python為工具
- 無代碼編程:用云表搭建企業數字化管理平臺
- Web交互界面設計與制作(微課版)
- RTC程序設計:實時音視頻權威指南
- MySQL數據庫管理與開發(慕課版)
- JavaScript入門經典
- Java程序設計
- C#實踐教程(第2版)
- Serverless computing in Azure with .NET
- C語言程序設計實驗指導 (第2版)
- 從零開始學C語言
- Mastering Data Mining with Python:Find patterns hidden in your data
- 從Excel到Python數據分析:Pandas、xlwings、openpyxl、Matplotlib的交互與應用
- CodeIgniter Web Application Blueprints
- 超簡單:用Python讓Excel飛起來(實戰150例)