- Apache Spark 2.x for Java Developers
- Sourav Gulati Sumit Kumar
- 180字
- 2021-07-02 19:01:52
HDFS I/O
An HDFS read operation from a client involves the following:
- The client requests NameNode to determine where the actual data blocks are stored for a given file.
- NameNode obliges by providing the block IDs and locations of the hosts (DataNode) where the data can be found.
- The client contacts DataNode with the respective block IDs to fetch the data from DataNode while preserving the order of the block files.

An HDFS write operation from a client involves the following:
- The client contacts NameNode to update the namespace with the filename and verify the necessary permissions.
- If the file exists, then NameNode throws an error; otherwise, it returns the client FSDataOutputStream which points to the data queue.
- The data queue negotiates with the NameNode to allocate new blocks on suitable DataNodes.
- The data is then copied to that DataNode, and, as per the replication strategy, the data is further copied from that DataNode to the rest of the DataNodes.
- It's important to note that the data is never moved through the NameNode as it would caused a performance bottleneck.
推薦閱讀
- Java程序設計(慕課版)
- Facebook Application Development with Graph API Cookbook
- Xcode 7 Essentials(Second Edition)
- Spring Boot+Spring Cloud+Vue+Element項目實戰:手把手教你開發權限管理系統
- MariaDB High Performance
- Java持續交付
- 游戲程序設計教程
- 微信公眾平臺開發:從零基礎到ThinkPHP5高性能框架實踐
- HTML5與CSS3基礎教程(第8版)
- Multithreading in C# 5.0 Cookbook
- PHP+Ajax+jQuery網站開發項目式教程
- Node.js開發指南
- C指針原理揭秘:基于底層實現機制
- 從Excel到Python數據分析:Pandas、xlwings、openpyxl、Matplotlib的交互與應用
- SEO教程:搜索引擎優化入門與進階(第3版)