- Apache Spark 2.x for Java Developers
- Sourav Gulati Sumit Kumar
- 180字
- 2021-07-02 19:01:52
HDFS I/O
An HDFS read operation from a client involves the following:
- The client requests NameNode to determine where the actual data blocks are stored for a given file.
- NameNode obliges by providing the block IDs and locations of the hosts (DataNode) where the data can be found.
- The client contacts DataNode with the respective block IDs to fetch the data from DataNode while preserving the order of the block files.

An HDFS write operation from a client involves the following:
- The client contacts NameNode to update the namespace with the filename and verify the necessary permissions.
- If the file exists, then NameNode throws an error; otherwise, it returns the client FSDataOutputStream which points to the data queue.
- The data queue negotiates with the NameNode to allocate new blocks on suitable DataNodes.
- The data is then copied to that DataNode, and, as per the replication strategy, the data is further copied from that DataNode to the rest of the DataNodes.
- It's important to note that the data is never moved through the NameNode as it would caused a performance bottleneck.
推薦閱讀
- 零基礎學Scratch少兒編程:小學課本中的Scratch創意編程
- Leap Motion Development Essentials
- JavaScript 網頁編程從入門到精通 (清華社"視頻大講堂"大系·網絡開發視頻大講堂)
- 老“碼”識途
- 實戰Java高并發程序設計(第3版)
- Jenkins Continuous Integration Cookbook(Second Edition)
- C/C++程序員面試指南
- AIRIOT物聯網平臺開發框架應用與實戰
- C語言從入門到精通
- iPhone應用開發從入門到精通
- 大話Java:程序設計從入門到精通
- Scala編程實戰
- 數字媒體技術概論
- 深入淺出 HTTPS:從原理到實戰
- 企業級Java現代化:寫給開發者的云原生簡明指南