官术网_书友最值得收藏!

HDFS

HDFS is a popular storage and access method for storing and retrieving data files for IoT solutions. The HDFS format can hold large amounts of data in a reliable and scalable manner. Its design is based on the Google File System (https://ai.google/research/pubs/pub51). HDFS splits individual files into fixed-size blocks that are stored on machines across the cluster. To ensure reliability, it replicates the file blocks and distributes them across the cluster; by default, the replication factor is 3. HDFS has two main architecture components:

  • The first, NodeName, stores the metadata for the entire filesystem, such as filenames, their permissions, and the location of each block of each file. 
  • The second, DataNode (one or more), is where file blocks are stored. It performs Remote Procedure Calls (RPCs) using protobufs.

RPC is a protocol that one program can use to request a service from a program located on another computer on a network without having to know the network's details. A procedure call is also sometimes known as a function call or a subroutine call.

There are many options for programmatically accessing HDFS in Python, such as snakebite, pyarrow, hdfs3, pywebhdfs, hdfscli, and so on. In this section, we will focus mainly on libraries that provide native RPC client interfaces and work with Python 3.

Snakebite is a pure Python module and CLI that allows you to access HDFS from Python programs.  At present, it only works with Python 2; Python 3 is not supported. Moreover, i t does not yet support write operations, and so we are not including it in the book. However, if you are interested in knowing more about this, you can refer to Spotify's GitHub: https://github.com/spotify/snakebite.
主站蜘蛛池模板: 隆回县| 大余县| 北宁市| 张掖市| 郯城县| 平安县| 陇南市| 锡林浩特市| 兰溪市| 石城县| 且末县| 澎湖县| 垫江县| 南溪县| 皋兰县| 德惠市| 公安县| 安龙县| 洛宁县| 镇雄县| 盘锦市| 平泉县| 东丰县| 灯塔市| 霍城县| 阿拉尔市| 雅安市| 台南市| 尉氏县| 扎鲁特旗| 新丰县| 扶绥县| 石阡县| 盘锦市| 澄迈县| 灌南县| 黄梅县| 锡林浩特市| 台湾省| 汨罗市| 米泉市|