官术网_书友最值得收藏!

Using hdfs3 with HDFS

hdfs3 is a lightweight Python wrapper around the C/C++ libhdfs3 library. It allows us to use HDFS natively from Python. To start, we first need to connect with the HDFS NameNode; this is done using the HDFileSystem class:

from hdfs3 import HDFileSystem
hdfs = HDFileSystem(host = 'localhost', port=8020)

This automatically establishes a connection with the NameNode. Now, we can access a directory listing using the following:

print(hdfs.ls('/tmp')) 

This will list all the files and directories in the tmp folder. You can use functions such as mkdir to make a directory and cp to copy a file from one location to another. To write into a file, we open it first using the open method and use write:

with hdfs.open('/tmp/file1.txt','wb') as f:
f.write(b'You are Awesome!')

Data can be read from the file:

with hdfs.open('/tmp/file1.txt') as f:
print(f.read())

You can learn more about hdfs3 from its documentation: https://media.readthedocs.org/pdf/hdfs3/latest/hdfs3.pdf

主站蜘蛛池模板: 泰来县| 云龙县| 芮城县| 油尖旺区| 奉化市| 玉山县| 隆林| 岳阳市| 吕梁市| 绥江县| 澳门| 武义县| 浦东新区| 南陵县| 嘉峪关市| 长岛县| 沭阳县| 南郑县| 乐清市| 临沭县| 安多县| 黄骅市| 亚东县| 佛教| 长乐市| 兴安县| 贵定县| 康保县| 池州市| 古丈县| 叶城县| 玛纳斯县| 漳平市| 双鸭山市| 定南县| 同江市| 行唐县| 阳信县| 青田县| 敖汉旗| 平凉市|