官术网_书友最值得收藏!

Using hdfs3 with HDFS

hdfs3 is a lightweight Python wrapper around the C/C++ libhdfs3 library. It allows us to use HDFS natively from Python. To start, we first need to connect with the HDFS NameNode; this is done using the HDFileSystem class:

from hdfs3 import HDFileSystem
hdfs = HDFileSystem(host = 'localhost', port=8020)

This automatically establishes a connection with the NameNode. Now, we can access a directory listing using the following:

print(hdfs.ls('/tmp')) 

This will list all the files and directories in the tmp folder. You can use functions such as mkdir to make a directory and cp to copy a file from one location to another. To write into a file, we open it first using the open method and use write:

with hdfs.open('/tmp/file1.txt','wb') as f:
f.write(b'You are Awesome!')

Data can be read from the file:

with hdfs.open('/tmp/file1.txt') as f:
print(f.read())

You can learn more about hdfs3 from its documentation: https://media.readthedocs.org/pdf/hdfs3/latest/hdfs3.pdf

主站蜘蛛池模板: 定陶县| 定安县| 蒲城县| 玉树县| 响水县| 墨竹工卡县| 抚州市| 石景山区| 利川市| 商水县| 沂水县| 马鞍山市| 怀来县| 娄底市| 洞口县| 芜湖市| 体育| 和平县| 滦南县| 鱼台县| 长垣县| 普兰店市| 历史| 文安县| 如皋市| 大新县| 屏南县| 达孜县| 若羌县| 商丘市| 岳阳县| 谢通门县| 尼玛县| 资兴市| 新昌县| 东辽县| 墨竹工卡县| 通河县| 阜新市| 沙湾县| 衡南县|