- Hands-On Artificial Intelligence for IoT
- Amita Kapoor
- 141字
- 2021-07-02 14:02:02
Using PyArrow's filesystem interface for HDFS
PyArrow has a C++-based interface for HDFS. By default, it uses libhdfs, a JNI-based interface, for the Java Hadoop client. Alternatively, we can also use libhdfs3, a C++ library for HDFS. We connect to the NameNode using hdfs.connect:
import pyarrow as pa
hdfs = pa.hdfs.connect(host='hostname', port=8020, driver='libhdfs')
If we change the driver to libhdfs3, we will be using the C++ library for HDFS from Pivotal Labs. Once the connection to the NameNode is made, the filesystem is accessed using the same methods as for hdfs3.
HDFS is preferred when the data is extremely large. It allows us to read and write data in chunks; this is helpful for accessing and processing streaming data. A nice comparison of the three native RPC client interfaces is presented in the following blog post: http://wesmckinney.com/blog/python-hdfs-interfaces/.
- 構建高質量的C#代碼
- 錯覺:AI 如何通過數據挖掘誤導我們
- Dreamweaver 8中文版商業案例精粹
- Windows XP中文版應用基礎
- Visual C# 2008開發技術詳解
- Mastering Machine Learning Algorithms
- 自動生產線的拆裝與調試
- 中國戰略性新興產業研究與發展·工業機器人
- Nginx高性能Web服務器詳解
- Troubleshooting OpenVPN
- Working with Linux:Quick Hacks for the Command Line
- 人工智能技術入門
- 所羅門的密碼
- ADuC系列ARM器件應用技術
- Appcelerator Titanium Smartphone App Development Cookbook(Second Edition)