- Hadoop 2.x Administration Cookbook
- Gurmukh Singh
- 270字
- 2021-07-09 20:10:30
Distcp usage
In Hadoop, we deal with large data, so performing a simple copy operation might not be the optimal thing to do. Imagine copying a 1 TB file from one cluster to another, or within the same cluster to a different path, and after 50% of the copy operation it times out. In this situation, the copy has to be started from the beginning.
Getting ready
This recipe shows the steps needed to copy files within and across the cluster. Ensure that the user has a running cluster with YARN configured to run MapReduce, as discussed in Chapter 1, Hadoop Architecture and Deployment.
For this recipe, there is no configuration needed to run Distcp
; just make sure HDFS and YARN is up and running.
How to do it...
- ssh to Namenode or the edge node and execute the following command to copy the
projects
directory to thenew
directory:$ hadoop distcp /projects /new
- The preceding command will submit a MapReduce job to the cluster, and once the job finishes we can see the data copied at the destination.
- We can perform an incremental copy as well by using the following command:
- The copy can be performed across clusters as a backup, or simply to move data from one cluster to another:
$ hadoop distcp hdfs://master1.cyrus.com:9000/projects hdfs://nn1.cluster1.com:9000/projects
推薦閱讀
- 腦動力:Linux指令速查效率手冊
- Hands-On Machine Learning on Google Cloud Platform
- Hands-On Neural Networks with Keras
- 圖解PLC控制系統梯形圖和語句表
- Apache Hive Essentials
- Maya 2012從入門到精通
- 控制系統計算機仿真
- Kubernetes for Serverless Applications
- Linux嵌入式系統開發
- Mastering Text Mining with R
- Linux系統管理員工具集
- 筆記本電腦維修之電路分析基礎
- 人工智能:智能人機交互
- 單片機C51應用技術
- 菜鳥起飛五筆打字高手