- Hadoop Real-World Solutions Cookbook(Second Edition)
- Tanmay Deshpande
- 314字
- 2021-07-09 20:02:50
Changing the replication factor of an existing file in HDFS
In this recipe, we are going to take a look at how to change the replication factor of a file in HDFS. The default replication factor is 3.
Getting ready
To perform this recipe, you should already have a running Hadoop cluster.
How to do it...
Sometimes. there might be a need to increase or decrease the replication factor of a specific file in HDFS. In this case, we'll use the setrep
command.
This is how you can use the command:
hadoop fs -setrep [-R] [-w] <noOfReplicas><path> ...
In this command, a path can either be a file or directory; if its a directory, then it recursively sets the replication factor for all replicas.
- The
w
option flags the command and should wait until the replication is complete - The
r
option is accepted for backward compatibility
First, let's check the replication factor of the file we copied to HDFS in the previous recipe:
hadoop fs -ls /mydir1/LICENSE.txt -rw-r--r-- 3 ubuntu supergroup 15429 2015-10-29 03:04 /mydir1/LICENSE.txt
Once you list the file, it will show you the read/write permissions on this file, and the very next parameter is the replication factor. We have the replication factor set to 3 for our cluster, hence, you the number is 3.
Let's change it to 2
using this command:
hadoop fs -setrep -w 2 /mydir1/LICENSE.txt
It will wait till the replication is adjusted. Once done, you can verify this again by running the ls command:
hadoop fs -ls /mydir1/LICENSE.txt -rw-r--r-- 2 ubuntu supergroup 15429 2015-10-29 03:04 /mydir1/LICENSE.txt
How it works...
Once the setrep
command is executed, NameNode
will be notified, and then NameNode
decides whether the replicas need to be increased or decreased from certain DataNode
. When you are using the –w
command, sometimes, this process may take too long if the file size is too big.