官术网_书友最值得收藏!

HDFS health and FSCK

The health of the filesystem is very important for data retrieval and optimal performance. In a distributed system, it becomes more critical to maintain the good health of the HDFS filesystem so as to ensure block replication and near-parallel streaming of data blocks.

In this recipe, we will see how to check the health of the filesystem and do repairs, if any are needed.

Getting ready

Make sure you have a running cluster that has already been up for a few days with data. We can run the commands on a new cluster as well, but for the sake of this lab, it will give you more insights if it is run on a cluster with a large dataset.

How to do it...

  1. ssh to the master1.cyrus.com Namenode and change the user to hadoop.
  2. To check the HDFS root filesystem, execute the hdfs fsck / command, as shown in the following screenshot:
    How to do it...
  3. We can also check the status of just one file instead of the entire filesystem, as shown in the following screenshot:
    How to do it...
  4. The output of the fsck command will show the blocks for a file, the replication status, whether blocks are corrupted, and many more details, as shown in the following screenshot:
    How to do it...
  5. We can also look at how the blocks of a file are laid across the cluster using the commands as shown in the following screenshot:
    How to do it...
  6. In the cluster named cyrus, you can see that there are some corrupt blocks. We can simulate this by manually deleting a block of a file on the lower filesystem. Each of the HDFS blocks, is a file at the lower filesystem such as EXT4.
    How to do it...
  7. The corrupt blocks can be fixed by deleting them, and for an under replicated block we can use the hdfs dfs -setrep 2 /input/new.txt command, so that a particular file is set to the desired number of replications. If we need to set many files to a specified number of replications, just loop through the list and do a setrep on them.

How it works...

The hdfs fsck /command is similar to the Linux fsck command. In Hadoop, it does not repair the filesystem automatically and needs a manual intervention. To see what options there are for this command, please use the hdfs fsck –help help command.

See also

  • The Configuring rack awareness recipe
主站蜘蛛池模板: 微山县| 南岸区| 屏东市| 会同县| 舞阳县| 自治县| 栖霞市| 虎林市| 徐水县| 香格里拉县| 延吉市| 嘉兴市| 汾西县| 蒲城县| 辉县市| 丰宁| 瑞金市| 虹口区| 富顺县| 大田县| 报价| 从化市| 阿克陶县| 绥芬河市| 秭归县| 岚皋县| 正定县| 金湖县| 诸暨市| 扶余县| 铁岭县| 海丰县| 齐河县| 高密市| 海盐县| 高邑县| 罗平县| 苏尼特右旗| 曲靖市| 加查县| 双峰县|