官术网_书友最值得收藏!

Recycling deleted data from trash to HDFS

In this recipe, we are going to see how to recover deleted data from the trash to HDFS.

Getting ready

To perform this recipe, you should already have a running Hadoop cluster.

How to do it...

To recover accidently deleted data from HDFS, we first need to enable the trash folder, which is not enabled by default in HDFS. This can be achieved by adding the following property to core-site.xml:

<property>
    <name>fs.trash.interval</name>
    <value>120</value>
</property>

Then, restart the HDFS daemons:

/usr/local/hadoop/sbin/stop-dfs.sh
/usr/local/hadoop/sbin/start-dfs.sh

This will set the deleted file retention to 120 minutes.

Now, let's try to delete a file from HDFS:

hadoop fs -rmr /LICENSE.txt
 15/10/30 10:26:26 INFO fs.TrashPolicyDefault: Namenode trash configuration: Deletion interval = 120 minutes, Emptier interval = 0 minutes.
 Moved: 'hdfs://localhost:9000/LICENSE.txt' to trash at: hdfs://localhost:9000/user/ubuntu/.Trash/Current

We have 120 minutes to recover this file before it is permanently deleted from HDFS. To restore the file to its original location, we can execute the following commands.

First, let's confirm whether the file exists:

hadoop fs -ls /user/ubuntu/.Trash/Current
 Found 1 items
 -rw-r--r-- 1 ubuntu supergroup 15429 2015-10-30 10:26 /user/ubuntu/.Trash/Current/LICENSE.txt

Now, restore the deleted file or folder; it's better to use the distcp command instead of copying each file one by one:

hadoop distcp hdfs

//localhost:9000/user/ubuntu/.Trash/Current/LICENSE.txt hdfs://localhost:9000/

This will start a map reduce job to restore data from the trash to the original HDFS folder. Check the HDFS path; the deleted file should be back to its original form.

How it works...

Enabling trash enforces the file retention policy for a specified amount of time. So, when trash is enabled, HDFS does not execute any blocks deletions or movements immediately but only updates the metadata of the file and its location. This way, we can accidently stop deleting files from HDFS; make sure that trash is enabled before experimenting with this recipe.

主站蜘蛛池模板: 禄丰县| 清镇市| 吉安县| 锦屏县| 阿合奇县| 巩留县| 洛隆县| 平塘县| 阳江市| 格尔木市| 新竹市| 古浪县| 三河市| 惠安县| 刚察县| 从江县| 灵寿县| 巴塘县| 郑州市| 宕昌县| 屏南县| 称多县| 钟山县| 新河县| 咸丰县| 定西市| 大庆市| 监利县| 凌海市| 孝义市| 宁乡县| 聂荣县| 成安县| 曲阜市| 富川| 徐汇区| 和平区| 夹江县| 什邡市| 富川| 尖扎县|