官术网_书友最值得收藏!

Time for action – formatting the NameNode

Before starting Hadoop in either pseudo-distributed or fully distributed mode for the first time, we need to format the HDFS filesystem that it will use. Type the following:

$ hadoop namenode -format

The output of this should look like the following:

$ hadoop namenode -format
12/10/26 22:45:25 INFO namenode.NameNode: STARTUP_MSG: 
/************************************************************
STARTUP_MSG: Starting NameNode
STARTUP_MSG: host = vm193/10.0.0.193
STARTUP_MSG: args = [-format]

12/10/26 22:45:25 INFO namenode.FSNamesystem: fsOwner=hadoop,hadoop
12/10/26 22:45:25 INFO namenode.FSNamesystem: supergroup=supergroup
12/10/26 22:45:25 INFO namenode.FSNamesystem: isPermissionEnabled=true
12/10/26 22:45:25 INFO common.Storage: Image file of size 96 saved in 0 seconds.
12/10/26 22:45:25 INFO common.Storage: Storage directory /var/lib/hadoop-hadoop/dfs/name has been successfully formatted.
12/10/26 22:45:26 INFO namenode.NameNode: SHUTDOWN_MSG: 
/************************************************************
SHUTDOWN_MSG: Shutting down NameNode at vm193/10.0.0.193
$ 

What just happened?

This is not a very exciting output because the step is only an enabler for our future use of HDFS. However, it does help us think of HDFS as a filesystem; just like any new storage device on any operating system, we need to format the device before we can use it. The same is true for HDFS; initially there is a default location for the filesystem data but no actual data for the equivalents of filesystem indexes.

Note

Do this every time!

If your experience with Hadoop has been similar to the one I have had, there will be a series of simple mistakes that are frequently made when setting up new installations. It is very easy to forget about the formatting of the NameNode and then get a cascade of failure messages when the first Hadoop activity is tried.

But do it only once!

The command to format the NameNode can be executed multiple times, but in doing so all existing filesystem data will be destroyed. It can only be executed when the Hadoop cluster is shut down and sometimes you will want to do it but in most other cases it is a quick way to irrevocably delete every piece of data on HDFS; it does take much longer on large clusters. So be careful!

Starting and using Hadoop

After all that configuration and setup, let's now start our cluster and actually do something with it.

主站蜘蛛池模板: 安阳县| 山阳县| 桃江县| 菏泽市| 固始县| 大英县| 青河县| 西城区| 武邑县| 武夷山市| 宜州市| 周口市| 阳城县| 清河县| 满洲里市| 盖州市| 宜兰县| 乌苏市| 玛沁县| 亚东县| 南雄市| 蕲春县| 清徐县| 德保县| 威宁| 长沙市| 盐山县| 天镇县| 阿城市| 冕宁县| 晋中市| 古田县| 常宁市| 云梦县| 遵义市| 礼泉县| 密云县| 镇沅| 股票| 溧水县| 上栗县|