官术网_书友最值得收藏!

Time for action – formatting the NameNode

Before starting Hadoop in either pseudo-distributed or fully distributed mode for the first time, we need to format the HDFS filesystem that it will use. Type the following:

$ hadoop namenode -format

The output of this should look like the following:

$ hadoop namenode -format
12/10/26 22:45:25 INFO namenode.NameNode: STARTUP_MSG: 
/************************************************************
STARTUP_MSG: Starting NameNode
STARTUP_MSG: host = vm193/10.0.0.193
STARTUP_MSG: args = [-format]

12/10/26 22:45:25 INFO namenode.FSNamesystem: fsOwner=hadoop,hadoop
12/10/26 22:45:25 INFO namenode.FSNamesystem: supergroup=supergroup
12/10/26 22:45:25 INFO namenode.FSNamesystem: isPermissionEnabled=true
12/10/26 22:45:25 INFO common.Storage: Image file of size 96 saved in 0 seconds.
12/10/26 22:45:25 INFO common.Storage: Storage directory /var/lib/hadoop-hadoop/dfs/name has been successfully formatted.
12/10/26 22:45:26 INFO namenode.NameNode: SHUTDOWN_MSG: 
/************************************************************
SHUTDOWN_MSG: Shutting down NameNode at vm193/10.0.0.193
$ 

What just happened?

This is not a very exciting output because the step is only an enabler for our future use of HDFS. However, it does help us think of HDFS as a filesystem; just like any new storage device on any operating system, we need to format the device before we can use it. The same is true for HDFS; initially there is a default location for the filesystem data but no actual data for the equivalents of filesystem indexes.

Note

Do this every time!

If your experience with Hadoop has been similar to the one I have had, there will be a series of simple mistakes that are frequently made when setting up new installations. It is very easy to forget about the formatting of the NameNode and then get a cascade of failure messages when the first Hadoop activity is tried.

But do it only once!

The command to format the NameNode can be executed multiple times, but in doing so all existing filesystem data will be destroyed. It can only be executed when the Hadoop cluster is shut down and sometimes you will want to do it but in most other cases it is a quick way to irrevocably delete every piece of data on HDFS; it does take much longer on large clusters. So be careful!

Starting and using Hadoop

After all that configuration and setup, let's now start our cluster and actually do something with it.

主站蜘蛛池模板: 上思县| 故城县| 东阳市| 恭城| 淳化县| 醴陵市| 上虞市| 宁夏| 科尔| 崇阳县| 义乌市| 镇康县| 丁青县| 武城县| 阳西县| 武邑县| 宜丰县| 华阴市| 卓尼县| 富川| 东光县| 色达县| 原平市| 诸暨市| 黄山市| 黑水县| 吴堡县| 大荔县| 蒲城县| 蒙山县| 西安市| 海兴县| 浪卡子县| 隆昌县| 咸阳市| 图木舒克市| 淮南市| 昌图县| 门源| 四川省| 宣恩县|