官术网_书友最值得收藏!

Running the WordCount program in a distributed cluster environment

This recipe describes how to run a MapReduce computation in a distributed Hadoop v2 cluster.

Getting ready

Start the Hadoop cluster by following the Setting up HDFS recipe or the Setting up Hadoop ecosystem in a distributed cluster environment using a Hadoop distribution recipe.

How to do it...

Now let's run the WordCount sample in the distributed Hadoop v2 setup:

  1. Upload the wc-input directory in the source repository to the HDFS filesystem. Alternatively, you can upload any other set of text documents as well.
    $ hdfs dfs -copyFromLocal wc-input .
    
  2. Execute the WordCount example from the HADOOP_HOME directory:
    $ hadoop jar hcb-c1-samples.jar \
    chapter1.WordCount \
    wc-input wc-output
    
  3. Run the following commands to list the output directory and then look at the results:
    $hdfs dfs -ls wc-output
    Found 3 items
    -rw-r--r-- 1 joesupergroup0 2013-11-09 09:04 /data/output1/_SUCCESS
    drwxr-xr-x - joesupergroup0 2013-11-09 09:04 /data/output1/_logs
    -rw-r--r-- 1 joesupergroup1306 2013-11-09 09:04 /data/output1/part-r-00000
    
    $ hdfs dfs -cat wc-output/part*
    

How it works...

When we submit a job, YARN would schedule a MapReduce ApplicationMaster to coordinate and execute the computation. ApplicationMaster requests the necessary resources from the ResourceManager and executes the MapReduce computation using the containers it received from the resource request.

There's more...

You can also see the results of the WordCount application through the HDFS monitoring UI by visiting http://NAMANODE:50070.

主站蜘蛛池模板: 临漳县| 五大连池市| 阿拉善盟| 中西区| 洪泽县| 合川市| 米泉市| 寿光市| 丰宁| 天水市| 通化县| 荣昌县| 武汉市| 嘉兴市| 沂南县| 乌拉特前旗| 姜堰市| 拉萨市| 徐汇区| 莒南县| 长葛市| 济南市| 辰溪县| 珲春市| 宾阳县| 乌兰察布市| 广宁县| 丘北县| 桂东县| 沭阳县| 东港市| 临泉县| 于田县| 砚山县| 鄄城县| 昭苏县| 和龙市| 奉新县| 芦山县| 阿尔山市| 大埔县|