- Hadoop MapReduce v2 Cookbook(Second Edition)
- Thilina Gunarathne
- 224字
- 2021-07-23 20:32:53
Running the WordCount program in a distributed cluster environment
This recipe describes how to run a MapReduce computation in a distributed Hadoop v2 cluster.
Getting ready
Start the Hadoop cluster by following the Setting up HDFS recipe or the Setting up Hadoop ecosystem in a distributed cluster environment using a Hadoop distribution recipe.
How to do it...
Now let's run the WordCount sample in the distributed Hadoop v2 setup:
- Upload the
wc-input
directory in the source repository to the HDFS filesystem. Alternatively, you can upload any other set of text documents as well.$ hdfs dfs -copyFromLocal wc-input .
- Execute the WordCount example from the
HADOOP_HOME
directory:$ hadoop jar hcb-c1-samples.jar \ chapter1.WordCount \ wc-input wc-output
- Run the following commands to list the output directory and then look at the results:
$hdfs dfs -ls wc-output Found 3 items -rw-r--r-- 1 joesupergroup0 2013-11-09 09:04 /data/output1/_SUCCESS drwxr-xr-x - joesupergroup0 2013-11-09 09:04 /data/output1/_logs -rw-r--r-- 1 joesupergroup1306 2013-11-09 09:04 /data/output1/part-r-00000 $ hdfs dfs -cat wc-output/part*
How it works...
When we submit a job, YARN would schedule a MapReduce ApplicationMaster to coordinate and execute the computation. ApplicationMaster requests the necessary resources from the ResourceManager and executes the MapReduce computation using the containers it received from the resource request.
There's more...
You can also see the results of the WordCount application through the HDFS monitoring UI by visiting http://NAMANODE:50070
.
推薦閱讀
- Reactive Programming with Swift
- SQL語言從入門到精通
- Data Analysis with IBM SPSS Statistics
- R的極客理想:工具篇
- Python機器學習經典實例
- 網站構建技術
- 蘋果的產品設計之道:創建優秀產品、服務和用戶體驗的七個原則
- C#開發案例精粹
- Mastering Linux Security and Hardening
- Python深度學習原理、算法與案例
- 基于SpringBoot實現:Java分布式中間件開發入門與實戰
- 零基礎學C語言程序設計
- Hadoop 2.X HDFS源碼剖析
- 軟件工程與UML案例解析(第三版)
- Python網絡爬蟲實例教程(視頻講解版)