- Hadoop MapReduce v2 Cookbook(Second Edition)
- Thilina Gunarathne
- 513字
- 2021-07-23 20:32:53
Setting up Hadoop YARN in a distributed cluster environment using Hadoop v2
Hadoop v2 YARN deployment includes deploying the ResourceManager service on the master node and deploying NodeManager services in the slave nodes. YARN ResourceManager is the service that arbitrates all the resources of the cluster, and NodeManager is the service that manages the resources in a single node.
Hadoop MapReduce applications can run on YARN using a YARN ApplicationMaster to coordinate each job and a set of resource containers to run the Map and Reduce tasks.
Tip
Installing Hadoop directly using Hadoop release artifacts, as mentioned in this recipe, is recommended for development testing and for advanced use cases only. For regular production clusters, we recommend using a packaged Hadoop distribution as mentioned in the Setting up Hadoop ecosystem in a distributed cluster environment using a Hadoop distribution recipe. Packaged Hadoop distributions make it much easier to install, configure, maintain, and update the components of the Hadoop ecosystem.
Getting ready
You can follow this recipe either using a single machine as a pseudo-distributed installation or using a multiple machine cluster. If you are using multiple machines, you should choose one machine as the master node where you will run the HDFS NameNode and the YARN ResourceManager. If you are using a single machine, use it as both the master node as well as the slave node.
Set up HDFS by following the Setting up HDFS recipe.
How to do it...
Let's set up Hadoop YARN by setting up the YARN ResourceManager and the NodeManagers.
- In each machine, create a directory named local inside
{HADOOP_DATA_DIR}, which
you created in the Setting up HDFS recipe. Change the directory permissions to755
. - Add the following to the
{HADOOP_HOME}/etc/hadoop/mapred-site.xml
template and save it as{HADOOP_HOME}/etc/hadoop/mapred-site.xml
:<property> <name>fs.default.name</name> <value>hdfs://localhost:9000</value> </property>
- Add the following to the
{HADOOP_HOME}/etc/hadoop/yarn-site.xml
file:<property> <name>yarn.nodemanager.aux-services</name> <value>mapreduce_shuffle</value> </property> <property> <name>yarn.nodemanager.aux-services.mapreduce_shuffle.class</name> <value>org.apache.hadoop.mapred.ShuffleHandler</value> </property>
- Start HDFS using the following command:
$ $HADOOP_HOME/sbin/start-dfs.sh
- Run the following command to start the YARN services:
$ $HADOOP_HOME/sbin/start-yarn.sh starting yarn daemons starting resourcemanager, logging to ……… xxx.xx.xxx.xxx: starting nodemanager, logging to ………
- Run the following command to start the MapReduce JobHistoryServer. This enables the web console for MapReduce job histories:
$ $HADOOP_HOME/sbin/mr-jobhistory-daemon.sh start historyserver
- Verify the installation by listing the processes through the
jps
command. The master node will list the NameNode, ResourceManager, and JobHistoryServer services. The slave nodes will list DataNode and NodeManager services:$ jps 27084 NameNode 2073 JobHistoryServer 2106 Jps 2588 1536 ResourceManager
- Visit the web-based monitoring pages for ResourceManager available at
http://{MASTER_NODE}:8088/
.
How it works...
As described in the introduction to the chapter, Hadoop v2 installation consists of HDFS nodes, YARN ResourceManager, and worker nodes. When we start the NameNode, it finds slaves through the HADOOP_HOME/slaves
file and uses SSH to start the DataNodes in the remote server at the startup. Also, when we start ResourceManager, it finds slaves through the HADOOP_HOME/slaves
file and starts NodeManagers.
See also
The Setting up Hadoop ecosystem in a distributed cluster environment using a Hadoop distribution recipe explores how to use a packaged Hadoop distribution to install the Hadoop ecosystem in your cluster.
- Java語(yǔ)言程序設(shè)計(jì)
- Boost.Asio C++ Network Programming(Second Edition)
- 自己動(dòng)手寫搜索引擎
- Oracle 11g從入門到精通(第2版) (軟件開發(fā)視頻大講堂)
- 騰訊iOS測(cè)試實(shí)踐
- Mastering QGIS
- The Computer Vision Workshop
- Julia Cookbook
- HDInsight Essentials(Second Edition)
- Java程序設(shè)計(jì)入門
- Java EE架構(gòu)設(shè)計(jì)與開發(fā)實(shí)踐
- Spring Data JPA從入門到精通
- Java程序設(shè)計(jì)實(shí)用教程(第2版)
- 從零開始構(gòu)建深度前饋神經(jīng)網(wǎng)絡(luò):Python+TensorFlow 2.x
- 人件集:人性化的軟件開發(fā)