- Hadoop 2.x Administration Cookbook
- Gurmukh Singh
- 424字
- 2021-07-09 20:10:31
Configuring YARN history server
Whenever a MapReduce job runs, it launches containers on multiple nodes and the logs for that container are only written on that particular node. If the user needs details of the job, he needs to go to all the nodes to fetch the logs, which could be very tedious in large clusters.
A better approach will be to aggregate the logs at a common location once the job finishes and then it can be accessed using a web server or other means. To address this, History Server was introduced in Hadoop, to aggregate logs and provide a Web UI, for users to see logs for all the containers of a job at one place.
Getting ready
You need to have a running cluster with YARN set up and should have completed the previous recipe to make sure the cluster is working fine in terms of HDFS and YARN.
The following steps will guide you through the process of setting up Job history server.
How to do it...
- Connect to the ResourceManager node, which is the YARN master and switch to user
hadoop
. - Navigate to the directory
/opt/cluster/hadoop/etc/hadoop
. - Edit the
yarn-site.xml
file to add the following configurations, as shown in the upcoming steps and screenshots. - Firstly, enable
yarn.log
aggregation using the following parameter:<property> <name>yarn.log-aggregation-enable</name> <value>true</value> </property>
- Add
jobhistory
server address. The following is the RPC configuration parameter: - Add the
jobhistory
web server address: - Configure a location to store logs on HDFS:
- Copy the
yarn-site.xml
file to all nodes in the cluster. - Start history server on the master using the following command:
$ mr-jobhistory-daemon.sh start historyserver
- Restart YARN daemons for changes to take effect, as shown next:
$ stop-yarn.sh $ start-yarn.sh
How it works...
Let's take a look at what we did throughout this recipe. In steps 1 through 7, we enabled YARN log aggregation, which is disabled by default. Then, we configured the RPC and web server ports and also the location where logs will be stored.
Whenever a container is cleaned, a log collection thread wakes up and does an upload of the logs to the configured location. The log location is similar to a web hosting directory, where the history server can publish its contents and is accessible through Web UI. There is a retention period, for how long the logs must be stored by the yarn.log-aggregation.retain-seconds
parameter.