官术网_书友最值得收藏!

Configuring YARN history server

Whenever a MapReduce job runs, it launches containers on multiple nodes and the logs for that container are only written on that particular node. If the user needs details of the job, he needs to go to all the nodes to fetch the logs, which could be very tedious in large clusters.

A better approach will be to aggregate the logs at a common location once the job finishes and then it can be accessed using a web server or other means. To address this, History Server was introduced in Hadoop, to aggregate logs and provide a Web UI, for users to see logs for all the containers of a job at one place.

Getting ready

You need to have a running cluster with YARN set up and should have completed the previous recipe to make sure the cluster is working fine in terms of HDFS and YARN.

The following steps will guide you through the process of setting up Job history server.

How to do it...

  1. Connect to the ResourceManager node, which is the YARN master and switch to user hadoop.
  2. Navigate to the directory /opt/cluster/hadoop/etc/hadoop.
  3. Edit the yarn-site.xml file to add the following configurations, as shown in the upcoming steps and screenshots.
  4. Firstly, enable yarn.log aggregation using the following parameter:
    <property>
        <name>yarn.log-aggregation-enable</name>
        <value>true</value>
    </property>
  5. Add jobhistory server address. The following is the RPC configuration parameter:
    How to do it...
  6. Add the jobhistory web server address:
    How to do it...
  7. Configure a location to store logs on HDFS:
    How to do it...
  8. Copy the yarn-site.xml file to all nodes in the cluster.
  9. Start history server on the master using the following command:
    $ mr-jobhistory-daemon.sh start historyserver
    
  10. Restart YARN daemons for changes to take effect, as shown next:
    $ stop-yarn.sh
    $ start-yarn.sh
    

How it works...

Let's take a look at what we did throughout this recipe. In steps 1 through 7, we enabled YARN log aggregation, which is disabled by default. Then, we configured the RPC and web server ports and also the location where logs will be stored.

Whenever a container is cleaned, a log collection thread wakes up and does an upload of the logs to the configured location. The log location is similar to a web hosting directory, where the history server can publish its contents and is accessible through Web UI. There is a retention period, for how long the logs must be stored by the yarn.log-aggregation.retain-seconds parameter.

There's more...

In the upcoming releases, a new server for maintaining the history logs is used, which is called Timeline server and its job history server might be deprecated in the future.

主站蜘蛛池模板: 东兰县| 布拖县| 伊春市| 迁西县| 玉龙| 广汉市| 民县| 前郭尔| 民勤县| 黄石市| 磴口县| 商河县| 天津市| 东乡县| 凤翔县| 乌拉特中旗| 和田县| 上杭县| 苗栗市| 富裕县| 新野县| 永善县| 庄浪县| 海淀区| 资中县| 元朗区| 南丰县| 海门市| 富裕县| 梅河口市| 满洲里市| 文水县| 杭锦旗| 新闻| 广西| 八宿县| 浪卡子县| 宁陕县| 南岸区| 若尔盖县| 武定县|