官术网_书友最值得收藏!

YARN cluster mode

YARN (http://hadoop.apache.org/docs/stable/hadoop-yarn/hadoop-yarn-site/YARN.html), which was introduced in Apache Hadoop 2.0, brought significant improvements in terms of scalability, high availability, and support for different paradigms. In the Hadoop version 1 MapReduce framework, job execution was controlled by types of processes—a single master process called JobTracker coordinates all the jobs running on the cluster and assigns map and reduce tasks to run on the TaskTrackers, which are a number of subordinate processes running assigned tasks and periodically reporting the progress to the JobTracker. Having a single JobTracker was a scalability bottleneck. The maximum cluster size was a little more than 4,000 nodes, with the number of concurrent tasks limited to 40,000. Furthermore, the JobTracker was a single point of failure and the only available programming model was MapReduce

The fundamental idea of YARN is to split up the functionalities of resource management and job scheduling or monitoring into separate daemons. The idea is to have a global ResourceManager and per-application ApplicationMaster (App Mstr). An application is either a single job or a DAG of jobs. The following is a diagram of YARN's architecture:

Figure 1.15

The ResourceManager and the NodeManager form the YARN framework. The ResourceManager decides on resource usage across all the running applications, while the NodeManager is an agent running on any machine in the cluster and is responsible for the containers by monitoring their resource usage (including CPU and memory) and reporting to the ResourceManager. The ResourceManager consists of two components – the scheduler and the ApplicationsManager. The scheduler is the component that's responsible for allocating resources to the various applications running, and it doesn't perform any monitoring of applications' statuses, nor offer guarantees about restarting any failed tasks. It performs scheduling based on an application's resource requirements.

The ApplicationsManager accepts job submissions and provides a service to restart the App Mstr container on any failure. The per-application App Mstr is responsible for negotiating the appropriate resource containers from the scheduler and monitoring their status and progress. YARN, by its nature, is a general scheduler, so support for non-MapReduce jobs (such as Spark jobs) is available for Hadoop clusters.

主站蜘蛛池模板: 巫山县| 越西县| 寻乌县| 桂林市| 井冈山市| 云和县| 广灵县| 民乐县| 渝中区| 康马县| 上高县| 宝山区| 合水县| 满洲里市| 许昌县| 延边| 宁城县| 洛宁县| 昌宁县| 乌鲁木齐县| 县级市| 炉霍县| 德钦县| 本溪| 财经| 普洱| 景泰县| 平罗县| 牟定县| 娄底市| 盘锦市| 苏尼特左旗| 巴里| 芜湖市| 东台市| 崇明县| 湟源县| 色达县| 黑河市| 台南县| 兴文县|