- Apache Spark 2.x for Java Developers
- Sourav Gulati Sumit Kumar
- 475字
- 2021-07-02 19:01:53
Processing the flow of application submission in YARN
The following steps follow the flow of application submission in YARN:
- Using a client or APIs, the user submits the application; let's say a Spark job jar. ResourceManager, whose primary task is to gather and report all the applications running on the entire Hadoop cluster and available resources on respective Hadoop nodes, depending on the privileges of the user submitting the job, accepts the newly submitted task.
- After this RM delegates the task to a scheduler, the scheduler then searches for a container which can host the application-specific Application Master. While the scheduler does take into consideration parameters such as availability of resources, task priority, data locality, and so on, before scheduling or launching an Application Master, it has no role in monitoring or restarting a failed job. It is the responsibility of RM to keep track of an AM and restart it in a new container if it fails.
- Once the ApplicationMaster gets launched it becomes the prerogative of the AM to oversee the resources negotiation with RM for launching task-specific containers. Negotiations with RM are typically over:
- The priority of the tasks at hand.
- The number of containers to be launched to complete the tasks.
- The resources needed to execute the tasks, such as RAM and CPU (since Hadoop 3.x).
- The available nodes where job containers can be launched with the required resources.
Depending on the priority and availability of resources the RM grants containers represented by the container ID and hostname of the node on which it can be launched.
- The AM then requests the NM of the respective hosts to launch the containers with specific IDs and resource configuration. The NM then launches the containers but keeps a watch on the resources usage of the task. If, for example, the container starts utilizing more resources than it has been provisioned then that container is killed by the NM. This greatly improves the job isolation and fair sharing of resources guarantee that YARN provides as, otherwise, it would have impacted the execution of other containers. However, it is important to note that the job status and application status as a whole are managed by the AM. It falls in the domain of the AM to continuously monitor any delay or dead containers, simultaneously negotiating with RM to launch new containers to reassign the task of dead containers.
- The containers executing on different nodes send application-specific statistics to the AM at specific intervals.
- The AM also reports the status of the application directly to the client that submitted the specific application, in our case a Spark job.
- The NM monitors the resources being utilized by all the containers on the respective nodes and keeps sending a periodic update to RM.
- The AM sends periodic statistics such application status, task failure, and log information to RM.
推薦閱讀
- Mastering NetBeans
- Spring 5企業級開發實戰
- Docker技術入門與實戰(第3版)
- ASP.NET Core Essentials
- Visual Basic程序設計教程
- Java 9模塊化開發:核心原則與實踐
- Mastering JBoss Enterprise Application Platform 7
- WebRTC技術詳解:從0到1構建多人視頻會議系統
- Java網絡編程核心技術詳解(視頻微課版)
- C++程序設計教程(第2版)
- Java7程序設計入門經典
- Photoshop智能手機APP界面設計
- SQL Server 2012 數據庫應用教程(第3版)
- Java核心編程
- Learning Alfresco Web Scripts