書名： Apache Spark Quick Start Guide
作者名： Shrey Mehrotra Akash Grade
本章字數： 198字
更新時間： 2021-07-02 13:39:54

Spark architecture overview

Spark follows a master-slave architecture, as it allows it to scale on demand. Spark's architecture has two main components:

Driver Program: A driver program is where a user writes Spark code using either Scala, Java, Python, or R APIs. It is responsible for launching various parallel operations of the cluster.
Executor: Executor is the Java Virtual Machine (JVM) that runs on a worker node of the cluster. Executor provides hardware resources for running the tasks launched by the driver program.

As soon as a Spark job is submitted, the driver program launches various operation on each executor. Driver and executors together make an application.

The following diagram demonstrates the relationships between Driver, Workers, and Executors. As the first step, a driver process parses the user code (Spark Program) and creates multiple executors on each worker node. The driver process not only forks the executors on work machines, but also sends tasks to these executors to run the entire application in parallel.

Once the computation is completed, the output is either sent to the driver program or saved on to the file system:

Driver, Workers, and Executors

官术网_书友最值得收藏!

Apache Spark Quick Start Guide

Spark architecture overview