官术网_书友最值得收藏!

Spark architecture overview

Spark follows a master-slave architecture, as it allows it to scale on demand. Spark's architecture has two main components:

  • Driver Program: A driver program is where a user writes Spark code using either Scala, Java, Python, or R APIs. It is responsible for launching various parallel operations of the cluster.
  • Executor: Executor is the Java Virtual Machine (JVM) that runs on a worker node of the cluster. Executor provides hardware resources for running the tasks launched by the driver program.

As soon as a Spark job is submitted, the driver program launches various operation on each executor. Driver and executors together make an application.

The following diagram demonstrates the relationships between Driver, Workers, and Executors. As the first step, a driver process parses the user code (Spark Program) and creates multiple executors on each worker node. The driver process not only forks the executors on work machines, but also sends tasks to these executors to run the entire application in parallel.

Once the computation is completed, the output is either sent to the driver program or saved on to the file system:

Driver, Workers, and Executors
主站蜘蛛池模板: 凤翔县| 怀远县| 紫阳县| 北辰区| 应用必备| 邢台市| 元朗区| 永胜县| 平昌县| 舒城县| 南宫市| 三台县| 勃利县| 达孜县| 商都县| 肥城市| 珲春市| 略阳县| 五华县| 华池县| 牙克石市| 武宁县| 华蓥市| 合水县| 墨江| 闽清县| 根河市| 东乌珠穆沁旗| 道孚县| 赫章县| 马鞍山市| 芦溪县| 吴江市| 黑山县| 丘北县| 石棉县| 永嘉县| 灵石县| 类乌齐县| 慈利县| 双鸭山市|