- Hands-On Deep Learning with Apache Spark
- Guglielmo Iozzia
- 319字
- 2021-07-02 13:34:22
Cluster mode using different managers
The following diagram shows how Spark applications run on a cluster. They are independent sets of processes that are coordinated by the SparkContext object in the Driver Program. SparkContext connects to a Cluster Manager, which is responsible for allocating resources across applications. Once the SparkContext is connected, Spark gets executors across cluster nodes.
Executors are processes that execute computations and store data for a given Spark application. SparkContext sends the application code (which could be a JAR file for Scala or .py files for Python) to the executors. Finally, it sends the tasks to run to the executors:

To isolate applications from each other, every Spark application receives its own executor processes. They stay alive for the duration of the whole application and run tasks in multithreading mode. The downside to this is that it isn't possible to share data across different Spark applications – to share it, data needs to be persisted to an external storage system.
Spark supports different cluster managers, but it is agnostic to the underlying type.
The driver program, at execution time, must be network addressable from the worker nodes because it has to listen for and accept incoming connections from its executors. Because it schedules tasks on the cluster, it should be executed close to the worker nodes, on the same local area network (if possible).
The following are the cluster managers that are currently supported in Spark:
- Standalone: A simple cluster manager that makes it easy to set up a cluster. It is included with Spark.
- Apache Mesos: An open source project that's used to manage computer clusters, and was developed at the University of California, Berkeley.
- Hadoop YARN: The resource manager available in Hadoop starting from release 2.
- Kubernetes: An open source platform for providing a container-centric infrastructure. Kubernetes support in Spark is still experimental, so it's probably not ready for production yet.
- 大數(shù)據(jù)技術(shù)與應(yīng)用基礎(chǔ)
- 自動(dòng)控制工程設(shè)計(jì)入門
- PowerShell 3.0 Advanced Administration Handbook
- 機(jī)器自動(dòng)化控制器原理與應(yīng)用
- 新手學(xué)電腦快速入門
- AutoCAD 2012中文版繪圖設(shè)計(jì)高手速成
- 高維聚類知識(shí)發(fā)現(xiàn)關(guān)鍵技術(shù)研究及應(yīng)用
- 工業(yè)機(jī)器人應(yīng)用案例集錦
- 單片機(jī)技術(shù)一學(xué)就會(huì)
- 智能生產(chǎn)線的重構(gòu)方法
- Photoshop行業(yè)應(yīng)用基礎(chǔ)
- Linux Shell編程從初學(xué)到精通
- INSTANT Puppet 3 Starter
- ASP.NET 2.0 Web開(kāi)發(fā)入門指南
- 中老年人學(xué)電腦與上網(wǎng)