官术网_书友最值得收藏!

Cluster mode using different managers

The following diagram shows how Spark applications run on a cluster. They are independent sets of processes that are coordinated by the SparkContext object in the Driver Program. SparkContext connects to a Cluster Manager, which is responsible for allocating resources across applications. Once the SparkContext is connected, Spark gets executors across cluster nodes.

Executors are processes that execute computations and store data for a given Spark application. SparkContext sends the application code (which could be a JAR file for Scala or .py files for Python) to the executors. Finally, it sends the tasks to run to the executors:

Figure 1.12

To isolate applications from each other, every Spark application receives its own executor processes. They stay alive for the duration of the whole application and run tasks in multithreading mode. The downside to this is that it isn't possible to share data across different Spark applications  to share it, data needs to be persisted to an external storage system.

Spark supports different cluster managers, but it is agnostic to the underlying type.

The driver program, at execution time, must be network addressable from the worker nodes because it has to listen for and accept incoming connections from its executors. Because it schedules tasks on the cluster, it should be executed close to the worker nodes, on the same local area network (if possible).

The following are the cluster managers that are currently supported in Spark:

  • Standalone: A simple cluster manager that makes it easy to set up a cluster. It is included with Spark.
  • Apache Mesos: An open source project that's used to manage computer clusters, and was developed at the University of California, Berkeley.
  • Hadoop YARN: The resource manager available in Hadoop starting from release 2.
  • Kubernetes: An open source platform for providing a container-centric infrastructure. Kubernetes support in Spark is still experimental, so it's probably not ready for production yet.
主站蜘蛛池模板: 板桥市| 阳新县| 凤城市| 永靖县| 吕梁市| 长阳| 韩城市| 密云县| 元朗区| 隆昌县| 宜黄县| 启东市| 乾安县| 富源县| 怀远县| 南投县| 岳阳市| 石景山区| 禹城市| 元氏县| 桃源县| 广元市| 卓尼县| 内江市| 皮山县| 长泰县| 绥阳县| 南投市| 玛纳斯县| 雷州市| 泸西县| 沙湾县| 凉城县| 牙克石市| 晋城| 南靖县| 太仓市| 布尔津县| 水富县| 江达县| 遵义县|