- Spark Cookbook
- Rishi Yadav
- 428字
- 2021-07-16 13:43:59
Deploying on a cluster with YARN
Yet another resource negotiator (YARN) is Hadoop's compute framework that runs on top of HDFS, which is Hadoop's storage layer.
YARN follows the master slave architecture. The master daemon is called ResourceManager
and the slave daemon is called NodeManager
. Besides this application, life cycle management is done by ApplicationMaster
, which can be spawned on any slave node and is alive for the lifetime of an application.
When Spark is run on YARN, ResourceManager
performs the role of Spark master and NodeManagers
work as executor nodes.
While running Spark with YARN, each Spark executor is run as YARN container.
Getting ready
Running Spark on YARN requires a binary distribution of Spark that has YARN support. In both Spark installation recipes, we have taken care of it.
How to do it...
- To run Spark on YARN, the first step is to set the configuration:
HADOOP_CONF_DIR: to write to HDFS YARN_CONF_DIR: to connect to YARN ResourceManager $ cd /opt/infoobjects/spark/conf (or /etc/spark) $ sudo vi spark-env.sh export HADOOP_CONF_DIR=/opt/infoobjects/hadoop/etc/Hadoop export YARN_CONF_DIR=/opt/infoobjects/hadoop/etc/hadoop
You can see this in the following screenshot:
- The following command launches YARN Spark in the
yarn-client
mode:$ spark-submit --class path.to.your.Class --master yarn-client [options] <app jar> [app options]
Here's an example:
$ spark-submit --class com.infoobjects.TwitterFireHose --master yarn-client --num-executors 3 --driver-memory 4g --executor-memory 2g --executor-cores 1 target/sparkio.jar 10
- The following command launches Spark shell in the
yarn-client
mode:$ spark-shell --master yarn-client
- The command to launch in the
yarn-cluster
mode is as follows:$ spark-submit --class path.to.your.Class --master yarn-cluster [options] <app jar> [app options]
Here's an example:
$ spark-submit --class com.infoobjects.TwitterFireHose --master yarn-cluster --num-executors 3 --driver-memory 4g --executor-memory 2g --executor-cores 1 targe t/sparkio.jar 10
How it works…
Spark applications on YARN run in two modes:
yarn-client
: Spark Driver runs in the client process outside of YARN cluster, andApplicationMaster
is only used to negotiate resources from ResourceManageryarn-cluster
: Spark Driver runs inApplicationMaster
spawned byNodeManager
on a slave node
The yarn-cluster
mode is recommended for production deployments, while the yarn-client
mode is good for development and debugging when you would like to see immediate output. There is no need to specify Spark master in either mode as it's picked from the Hadoop configuration, and the master parameter is either yarn-client
or yarn-cluster
.
The following figure shows how Spark is run with YARN in the client mode:

The following figure shows how Spark is run with YARN in the cluster mode:

In the YARN mode, the following configuration parameters can be set:
--num-executors
: Configure how many executors will be allocated--executor-memory
: RAM per executor--executor-cores
: CPU cores per executor
- Advanced Machine Learning with Python
- HTML5移動Web開發技術
- 移動UI設計(微課版)
- 單片機C語言程序設計實訓100例:基于STC8051+Proteus仿真與實戰
- Python Data Analysis(Second Edition)
- PySide GUI Application Development(Second Edition)
- 名師講壇:Spring實戰開發(Redis+SpringDataJPA+SpringMVC+SpringSecurity)
- 自然語言處理Python進階
- Windows Phone 7.5:Building Location-aware Applications
- C語言程序設計
- 低代碼平臺開發實踐:基于React
- 從程序員角度學習數據庫技術(藍橋杯軟件大賽培訓教材-Java方向)
- Qt 4開發實踐
- Scratch從入門到精通
- Python編程快速上手2