官术网_书友最值得收藏!

Submitting Spark applications on YARN

To launch Spark applications on YARN, the HADOOP_CONF_DIR or YARN_CONF_DIR env variable needs to be set and pointing to the directory that contains the client-side configuration files for the Hadoop cluster. These configurations are needed to connect to the YARN ResourceManager and to write to HDFS. This configuration is distributed to the YARN cluster so that all the containers used by the Spark application have the same configuration. To launch Spark applications on YARN, two deployment modes are available:

  •  Cluster mode: In this case, the Spark driver runs inside an application master process that's managed by YARN on the cluster. The client can finish its execution after initiating the application.
  • Client mode: In this case, the driver runs and the client runs in the same process. The application master is used for the sole purpose of requesting resources from YARN.

Unlike the other modes, in which the master's address is specified in the master parameter, in YARN mode, the ResourceManager's address is retrieved from the Hadoop configuration. Therefore, the master parameter value is always yarn.

You can use the following command to launch a Spark application in cluster mode:

$SPARK_HOME/bin/spark-submit --class path.to.your.Class --master yarn --deploy-mode cluster [options] <app jar> [app options]

In cluster mode, since the driver runs on a different machine than the client, the SparkContext.addJar method doesn't work with the files that are local to the client. The only choice is to include them using the jars option in the launch command.

Launching a Spark application in client mode happens the same way—the deploy-mode option value needs to change from cluster to client.

主站蜘蛛池模板: 宜阳县| 普安县| 南皮县| 文昌市| 西昌市| 益阳市| 永新县| 白山市| 芮城县| 黄平县| 霍州市| 蓬溪县| 二连浩特市| 东台市| 循化| 婺源县| 江华| 阿克陶县| 福州市| 洛南县| 明水县| 吴川市| 普兰县| 吉木乃县| 中西区| 保定市| 桐乡市| 厦门市| 怀宁县| 郯城县| 惠州市| 昌邑市| 聂荣县| 舟山市| 青田县| 乌拉特后旗| 克山县| 修武县| 永泰县| 岳普湖县| 新竹县|