官术网_书友最值得收藏!

Submitting Spark applications on YARN

To launch Spark applications on YARN, the HADOOP_CONF_DIR or YARN_CONF_DIR env variable needs to be set and pointing to the directory that contains the client-side configuration files for the Hadoop cluster. These configurations are needed to connect to the YARN ResourceManager and to write to HDFS. This configuration is distributed to the YARN cluster so that all the containers used by the Spark application have the same configuration. To launch Spark applications on YARN, two deployment modes are available:

  •  Cluster mode: In this case, the Spark driver runs inside an application master process that's managed by YARN on the cluster. The client can finish its execution after initiating the application.
  • Client mode: In this case, the driver runs and the client runs in the same process. The application master is used for the sole purpose of requesting resources from YARN.

Unlike the other modes, in which the master's address is specified in the master parameter, in YARN mode, the ResourceManager's address is retrieved from the Hadoop configuration. Therefore, the master parameter value is always yarn.

You can use the following command to launch a Spark application in cluster mode:

$SPARK_HOME/bin/spark-submit --class path.to.your.Class --master yarn --deploy-mode cluster [options] <app jar> [app options]

In cluster mode, since the driver runs on a different machine than the client, the SparkContext.addJar method doesn't work with the files that are local to the client. The only choice is to include them using the jars option in the launch command.

Launching a Spark application in client mode happens the same way—the deploy-mode option value needs to change from cluster to client.

主站蜘蛛池模板: 舟曲县| 出国| 甘肃省| 和静县| 肇东市| 谷城县| 嘉黎县| 大埔区| 沁阳市| 沂水县| 高州市| 凤庆县| 咸阳市| 日喀则市| 天津市| 梅河口市| 辰溪县| 武威市| 措勤县| 延津县| 确山县| 龙海市| 赤峰市| 那曲县| 栖霞市| 固阳县| 西充县| 临安市| 五家渠市| 洮南市| 西宁市| 桐城市| 洪雅县| 通化县| 新兴县| 光泽县| 隆德县| 广丰县| 临夏市| 嵊州市| 扶沟县|