官术网_书友最值得收藏!

Submitting Spark applications on YARN

To launch Spark applications on YARN, the HADOOP_CONF_DIR or YARN_CONF_DIR env variable needs to be set and pointing to the directory that contains the client-side configuration files for the Hadoop cluster. These configurations are needed to connect to the YARN ResourceManager and to write to HDFS. This configuration is distributed to the YARN cluster so that all the containers used by the Spark application have the same configuration. To launch Spark applications on YARN, two deployment modes are available:

  •  Cluster mode: In this case, the Spark driver runs inside an application master process that's managed by YARN on the cluster. The client can finish its execution after initiating the application.
  • Client mode: In this case, the driver runs and the client runs in the same process. The application master is used for the sole purpose of requesting resources from YARN.

Unlike the other modes, in which the master's address is specified in the master parameter, in YARN mode, the ResourceManager's address is retrieved from the Hadoop configuration. Therefore, the master parameter value is always yarn.

You can use the following command to launch a Spark application in cluster mode:

$SPARK_HOME/bin/spark-submit --class path.to.your.Class --master yarn --deploy-mode cluster [options] <app jar> [app options]

In cluster mode, since the driver runs on a different machine than the client, the SparkContext.addJar method doesn't work with the files that are local to the client. The only choice is to include them using the jars option in the launch command.

Launching a Spark application in client mode happens the same way—the deploy-mode option value needs to change from cluster to client.

主站蜘蛛池模板: 乌拉特中旗| 铜陵市| 仙桃市| 嘉荫县| 营口市| 汉沽区| 二手房| 卢湾区| 嵊州市| 兴安盟| 临夏市| 南宁市| 资兴市| 营口市| 正宁县| 左云县| 加查县| 鹰潭市| 离岛区| 项城市| 康保县| 石台县| 兴安县| 江安县| 丰台区| 石狮市| 南丹县| 佛冈县| 南开区| 内乡县| 丹寨县| 桃园市| 西乌| 房产| 阳曲县| 芦溪县| 遵义市| 亚东县| 邛崃市| 南宁市| 阳城县|