官术网_书友最值得收藏!

SparkContext and SparkConf

The starting point of writing any Spark program is SparkContext (or JavaSparkContext in Java). SparkContext is initialized with an instance of a SparkConf object, which contains various Spark cluster-configuration settings (for example, the URL of the master node).

It is a main entry point for Spark functionality. A SparkContext is a connection to a Spark cluster. It can be used to create RDDs, accumulators, and broadcast variables on the cluster.

Only one SparkContext is active per JVM. You must call stop(), which is the active SparkContext, before creating a new one.

Once initialized, we will use the various methods found in the SparkContext object to create and manipulate distributed datasets and shared variables. The Spark shell (in both Scala and Python, which is unfortunately not supported in Java) takes care of this context initialization for us, but the following lines of code show an example of creating a context running in the local mode in Scala:

val conf = new SparkConf() 
.setAppName("Test Spark App")
.setMaster("local[4]")
val sc = new SparkContext(conf)

This creates a context running in the local mode with four threads, with the name of the application set to Test Spark App. If we wish to use the default configuration values, we could also call the following simple constructor for our SparkContext object, which works in the exact same way:

val sc = new SparkContext("local[4]", "Test Spark App")
Downloading the example code
You can download the example code files for all Packt books you have purchased from your account at http://www.packtpub.com. If you purchased this book from any other source, you can visit http://www.packtpub.com/support and register to have the files e-mailed directly to you.
主站蜘蛛池模板: 莱州市| 泰和县| 漳平市| 靖江市| 民勤县| 南涧| 通州区| 乌兰浩特市| 皮山县| 兰州市| 磐石市| 沁源县| 南开区| 水富县| 阳谷县| 天镇县| 义马市| 黔南| 凤翔县| 中牟县| 枣强县| 牙克石市| 荣昌县| 井陉县| 留坝县| 汶上县| 泽库县| 井研县| 浏阳市| 三明市| 万全县| 大城县| 措美县| 葫芦岛市| 青阳县| 阜新| 伊吾县| 宾阳县| 南部县| 靖安县| 东至县|