官术网_书友最值得收藏!

SparkContext and SparkConf

The starting point of writing any Spark program is SparkContext (or JavaSparkContext in Java). SparkContext is initialized with an instance of a SparkConf object, which contains various Spark cluster-configuration settings (for example, the URL of the master node).

It is a main entry point for Spark functionality. A SparkContext is a connection to a Spark cluster. It can be used to create RDDs, accumulators, and broadcast variables on the cluster.

Only one SparkContext is active per JVM. You must call stop(), which is the active SparkContext, before creating a new one.

Once initialized, we will use the various methods found in the SparkContext object to create and manipulate distributed datasets and shared variables. The Spark shell (in both Scala and Python, which is unfortunately not supported in Java) takes care of this context initialization for us, but the following lines of code show an example of creating a context running in the local mode in Scala:

val conf = new SparkConf() 
.setAppName("Test Spark App")
.setMaster("local[4]")
val sc = new SparkContext(conf)

This creates a context running in the local mode with four threads, with the name of the application set to Test Spark App. If we wish to use the default configuration values, we could also call the following simple constructor for our SparkContext object, which works in the exact same way:

val sc = new SparkContext("local[4]", "Test Spark App")
Downloading the example code
You can download the example code files for all Packt books you have purchased from your account at http://www.packtpub.com. If you purchased this book from any other source, you can visit http://www.packtpub.com/support and register to have the files e-mailed directly to you.
主站蜘蛛池模板: 舟山市| 鄂托克旗| 侯马市| 栾川县| 阳春市| 青川县| 清水河县| 兴义市| 大悟县| 新竹市| 沙河市| 成都市| 元氏县| 临海市| 连平县| 灵台县| 普陀区| 鹤山市| 新竹市| 百色市| 宁河县| 沁水县| 三明市| 灵川县| 息烽县| 台北县| 甘肃省| 安徽省| 河东区| 安图县| 抚顺市| 迁西县| 通海县| 镇远县| 收藏| 外汇| 白水县| 安龙县| 高密市| 建瓯市| 津南区|