官术网_书友最值得收藏!

The Spark shell

The Spark shell is an excellent tool for rapid prototyping with Spark. It works with Scala and Python. It allows you to interact with the Spark cluster and as a result of which, the full API is under your command. It can be great for debugging, just trying things out, or interactively exploring new Datasets or approaches.

The previous chapter should have gotten you to the point of having a Spark instance running; now all you need to do is start your Spark shell and point it at your running instance with the command given in the table we're soon going to check out.

For local mode, Spark will start an instance when you invoke the Spark shell or start a Spark program from an IDE. So, a local installation on a Mac or Linux PC/laptop is sufficient to start exploring the Spark shell. Not having to spin up a real cluster to do the prototyping is an important and useful feature of Spark. The Quick Start guide at http://spark.apache.org/docs/latest/quick-start.html is a good reference.

Assuming that you have installed Spark in the /opt directory and also have a soft link to Spark, run the commands shown in the following table:

Tip

The documentation link http://spark.apache.org/docs/latest/programming-guide.html#using-the-shell has a list of Spark shell options. For example, Bin/spark-shell -master local[2] will start the Spark with two threads.

You will see the shell prompt as shown in the following screenshot:

I have downloaded and compiled Spark in ~/Downloads/spark-2.0.0 and it is running in local mode.

A few points of interest are as follows:

  • The shell has instantiated a connection object (SparkSession) to the Spark instance in the spark variable. This is new to Spark 2.0.0. Earlier versions had SparkContext, sqlContext, and hiveContext. From Version 2.0.0 onward, all these subcontexts are consolidated under SparkSession, but are available from the SparkSession object. We will explore all these concepts in later chapters.
  • The Spark monitor UI can be accessed at port 4040, as shown in the following screenshot:

Exiting out of the shell

When we start any program, the first thing we should know is how to exit. Exiting the shell is easy: use the :quit command and you will be dropped out of the spark-shell command.

Using Spark shell to run the book code

As a convention that makes it easy to navigate directories, let's start the Spark shell from the directory in which you have downloaded the code and data for this book, which means from either the GitHub, https://github.com/xsankar/fdps-v3, or the Packt support site.

Assuming the book code/data is at ~/fdps-v3 and Spark at ~/Downloads/spark-2.0.0, start the Spark shell as follows:

cd ~/fdps-v3
~/Downloads/spark-2.0.0/bin/spark-shell

Tip

If you have used a different directory structure, please adjust accordingly, that is, change the directory to fdps-v3 and start spark-shell from there.

The fdps-v3/code has the code and fdps-v3/data has the data.

主站蜘蛛池模板: 万山特区| 西城区| 湾仔区| 同德县| 汤阴县| 道孚县| 徐州市| 吉安市| 峨眉山市| 秦皇岛市| 彰武县| 改则县| 姚安县| 永寿县| 大新县| 冷水江市| 闽清县| 抚州市| 江源县| 邢台市| 昌图县| 莫力| 油尖旺区| 长沙市| 新闻| 临颍县| 高州市| 承德县| 黄石市| 旬阳县| 东丰县| 上犹县| 岑巩县| 建湖县| 永平县| 科技| 正镶白旗| 焦作市| 鄢陵县| 怀化市| 浦北县|