官术网_书友最值得收藏!

The Spark shell

The Spark shell is an excellent tool for rapid prototyping with Spark. It works with Scala and Python. It allows you to interact with the Spark cluster and as a result of which, the full API is under your command. It can be great for debugging, just trying things out, or interactively exploring new Datasets or approaches.

The previous chapter should have gotten you to the point of having a Spark instance running; now all you need to do is start your Spark shell and point it at your running instance with the command given in the table we're soon going to check out.

For local mode, Spark will start an instance when you invoke the Spark shell or start a Spark program from an IDE. So, a local installation on a Mac or Linux PC/laptop is sufficient to start exploring the Spark shell. Not having to spin up a real cluster to do the prototyping is an important and useful feature of Spark. The Quick Start guide at http://spark.apache.org/docs/latest/quick-start.html is a good reference.

Assuming that you have installed Spark in the /opt directory and also have a soft link to Spark, run the commands shown in the following table:

Tip

The documentation link http://spark.apache.org/docs/latest/programming-guide.html#using-the-shell has a list of Spark shell options. For example, Bin/spark-shell -master local[2] will start the Spark with two threads.

You will see the shell prompt as shown in the following screenshot:

I have downloaded and compiled Spark in ~/Downloads/spark-2.0.0 and it is running in local mode.

A few points of interest are as follows:

  • The shell has instantiated a connection object (SparkSession) to the Spark instance in the spark variable. This is new to Spark 2.0.0. Earlier versions had SparkContext, sqlContext, and hiveContext. From Version 2.0.0 onward, all these subcontexts are consolidated under SparkSession, but are available from the SparkSession object. We will explore all these concepts in later chapters.
  • The Spark monitor UI can be accessed at port 4040, as shown in the following screenshot:

Exiting out of the shell

When we start any program, the first thing we should know is how to exit. Exiting the shell is easy: use the :quit command and you will be dropped out of the spark-shell command.

Using Spark shell to run the book code

As a convention that makes it easy to navigate directories, let's start the Spark shell from the directory in which you have downloaded the code and data for this book, which means from either the GitHub, https://github.com/xsankar/fdps-v3, or the Packt support site.

Assuming the book code/data is at ~/fdps-v3 and Spark at ~/Downloads/spark-2.0.0, start the Spark shell as follows:

cd ~/fdps-v3
~/Downloads/spark-2.0.0/bin/spark-shell

Tip

If you have used a different directory structure, please adjust accordingly, that is, change the directory to fdps-v3 and start spark-shell from there.

The fdps-v3/code has the code and fdps-v3/data has the data.

主站蜘蛛池模板: 新河县| 永春县| 桦南县| 莆田市| 环江| 嵩明县| 苍溪县| 赤水市| 剑河县| 美姑县| 嘉祥县| 绥中县| 扶风县| 遵义市| 阿图什市| 花莲市| 栖霞市| 白河县| 凌海市| 上蔡县| 霍城县| 淮滨县| 应用必备| 新乡县| 盘山县| 日照市| 瓮安县| 都匀市| 砚山县| 苏州市| 湖北省| 松滋市| 南岸区| 恩施市| 从化市| 龙游县| 东平县| 固安县| 阜康市| 泸州市| 固镇县|