手机游戏试玩平台赚钱网站

書名： Fast Data Processing with Spark 2（Third Edition）
作者名： Krishna Sankar
本章字數： 477字
更新時間： 2021-08-20 10:27:10

The Spark shell

The Spark shell is an excellent tool for rapid prototyping with Spark. It works with Scala and Python. It allows you to interact with the Spark cluster and as a result of which, the full API is under your command. It can be great for debugging, just trying things out, or interactively exploring new Datasets or approaches.

The previous chapter should have gotten you to the point of having a Spark instance running; now all you need to do is start your Spark shell and point it at your running instance with the command given in the table we're soon going to check out.

For local mode, Spark will start an instance when you invoke the Spark shell or start a Spark program from an IDE. So, a local installation on a Mac or Linux PC/laptop is sufficient to start exploring the Spark shell. Not having to spin up a real cluster to do the prototyping is an important and useful feature of Spark. The Quick Start guide at http://spark.apache.org/docs/latest/quick-start.html is a good reference.

Assuming that you have installed Spark in the /opt directory and also have a soft link to Spark, run the commands shown in the following table:

Tip

The documentation link http://spark.apache.org/docs/latest/programming-guide.html#using-the-shell has a list of Spark shell options. For example, Bin/spark-shell -master local[2] will start the Spark with two threads.

You will see the shell prompt as shown in the following screenshot:

I have downloaded and compiled Spark in ~/Downloads/spark-2.0.0 and it is running in local mode.

A few points of interest are as follows:

The shell has instantiated a connection object (SparkSession) to the Spark instance in the spark variable. This is new to Spark 2.0.0. Earlier versions had SparkContext, sqlContext, and hiveContext. From Version 2.0.0 onward, all these subcontexts are consolidated under SparkSession, but are available from the SparkSession object. We will explore all these concepts in later chapters.
The Spark monitor UI can be accessed at port 4040, as shown in the following screenshot:

Exiting out of the shell

When we start any program, the first thing we should know is how to exit. Exiting the shell is easy: use the :quit command and you will be dropped out of the spark-shell command.

Using Spark shell to run the book code

As a convention that makes it easy to navigate directories, let's start the Spark shell from the directory in which you have downloaded the code and data for this book, which means from either the GitHub, https://github.com/xsankar/fdps-v3, or the Packt support site.

Assuming the book code/data is at ~/fdps-v3 and Spark at ~/Downloads/spark-2.0.0, start the Spark shell as follows:

cd ~/fdps-v3
~/Downloads/spark-2.0.0/bin/spark-shell

Tip

If you have used a different directory structure, please adjust accordingly, that is, change the directory to fdps-v3 and start spark-shell from there.

The fdps-v3/code has the code and fdps-v3/data has the data.

官术网_书友最值得收藏!

Fast Data Processing with Spark 2（Third Edition）

The Spark shell

Tip

Exiting out of the shell

Using Spark shell to run the book code

Tip