- Fast Data Processing with Spark 2(Third Edition)
- Krishna Sankar
- 477字
- 2021-08-20 10:27:10
The Spark shell
The Spark shell is an excellent tool for rapid prototyping with Spark. It works with Scala and Python. It allows you to interact with the Spark cluster and as a result of which, the full API is under your command. It can be great for debugging, just trying things out, or interactively exploring new Datasets or approaches.
The previous chapter should have gotten you to the point of having a Spark instance running; now all you need to do is start your Spark shell and point it at your running instance with the command given in the table we're soon going to check out.
For local mode, Spark will start an instance when you invoke the Spark shell or start a Spark program from an IDE. So, a local installation on a Mac or Linux PC/laptop is sufficient to start exploring the Spark shell. Not having to spin up a real cluster to do the prototyping is an important and useful feature of Spark. The Quick Start guide at http://spark.apache.org/docs/latest/quick-start.html is a good reference.
Assuming that you have installed Spark in the /opt
directory and also have a soft link to Spark, run the commands shown in the following table:

Tip
The documentation link http://spark.apache.org/docs/latest/programming-guide.html#using-the-shell has a list of Spark shell options. For example, Bin/spark-shell -master local[2]
will start the Spark with two threads.
You will see the shell prompt as shown in the following screenshot:

I have downloaded and compiled Spark in ~/Downloads/spark-2.0.0
and it is running in local mode.
A few points of interest are as follows:
- The shell has instantiated a connection object (
SparkSession
) to the Spark instance in thespark
variable. This is new to Spark 2.0.0. Earlier versions hadSparkContext
,sqlContext
, andhiveContext
. From Version 2.0.0 onward, all these subcontexts are consolidated underSparkSession
, but are available from theSparkSession
object. We will explore all these concepts in later chapters. - The Spark monitor UI can be accessed at port 4040, as shown in the following screenshot:

Exiting out of the shell
When we start any program, the first thing we should know is how to exit. Exiting the shell is easy: use the :quit
command and you will be dropped out of the spark-shell
command.
Using Spark shell to run the book code
As a convention that makes it easy to navigate directories, let's start the Spark shell from the directory in which you have downloaded the code and data for this book, which means from either the GitHub, https://github.com/xsankar/fdps-v3, or the Packt support site.
Assuming the book code/data is at ~/fdps-v3
and Spark at ~/Downloads/spark-2.0.0
, start the Spark shell as follows:
cd ~/fdps-v3 ~/Downloads/spark-2.0.0/bin/spark-shell
Tip
If you have used a different directory structure, please adjust accordingly, that is, change the directory to fdps-v3
and start spark-shell
from there.
The fdps-v3/code
has the code and fdps-v3/data
has the data.
- SPSS數據挖掘與案例分析應用實踐
- 現代C++編程:從入門到實踐
- 一步一步學Spring Boot 2:微服務項目實戰
- Progressive Web Apps with React
- Learning Chef
- CentOS 7 Linux Server Cookbook(Second Edition)
- Visual Basic程序設計習題解答與上機指導
- Apache Karaf Cookbook
- Mastering Apache Spark 2.x(Second Edition)
- Python Data Analysis Cookbook
- 大學計算機基礎實驗指導
- Xcode 6 Essentials
- QPanda量子計算編程
- App Inventor少兒趣味編程動手做
- 算法設計與分析:基于C++編程語言的描述