- Fast Data Processing with Spark 2(Third Edition)
- Krishna Sankar
- 477字
- 2021-08-20 10:27:10
The Spark shell
The Spark shell is an excellent tool for rapid prototyping with Spark. It works with Scala and Python. It allows you to interact with the Spark cluster and as a result of which, the full API is under your command. It can be great for debugging, just trying things out, or interactively exploring new Datasets or approaches.
The previous chapter should have gotten you to the point of having a Spark instance running; now all you need to do is start your Spark shell and point it at your running instance with the command given in the table we're soon going to check out.
For local mode, Spark will start an instance when you invoke the Spark shell or start a Spark program from an IDE. So, a local installation on a Mac or Linux PC/laptop is sufficient to start exploring the Spark shell. Not having to spin up a real cluster to do the prototyping is an important and useful feature of Spark. The Quick Start guide at http://spark.apache.org/docs/latest/quick-start.html is a good reference.
Assuming that you have installed Spark in the /opt
directory and also have a soft link to Spark, run the commands shown in the following table:

Tip
The documentation link http://spark.apache.org/docs/latest/programming-guide.html#using-the-shell has a list of Spark shell options. For example, Bin/spark-shell -master local[2]
will start the Spark with two threads.
You will see the shell prompt as shown in the following screenshot:

I have downloaded and compiled Spark in ~/Downloads/spark-2.0.0
and it is running in local mode.
A few points of interest are as follows:
- The shell has instantiated a connection object (
SparkSession
) to the Spark instance in thespark
variable. This is new to Spark 2.0.0. Earlier versions hadSparkContext
,sqlContext
, andhiveContext
. From Version 2.0.0 onward, all these subcontexts are consolidated underSparkSession
, but are available from theSparkSession
object. We will explore all these concepts in later chapters. - The Spark monitor UI can be accessed at port 4040, as shown in the following screenshot:

Exiting out of the shell
When we start any program, the first thing we should know is how to exit. Exiting the shell is easy: use the :quit
command and you will be dropped out of the spark-shell
command.
Using Spark shell to run the book code
As a convention that makes it easy to navigate directories, let's start the Spark shell from the directory in which you have downloaded the code and data for this book, which means from either the GitHub, https://github.com/xsankar/fdps-v3, or the Packt support site.
Assuming the book code/data is at ~/fdps-v3
and Spark at ~/Downloads/spark-2.0.0
, start the Spark shell as follows:
cd ~/fdps-v3 ~/Downloads/spark-2.0.0/bin/spark-shell
Tip
If you have used a different directory structure, please adjust accordingly, that is, change the directory to fdps-v3
and start spark-shell
from there.
The fdps-v3/code
has the code and fdps-v3/data
has the data.
- Learning ROS for Robotics Programming(Second Edition)
- ASP.NET MVC4框架揭秘
- Responsive Web Design with HTML5 and CSS3
- 算法訓練營:提高篇(全彩版)
- Python算法詳解
- 深度探索Go語言:對象模型與runtime的原理特性及應用
- Python Machine Learning Blueprints:Intuitive data projects you can relate to
- 從0到1:HTML5 Canvas動畫開發
- 單片機原理及應用技術
- Practical Predictive Analytics
- Mapping with ArcGIS Pro
- C++游戲設計案例教程
- Python 3.8編程快速入門
- 嵌入式網絡編程
- C++設計模式