官术网_书友最值得收藏!

Getting started with Spark

In this section, we will run Apache Spark in local mode or standalone mode. First we will set up Scala, which is the prerequisite for Apache Spark. After the Scala setup, we will set up and run Apache Spark. We will also perform some basic operations on it. So let's start.

Since Apache Spark is written in Scala, it needs Scala to be set up on the system. You can download Scala from http://www.scala-lang.org/download/ (we will set up Scala 2.11.8 in the following examples).

Once Scala is downloaded, we can set it up on a Linux system as follows:

Also, it is recommended to set the SCALA_HOME environment variable and add Scala binaries to the PATH variable. You can set it in the .bashrc file or /etc/environment file as follows:

export SCALA_HOME=/usr/local/scala-2.11.8
export PATH=$PATH:/usr/local/scala-2.11.8/bin

It is also shown in the following screenshot:

Now, we have set up a Scala environment successfully. So, it is time to download Apache Spark. You can download it from http://spark.apache.org/downloads.html.

The Spark version can be different, as per requirements.

After Apache Spark is downloaded, run the following commands to set it up:

tar -zxf spark-2.0.0-bin-hadoop2.7.tgz
sudo mv spark-2.0.0-bin-hadoop2.7 /usr/local/spark
Directory location can be different as per user's requirement.

Also, you can set environment variable SPARK_HOME. It is not mandatory; however, it helps the user to find the installation directory of Spark. Also, you can add the path of Spark binaries in the $PATH variable for accessing them without specifying their path:

export SPARK_HOME=/usr/local/spark
export PATH=$PATH:/usr/local/scala-2.11.8/bin:$SPARK_HOME/bin

It is shown in the following screenshot:

Now, we are ready to start Spark in standalone mode. Let's run the following command to start it:

$SPARK_HOME/bin/spark-shell

Also, we can simply execute the spark-shellcommand as Spark binaries are added to the environment variable PATH.


Also, you can access Spark Driver's UI at http://localhost:4040:

We will discuss more about Spark UI in the Spark Driver Web UI section of this chapter.

In this section, we have completed the Spark setup in standalone mode. In the next section, we will do some hands on Apache Spark, using spark-shell or spark-cli.

主站蜘蛛池模板: 登封市| 墨竹工卡县| 麻江县| 红桥区| 乐东| 容城县| 霍州市| 基隆市| 晋城| 太白县| 扶余县| 大同县| 邻水| 张家界市| 彭山县| 杭州市| 磴口县| 禄丰县| 汕尾市| 海林市| 铁力市| 贺州市| 桃园县| 前郭尔| 濉溪县| 丰县| 嘉祥县| 孟村| 城步| 台前县| 蒙自县| 临颍县| 灵璧县| 化德县| 玛纳斯县| 仁怀市| 井研县| 嘉峪关市| 民勤县| 桂阳县| 额尔古纳市|