官术网_书友最值得收藏!

Getting started with Spark

In this section, we will run Apache Spark in local mode or standalone mode. First we will set up Scala, which is the prerequisite for Apache Spark. After the Scala setup, we will set up and run Apache Spark. We will also perform some basic operations on it. So let's start.

Since Apache Spark is written in Scala, it needs Scala to be set up on the system. You can download Scala from http://www.scala-lang.org/download/ (we will set up Scala 2.11.8 in the following examples).

Once Scala is downloaded, we can set it up on a Linux system as follows:

Also, it is recommended to set the SCALA_HOME environment variable and add Scala binaries to the PATH variable. You can set it in the .bashrc file or /etc/environment file as follows:

export SCALA_HOME=/usr/local/scala-2.11.8
export PATH=$PATH:/usr/local/scala-2.11.8/bin

It is also shown in the following screenshot:

Now, we have set up a Scala environment successfully. So, it is time to download Apache Spark. You can download it from http://spark.apache.org/downloads.html.

The Spark version can be different, as per requirements.

After Apache Spark is downloaded, run the following commands to set it up:

tar -zxf spark-2.0.0-bin-hadoop2.7.tgz
sudo mv spark-2.0.0-bin-hadoop2.7 /usr/local/spark
Directory location can be different as per user's requirement.

Also, you can set environment variable SPARK_HOME. It is not mandatory; however, it helps the user to find the installation directory of Spark. Also, you can add the path of Spark binaries in the $PATH variable for accessing them without specifying their path:

export SPARK_HOME=/usr/local/spark
export PATH=$PATH:/usr/local/scala-2.11.8/bin:$SPARK_HOME/bin

It is shown in the following screenshot:

Now, we are ready to start Spark in standalone mode. Let's run the following command to start it:

$SPARK_HOME/bin/spark-shell

Also, we can simply execute the spark-shellcommand as Spark binaries are added to the environment variable PATH.


Also, you can access Spark Driver's UI at http://localhost:4040:

We will discuss more about Spark UI in the Spark Driver Web UI section of this chapter.

In this section, we have completed the Spark setup in standalone mode. In the next section, we will do some hands on Apache Spark, using spark-shell or spark-cli.

主站蜘蛛池模板: 循化| 翁牛特旗| 新密市| 抚顺县| 西峡县| 江津市| 云龙县| 元朗区| 长治市| 南康市| 开化县| 安岳县| 登封市| 嘉祥县| 临邑县| 如皋市| 商都县| 安国市| 万宁市| 闽侯县| 呼伦贝尔市| 长沙县| 宁安市| 南投县| 鲁甸县| 杭锦后旗| 贡嘎县| 怀来县| 喀喇| 兴国县| 巧家县| 广饶县| 依安县| 楚雄市| 湟源县| 尼勒克县| 盖州市| 偃师市| 罗江县| 莫力| 五指山市|