官术网_书友最值得收藏!

Downloading and installing Spark 1.4.1

In the following section, we will go through the Spark installation process in detail. Spark is built on Scala and runs on the Java Virtual Machine (JVM). Before installing Spark, you should first have Java Development Kit 7 (JDK) installed on your computer.

Make sure you install JDK instead of Java Runtime Environment (JRE). You can download it from http://www.oracle.com/technetwork/java/javase/downloads/jdk7-downloads-1880260.html.

Next, download the latest release of Spark from the project website https://spark.apache.org/downloads.html. Perform the following three steps to get Spark installed on your computer:

  1. Select the package type: Pre-built for Hadoop 2.6 and later and then Direct Download. Make sure you choose a prebuilt version for Hadoop instead of the source code.
  2. Download the compressed TAR file called spark-1.4.1-bin-hadoop2.6.tgz and place it into a directory on your computer.
  3. Open the terminal and change to the previous directory. Using the following commands, extract the TAR file, rename the Spark root folder to spark-1.4.1, and then list the installed files and subdirectories:
    tar -xf spark-1.4.1-bin-hadoop2.6.tgz
      mv spark-1.4.1-bin-hadoop2.6 spark-1.4.1
      cd spark-1.4.1
      ls 

That's it! You now have Spark and its libraries installed on your computer. Note the following files and directories in the spark-1.4.1 home folder:

  • core: This directory contains the source code for the core components and API of Spark
  • bin: This directory contains the executable files that are used to submit and deploy Spark applications or also to interact with Spark in a Spark shell
  • graphx, mllib, sql, and streaming: These are Spark libraries that provide a unified interface to do different types of data processing, namely graph processing, machine learning, queries, and stream processing
  • examples: This directory contains demos and examples of Spark applications

It is often convenient to create shortcuts to the Spark home folder and Spark example folders. In Linux or Mac, open or create the ~/.bash_profile file in your home folder and insert the following lines:

export SPARKHOME="/[Where you put Spark]/spark-1.4.1/"
export SPARKSCALAEX="ls ../spark- 1.4.1/examples/src/main/scala/org/apache/spark/examples/"

Then, execute the following command for the previous shortcuts to take effect:

source ~/.bash_profile

As a result, you can quickly access these folders in the terminal or Spark shell. For example, the example named LiveJournalPageRank.scala can be accessed with:

$SPARKSCALAEX/graphx/LiveJournalPageRank.scala
主站蜘蛛池模板: 方山县| 和硕县| 泗水县| 寻甸| 嘉善县| 承德市| 大城县| 黎城县| 且末县| 禹州市| 大埔区| 东丰县| 梨树县| 大渡口区| 肥西县| 廊坊市| 汉寿县| 唐山市| 保靖县| 黑龙江省| 甘孜县| 中山市| 衡水市| 临猗县| 罗山县| 迁西县| 沈阳市| 黄山市| 塘沽区| 三江| 金秀| 苗栗县| 苏尼特左旗| 冷水江市| 陆河县| 开远市| 黄山市| 保定市| 宝清县| 东乌| 射洪县|