官术网_书友最值得收藏!

The Spark shell

Spark supports writing programs interactively using the Scala, Python, or R REPL (that is, the Read-Eval-Print-Loop, or interactive shell). The shell provides instant feedback as we enter code, as this code is immediately evaluated. In the Scala shell, the return result and type is also displayed after a piece of code is run.

To use the Spark shell with Scala, simply run ./bin/spark-shell from the Spark base directory. This will launch the Scala shell and initialize SparkContext, which is available to us as the Scala value, sc. With Spark 2.0, a SparkSession instance in the form of Spark variable is available in the console as well.

Your console output should look similar to the following:

$ ~/work/spark-2.0.0-bin-hadoop2.7/bin/spark-shell 
Using Spark's default log4j profile: org/apache/spark/log4j-
defaults.properties

Setting default log level to "WARN".
To adjust logging level use sc.setLogLevel(newLevel).
16/08/06 22:14:25 WARN NativeCodeLoader: Unable to load native-
hadoop library for your platform... using builtin-java classes
where applicable

16/08/06 22:14:25 WARN Utils: Your hostname, ubuntu resolves to a
loopback address: 127.0.1.1; using 192.168.22.180 instead (on
interface eth1)

16/08/06 22:14:25 WARN Utils: Set SPARK_LOCAL_IP if you need to
bind to another address

16/08/06 22:14:26 WARN Utils: Service 'SparkUI' could not bind on
port 4040. Attempting port 4041.

16/08/06 22:14:27 WARN SparkContext: Use an existing SparkContext,
some configuration may not take effect.

Spark context Web UI available at http://192.168.22.180:4041
Spark context available as 'sc' (master = local[*], app id = local-
1470546866779).

Spark session available as 'spark'.
Welcome to
____ __
/ __/__ ___ _____/ /__
_ / _ / ______/ __/ '_/
/___/ .__/_,_/_/ /_/_ version 2.0.0
/_/

Using Scala version 2.11.8 (Java HotSpot(TM) 64-Bit Server VM,
Java 1.7.0_60)

Type in expressions to have them evaluated.
Type :help for more information.

scala>

To use the Python shell with Spark, simply run the ./bin/pyspark command. Like the Scala shell, the Python SparkContext object should be available as the Python variable, sc. Your output should be similar to this:

~/work/spark-2.0.0-bin-hadoop2.7/bin/pyspark 
Python 2.7.6 (default, Jun 22 2015, 17:58:13)
[GCC 4.8.2] on linux2
Type "help", "copyright", "credits" or "license" for more
information.

Using Spark's default log4j profile: org/apache/spark/log4j-
defaults.properties

Setting default log level to "WARN".
To adjust logging level use sc.setLogLevel(newLevel).
16/08/06 22:16:15 WARN NativeCodeLoader: Unable to load native-
hadoop library for your platform... using builtin-java classes
where applicable

16/08/06 22:16:15 WARN Utils: Your hostname, ubuntu resolves to a
loopback address: 127.0.1.1; using 192.168.22.180 instead (on
interface eth1)

16/08/06 22:16:15 WARN Utils: Set SPARK_LOCAL_IP if you need to
bind to another address

16/08/06 22:16:16 WARN Utils: Service 'SparkUI' could not bind on
port 4040. Attempting port 4041.

Welcome to
____ __
/ __/__ ___ _____/ /__
_ / _ / ______/ __/ '_/
/__ / .__/_,_/_/ /_/_ version 2.0.0
/_/

Using Python version 2.7.6 (default, Jun 22 2015 17:58:13)
SparkSession available as 'spark'.
>>>

R is a language and has a runtime environment for statistical computing and graphics. It is a GNU project. R is a different implementation of S (a language developed by Bell Labs).

R provides statistical (linear and nonlinear modeling, classical statistical tests, time-series analysis, classification, and clustering) and graphical techniques. It is considered to be highly extensible.

To use Spark using R, run the following command to open Spark-R shell:

$ ~/work/spark-2.0.0-bin-hadoop2.7/bin/sparkR
R version 3.0.2 (2013-09-25) -- "Frisbee Sailing"
Copyright (C) 2013 The R Foundation for Statistical Computing
Platform: x86_64-pc-linux-gnu (64-bit)

R is free software and comes with ABSOLUTELY NO WARRANTY.
You are welcome to redistribute it under certain conditions.
Type 'license()' or 'licence()' for distribution details.

Natural language support but running in an English locale

R is a collaborative project with many contributors.
Type 'contributors()' for more information and
'citation()' on how to cite R or R packages in publications.

Type 'demo()' for some demos, 'help()' for on-line help, or
'help.start()' for an HTML browser interface to help.
Type 'q()' to quit R.

Launching java with spark-submit command /home/ubuntu/work/spark-
2.0.0-bin-hadoop2.7/bin/spark-submit "sparkr-shell"
/tmp/RtmppzWD8S/backend_porta6366144af4f

Using Spark's default log4j profile: org/apache/spark/log4j-
defaults.properties

Setting default log level to "WARN".
To adjust logging level use sc.setLogLevel(newLevel).
16/08/06 22:26:22 WARN NativeCodeLoader: Unable to load native-
hadoop library for your platform... using builtin-java classes
where applicable

16/08/06 22:26:22 WARN Utils: Your hostname, ubuntu resolves to a
loopback address: 127.0.1.1; using 192.168.22.186 instead (on
interface eth1)

16/08/06 22:26:22 WARN Utils: Set SPARK_LOCAL_IP if you need to
bind to another address

16/08/06 22:26:22 WARN Utils: Service 'SparkUI' could not bind on
port 4040. Attempting port 4041.


Welcome to
____ __
/ __/__ ___ _____/ /__
_ / _ / _ ____/ __/ '_/
/___/ .__/_,_/_/ /_/_ version 2.0.0
/_/
SparkSession available as 'spark'.
During startup - Warning message:
package 'SparkR' was built under R version 3.1.1
>
主站蜘蛛池模板: 曲周县| 苏尼特右旗| 信阳市| 惠州市| 锡林郭勒盟| 河北省| 湖南省| 五家渠市| 卢氏县| 平利县| 浦江县| 中牟县| 凤山县| 陵水| 博野县| 保德县| 日照市| 黑水县| 广河县| 秀山| 水城县| 南漳县| 辽宁省| 慈利县| 五莲县| 包头市| 双柏县| 商城县| 同仁县| 阳新县| 海淀区| 靖西县| 军事| 广饶县| 游戏| 麻城市| 陇西县| 青田县| 丹阳市| 错那县| 大港区|