官术网_书友最值得收藏!

Configuring and running Spark on Amazon Elastic Map Reduce

Launch a Hadoop cluster with Spark installed using the Amazon Elastic Map Reduce. Perform the following steps to create an EMR cluster with Spark installed:

  1. Launch an Amazon EMR Cluster.
  2. Open the Amazon EMR UI console at https://console.aws.amazon.com/elasticmapreduce/.
  3. Choose Create cluster:
  1. Choose appropriate Amazon AMI Version 3.9.0 or later as shown in the following screenshot:
  1. For the applications to be installed field, choose Spark 1.5.2 or later from the list shown on the User Interface and click on Add.
  2. Select other hardware options as necessary:
    • The Instance Type
    • The keypair to be used with SSH
    • Permissions
    • IAM roles (Default orCustom)

Refer to the following screenshot:

  1. Click on Create cluster. The cluster will start instantiating as shown in the following screenshot:
  1. Log in into the master. Once the EMR cluster is ready, you can SSH into the master:
   $ ssh -i rd_spark-user1.pem
hadoop@ec2-52-3-242-138.compute-1.amazonaws.com
The output will be similar to following listing:
     Last login: Wed Jan 13 10:46:26 2016

__| __|_ )
_| ( / Amazon Linux AMI
___|___|___|

https://aws.amazon.com/amazon-linux-ami/2015.09-release-notes/
23 package(s) needed for security, out of 49 available
Run "sudo yum update" to apply all updates.
[hadoop@ip-172-31-2-31 ~]$
  1. Start the Spark Shell:
      [hadoop@ip-172-31-2-31 ~]$ spark-shell
16/01/13 10:49:36 INFO SecurityManager: Changing view acls to:
hadoop

16/01/13 10:49:36 INFO SecurityManager: Changing modify acls to:
hadoop

16/01/13 10:49:36 INFO SecurityManager: SecurityManager:
authentication disabled; ui acls disabled; users with view
permissions: Set(hadoop); users with modify permissions:
Set(hadoop)

16/01/13 10:49:36 INFO HttpServer: Starting HTTP Server
16/01/13 10:49:36 INFO Utils: Successfully started service 'HTTP
class server' on port 60523.

Welcome to
____ __
/ __/__ ___ _____/ /__
_ / _ / _ `/ __/ '_/
/___/ .__/_,_/_/ /_/_ version 1.5.2
/_/
scala> sc
  1. Run Basic Spark sample from the EMR:
    scala> val textFile = sc.textFile("s3://elasticmapreduce/samples
/hive-ads/tables/impressions/dt=2009-04-13-08-05
/ec2-0-51-75-39.amazon.com-2009-04-13-08-05.log")

scala> val linesWithCartoonNetwork = textFile.filter(line =>
line.contains("cartoonnetwork.com")).count()
Your output will be as follows:
     linesWithCartoonNetwork: Long = 9
主站蜘蛛池模板: 都昌县| 遵义市| 兴隆县| 临邑县| 怀宁县| 留坝县| 姚安县| 瑞金市| 屯门区| 邹城市| 崇明县| 望谟县| 和林格尔县| 平罗县| 自贡市| 成安县| 宝坻区| 江西省| 勐海县| 沁源县| 南溪县| 海原县| 兰州市| 昌平区| 大连市| 岳西县| 禹城市| 翁源县| 噶尔县| 河源市| 雷州市| 安泽县| 五寨县| 崇文区| 靖远县| 焉耆| 石景山区| 北票市| 伊宁县| 苗栗县| 敦煌市|