官术网_书友最值得收藏!

Configuring and running Spark on Amazon Elastic Map Reduce

Launch a Hadoop cluster with Spark installed using the Amazon Elastic Map Reduce. Perform the following steps to create an EMR cluster with Spark installed:

  1. Launch an Amazon EMR Cluster.
  2. Open the Amazon EMR UI console at https://console.aws.amazon.com/elasticmapreduce/.
  3. Choose Create cluster:
  1. Choose appropriate Amazon AMI Version 3.9.0 or later as shown in the following screenshot:
  1. For the applications to be installed field, choose Spark 1.5.2 or later from the list shown on the User Interface and click on Add.
  2. Select other hardware options as necessary:
    • The Instance Type
    • The keypair to be used with SSH
    • Permissions
    • IAM roles (Default orCustom)

Refer to the following screenshot:

  1. Click on Create cluster. The cluster will start instantiating as shown in the following screenshot:
  1. Log in into the master. Once the EMR cluster is ready, you can SSH into the master:
   $ ssh -i rd_spark-user1.pem
hadoop@ec2-52-3-242-138.compute-1.amazonaws.com
The output will be similar to following listing:
     Last login: Wed Jan 13 10:46:26 2016

__| __|_ )
_| ( / Amazon Linux AMI
___|___|___|

https://aws.amazon.com/amazon-linux-ami/2015.09-release-notes/
23 package(s) needed for security, out of 49 available
Run "sudo yum update" to apply all updates.
[hadoop@ip-172-31-2-31 ~]$
  1. Start the Spark Shell:
      [hadoop@ip-172-31-2-31 ~]$ spark-shell
16/01/13 10:49:36 INFO SecurityManager: Changing view acls to:
hadoop

16/01/13 10:49:36 INFO SecurityManager: Changing modify acls to:
hadoop

16/01/13 10:49:36 INFO SecurityManager: SecurityManager:
authentication disabled; ui acls disabled; users with view
permissions: Set(hadoop); users with modify permissions:
Set(hadoop)

16/01/13 10:49:36 INFO HttpServer: Starting HTTP Server
16/01/13 10:49:36 INFO Utils: Successfully started service 'HTTP
class server' on port 60523.

Welcome to
____ __
/ __/__ ___ _____/ /__
_ / _ / _ `/ __/ '_/
/___/ .__/_,_/_/ /_/_ version 1.5.2
/_/
scala> sc
  1. Run Basic Spark sample from the EMR:
    scala> val textFile = sc.textFile("s3://elasticmapreduce/samples
/hive-ads/tables/impressions/dt=2009-04-13-08-05
/ec2-0-51-75-39.amazon.com-2009-04-13-08-05.log")

scala> val linesWithCartoonNetwork = textFile.filter(line =>
line.contains("cartoonnetwork.com")).count()
Your output will be as follows:
     linesWithCartoonNetwork: Long = 9
主站蜘蛛池模板: 潜山县| 湛江市| 怀宁县| 乐至县| 忻州市| 安泽县| 河曲县| 左权县| 沧源| 遵义县| 南雄市| 江油市| 惠来县| 宝丰县| 德兴市| 农安县| 正宁县| 张家口市| 博兴县| 梨树县| 田林县| 大丰市| 晋宁县| 宁安市| 大荔县| 永春县| 英超| 崇阳县| 清苑县| 南澳县| 堆龙德庆县| 修武县| 蕉岭县| 济宁市| 方城县| 绵阳市| 华阴市| 陇西县| 清涧县| 晋宁县| 达日县|