白银市没有捕鱼机了吗

書名： Hadoop MapReduce v2 Cookbook（Second Edition）
作者名： Thilina Gunarathne
本章字數： 823字
更新時間： 2021-07-23 20:32:55

Creating an Amazon EMR job flow using the AWS Command Line Interface

AWS Command Line Interface (CLI) is a tool that allows us to manage our AWS services from the command line. In this recipe, we use AWS CLI to manage Amazon EMR services.

This recipe creates an EMR job flow using the AWS CLI to execute the WordCount sample from the Running Hadoop MapReduce computations using Amazon Elastic MapReduce recipe of this chapter.

Getting ready

The following are the prerequisites to get started with this recipe:

Python 2.6.3 or higher
pip—Python package management system

How to do it...

The following steps show you how to create an EMR job flow using the EMR command-line interface:

Install AWS CLI in your machine using the pip installer:
```
$ sudo pip install awscli
```
Note

Refer to http://docs.aws.amazon.com/cli/latest/userguide/installing.html for more information on installing the AWS CLI. This guide provides instructions on installing AWS CLI without sudo as well as instructions on installing AWS CLI using alternate methods.
Create an access key ID and a secret access key by logging in to the AWS IAM console (https://console.aws.amazon.com/iam). Download and save the key file in a safe location.
Use the aws configure utility to configure your AWS account to the AWC CLI. Provide the access key ID and the secret access key you obtained in the previous step. This information would get stored in the .aws/config and .aws/credentials files in your home directory.
```
$ aws configure
AWS Access Key ID [None]: AKIA….
AWS Secret Access Key [None]: GC…
Default region name [None]: us-east-1a
Default output format [None]: 
```
Tip

You can skip to step 7 if you have completed steps 2 to 6 of the Running Hadoop MapReduce computations using Amazon Elastic MapReduce recipe in this chapter.
Create a bucket to upload the input data by clicking on Create Bucket in the Amazon S3 monitoring console (https://console.aws.amazon.com/s3). Provide a unique name for your bucket. Upload your input data to the newly-created bucket by selecting the bucket and clicking on Upload. The input data for the WordCount sample should be one or more text files.
Create an S3 bucket to upload the JAR file needed for our MapReduce computation. Upload hcb-c1-samples.jar to the newly created bucket.
Create an S3 bucket to store the output data of the computation. Create another S3 bucket to store the logs of the computation. Let's assume the name of this bucket is hcb-c2-logs.

Create an EMR cluster by executing the following command. This command will output the cluster ID of the created EMR cluster:

$ aws emr create-cluster --ami-version 3.1.0 \
--log-uri s3://hcb-c2-logs \
--instance-groups \
InstanceGroupType=MASTER,InstanceCount=1,\
InstanceType=m3.xlarge \
InstanceGroupType=CORE,InstanceCount=2,\
InstanceType=m3.xlarge
{
 “ClusterId”: “j-2X9TDN6T041ZZ”
}

You can use the list-clusters command to check the status of the created EMR cluster:

$ aws emr list-clusters
{
 “Clusters”: [
 {
 “Status”: {
 “Timeline”: {
 “ReadyDateTime”: 1421128629.1830001,
 “CreationDateTime”: 1421128354.4130001
 },
 “State”: “WAITING”,
 “StateChangeReason”: {
 “Message”: “Waiting after step completed”
 }
 },
 “NormalizedInstanceHours”: 24,
 “Id”: “j-2X9TDN6T041ZZ”,
 “Name”: “Development Cluster”
 }
 ]
}

Add a job step to this EMR cluster by executing the following. Replace the paths of the JAR file, input data location, and the output data location with the locations you used in steps 5, 6, and 7. Replace cluster-id with the cluster ID of your newly created EMR cluster.

$ aws emr add-steps \
--cluster-id j-2X9TDN6T041ZZ \
--steps Type=CUSTOM_JAR,Name=CustomJAR,ActionOnFailure=CONTINUE,\
Jar=s3n://[S3 jar file bucket]/hcb-c1-samples.jar,\
Args=chapter1.WordCount,\
s3n://[S3 input data path]/*,\
s3n://[S3 output data path]/wc-out
{
 “StepIds”: [
 “s-1SEEPDZ99H3Y2”
 ]
}

Check the status of the submitted job step using the describe-step command as follows. You can also check the status and debug your job flow using the Amazon EMR console (https://console.aws.amazon.com/elasticmapreduce).
```
$ aws emr describe-step \
–cluster-id j-2X9TDN6T041ZZ \
–step-id s-1SEEPDZ99H3Y2
```
Once the job flow is completed, check the result of the computation in the output data location using the S3 console.

Terminate the cluster using the terminate-clusters command:

$ aws emr terminate-clusters --cluster-ids j-2X9TDN6T041ZZ

There's more...

You can use EC2 Spot Instances with your EMR clusters to reduce the cost of your computations. Add a bid price to your request by adding the --BidPrice parameter to the instance groups of your create-cluster command:

$ aws emr create-cluster --ami-version 3.1.0 \
--log-uri s3://hcb-c2-logs \
--instance-groups \
InstanceGroupType=MASTER,InstanceCount=1,\
InstanceType=m3.xlarge,BidPrice=0.10 \
InstanceGroupType=CORE,InstanceCount=2,\
InstanceType=m3.xlarge,BidPrice=0.10

Refer to the Saving money using Amazon EC2 Spot Instances to execute EMR job flows recipe in this chapter for more details on Amazon Spot Instances.

官术网_书友最值得收藏!

Creating an Amazon EMR job flow using the AWS Command Line Interface

Getting ready

How to do it...

Note

Tip

There's more...

See also

官术网_书友最值得收藏!

Hadoop MapReduce v2 Cookbook（Second Edition）

Creating an Amazon EMR job flow using the AWS Command Line Interface

Getting ready

How to do it...

Note

Tip

There's more...

See also