2. Run word count - Hive job on EMR- V1_Reviewed_sks_Lab Guides

Run Hive Job on EMR– Demo

Table of Contents

Steps to create EMR Cluster – Demo ............................................................................................. 2

Step 1: Select the S3 service. ................................................................ Error! Bookmark not defined.
Step 2: Click Create bucket.................................................................... Error! Bookmark not defined.
Step 3: Write the Bucket name. Click Create. ...................................... Error! Bookmark not defined.
Step 4: Click the bucket name. .............................................................. Error! Bookmark not defined.
Step 5: Click Create folder. .................................................................... Error! Bookmark not defined.
Step 6: Type the folder name. Click Save. ........................................... Error! Bookmark not defined.
Step 7: Select the EMR service. ............................................................ Error! Bookmark not defined.
Step 8: Click Create clusters. ................................................................. Error! Bookmark not defined.
Step 9: Type the Cluster name. Click the folder icon........................... Error! Bookmark not defined.
Step 10: Select the S3 bucket created earlier. Click Select. ............... Error! Bookmark not defined.
Step 11: Choose the latest version. ...................................................... Error! Bookmark not defined.
Step 12: Choose the instance type as “m4.large”. Choose the number of instances as per your
requirement. Enter the EC2 key pair. Click Create cluster. ................. Error! Bookmark not defined.
Step 13: Check the cluster status. ......................................................... Error! Bookmark not defined.
Step 14: Go to EC2 service. Three instances are created automatically. ....... Error! Bookmark not
Step 15: Click the master node Security group. ................................... Error! Bookmark not defined.
Step 16: Click the Inbound tab. Click Edit. ............................................ Error! Bookmark not defined.
Step 17: Click Add Rule button. ............................................................. Error! Bookmark not defined.
Step 18: Add “SSH” and make it anywhere. Click Save. ..................... Error! Bookmark not defined.
Step 19: SSH your instance. .................................................................. Error! Bookmark not defined.

Steps to run Hive Job on EMR – Demo

Step 1: Click the cluster you created earlier.

Step 2: Click Steps tab. Click “Add Step” button.

Step 3: Select the step type “Hive program”. Give it a name. Enter the Script S3 location and
Input S3 location.
Script location: S3://us-east-1.elasticmapreduce.samples/cloudfront/code/Hive_CloudFront.q

Input location: s3://us-east-1.elasticmapreduce.samples

Step 4: Simultaneously, Go to S3 service on a new tab and click the bucket you created earlier.

Step 5: Click Create folder.

Step 6: Type the folder name. Click Save.

Step 7: Go to EMR service tab again. Click the folder icon.

Step 8: Select the folder “outputs” from the bucket. Click Select.

Step 9: Check the cluster status.

Step 10: Select the S3 bucket you created earlier. Select the outputs folder.

Step 11: Choose the os_requests.

Step 12: Download it. Open it in a notepad.

Step 13: Check the file.

