Download as pdf or txt
Download as pdf or txt
You are on page 1of 3

BigData Challenge - L2 [Spark]

Problem Statement
Fix the broken Spark code present in "/home/hadoop/mycode.py" and run the code using spark-submit.

Deployed Resource Names (AWS Region: us-east-1)

● EMR cluster - EMR_Hackathon_Scenario2

Instructions

● As a prerequisite, create the EC2 Key pair by following the steps in below documentation.
https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/ec2-key-pairs.html#having-ec2-create-your-key-pair
● Navigate to the CloudFormation console - https://console.aws.amazon.com/cloudformation/home?region=us-
east-1 and choose us-east-1 region.
● Click on “Create stack” and select “With new resources(standard) option”

● Use “BigData_L2.json“ template provided in the BigData-Lab.zip file.


● Choose “Template is ready option” and Upload the above template from your local filesystem.

● Then click on Next.


● Specify the stack name as “Hackathon-Lab2”. Under Parameters → KeyName option, please select the keypair
created in first step and click on Next.

● Step 3 : Configure stack options page, click Next.


● Step 4 : Review page, click acknowledge and select “Create stack”.

● Once the stack is provisioned, Navigate to the output section to fetch the EMR master node public hostname
(“EMRPublicDNSMaster”).
● SSH into the master node by using the keypair specified in the CFN.
example : ssh -i test.pem hadoop@ec2.xxxx.
● Fix the broken Spark code present in "/home/hadoop/mycode.py" and run the code using spark-submit.

Submissions

● General Info
-Your Name
-Email ID
-AWS Account ID
● Resource Info
- EMR cluster ID from the Cloudformation output.
- Share the output of “cat /home/hadoop/mycode.py”
- Also Share the screenshot of /home/hadoop/out.txt file content.

After collating the above information, send an email to ps-hackathon-bigdata@amazon.com

Documentation Links
● What is a EC2 keypair? - https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/ec2-key-pairs.html#having-ec2-
create-your-key-pair
● What is AWS EMR ? - https://docs.aws.amazon.com/emr/latest/ManagementGuide/emr-what-is-emr.html
● Apache spark - https://spark.apache.org/documentation.html

You might also like