Professional Documents
Culture Documents
BigData Challenge - L2 (Spark)
BigData Challenge - L2 (Spark)
Problem Statement
Fix the broken Spark code present in "/home/hadoop/mycode.py" and run the code using spark-submit.
Instructions
● As a prerequisite, create the EC2 Key pair by following the steps in below documentation.
https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/ec2-key-pairs.html#having-ec2-create-your-key-pair
● Navigate to the CloudFormation console - https://console.aws.amazon.com/cloudformation/home?region=us-
east-1 and choose us-east-1 region.
● Click on “Create stack” and select “With new resources(standard) option”
● Once the stack is provisioned, Navigate to the output section to fetch the EMR master node public hostname
(“EMRPublicDNSMaster”).
● SSH into the master node by using the keypair specified in the CFN.
example : ssh -i test.pem hadoop@ec2.xxxx.
● Fix the broken Spark code present in "/home/hadoop/mycode.py" and run the code using spark-submit.
Submissions
● General Info
-Your Name
-Email ID
-AWS Account ID
● Resource Info
- EMR cluster ID from the Cloudformation output.
- Share the output of “cat /home/hadoop/mycode.py”
- Also Share the screenshot of /home/hadoop/out.txt file content.
Documentation Links
● What is a EC2 keypair? - https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/ec2-key-pairs.html#having-ec2-
create-your-key-pair
● What is AWS EMR ? - https://docs.aws.amazon.com/emr/latest/ManagementGuide/emr-what-is-emr.html
● Apache spark - https://spark.apache.org/documentation.html