Professional Documents
Culture Documents
2 DE +Installing+Apache+Spark+on+CDH+EC2
2 DE +Installing+Apache+Spark+on+CDH+EC2
2 DE +Installing+Apache+Spark+on+CDH+EC2
wget http://archive.cloudera.com/spark2/csd/SPARK2_ON_YARN-2.3.0.cloudera2.jar
cp SPARK2_ON_YARN-2.3.0.cloudera2.jar /opt/cloudera/csd/
ls -ltrh
4. Change the owner using the following command
ls -ltrh
● Integrate with services like. S3, HBase, HDFS, Hive, YARN(MR2 Included), Zookeeper
● You will the pop-up window as shown below. Select the checkbox. Click Ok to get back.
● Click on Continue
● Click on Continue
● Click on Continue
● Click on Finish.
10. Now restart YARN by clicking the Stale configuration icon, as shown below
mkdir /var/log/spark2/lineage
chown spark:spark /var/log/spark2/lineage
chmod 777 /var/log/spark2/lineage
12. The Spark2 job by default reads data from HDFS, so now let’s
create an ec2-user directory in HDFS(Skip if already done)
In putty run the following commands. (run from root user)
sudo su hdfs
hadoop fs -mkdir -p /user/ec2-user
hadoop fs -chown -R ec2-user:ec2-user /user/ec2-user
hadoop fs -chmod -R 777 /user/ec2-user
Now, for your spark jobs, keep your data file in /user/ec2-user/ directory.(you can further create
a directory here and store your data files)
spark2-submit --version
Note: Before open History server Web UI. First, you need to configure the Security group of
your instance:
II. Scroll down the Description tab and under your security group click on avdmap.
III. Then go to Inbound > Edit > Add Rule
IV. A new will get added in the list, configure the rule as below
A. Type section - “All TCP”
B. Protocol - TCP.
C. Port range - 0-65535
D. Source - My IP
E. Click on Save.
V. Copy public IP address from the ec2 dashboard.
VII. Now Paste the public IP address into the browser as shown below. The format is