Professional Documents
Culture Documents
Exam Question Paper - BDT - 35
Exam Question Paper - BDT - 35
Pre-requisites:
1. Each candidate must have hardware as follows
a. I7 processor
b. 8GB RAM
c. 500+ GB HDD
d. Internet connectivity
2. Each candidate must have software as follows
a. Windows / CentOS 7 Operating system
b. VirtualBox installed
c. Docker installed ( in CentOS7 only )
d. Docker image for CDH 5.X
e. Mysql installed
3. Candidates have an option to either use standalone installation / VirtualBox / Docker images.
4. Candidates are assumed to have good hands on over the technology stack they use for installation. For
e.g: If a candidate decides to use Docker, he/she should be well-versed with docker and no assistance will
be provided for performing operations on docker.
Q2. Below is the data for the chikoo plantation from the market.
- 10 Marks
Sample data: (Please use the attached file: Chikoo_Numbers.csv)
Software Stack needed: Spark, HDFS (using data from HDFS is optional)
Analyse it using Spark and answer the following questions (2 marks each):
1. Find all the states participating in the data collection
2. Find the total amount of data rows in the participation
3. Find the No of districts from each state participating in the data collection
4. Find the market where a buyer can get chikoo at its cheapest
5. Find the average of min_price and max_price for the chikoo
Q3. Create a hive table for analysing market data. (Chikoo_Numbers.csv)
Write Hive queries for the following:
- 5 Marks
Software Stack needed: Hive, HDFS
Q5. Create a Map-Reduce program for wordcount in README.txt using any one of the following options
- 5 Marks
Software Stack needed: HDFS, Mapreduce, Java / Python + hadoop-streaming
Q6. Create an Apache Airflow to implement a pipeline for triggering a pySpark action.
- 5 Marks
Software Stack needed: Airflow, Spark