Question Bank Big Data Analytics

You might also like

Download as pdf or txt
Download as pdf or txt
You are on page 1of 5

Big Data Analytics (15CS82) Question Bank

QUESTION BANK
BIG DATA ANALYTICS
[As per Choice Based Credit System (CBCS) scheme]
(Effective from the academic year 2016 -2017)
SEMESTER – VIII

Subject Code 15CS82 IA Marks 40


Number of Lecture Hours/Week 04 Exam Marks 60
Total Number of Lecture Hours 50 Exam Hours 03

Module 1:
1. List and brief HDFS User commands that will facilitate navigation within 8M (July ‘19)
HDFS
2. Write a java program to read, write and delete files on HDFS 8M (July’19)
3. Describe HDFS component with diagram 8M (July’19)
4. What are the steps to run terasort benchmark in hadoop? 4M NS
5. What are the steps to run TestDFSIO benchmark in hadoop? 4M (Dec’19)
6. How to manage Hadoop MapReduce jobs by using mapred job command? 4M (July’19)
7. Explain about hadoop MapReduce model with simple mapper and reducer 8M (july2019)
script
8. Illustrate Hadoop parallel MapReduce data flow with diagram. Write down 8M (July’19)
the steps of MapReduce parallel execution
9. Write a java program that counts the number of occurrences of each word in 8M (Dec’19)
a given input.
10. Explain about Hadoop Streaming interface with Python Mapper and 8M (Dec’19)
Reducer script.
11. Explain about Hadoop Pipes interface and write a program for word count 8M (Dec’19)
using the Pipes interface.
Module 2
1. Describe about Apache Pig scripting tool to examine data both locally and 8M (Dec’19)
on Hadoop cluster.
2. Write a steps and procedure to summarize, ad hoc queries and analyze the 8M NS
data set using Hadoop HiveQL with example.
3. How to acquire relational data using Hadoop Sqoop? Explain with Import 8M (Dec’19)
and Export method.
4. Detail the steps of importing data from MySQL to HDFS and Export data 8M (Dec’19)

Dept. of ISE, DSATM 2019-20 1


Big Data Analytics (15CS82) Question Bank

from HDFS to MySQL with example data.


5. Write the procedure to collect, transport and store data into HDFS using 8M NS
apache Flume.
6. How Apache Oozie is used to manage Hadoop workflow? Explain in detail. 8M (July’19)
7. Write a short summary of Apache oozie job commands. 4M (July’19)
8. What are the features of Apache HBase? How it manage the Distributed 8M (July’19)
Database? Explain the commands with basic operations.
9. Explain the structure of YARN Applications. 6M (July’19)
10. Describe the framework of YARN Applications with HDFS. 8M (July’19)
11 Describe the procedure for restarting the stopped Hadoop service by Apache 8M (Dec’19)
Ambari
12 Describe the procedure for changing Hadoop properties by Apache Ambari 8M (Dec’19)
13 What are the built-in administrative features and commands of Hadoop 8M (Dec’19)
YARN application
14 How to configure NFSv3 Gateway to HDFS? Explain the steps with scripts 8M (Dec’19)
and commands.
15 How to transform applications of Hadoop version 1 to version 2. What are the 4M (Dec’19)
compatibilities to be adopted?

Module 3
1. What is Business Intelligence? Write a note on its role in Decision making 4M (July’19)
2. List and explain any two areas of applications of Business Intelligence and 8M (July’19)
Data mining.
3. List out the features of a good Data Warehousing 4M (Dec’19)
4. Describe Data Warehouse architecture with neat diagram. 8M (July’19)
5. How the raw data is prepared for mining? Explain the processes are involved 8M (Dec’19)
to prepare data,
6. What are the Data mining techniques are involved in Supervised and 8M (July’19)
Unsupervised learning? Explain briefly.
7. Write notes on tools and platforms for Data Mining. 4M (Dec’19)
8. What is Confusion Matrix? What is the use of Confusion matrix? 4M NS
9. Compare popular Data Mining platforms with different features 4M (July’19)
10. How you will use Data Mining techniques effectively and Successfully? Brief 6M (July’19)
CRISP-DM steps for effective Data Mining.
11. Write down the myths of data Mining in Business Industry 4M (Dec’19)
12 What are the major mistakes to be avoided when doing Data Mining 8M (Dec’19)

Dept. of ISE, DSATM 2019-20 2


Big Data Analytics (15CS82) Question Bank

Module 4
1. Draw the Decision tree for the given data set. 8M (July’19)

2. What is Decision tree? Why are decision trees the most popular classification 8M NS
technique?
3. What is splitting variable? Describe three criteria for choosing splitting 4M NS
variable.
4. What is pruning? What are pre-pruning and post-pruning techniques? Why 8M (July’19)
choose one over other?
5. What is logistic regression? Describe Advantages and Disadvantages of 8M (Dec’19)
regression models.
6. How you will represent neural network? Explain about the Design principles 8M (Dec’19)
of Artificial Neural Network.
7. List out the steps to build ANN. Brief advantages and disadvantages of using 8M (Dec’19)
ANN
8. How you will represent association rules? Describe Apriori algorithm for 8M (July’19)
association rules.

Dept. of ISE, DSATM 2019-20 3


Big Data Analytics (15CS82) Question Bank

9. Create a regression model to predict the Test2 from Test1 score. Then predict 8M NS
the score for the one who got 46 in Test1
Test1 Test2
59 56
52 63
44 55
51 50
42 66
42 48
41 58
45 36
27 13
63 50
54 81
44 56
50 64
47 50
10 X and Y are the two dimensions on interest. Determine the number of clusters 8M (July’19)
and the center points of those clusters.
X Y
2 4
2 6
5 6
4 7
8 3
6 6
5 2
5 7
6 3
4 4

Dept. of ISE, DSATM 2019-20 4


Big Data Analytics (15CS82) Question Bank

Module 5
1. Brief the Text mining application in different domains 8M NS
2. Compare Text mining with data mining with different dimensions. 6M NS
3. What is Naïve Bayes Model? Explain with classification example. 8M NS
4. What is SVM? Explain about SVM model with kernel method. 8M NS
5 Write notes on three types of web mining. 6M NS
6 What is social network analysis? How it is different from other data mining 8M NS
techniques such as clustering and decision trees?

Dept. of ISE, DSATM 2019-20 5

You might also like