Download as pdf or txt
Download as pdf or txt
You are on page 1of 6

Data Mining Question bank

Chapter-1 (Introduction to Data warehouse and Data Mining)

Expected Questions

1 Mark Questions:

1. What is OLTP?
2. Define Data Cube.
3. What is Meta Data?
4. What is Data Mining?
5. Mention any 2 applications of data mining.
6. What is an attribute?
7. What is supervised learning?
8. Define social media data mining
9. What is data warehouse?
10. What is pattern matching?
11. Expand KDD.

3 Marks Questions:
1. Write a short note on 1) Missing values 2) noisy data 3) duplicate data
2. Write a short note on data mining tools.
3. Discuss the applications of Data Mining.
4. Explain ETL Process.

5 Marks Questions:
1. Differentiate between data warehouse and data mining.
2. What are the characteristics of OLAP?
3. Explain KDD process in detail.
4. Explain some popular OLAP operations for multidimensional data / OLAP Cube.
5. Differentiate between OLAP and OLTP.
6. Define following terminologies i) data II) knowledge III) information V) data mining
Metrics

K. C. Silpa, Asst. Professor, Dept. of B.C.A Page 1


Data Mining Question bank

7. Explain the star schema. What is the main difference between star schema and snowflake
schema?
8. Briefly explain Data Mining Tasks.
9. How to Handle Noisy Data using Data Smoothening Techniques.

7 Marks Questions:
1. Explain architecture of data warehouse with neat diagram.
2. Explain architecture of data mining with a neat diagram.
3. Explain steps involved in data processing in data mining.
4. What is machine learning and explain the categories of machine learning.

Chapter-2 (Classification and Prediction) Expected Questions

1 Mark Questions:

1. State Naïve Baye’s Theorem.


2. What are training data and test data?
3. Mention poplar classification software.
4. Define classification and prediction.

3 Marks Questions:

1. Differences between classification and prediction.


2. Given confusion matrix compute error and accuracy measures.

5 Marks Questions:

1. Mention the different classifiers used to solve the classification problem explain any one.
2. Explain Associative Classification with an example.
5. Explain rule based classifier with an example.
6. Explain briefly back propagation technique.

7Marks Questions:

K. C. Silpa, Asst. Professor, Dept. of B.C.A Page 2


Data Mining Question bank

1. How decision tree classifier works explain with an example either Information gain or
gini index.
2. Problem on Naïve Baye’s Theorem.

Chapter-3 (Cluster Analysis) Expected Questions

1 Mark Questions:

1. What is clustering?
2. Write the formula for Euclidian distance.
3. Mention any 2 cluster analysis software.
4. What is the difference between classification and clustering?
5. What is dendogram?

3 Marks Questions:
1. Explain partition method used in clustering
2. Explain different data types with appropriate examples.
3. Identify advantages and disadvantages of k-means clustering.

5Marks Questions:

1. Explain core, border and outlier points in density based clustering with neat diagram.
2. Differentiate K-means and Density based clustering.
3. Given K={2,3,4,10,11,12,20,25,30} divide data into 2 clusters with the centroids of 4 and
12.
4. Apply K-means clustering for the given dataset {10,1,15,12,4,3,13,4,5,8} with k=2 and
centroids are C1=3 and C2=10.
5. With an algorithm explain divisive hierarchical clustering.
6. Describe the working of DBSCAN algorithm.

7Marks Questions:

1. What is an agglomerative clustering? Cluster the data using agglomerative approach and
represents through dendogram. (problem)

K. C. Silpa, Asst. Professor, Dept. of B.C.A Page 3


Data Mining Question bank

2. Explain any one cluster analysis method in detail.

Chapter- 4 (Web Data Mining) Expected Questions

1 Mark Questions:

1. What is web usage mining?


2. What is web content mining?
3. What is Web Mining?
4. What is the role of Web crawlers?
5. What are damping factor and page rank?
6. What is Search Engine?

5Marks Questions:

1. Explain any 5 web terminologies in web mining.


2. Explain types of Web Mining in detail.
3. Explain different types of web pages.
4. Explain the functionality of search engine.
5. Mention the factors which affects ranking of the webpage? Explain.

7Marks Questions:

1. Explain briefly the architecture of search engine and it’s working.


2. Describe page ranking algorithm with appropriate example. (problem)

Chapter-5 (Big Data) Expected Questions

1 Mark Questions:

1. What is big data and big data mining?


2. Why do we need bid data?
3. Mention any 2 tools used in big data.

5 Marks Questions:

K. C. Silpa, Asst. Professor, Dept. of B.C.A Page 4


Data Mining Question bank

1. Give an example for structured, semi structured and unstructured data / What are the
different types of data in big data explain.
2. What are the applications of big data?
3. Explain 5 characteristics of big data (5V’s)?
4. List the differences between HBase and HDFS.
5. List the advantages and disadvantages of NOSQL database.
7 Marks Questions:
1. What is Map reduce? Explain Map reduce framework with a neat labeled sketch.
2. Define NOSQL. Explain Types of data in NOSQL.

K. C. Silpa, Asst. Professor, Dept. of B.C.A Page 5


Data Mining Question bank

The main characteristics of OLAP

1. Multidimensional conceptual view: OLAP systems let business users have a dimensional
and logical view of the data in the data warehouse. It helps in carrying slice and dice
operations.
2. Multi-User Support: Since the OLAP techniques are shared, the OLAP operation should
provide normal database operations, containing retrieval, update, adequacy control,
integrity, and security.
3. Accessibility: OLAP acts as a mediator between data warehouses and front-end. The OLAP
operations should be sitting between data sources (e.g., data warehouses) and an OLAP
front-end.
4. Storing OLAP results: OLAP results are kept separate from data sources.
5. Uniform documenting performance: Increasing the number of dimensions or database size
should not significantly degrade the reporting performance of the OLAP system.
6. OLAP provides for distinguishing between zero values and missing values so that aggregates
are computed correctly.
7. OLAP system should ignore all missing values and compute correct aggregate values.
8. OLAP facilitate interactive query and complex analysis for the users.
9. OLAP allows users to drill down for greater details or roll up for aggregations of metrics
along a single business dimension or across multiple dimensions.

10. OLAP provides the ability to perform intricate calculations and comparisons.
11. OLAP presents results in a number of meaningful ways, including charts and graphs.

K. C. Silpa, Asst. Professor, Dept. of B.C.A Page 6

You might also like