Professional Documents
Culture Documents
Data Mining Question Bank Chapter-1 (Introduction To Data Warehouse and Data Mining) Expected Questions 1 Mark Questions
Data Mining Question Bank Chapter-1 (Introduction To Data Warehouse and Data Mining) Expected Questions 1 Mark Questions
Expected Questions
1 Mark Questions:
1. What is OLTP?
2. Define Data Cube.
3. What is Meta Data?
4. What is Data Mining?
5. Mention any 2 applications of data mining.
6. What is an attribute?
7. What is supervised learning?
8. Define social media data mining
9. What is data warehouse?
10. What is pattern matching?
11. Expand KDD.
3 Marks Questions:
1. Write a short note on 1) Missing values 2) noisy data 3) duplicate data
2. Write a short note on data mining tools.
3. Discuss the applications of Data Mining.
4. Explain ETL Process.
5 Marks Questions:
1. Differentiate between data warehouse and data mining.
2. What are the characteristics of OLAP?
3. Explain KDD process in detail.
4. Explain some popular OLAP operations for multidimensional data / OLAP Cube.
5. Differentiate between OLAP and OLTP.
6. Define following terminologies i) data II) knowledge III) information V) data mining
Metrics
7. Explain the star schema. What is the main difference between star schema and snowflake
schema?
8. Briefly explain Data Mining Tasks.
9. How to Handle Noisy Data using Data Smoothening Techniques.
7 Marks Questions:
1. Explain architecture of data warehouse with neat diagram.
2. Explain architecture of data mining with a neat diagram.
3. Explain steps involved in data processing in data mining.
4. What is machine learning and explain the categories of machine learning.
1 Mark Questions:
3 Marks Questions:
5 Marks Questions:
1. Mention the different classifiers used to solve the classification problem explain any one.
2. Explain Associative Classification with an example.
5. Explain rule based classifier with an example.
6. Explain briefly back propagation technique.
7Marks Questions:
1. How decision tree classifier works explain with an example either Information gain or
gini index.
2. Problem on Naïve Baye’s Theorem.
1 Mark Questions:
1. What is clustering?
2. Write the formula for Euclidian distance.
3. Mention any 2 cluster analysis software.
4. What is the difference between classification and clustering?
5. What is dendogram?
3 Marks Questions:
1. Explain partition method used in clustering
2. Explain different data types with appropriate examples.
3. Identify advantages and disadvantages of k-means clustering.
5Marks Questions:
1. Explain core, border and outlier points in density based clustering with neat diagram.
2. Differentiate K-means and Density based clustering.
3. Given K={2,3,4,10,11,12,20,25,30} divide data into 2 clusters with the centroids of 4 and
12.
4. Apply K-means clustering for the given dataset {10,1,15,12,4,3,13,4,5,8} with k=2 and
centroids are C1=3 and C2=10.
5. With an algorithm explain divisive hierarchical clustering.
6. Describe the working of DBSCAN algorithm.
7Marks Questions:
1. What is an agglomerative clustering? Cluster the data using agglomerative approach and
represents through dendogram. (problem)
1 Mark Questions:
5Marks Questions:
7Marks Questions:
1 Mark Questions:
5 Marks Questions:
1. Give an example for structured, semi structured and unstructured data / What are the
different types of data in big data explain.
2. What are the applications of big data?
3. Explain 5 characteristics of big data (5V’s)?
4. List the differences between HBase and HDFS.
5. List the advantages and disadvantages of NOSQL database.
7 Marks Questions:
1. What is Map reduce? Explain Map reduce framework with a neat labeled sketch.
2. Define NOSQL. Explain Types of data in NOSQL.
1. Multidimensional conceptual view: OLAP systems let business users have a dimensional
and logical view of the data in the data warehouse. It helps in carrying slice and dice
operations.
2. Multi-User Support: Since the OLAP techniques are shared, the OLAP operation should
provide normal database operations, containing retrieval, update, adequacy control,
integrity, and security.
3. Accessibility: OLAP acts as a mediator between data warehouses and front-end. The OLAP
operations should be sitting between data sources (e.g., data warehouses) and an OLAP
front-end.
4. Storing OLAP results: OLAP results are kept separate from data sources.
5. Uniform documenting performance: Increasing the number of dimensions or database size
should not significantly degrade the reporting performance of the OLAP system.
6. OLAP provides for distinguishing between zero values and missing values so that aggregates
are computed correctly.
7. OLAP system should ignore all missing values and compute correct aggregate values.
8. OLAP facilitate interactive query and complex analysis for the users.
9. OLAP allows users to drill down for greater details or roll up for aggregations of metrics
along a single business dimension or across multiple dimensions.
10. OLAP provides the ability to perform intricate calculations and comparisons.
11. OLAP presents results in a number of meaningful ways, including charts and graphs.