Professional Documents
Culture Documents
Data Mining Doubt Clearing Session Questions
Data Mining Doubt Clearing Session Questions
Data Mining Doubt Clearing Session Questions
Tejaswini Bhosale
Unit 1
• What kind of data preprocessing do we need before applying data mining
algorithm to any data set. Explain mining method to handle noisy data with
example.
• In Given dataset identify Noisy Data, missing value, Inconsistency, Outlier
and address these problem with different preprocessing methods
• How is data warehouse different from a database? Explain with Example
• Describe the steps involved in data mining when viewed as a process of
KDD.
• What is data mining? How is data warehouse different from a database?
• Explain the steps involved in handling redundancy in data integration.
• Data mining is a part of KDD", Do you agree or disagree? Justify. Explain the
different stages in KDD
Unit 1
• Explain the data mining techniques.
• Explain about the architecture and implementation of data
warehouse with example.
• Differentiate between Data-Warehouse and Data-mining. Explain the
stages of knowledge discovery in database with example.
• What is KDD? Explain about data mining as a step in the process of
knowledge discovery
Unit 1
• Discuss and list out data mining functionalities, Explain any two with
example?
• What is Data Objects? Explain different Data Attribute & type of data
attribute
• Why Preprocess the data? Explain the different task of data preprocessing?
• Describe any three methods to normalize the group of data
• Explain the architecture and implementation of data warehouse with
example.
• Explain the application of data warehouse and data mining.
•
Unit 2
• Suppose that a data warehouse for Big University consists of the following
four dimensions: student, course, semester, and instructor, and two
measures count and avg-grade. When at the lowest conceptual level (e.g.,
for a given student, course, semester, and instructor combination), the
avg-grade measure stores the actual course grade of the student. At higher
conceptual levels,
avg-grade stores the average grade for the given combination.
a) Draw a snowflake schema diagram for the data warehouse.
b) Starting with the base cuboid [student, course, semester, instructor],
what specific OLAP operations (e.g., roll-up from semester to year) should
one perform in order to list the average grade of CS courses for each Big
University Student.
c) If each dimension has five levels (including all), such as “student < major
< status < university < all”, how many cuboids will this cube contain
(including the base and apex cuboids)?
Unit 2
• Differentiate between data marts and data cubes
• Design a data warehouse multi-tier architecture for a University.
• A data warehouse for Shopping Mall can be perform by either a OLAP and OLTP
operations. Briefly describe the differences of the two, and then analyse their
advantages and disadvantages with regard to one another.
• Suppose that a data warehouse for Company consists of the four Dimensions
(Employs, Product, salary, sale) and two measures (count, Profit)
Draw a Star schema diagram for the data warehouse.
Write DMQL for schema for same.
• Explain OLAP operations with example?
• Differentiate between star schema and snow flake schema. List any two methods
for data normalization.
Unit 2
• Suppose that a data warehouse for Big University consists of the four
dimensions student, course, semester, and instructor, and two measures
count and avg grade. At the lowest conceptual level (e.g., for a given
student, course, semester, and instructor combination), the avg grade
measure stores the actual course grade of the student. At higher
conceptual levels, avg grade stores the average grade for the given
combination.
(a) Draw a snowflake schema diagram for the data warehouse.
(b) Starting with the base cuboid [student,course,semester,instructor], what
specific
• OLAP operations (e.g., roll-up from semester to year) should you perform
in order to list the average grade of CS courses for each Big University
student.
Unit 3
• How concept hierarchy is used in extracting information? Generate
the frequent pattern from the following data set FP growth, where
minimum support = 3.
Unit 3
• What are the significances of association rules in data mining? List the types of
association rules with examples.
• Apriori needs to scan the dataset a lot of time which reduces the efficiency.
Explain some mechanism to improve its efficiency.
• A database has 4 transactions, shown below: