Data Mining Doubt Clearing Session Questions

You might also like

Download as pdf or txt
Download as pdf or txt
You are on page 1of 12

Dough Clearing Session

Tejaswini Bhosale
Unit 1
• What kind of data preprocessing do we need before applying data mining
algorithm to any data set. Explain mining method to handle noisy data with
example.
• In Given dataset identify Noisy Data, missing value, Inconsistency, Outlier
and address these problem with different preprocessing methods
• How is data warehouse different from a database? Explain with Example
• Describe the steps involved in data mining when viewed as a process of
KDD.
• What is data mining? How is data warehouse different from a database?
• Explain the steps involved in handling redundancy in data integration.
• Data mining is a part of KDD", Do you agree or disagree? Justify. Explain the
different stages in KDD
Unit 1
• Explain the data mining techniques.
• Explain about the architecture and implementation of data
warehouse with example.
• Differentiate between Data-Warehouse and Data-mining. Explain the
stages of knowledge discovery in database with example.
• What is KDD? Explain about data mining as a step in the process of
knowledge discovery
Unit 1
• Discuss and list out data mining functionalities, Explain any two with
example?
• What is Data Objects? Explain different Data Attribute & type of data
attribute
• Why Preprocess the data? Explain the different task of data preprocessing?
• Describe any three methods to normalize the group of data
• Explain the architecture and implementation of data warehouse with
example.
• Explain the application of data warehouse and data mining.

Unit 2
• Suppose that a data warehouse for Big University consists of the following
four dimensions: student, course, semester, and instructor, and two
measures count and avg-grade. When at the lowest conceptual level (e.g.,
for a given student, course, semester, and instructor combination), the
avg-grade measure stores the actual course grade of the student. At higher
conceptual levels,
avg-grade stores the average grade for the given combination.
a) Draw a snowflake schema diagram for the data warehouse.
b) Starting with the base cuboid [student, course, semester, instructor],
what specific OLAP operations (e.g., roll-up from semester to year) should
one perform in order to list the average grade of CS courses for each Big
University Student.
c) If each dimension has five levels (including all), such as “student < major
< status < university < all”, how many cuboids will this cube contain
(including the base and apex cuboids)?
Unit 2
• Differentiate between data marts and data cubes
• Design a data warehouse multi-tier architecture for a University.
• A data warehouse for Shopping Mall can be perform by either a OLAP and OLTP
operations. Briefly describe the differences of the two, and then analyse their
advantages and disadvantages with regard to one another.
• Suppose that a data warehouse for Company consists of the four Dimensions
(Employs, Product, salary, sale) and two measures (count, Profit)
Draw a Star schema diagram for the data warehouse.
Write DMQL for schema for same.
• Explain OLAP operations with example?
• Differentiate between star schema and snow flake schema. List any two methods
for data normalization.
Unit 2
• Suppose that a data warehouse for Big University consists of the four
dimensions student, course, semester, and instructor, and two measures
count and avg grade. At the lowest conceptual level (e.g., for a given
student, course, semester, and instructor combination), the avg grade
measure stores the actual course grade of the student. At higher
conceptual levels, avg grade stores the average grade for the given
combination.
(a) Draw a snowflake schema diagram for the data warehouse.
(b) Starting with the base cuboid [student,course,semester,instructor], what
specific
• OLAP operations (e.g., roll-up from semester to year) should you perform
in order to list the average grade of CS courses for each Big University
student.
Unit 3
• How concept hierarchy is used in extracting information? Generate
the frequent pattern from the following data set FP growth, where
minimum support = 3.
Unit 3
• What are the significances of association rules in data mining? List the types of
association rules with examples.
• Apriori needs to scan the dataset a lot of time which reduces the efficiency.
Explain some mechanism to improve its efficiency.
• A database has 4 transactions, shown below:

Assuming a minimum level of support min_sup = 60% and a minimum level of


confidence min_conf = 80%,Find all frequent itemsets using the Apriori algorithm.
Unit 4
• Refer the TA2 paper
Unit 5
• Define clustering. Explain with example of the partitioning and hierarchical clustering
methods.
• Write the algorithm for K-means clustering. Compare it with k-nearest neighbor
algorithm.
• Apply K(=2)- Means algorithm over the data (185, 72), (170, 56), (168, 60), (179, 68), (182,
72), (188, 77) up to two iterations and show the clusters. Initially choose first two objects
as initial centroids.
• Which one approach is better, hierarchical or partitioning for clustering? Justify. List some
drawbacks of k-means.
• Explain the K-Mediod Algorithm.
• Explain why K-means is sensitive to outlier and how does K-Medoid minimize this issue.
• What is the purpose of cluster analysis in data mining? Explain.
• Illustrate the strength and weakness of k-mean in comparison with k-medoids algorithm.
Unit 5
• Explain the K-mean and K-Mediod Algorithm with example.
• What do you mean by clustering? Explain the K-Mean and K-Mediod
algorithm with example.
• What is the objective of K-means algorithm?
• Write Agglomerative hierarchical clustering algorithm? explain single
link and complete link techniques.
• Differentiate classification and clustering.
• Explain different data types used in clustering

You might also like