Professional Documents
Culture Documents
J 3025-Data Mining and Warehousing
J 3025-Data Mining and Warehousing
J 3025-Data Mining and Warehousing
* P‘
SeM
• * f.
Remarks on Scrutiny
i•
/
Answer Key
10 X 1 =10 marks
1) Data mining .is the extraction of interesting (non-triyial, implicit, previously
unknown and potentially useful) patterns or knowledge from huge amount of data.
ft
.¥
I
2) Online Analytical Processing Server (OLAP) is based on the multidimensional data
^ model. It allow managers, and analysts to get an insight of the information through
fast, consistent, and interactive access to information.
3) Clustering is the process of partitioning the data (or objects) into the same class
5) A decision tree is a structure that includes a root node, branches, and leaf nodes.
Each internal node denotes a test on an attribute, each branch denotes the outcome of
a test, and each leaf node holds a class label.
6) Prediction is the process of finding some unavailable data values or pending trends
or class label for some data, or Forecast of missing numerical values or
increase/decrease trends in time related data
7) A data cube refers is a three-dimensional (3D) (or higher) range of values that, are
generally used to explain the time sequence of an image’s data.
8) The star schema is the simplest style of data mart schema and the star
schema consists of one or more fact tables referencing any number of dimension
tables.
9) Data object that does not comply with general behaviour of the data are called
outliers. ' '
10) A pattern (a set of items, subsequences, substructures, etc.) that occurs frequently in
a data set
*
. 15) Parametric methods: Assume the data fits, some model, estimate model parameters,
Q store only the parameters, and discard the data (except possible outliers) Example:
Log-linear models. Non-parametric data reduction does not use a model. It
summarizes data with sample statistics or pictures. ' ,
16) Features of data ware house
. • Subject orient
• Integrated
• Time variant(historic perspective)
• Nonvolatile
17) Market Basket Analysis is a modelling technique based upon the theory that if you
buy a certain group of items, you are more (or less) likely to buy another group of
items. . ,
Explanation with an example.
18) Association rule mining is a procedure which aims to observe frequently occurring
patterns, correlations, or associations from datasets found in various kinds of
databases such as relational databases, transactional databases, and other forms of
repositories.
19) The data mining system is linked with a database or a data warehouse system and in
addition to that, efficient implementations of a few data mining primitives including
sorting, indexing, aggregation, histogram analysis, multi-way join and pre-
computation of some essential statistical measures such as
sum,count,fnax,min,standard deviation and so on.
20) The requirements of clustering techniques in data mining are
• Scalability
• Ability to deal with different kinds of attributes
• Discovery of clusters with attribute shape
• High dimensionality
• Ability to deal with noisy data
• Interpretability (any four)
/
21) A concept hierarchy defines a sequence of mappings from a set of low-
level concepts to higher-level, more general concepts. Consider a concept
hierarchy for the dimension location. Explanation with an example diagram
22) The key concept of apriori algorithm:
• Frequent item sets
• Apriori property . -
. • Join operation
I
' 5. Multimedia Databases
6. Spatial Database
7. Time-series Databases
8.WWW
(With explanation of any four)
24)
27) A multidimensional model views data in the form of a data-cube. A data cube
enables data to be modelled and viewed in multiple dimensions. It is defined by
dimensions and facts. Explanation with example diagram
28) Market Basket Analysis is a technique which identifies the strength of association
between pairs of products purchased together and identify patterns of co
occurrence. A co-occurrence is when two or more things take place together.
Explanation with a example
29) The major issue is preparing the data for Classification and Prediction. Preparing the data
involves the following activities -
Data Cleaning, Relevance Analysis, Data Transformation and reduction -
■ Normalization
■ Generalization.
30) Hierarchical clustering involves creating clusters that have a predetermined ordering
from top to bottom.
There are two types of hierarchical Clustering, Divisive and Agglomerative (with
explanation)
I
Bottom tier
1) Relational OLAP(RALOP)
2) Multi-dimensional OLAP(MOLAP)
Top tier: front end client layer which contain query and reporting tools, analysis
tools and/or data mining .Explanation with diagram
35) Bayesian classification is based on Bayes' Theorem. Bayesian classifiers are the
statistical classifiers. Bayesian classifiers can predict class membership probabilities
such as the probability that a given tuple belongs to a particular class.
Naive Bayesian classifier predicts that tuple X belongs to the Ci if and only if
P(X\Ci)P(CQ
P(Ci\X) =
P(.X)
Explanation the working and predict the label using naive classifier with an example
r;
✓
4