J 3025-Data Mining and Warehousing

You might also like

Download as pdf or txt
Download as pdf or txt
You are on page 1of 12

r*

* P‘
SeM
• * f.

For Office'use only

No.of Question Paper Code


Copies 3o5s^
i

Remarks on Scrutiny

i•

No.of Pages Page No.

FOURTH SEMESTERgCA DEGREE EXAMINATION^ /


CAREER RELATED UNDER FDP UNDER CBCS^.

Group 2(b)-COMPUTER APLICATIONS


/
Core Course-CP 1444
/
DATA MINING AND WAREHOUSING
/
(2018 Admission)
Time: 3 Hours / Total Marks: 80 i♦

/
Answer Key

SECTION A [Very Short Answer type]


(One word to maximum of one sentence. Answer ALL questions)

10 X 1 =10 marks
1) Data mining .is the extraction of interesting (non-triyial, implicit, previously
unknown and potentially useful) patterns or knowledge from huge amount of data.
ft

I
2) Online Analytical Processing Server (OLAP) is based on the multidimensional data
^ model. It allow managers, and analysts to get an insight of the information through
fast, consistent, and interactive access to information.
3) Clustering is the process of partitioning the data (or objects) into the same class

4) Classification is the process of finding a model(fimction)that describes and


distinguishes data classes or concepts.

5) A decision tree is a structure that includes a root node, branches, and leaf nodes.
Each internal node denotes a test on an attribute, each branch denotes the outcome of
a test, and each leaf node holds a class label.
6) Prediction is the process of finding some unavailable data values or pending trends
or class label for some data, or Forecast of missing numerical values or
increase/decrease trends in time related data
7) A data cube refers is a three-dimensional (3D) (or higher) range of values that, are
generally used to explain the time sequence of an image’s data.
8) The star schema is the simplest style of data mart schema and the star
schema consists of one or more fact tables referencing any number of dimension
tables.
9) Data object that does not comply with general behaviour of the data are called
outliers. ' '
10) A pattern (a set of items, subsequences, substructures, etc.) that occurs frequently in
a data set

SECTION B [Short Answer type]


(Not to exceed one paragraph, answer any EIGHT questions each question carries TWO marks)

8X2 =16 marks


11) Data mining functionalities:
• Characterization and Discrimination,
• The mining of frequent patterns, associations and correlations.
• Classification and regression
• Cluster analysis
• Outlier analysis (any four) '

12) Applications of data mining: Market Basket Analysis, Manufacturing Engineering,


Fraud Detection, Intrusion Detection, Customer Segmentation and Financial
Banking. (Listing any four applications)
13) There are four types of schemas are available in the data warehouse.
• Star Schema.
• Snow Flake Schema.
• Galaxy Schema.
• Fact Constellation Schema.
14) Web mining is the application of data mining technique to extract knowledge from
web data such as web content, web structure, and web usage data.
Text mining is to exploit information contained in textual documents in various ways.
t
*
;>

*
. 15) Parametric methods: Assume the data fits, some model, estimate model parameters,
Q store only the parameters, and discard the data (except possible outliers) Example:
Log-linear models. Non-parametric data reduction does not use a model. It
summarizes data with sample statistics or pictures. ' ,
16) Features of data ware house
. • Subject orient
• Integrated
• Time variant(historic perspective)
• Nonvolatile
17) Market Basket Analysis is a modelling technique based upon the theory that if you
buy a certain group of items, you are more (or less) likely to buy another group of
items. . ,
Explanation with an example.
18) Association rule mining is a procedure which aims to observe frequently occurring
patterns, correlations, or associations from datasets found in various kinds of
databases such as relational databases, transactional databases, and other forms of
repositories.
19) The data mining system is linked with a database or a data warehouse system and in
addition to that, efficient implementations of a few data mining primitives including
sorting, indexing, aggregation, histogram analysis, multi-way join and pre-
computation of some essential statistical measures such as
sum,count,fnax,min,standard deviation and so on.
20) The requirements of clustering techniques in data mining are
• Scalability
• Ability to deal with different kinds of attributes
• Discovery of clusters with attribute shape
• High dimensionality
• Ability to deal with noisy data
• Interpretability (any four)
/
21) A concept hierarchy defines a sequence of mappings from a set of low-
level concepts to higher-level, more general concepts. Consider a concept
hierarchy for the dimension location. Explanation with an example diagram
22) The key concept of apriori algorithm:
• Frequent item sets
• Apriori property . -
. • Join operation

SECTION C [Short Essay]


(Not to exceed 120 words, answer any SIX questions each question carries FOUR marks)

23) Different Sources of Data that can be mined


1. Files
2. Relational Databases
3. DataWarehouse
4. Transactional Databases,
I

I
' 5. Multimedia Databases
6. Spatial Database
7. Time-series Databases
8.WWW
(With explanation of any four)
24)

! Sl.No. ! Data Warehouse (OLAP) Operational Database(OLTP)

1 It involves.historical processing j It involves day-to-day processing,


of information.

2 OLAP systems are used by ; OLTP systems are used by clerks,


knowledge workers such as ; DBAs, or database professionals.
executives, managers, and
analysts.
i-------

3 i It is used to analyze the ; It is used to run the business.


> business.
r~
: 4 ^ It focuses on Information out. It focuses on Data in.

5 ! It is based on Star Schema, • It is based on Entity Relationship


* | Snowflake Schema, and Fact : Model.
| Constellation Schema.

6 It focuses on Information out. It is application oriented.

! 7 | It contains historical data. ; It contains current data.

25) Major task in data pre-processing are


• Data cleaning
• Data Cleaning

• Data transformation
• Data reduction
26)

S.NO DATA WAREHOUSE DATA MART

Data warehouse is a Centralised


system. While it is a decentralised system.

In data warehouse, lightly . While in Data mart, highly


! 2. denormalization takes place. denormalization takes place.

Data warehouse is top-down


3 model. - While it is a bottom-up model.
;
. 4 To built a warehouse is difficult. While to build a mart is easy.

In data warehouse, Fact While in this, Star schema and


5. constellation schema is used. snowflake schema are used.
6. Data Warehouse is flexible. While it is not flexible.

Data Warehouse is the data- While it is the project-oriented in


i 7. oriented in nature. nature.

While data-mart is short life than


Data Ware house is long life. . warehouse.

(Any four comparison) •

27) A multidimensional model views data in the form of a data-cube. A data cube
enables data to be modelled and viewed in multiple dimensions. It is defined by
dimensions and facts. Explanation with example diagram
28) Market Basket Analysis is a technique which identifies the strength of association
between pairs of products purchased together and identify patterns of co­
occurrence. A co-occurrence is when two or more things take place together.
Explanation with a example
29) The major issue is preparing the data for Classification and Prediction. Preparing the data
involves the following activities -
Data Cleaning, Relevance Analysis, Data Transformation and reduction -
■ Normalization
■ Generalization.
30) Hierarchical clustering involves creating clusters that have a predetermined ordering
from top to bottom.
There are two types of hierarchical Clustering, Divisive and Agglomerative (with
explanation)

31) Data mining techniques


■ Classification,
■ Clustering:
■ Regression:
■ Association Rules:
■ Outer detection:
■ Sequential Patterns
■ Prediction

Section D [Long Essay]


[Answer any TWO questions each question carries 15 marks] 2X 15 =30 marks
n
32) Steps in KDD process
o Data cleaning
o Data integration
o Data selection
o Data transformation
o Data mining
o Pattern Evaluation
o Knowledge presentation. Explanation with diagram

33) Three tier architecture:


I

I
Bottom tier

• Warehouse database server ' - *


• Backend tools (futilities are used to feed data into bottom layer from other
databases
• contains metadata repository 1
• Data are extracted using application program interfaces known as gateways.
Eg.OBDC,OLEDB,JDBC

Middle tier: OLAP server, Implemented using

1) Relational OLAP(RALOP)

2) Multi-dimensional OLAP(MOLAP)

Top tier: front end client layer which contain query and reporting tools, analysis
tools and/or data mining .Explanation with diagram

34) The OLAP operations in multidimensional data model

^ ■ Roll up (drill-up): summarize data


■ Drill down (roll down): reverse of roll-up
■ Slice and dice
■ Pivot explanation with example

35) Bayesian classification is based on Bayes' Theorem. Bayesian classifiers are the
statistical classifiers. Bayesian classifiers can predict class membership probabilities
such as the probability that a given tuple belongs to a particular class.

Naive Bayesian classifier predicts that tuple X belongs to the Ci if and only if

P(Ci|X)>P(Cj|X) forl<j <m,jt i

Classification is to derive the maximum posteriori, i.e., the maximal P(Ci|X)

P(X\Ci)P(CQ
P(Ci\X) =
P(.X)

Estimate the probabilities of P(xl|Ci), P(x2|Ci), P(x3|Ci),.... P(xn|Ci) from training


tuples. And evaluate
P(Ci\X) = P(xl|Ci)X P(x2|Ci)/XP(x3|Ci)X ....X P(xn|Ci)

Explanation the working and predict the label using naive classifier with an example
r;

4

You might also like