Download as pptx, pdf, or txt
Download as pptx, pdf, or txt
You are on page 1of 15

Data Mining and Data

Warehousing
18CS641
Module 1

 Data Warehousing & modelling


Basic Concepts: Data Warehousing: A multitier Architecture, Data
warehouse models: Enterprise warehouse, Data mart and virtual
warehouse, Extraction, Transformation and loading, Data Cube: A
multidimensional data model, Stars, Snowflakes and Fact
constellations: Schemas for multidimensional Data models,
Dimensions: The role of concept Hierarchies, Measures: Their
Categorization and computation, Typical OLAP Operations
Module 2

 Data warehouse implementation & Data mining:


Efficient Data Cube computation: An overview, Indexing OLAP
Data: Bitmap index and join index, Efficient processing of OLAP
Queries, OLAP server Architecture ROLAP versus MOLAP Versus
HOLAP. : Introduction: What is data mining, Challenges, Data
Mining Tasks, Data: Types of Data, Data Quality, Data
Preprocessing, Measures of Similarity and Dissimilarity.
Module 3

 Association Analysis
Association Analysis: Problem Definition, Frequent Item set
Generation, Rule generation. Alternative Methods for Generating
Frequent Item sets, FPGrowth Algorithm, Evaluation of Association
Patterns.
Module 4

 Classification
Decision Trees Induction, Method for Comparing Classifiers, Rule
Based Classifiers, Nearest Neighbor Classifiers, Bayesian
Classifiers.
Module 5

 Clustering Analysis
Overview, K-Means, Agglomerative Hierarchical Clustering,
DBSCAN, Cluster Evaluation, Density-Based Clustering, Graph-
Based Clustering, Scalable Clustering Algorithms.
Couse Outcomes

 The student will be able to


• Identify data mining problems and implement the data warehouse
• Write association rules for a given data pattern.
• Choose between classification and clustering solution
Textbooks

1. Pang-Ning Tan, Michael Steinbach, Vipin Kumar: Introduction to Data Mining,


Pearson, First impression,2014.
2. Jiawei Han, Micheline Kamber, Jian Pei: Data Mining -Concepts and
Techniques, 3rd Edition, Morgan Kaufmann Publisher, 2012.
What is a Data Warehouse?

Definition: “Data Warehouse is a subject-oriented , integrated, time variant and


non-volatile collection of data in support of management’s decision making
process” -William H. Inmon
 It provides architectures and tools for business executives to systematically
organize, understand, and use their data to make strategic decisions.
 It is data repository that is maintained separately from an organization’s
operational databases.
 It allows for integration of a variety of application systems.
 It supports information processing by providing a solid platform of
consolidated historic data for analysis
Key features of Data Warehouse are

 Subject oriented
 Integrated
 Time -variant
 Non- volatile
Subject oriented

 Data warehouse is organized around major subjects such


as customer, supplier, product, and sales rather than day
to day transaction
 It only focusses on modelling and analysis of data for
decision makers (subjects)
 It provides a concise view of particular subject issue which
is useful for decision making process
Integrated

 A data warehouse usually constructed by integrating


multiple heterogeneous such as relational data base, flat
file and online transaction records.
 Data cleaning and data integration techniques are applied
to ensure consistency in naming conventions, encoding
structures, attribute measures
Time – Variant

 Data are stored to provide information from an historic


perspective (e.g., the past 5–10 years).
 Every key structure in the data warehouse contains, either
implicitly or explicitly, a time element
Non-volatile

 A data warehouse is always a physically separate store of


data transformed from the application data found in the
operational environment.
 Due to this separation, a data warehouse does not require
transaction processing, recovery, and concurrency control
mechanisms.
 It usually requires only two operations in data accessing:
initial loading of data and access of data.

You might also like