Professional Documents
Culture Documents
DM Lect4
DM Lect4
Data Warehousing
Lec 4
Relational Database Theory
• Relational database modeling process –
normalization, relations or tables are
progressively decomposed into smaller relations
to a point where all attributes in a relation are very
tightly coupled with the primary key of the relation.
Relational Database Theory
• The process of normalization generally breaks a
table into many independent tables.
Different interfaces
Different data representations
Duplicate and inconsistent information
The Need for Data Warehouses
• Two major factors drive the need for data
warehousing in most organizations today:
• Business requires an integrated company-wide view of
high-quality information.
• The IS department must separate informational from
operational systems in order to dramatically improve
performance in managing company data.
Need for a Company Wide View
• Data in operational systems typically fragmented
and of poor quality.
• Generally distributed on a variety of incompatible
HW and SW platforms:
• Unix running oracle DBMS
• IBM MVS running the DB2 DBMS
• Often necessary to provide a single, corporate
view of that information for decision making.
Need to Separate Operational and
Informational Systems
• Operational system used to run a business in real
time based on current data.
• E.g. sales order processing, reservation systems, patient
registration,
• Process large volumes of relatively simple read/write
transactions, while providing fast response.
• Information systems designed to support decision
making based on historical data.
• Designed for complex and read-only queries or data
mining application.
• Sales trend analysis, customer segmentation, and human
resource planning.
Goal: Unif ied Access to Data
• A data warehouse
is based on a multidimensional data model which views data in the
form of a data cube
The Data Warehouse
• Key characteristics
• Subject-oriented
• Integrated
• Time-variant
• Nonvolatile
Subject Oriented
• Data is stored by business subject rather than by
application