Professional Documents
Culture Documents
Lecture#5: Components of Data Warehouse
Lecture#5: Components of Data Warehouse
Network Based
Application Record
Record
Hierarchical
Application Data
Object Warehouse
Object Oriented
Application
Text
File Based Some standard
Application data format is
required
Subject Oriented data from
multiple applications
Bank Online
ATM Card
Deposit/Withdraw Deposit/Withdraw
Processing system
system system
Credit Card
Processing system
Bank
Account of
Subject Oriented Data
Customer
Coming from multiple
applications
Extraction
Data coming from different sources will be in different
formats.
Data from sources may be in:
Relational data model.
Network/Hierarchical data model.
In flat files.
Tools for data extraction:
Third party tools
In house tools
Extraction
For data extraction, data warehouse development team
may establish an environment in which data is
extracted from data source to a common data
repository.
This data repository may be based on:
Flat file.
Relational database system.
Combination of both.
Common data repository
Relational Relation/Table
Application
Network Based
Application Record Common Data
Repository
Hierarchical Record (Relational/File
Application Based/Both)
Object
Object Oriented
Application
Text
File Based
Application Data
Warehouse
Transformation
Clean the data extracted from each source.
Correction of spellings.
Resolution of conflicts between postal codes, NIC
numbers, etc.
Providing default values for missing data elements.
Standardization of data elements.
Standardize data types and lengths for same elements
extracted from different data sources.
Semantic Standardization: resolving synonyms and
homonyms.
Transformation
Resolving synonyms:
Two or more terms from different sources mean same
thing.
For example: reluctance in source S1 and unwillingness
in S2 (both the terms mean lack of enthusiasm)
Resolving homonyms:
A single term means many different things in different
source systems.
For example: bank in source S1 used as river bank and in
S2 used as a place where people mange their money.
Transformation
Combining pieces of data from multiple sources.
Data may be combined from multiple sources.
Purging of data: removal of data which is not useful for
data warehouse.
Sorting and merging of data is also performed.
Primary keys in operational databases are fields with
some built-in meanings.
For example: Product Key may contains: product
number, product category, number of store in which
product is stored.
Transformation
Data warehouses use keys which does not have built-in meanings.
Surrogate Keys: Generated by system for each row of table.
Oracle provides sequences.
SQL Server provides identity() function for generation of these keys.
Data warehouse may not store data at detailed level.
Example:
A grocery store may keep unit sales and revenue information for each
transaction performed in a day for an operational database.
For a data warehouse it will be suitable to calculate summary totals of
sales for a day and no. of products sold for a day from operational
database.
This summarization is required to be performed by data
transformation phase.