Professional Documents
Culture Documents
Data Warehouse
Data Warehouse
Warehouse
Agenda
What is Data Warehouse
Transaction System vs Data Warehouse
Data Warehouse Architecture
Metadata
Data Flows
Issues for building Data Warehouse
Warehouse Schema
Tool & Technologies
Advantages of Data Warehouse
Problems
Data Mart
Data Mining
Data Warehouse
What is Data Warehouse?
Collection of integrated, subject-oriented, time-variant
and non-volatile data in support of managements
decision making process.
Data Warehouse
Transaction System vs. Data
Warehouse
♦ Transaction System ♦ Data Warehouse
Supports day-to-day operational Supports management analysis and
processes decision-making processes
Contains raw, detailed data that has not Contains summarized, refined, and
been refined or cleansed cleansed information
Volatile -- data changes from day-to-day, Non-volatile -- provides a data
with frequent updates “snapshot”; adjustments are not
Technical issues drive the data permitted, or are limited
structure and system design Business analysis requirements drive the
Disparate data structures, physical data structure and system design
locations, query types, etc. Integrated, consistent information on a
Users rely on technical analysts for single technology platform
reporting needs Users have direct, fast access via On-line
Operational processes impacted by Analytical Processing tools
queries run off of system Minimal impact on operational processes
Data Warehouse
Data Warehouse
Architecture
Reporting,
ODS 1 Query query,
application
Meta-data Lightly
High Manager development,
Summarized
summarized and EIS tools
Load data
data
ODS 2
Manager
Detailed data DBMS OLAP tools
ODS 3
Warehouse Manager
Operational data
store (ODS) Data mining
Data Warehouse
Operational datastore(ODS)
It is a repository of current and integrated operational data
used for analysis.
Data Warehouse
End-user access toolscan be categorized into five main groups:
data reporting and query tools, application development tools,
executive information system (EIS) tools, online analytical
processing (OLAP) tools, and data mining tools
Meta-data
Data Warehouse
Importance of Meta Data
Meta-data : data about data
Purpose of meta-data is to show the pathway back to where the
data began, so that the warehouse administrators know the history
of any item in the warehouse
The meta-data associated with data transformation and loading
must describe the source data and any changes that were made to
the data
The meta-data associated with data management describes the
data as it is stored in the warehouse
The meta-data is required by the query manager to generate
appropriate queries, also is associated with the user of queries
Data Warehouse
Data flows
Data Warehouse
Reporting, query,application
development, and EIS (executive
information system) tools
Operational
Warehouse Manager
data source1
Meta-flow
Meta-data High
summarized data
Inflow Outflow
Lightly
Load summarized
Manager data Query Manager OLAP (online
Upflow analytical processing)
Operational tools
data source n Detailed data DBMS
Operational
data store (ods)
Warehouse Manager
Downflow
Data Warehouse
Issues to be addressed in
Building Data Warehouse
When and how to gather Data?
What schema to use?
Data Cleansing
How to propagate updates?
What data to summarize?
Data Warehouse
Warehouse Schema
Fact Table:
Stores the business data. Data in fact table is
called Fact. They contain multidimensional data.
Dimension Table:
To minimize storage requirements, dimension
attributes are usually short identifiers that are
foreign keys into other tables called Dimension
Table
Data Warehouse
Schema with Fact & Dimension
Table
Name of the PRODUCT Area 1
Product
Product Area 2
AREA
Number
Description
Of Product DURATION Area 3
Year
Beginning
Date
Completion
Date
Data Warehouse
Star Schema
Fact table in the center and all the dimension tables
attached to the central fact table.
Example: Sales Processing
Dimension
Table:
PRODUCT
Dimension Dimension
Table:
Fact Table Table:
AREA TIME
SALES
Dimension
Table:
CUSTOMER
Data Warehouse
Dimension Tables
Region_Dimension_Table
NE Northeast
Product_Dimension_Table NW Northwest account _id _id account
account account _doc
_doc
SE Southeast
prod_grp_id prod_id prod_grp_desc prod_desc SW Southwest 100000
100000 ABC
ABCElectronics
Electronics
110000
110000 Midway
Midway Electric
Electric
10 100 Fewer devices Power supply 120000
120000 Victor Components
Victor Components
20 140 Circuit boards Motherboard 130000
130000 Washburn, Inc. Inc.
Washburn,
30 220 Components Co-processor 140000
140000 Zerox
Zerox
Account_Dimension_Table
month
month prod_id
prod_id region_id
region_id account_id
account_id vend_id
vend_id net-sales
net-sales gross_sales
gross_sales
01-1996
01-1996 100
100 SW
SW 100000
100000 100
100 30,000
30,000 50,000
50,000
02-1996
02-1996 140
140 NE
NE 110000
110000 200
200 23,000
23,000 42,000
42,000
03-1996
03-1996 220
220 SW
SW 100000
100000 300
300 32,000
32,000 49,000
49,000
Fact Table
Monthly_Sales_Summary_Table
Vendor_Dimension_Table
month
month mo_in_fiscal_yr
mo_in_fiscal_yr month_name
month_name
vend_id
vend_id vendor_desc
vendor_desc
01-1996
01-1996 4
4 January
January
02-1996
02-1996 5
5 February
February 100 PowerAge, Inc. Inc.
100 PowerAge,
03-1996
03-1996 6
6 March
March 200 Advanced MicroMicro
DevicesDevices
200 Advanced
300
300 Farad Incorporated
Farad Incorporated
Time_Dimension_Table
Data Warehouse
Snowflake Schema
Consists of Fact Table and Normalized
Dimensional Table.
Disadvantage:
Unmanageable Data
Difficult to Retrieve Data
Metadata become Complex
Data Warehouse
Snowflake Schema
Product Category Product
Manufacturer
Dimension
Table
PRODUCT
Dimension Dimension
Table
Fact Table Table
AREA TIME
SALES
Dimension
Table
CUSTOMER
Data Warehouse
Starflake Schema
Combination of Star Schema and Snowflake
Schema.
Consists of Fact table, Star Dimension and
Snowflake Dimension.
Data Warehouse
Starflak
e Price Weight
Schema Product
Snowflake
Dimension
Location
Location 1 Location 2
Data Warehouse
Tools and Technologies
Data Warehouse
Advantages of using data
warehouse
End-user access wide variety of data
Business decision making for future purpose
Increases data consistency
Increases productivity
Decreases computing costs
Combines data
Data Warehouse
Problems
Increased end-user demands
High demand for resources
High maintenance
Extracting, cleansing and loading data could be time
consuming.
Data warehousing increases project scope.
Problems with compatibility with systems already in place
e.g. transaction processing system.
Providing training to end-users, who end up not using the
data warehouse.
Security could develop into a serious issue, especially if
the data warehouse is web accessible.
Data Warehouse
Data mart
It a subset of a data warehouse that supports
the requirements of particular department or
business function
The characteristics that differentiate Data Marts
and Data Warehouses include:
A Data mart focuses on only the requirements of users
associated with one department or business function
Data marts do not normally contain detailed
operational data, unlike data warehouses
As data marts contain less data compared with data
warehouses, data marts are more easily understood
and navigated
Data Warehouse
Operational Warehouse Manager
data source1
Reporting, query,application
Highly
Meta-data summarized data
development, and EIS tools
ODS 1 Lightly
Load Query
summarized
Manager Manager
data
Archive/backup End-user
data access tools
summarized
Data Mart Data
(Relational database)
(Second Tier)
Summarized data
(Multi-dimension
database)
Data Warehouse
Reasons for creating a
Data
Mart
To give users access to the data they need to analyze
most often
To provide data in a form that matches the collective
view of the data by a group of users in a department or
business function
To improve end-user response time due to the reduction
in the volume of data to be accessed
To provide appropriately structured data the user as it is
the requirements of end-user access tools
Normally use less data so tasks such as data cleansing,
loading, transformation, and integration are far easier,
and hence implementing and setting up a data mart is
simpler than establishing a corporate data warehouse
Data Warehouse
Data Mining
Process of extracting previously unknown, valid and actionable
information from large data and then using the information to make
crucial business decisions.
Data Analysis
Forecasting and business modeling
Data Warehouse