Professional Documents
Culture Documents
2 DW
2 DW
2 DW
all
0-D(apex) cuboid
time,location,supplier
time,item,location 3-D cuboids
time,item,supplier item,location,supplier
4-D(base) cuboid
time, item, location, supplier
January 7, 2021 Data Mining: Concepts and Techniques 14
Conceptual Modeling of
Data Warehouses
Modeling data warehouses: dimensions & measures
Star schema: A fact table in the middle connected to a
set of dimension tables
Snowflake schema: A refinement of star schema
where some dimensional hierarchy is normalized into a
set of smaller dimension tables, forming a shape
similar to snowflake
Fact constellations: Multiple fact tables share
dimension tables, viewed as a collection of stars,
therefore called galaxy schema or fact constellation
January 7, 2021 Data Mining: Concepts and Techniques 15
Example of Star Schema
time
time_key item
day item_key
day_of_the_week Sales Fact Table item_name
month brand
quarter time_key type
year supplier_type
item_key
branch_key
branch location
location_key
branch_key location_key
branch_name units_sold street
branch_type city
dollars_sold province_or_street
country
avg_sales
Measures
branch_key
branch location
location_key
location_key
branch_key
units_sold street
branch_name
city_key city
branch_type
dollars_sold
city_key
avg_sales city
province_or_street
Measures country
<dimension_name_first_time> in cube
<cube_name_first_time>
all all
Specification of hierarchies
Schema hierarchy
day < {month <
quarter; week} < year
Set_grouping hierarchy
{1..10} < inexpensive
Office Day
Month
January 7, 2021 Data Mining: Concepts and Techniques 26
A Sample Data Cube
Total annual sales
Date of TV in U.S.A.
1Qtr 2Qtr 3Qtr 4Qtr sum
TV
PC U.S.A
VCR
Country
sum
Canada
Mexico
sum
all
0-D(apex) cuboid
product date country
1-D cuboids
3-D(base) cuboid
product, date, country
ORDER
TRUCK
PRODUCT LINE
Time Product
ANNUALY QTRLY DAILY PRODUCT ITEM PRODUCT GROUP
CITY
SALES PERSON
COUNTRY
DISTRICT
REGION
DIVISION
Location Each circle is
called a footprint Promotion Organization
January 7, 2021 Data Mining: Concepts and Techniques 30
Chapter 2: Data Warehousing and
OLAP Technology for Data Mining
Choose the dimensions that will apply to each fact table record
Choose the measure that will populate each fact table record
Monitor
& OLAP Server
other Metadata
sources Integrator
Analysis
Operational Extract Query
Transform Data Serve Reports
DBs
Load
Refresh
Warehouse Data mining
Data Marts
materialized
January 7, 2021 Data Mining: Concepts and Techniques 35
Data Warehouse Development:
A Recommended Approach
Multi-Tier Data
Warehouse
Distributed
Data Marts
Enterprise
Data Data
Mart Mart Data
Warehouse
techniques)
fast indexing to pre-computed summarized data