Professional Documents
Culture Documents
Concepts and Techniques: Data Mining
Concepts and Techniques: Data Mining
— Chapter 3 —
Jiawei Han
Department of Computer Science
University of Illinois at Urbana-Champaign
www.cs.uiuc.edu/~hanj
©2006 Jiawei Han and Micheline Kamber, All rights reserved
November 11, 2019 Data Mining: Concepts and Techniques 1
November 11, 2019 Data Mining: Concepts and Techniques 2
Chapter 3: Data Warehousing and
OLAP Technology: An Overview
records
Data cleaning and data integration techniques are
applied.
Ensure consistency in naming conventions, encoding
time,location,supplier
3-D cuboids
time,item,location
time,item,supplier item,location,supplier
4-D(base) cuboid
time, item, location, supplier
branch_key
branch location
location_key
location_key
branch_key
units_sold street
branch_name
city_key
branch_type
dollars_sold city
city_key
avg_sales city
state_or_province
Measures country
<dimension_name_first_time> in cube
<cube_name_first_time>
all all
Specification of hierarchies
Schema hierarchy
day < {month <
quarter; week} < year
Set_grouping hierarchy
{1..10} < inexpensive
Office Day
Month
November 11, 2019 Data Mining: Concepts and Techniques 27
A Sample Data Cube
Total annual sales
Date of TV in U.S.A.
1Qtr 2Qtr 3Qtr 4Qtr sum
TV
PC U.S.A
VCR
Country
sum
Canada
Mexico
sum
all
0-D(apex) cuboid
product date country
1-D cuboids
3-D(base) cuboid
product, date, country
Visualization
OLAP capabilities
Interactive manipulation
November 11, 2019 Data Mining: Concepts and Techniques 30
Typical OLAP Operations
Roll up (drill-up): summarize data
by climbing up hierarchy or by dimension reduction
Drill down (roll down): reverse of roll-up
from higher level summary to lower level summary or
detailed data, or introducing new dimensions
Slice and dice: project and select
Pivot (rotate):
reorient the cube, visualization, 3D to series of 2D planes
Other operations
drill across: involving (across) more than one fact table
drill through: through the bottom level of the cube to its
back-end relational tables (using SQL)
ORDER
TRUCK
PRODUCT LINE
Time Product
ANNUALY QTRLY DAILY PRODUCT ITEM PRODUCT GROUP
CITY
SALES PERSON
COUNTRY
DISTRICT
REGION
DIVISION
Location Each circle is
called a footprint Promotion Organization
November 11, 2019 Data Mining: Concepts and Techniques 33
Chapter 3: Data Warehousing and
OLAP Technology: An Overview
Monitor
& OLAP Server
Other Metadata
sources Integrator
Analysis
Operational Extract Query
DBs Transform Data Serve Reports
Load
Refresh
Warehouse Data mining
Data Marts
materialized
November 11, 2019 Data Mining: Concepts and Techniques 38
Data Warehouse Development:
A Recommended Approach
Multi-Tier Data
Warehouse
Distributed
Data Marts
Enterprise
Data Data
Data
Mart Mart
Warehouse
Data transformation
convert data from legacy or host format to warehouse
format
Load
sort, summarize, consolidate, compute views, check
integrity, and build indicies and partitions
Refresh
propagate the updates from the data sources to the
warehouse
November 11, 2019 Data Mining: Concepts and Techniques 40
Metadata Repository
Meta data is the data defining warehouse objects. It stores:
Description of the structure of the data warehouse
schema, view, dimensions, hierarchies, derived data defn, data
mart locations and contents
Operational meta-data
data lineage (history of migrated data and transformation path),
currency of data (active, archived, or purged), monitoring
information (warehouse usage statistics, error reports, audit trails)
The algorithms used for summarization
The mapping from operational environment to the data warehouse
Data related to system performance
warehouse schema, view and derived data definitions
Business data
business terms and definitions, ownership of data, charging policies
November 11, 2019 Data Mining: Concepts and Techniques 41
OLAP Server Architectures
Relational OLAP (ROLAP)
Use relational or extended-relational DBMS to store and manage
warehouse data and OLAP middle ware
Include optimization of DBMS backend, implementation of
aggregation navigation logic, and additional tools and services
Greater scalability
Multidimensional OLAP (MOLAP)
Sparse array-based multidimensional storage engine
Fast indexing to pre-computed summarized data
Hybrid OLAP (HOLAP) (e.g., Microsoft SQLServer)
Flexibility, e.g., low level: relational, high-level: array
Specialized SQL servers (e.g., Redbricks)
Specialized support for SQL queries over star/snowflake schemas
November 11, 2019 Data Mining: Concepts and Techniques 42
Chapter 3: Data Warehousing and
OLAP Technology: An Overview
threshold
Avoid explosive growth of the cube
and product
A join index on city maintains for each
data warehouses
ODBC, OLEDB, Web accessing, service facilities,
Layer2
MDDB
MDDB
Meta Data
Summary
November 11, 2019 Data Mining: Concepts and Techniques 54
Summary: Data Warehouse and OLAP Technology