Professional Documents
Culture Documents
3 AdvMDImpl
3 AdvMDImpl
Project
Project
Business
Business
Requirements
Requirements
Dimensional
Dimensional
Dimensional Physical
Physical
Data
Data Staging
Staging
Design
Design && Deployment
Deployment
Maintenance
Maintenance
Versioning dimension values
Planning
Planning Modeling
Modeling Design
Design &
& Growth
Growth Capturing the previous and the current value
Modeling Development
Definition
Definition Development
Timestamping
End-User
End-User
Application
End-User
End-User
Application
Split into changing and constant attributes
Application Application
Specification
Specification Development
Development
Project
Project Management
Management
Aalborg University 2007 - DWML course 3 Aalborg University 2007 - DWML course 4
Changing Dimensions Example
StoreID
• So far, we assume that dimensions are stable over TimeID • Attribute values in
Address
time Weekday dimensions vary over time
TimeID City
Existing rows do not change Week A store changes Size
New rows in dimension tables can be inserted StoreID District A product changes
Month
ProductID Size Description
• “Slowly changing dimensions” phenomenon Quarter Districts are changed
… SCategory
Dimension information change, but changes are not Year • Problems
frequent ItemsSold
DayNo Update dimensions
Still assume that the schema is fixed Amount
Holiday ProductID wrong information in
• Study techniques for handling changes in historical data
Description
Don’t update dimensions
dimensions Brand
DW is not up-to-date
PCategory
change
? ?
timeline
Aalborg University 2007 - DWML course 5 Aalborg University 2007 - DWML course 6
Aalborg University 2007 - DWML course 9 Aalborg University 2007 - DWML course 10
Aalborg University 2007 - DWML course 11 Aalborg University 2007 - DWML course 12
Solution 2A: inserting special facts Solution 2A
StoreID TimeID … ItemsSold … StoreID … Size …
001 234 2000 001 250 • Solution 2A: Use special facts for capturing changes in
dimensions via the Time dimension.
Assume that no simultaneous, new fact refers to the new
special fact for capturing changes
dimension row
StoreID TimeID … ItemsSold … StoreID … Size … Insert a new special fact that points to the new dimension row, and
001 234 2000 001 250 through its reference to the Time dimension, timestamps the row
002 345 - 002 450 • Pros
Possible to capture the development over time of the subjects that
the dimensions describe
• Cons
StoreID TimeID … ItemsSold … StoreID … Size …
Even larger tables
001 234 2000 001 250
002 345 - 002 450
002 456 2500
Aalborg University 2007 - DWML course 13 Aalborg University 2007 - DWML course 14
Aalborg University 2007 - DWML course 15 Aalborg University 2007 - DWML course 16
Solution 2B Rapidly Changing Dimensions
• Difference between “slowly” and “rapidly” is subjective
• Solution 2B: examples
Solution 2 is often still feasible
• Product descriptions are versioned, when products are The problem is the size of the dimension
changed, e.g., new package sizes • Example
• Old versions are still in the stores, new facts can refer to Assume an Employee dimension with 100,000 employess, each
both the newest and older versions of products using 2K bytes and many changes every year
Solution 2B is recommended
• Time value for a fact not necessarily between “From” and
• Other typical examples of (large) dimensions with many
“To” values in the fact’s Product dimension row
changes are Product and Customer
• Unlike changes in Size for a store, where all facts from a • The more attributes in a dimension table, the more
certain point in time will refer to the newest Size value changes per row can be expected
• Unlike alternative categorizations that one wants to • Example:
choose between. A Customer dimension with 100M customers and many attributes
Solution 2 yields a dimension that is too large
Aalborg University 2007 - DWML course 17 Aalborg University 2007 - DWML course 18
• Pros
DW size (dimension tables) is kept down • Why change dimensions?
Changes in a customer’s demographic values do not Applications change
result in changes in dimensions The modeled reality changes
• Cons • Multidimensional models realized as star schemas
More dimensions and more keys in the star schema support change over time to a large extent
Using value groups gives less detail
• A number of techniques for handling change over
The construction of groups is irreversible and makes it
hard to make other groupings time at the instance level was described
Navigation of customer attributes is more cumbersome Solution 2 (and the derived, 2A and 2B) is the most useful
as these are in more than one dimension Possible to capture change precisely
Aalborg University 2007 - DWML course 21 Aalborg University 2007 - DWML course 22
Aalborg University 2007 - DWML course 23 Aalborg University 2007 - DWML course 24
DW Bus Architecture DW Bus Architecture
• Data marts built independently by departments • Dimension content managed by dimension owner
Good (small projects, focus, independence,…) The Customer dimension is made and published in one place
Problems with “stovepipes” (reuse across marts impossible)
• Tools query each data mart separately
• Conformed dimensions Separate (SQL) queries to each data mart
Same structure and content across data marts
Results combined (outer join) by tool (or OLAP server)
Take data from the best source
Dimensions are copied to data marts (not a space problem) • Hard to make conformed dimensions and facts
• Conformed fact definitions Organizational and political challenge, not technical
The same definition across data marts (price excl. sales tax) Get everyone together and
Facts are not copied between data marts (facts > 95% of data) Get a top manager (CIO) to back the conformance decision
Observe units of measurement (also currency, etc.) • Exception: business areas totally separated
Use the same name only if it is exactly the same concept No common management/control
• Allows several data marts to work together Build several DWs
Combining data from several fact tables is no problem
Aalborg University 2007 - DWML course 25 Aalborg University 2007 - DWML course 26
Aalborg University 2007 - DWML course 27 Aalborg University 2007 - DWML course 28
Matrix Method Overview
Sales + + +
Data marts Costs + +
Profit + + + +
Aalborg University 2007 - DWML course 29 Aalborg University 2007 - DWML course 30
Aalborg University 2007 - DWML course 31 Aalborg University 2007 - DWML course 32
Transact-SQL Replication
• SQL Server’s SQL dialect = SQL + procedural code • Publisher
• Procedures/functions, variables, IF-THEN, loops, DDL… Server that publishes data to other
Has one or more publications consisting of articles (tables, etc.)
• Built-in functions
• Distributor
• Exception handling (RAISERROR) Server that manages distribution of data
Keeps track of publications and subscriptions, history, etc.
CREATE PROCEDURE CustOrdersDetail @OrderID int
AS • Subscriber
SELECT ProductName, Server that receives data
UnitPrice=ROUND(Od.UnitPrice, 2), Has subscriptions to a number of publications
Quantity, • Push/pull both possible
Discount=CONVERT(int, Discount * 100),
• Types of replication
ExtendedPrice=ROUND(CONVERT(money, Quantity*(1 - Discount)*Od.UnitPrice), 2)
Snapshot replication - copies data at a specific point in time
FROM Products P, [Order Details] Od
Transactional replication - first send snapshot, then send updates
WHERE Od.ProductID = P.ProductID and Od.OrderID = @OrderID
Merge replication - distributed, disconnected replication
GO
Aalborg University 2007 - DWML course 33 Aalborg University 2007 - DWML course 34
Aalborg University 2007 - DWML course 35 Aalborg University 2007 - DWML course 36
Mini Project Summary
• New subtasks
See mini project page
• After this part, you should have the following: • Advanced multidimensional modeling
Description of business processes Mainly handling changes in dimensions
Choice of data source(s) + considerations described • Large-scale dimensional modeling
An “true” multidimensional schema + description
Coordinating cubes/data marts
A relational DW schema design, e.g., star schema + description
An implementation in SQL Server and Analysis Services • Multidimensional database implementation
You are welcome to discuss your design with me MS SQL Server
• “Building Cubes” subtask can only be completed after the MS Analysis Services
ETL subtask is completed
Cube must be refreshed or rebuilt after ETL is complete
But “test cube” can be built based on data you type
Create Relational DW tables by ETL or directly by MS SQL
Aalborg University 2007 - DWML course 37 Aalborg University 2007 - DWML course 38