Professional Documents
Culture Documents
Advanced Data Warehousing
Advanced Data Warehousing
Advanced Data Warehousing
MicroStrategy
(N )‘p‘
60"
§0
4( )"/Z»
This Course (course and course materials) and any Software are provided “as is”
and without express orlimited warrantyofanykindby eitherMicroStrategy,
Inc. or anyone who has been involved in the creation, production, or
warranties
risk
Should
distribution
or anyone
asto
the
thequality
elsewho
of
Course
ofmerchantability
the Course
or
has
andperformance
Software
been
or Software,
involved
and
prove
fitnessfora
defective,
with
ofthe
including,
thecreation,
Courseand
particular
you(and
butnotproduction,
Software
not
purpose.
limited
MicroStrategy,
to,
iswith
The
the
or entire
implied
you.
Inc.
on
creation,
not
other
account
limitedtoany
special,incidental,
production,
ofany claim
damages
or for
distribution
consequential,
damage,
assessedagainst
including
ofthe
or exemplary
Courseor
anylost
Softwarebe
damages,
profits, lost
including
liable
savings,or
toyou
but
use,
of contract
or
otherwise.
contribution,
quality,warranty,
orperformance
the failure
negligence,
ofofsuch
anystrict
remedyto
CourseandSoftware,
liabilityforthe
achieve itsessential
negligence
baseduponprinciples
purpose,or
ofindemnity
is
prior
prohibited.
right
all
The
obligation
selling,
rights
informationcontained
tomakeperiodic
written
orarereserved
otherwise
toU.S.
notify
consent
Government
any
distributinganypart
ofan
by
modifications
person
MicroStrategy,
authorizedrepresentative
inthisCourse
Restricted
orentity
to the
ofsuch
Rights.
Inc.
and
ofCourse
theCourse
MicroStrategy,
the
revision.Copying,
It Software
orthe
acknowledged
of orSoftware
MicroStrategy,
Software
are
Inc.
copyrighted
reservesthe
duplicating,
without
that
without
Inc.
the and
are
Copyright
respect to unpublished
Information
portionsoftheSoftware.
Trademark Information
MicroStrategy7i
MicroStrategy, MicroStrategy6, MicroStrategy 7,MicroStrategy 7i,
Evaluation Edition, MicroStrategy 7i Olap Services,
MicroStrategy 8, MicroStrategy 9, MicroStrategy Distribution Services,
MicroStrategy MultiSource Option, MicroStrategy Command Manager,
MicroStrategy Enterprise Manager, MicroStrategy Object Manager,
MicroStrategy Reporting Suite, MicroStrategy Power User, MicroStrategy
Analyst, MicroStrategy Consumer, MicroStrategy Email Delivery, MicroStrategy
BI Author, MicroStrategy BI Modeler, MicroStrategy Evaluation Edition,
MicroStrategy
MicroStrategyAdministrator,
BI Developer Kit,
MicroStrategy
MicroStrategy
Agent,MicroStrategy
BroadcastServer, MicroStrategy
Architect,
MicroStrategy
Adapter,
Broadcaster,MicroStrategy
IntelligencePlatform,
Applications,MicroStrategy
MicroStrategyDesktop
eCRM
Executive,MicroStrategy
7,MicroStrategy
MicroStrategy
OLAP
IntelligenceServer
Provider,
MicroStrategyConsulting,
Analyst,
Education,
Narrowcast
Infocenter,
Broadcaster
Customer
MicroStrategy
MicroStrategy
UniversalEdition,
MicroStrategy
Server,
MicroStrategy
Analyzer,
Server,
MicroStrategy
SDK,MicroStrategy
Desktop
MicroStrategyBusiness
MicroStrategyDesktop,
MicroStrategyCRM
eTrainer,MicroStrategy
Intelligence
MicroStrategy
Designer,
Objects,
Server,
MicroStrategy
Support,
MDX
Centralized
MicroStrategyWebBusiness
Developmentand
MicroStrategyTelecaster,
Application
SophisticatedAnalysis,
Management,
MicroStrategy
Analyzer,Information
MicroStrategyWorld,Application
Transactor,MicroStrategy
BestInBusiness
Like Water,
Intelligence,
Intelligence
Web,
Through Every Phone, Intelligence To Every Decision Maker, Intelligent E-
Business, Personalized Intelligence Portal, Query Tone, Rapid Application
Development, MicroStrategy Intelligent Cubes, The Foundation For Intelligent
Enterprise,The
Intelligence
E-Business,The
Platform
PlatformForIntelligent
Integrated
Built ForTheInternet,
Business Intelligence
E-Business,
Office
PlatformBuilt
Intelligence,
The Scalable ForThe
MicroStrategy
Business
All
without
companieswith
MicroStrategymakes
othernotice.
companyand
MicroStrategy
whichthey
noproduct
warrantiesor
areisnot
associated.Specifications
namesmay
responsiblefor
commitmentsconcerning
betrademarks
errorsor
subjectto
oftherespective
omissions.
theavailability
change
Patent Information
This product is patented. One or more of the following patents may apply to the
product sold herein: U.S. Patent Nos. 6,154,766, 6,173,310, 6,260,050,
6,263,051, 6,269,393, 6,279,033, 6,567,796, 6,587,547, 6,606,596, 6,658,093,
6,658,432, 6,662,195,6,671,715, 6,691,100, 6,694,316, 6,697,808, 6,704,723,
6,741,980, 6,765,997, 6,768,788, 6,772,137, 6,788,768, 6,798,867, 6,801,910,
6,820,073, 6,829,334, 6,836,537, 6,850,603, 6,859,798, 6,873,693, 6,885,734,
6,940,953, 6,964,012, 6,977,992, 6,996,568, 6,996,569, 7,003,512, 7,010,518,
7,016,480, 7,020,251, 7,039,165, 7,082,422, 7,113,993, 7,127,403, 7,174,349,
7,181,417, 7,194,457, 7,197,461, 7,228,303, 7,260,577, 7,266,181, 7,272,212,
7,302,639, 7,324,942, 7,330,847, 7,340,040, 7,356,758, 7,356,840, 7,415,438,
7,428,302, 7,430,562, 7,440,898, 7,486,780, 7,509,671, 7,516,181, 7,559,048,
8,094,788,
7,574,376,
7,881,443,7,925,616,
7,617,201,
8,130,918,7,725,811,7,801,967,
7,945,584,
8,296,287,7,970,782,
8,321,411and
7,836,178,
8,005,870,
8,452,755.
7,861,161,
8,051,168,
Other7,861,253,
patent
8,051,369,
applications are pending.
How to Contact Us
Course Description
This 2-day course covers advanced issues in data warehousing design and
explains how to work with these complexities when implementing a
MicroStrategy project. The course assumes an understanding of basic report
building andprojectdesign conceptsfromthe2-day MicroStrategy Desktop:
ReportingEssentials course,the2-dayMicroStrategy Architect: Project Design
Essentials course, and the 1-day MicroStrategy Architect: Advanced Project
Design course as well as a basic knowledge of SQL.
ina
Students
can
model
The affectreportanalysis
coursecovers
and
willlearn
schemato
avariety
how
resolve
tomodel
andthen
ofdata
them
complex
examining
warehousing
MicroStrategy
hierarchies
how
topics,explaining
todesign
and
reporting
attribute
the datawarehouse
environment.
issues that
relationships,implement
design
modeling
theschema
and schema
foroptimal
design
roleissues,
attributesand
performance, use
and optimize
slowlychanging
logicalviewsto
query performance
dimensions,
solvedata
using
arise
environment.
After when
know taking
howtodesigning,
this
bestcourse,students
accommodate
building, and
them
will
querying
understand
within the
adata
MicroStrategy
thecomplexissues
warehouse, reporting
andthey
thatoften
will
• Project architects
Course Prerequisites
Before starting this course, you should know all topics covered in the following
courses:
• MicroStrategyDesktop: ReportingEssentials
FollowUp Courses
This course does not have any recommended follow-up courses.
Related Certifications
To validateyourproficiencyin the contentofthis course, you might consider
taking the following certification:
in a star schema.
• Describe advanced data modeling concepts and explain how
to design thedata warehouse model and schema to support them
implementing
•in aDescribe
MicroStrategy in aMicroStrategy
attributeroles
themproject. thefourmethods for
and explainproject.
This course is organized into lessons and reference appendices. Each lesson
focuses on majorconceptsand skills thathelp you to better understand
MicroStrategy productsanduse themto implement MicroStrategy projects. The
appendices provide you with supplemental information to enhance your
knowledge of MicroStrategy products.
ContentDescriptions
Each majorsection of thiscoursebegins withaDescription
heading. The
Description introduces you to the content contained in that section.
Learning Objectives
Learning objectives enable you to focus on the key knowledge and skills you
for
should
you obtain
atthefollowing
by successfully
threelevels:
completing this course.Objectives are provided
Lessons
Each lesson sequentially presents concepts and guides you with step-by-step
procedures. Illustrations, screen examples, bulleted text, notes, and definition
tables helpyouto achieve the learning objectives.
Opportunities
This forPractice
version of this course manual excludes hands-on exercises. If
you are
interested in taking the complete course, please contact MicroStrategy
Education at education@microstrategy.com.
Typographical Standards
of
Following are explanations the font style changes, icons, and different types
of notes thatyou see in this course.
Actions
Sum(Sales)/Number of Months
Data Entry
References to literal data you must type in an exercise or procedure are in bold
Arial font style. References to data you type that could vary from user to user or
system to system are in bold italic Arial font style. The following example shows
this style:
Keyboard Keys
Press CTRL+B.
New Terms
New arefirst
they termstonote
encountered
are in regularitalic
inthe course.The
font style.
following
Theseterms
example
are
shows
defined
thiswhen
style:
A warning icon calls your attention to very important information that you
should readbefore continuingthe course.
Other MicroStrategy Courses
CoreCoursesAdvanced
Courses
Core Courses
• Implementing MicroStrategy: Development and Deployment
• MicroStrategyOfficeEssentials
Advanced Courses
• MicroStrategyAdministration: Configurationand Security
• MicroStrategyWebSDK: PortalIntegration
Lesson Description
This lesson describes how the design of the data warehouse affects performance
in the reporting environment and provides an overview of the concepts that are
covered in this course.
In this lesson, you will learn why data warehouse design is so crucial to
achieving efficient reportperformance and resolving complex report
requirements. Then, youwillbe introduced at a high level to the topics that are
covered in this course.
Lesson Objectives
Explain how data warehouse design affects the reporting environment and
describe the topics that are covered in this course.
•AfterExplainhow
completing data
the topics
warehouse
in thislesson,
design contributes
you will betoableto:
an efficient
reporting
environment.
• additional
ReportObjects—With
informationin
MicroStrategy OLAP Services,developers can include
the SQL generated for Intelligent Cube reports
beyond the data initially displayed to users. Enabling developers to “plan
• ahead”
query
Reportexecutesagainstthedata
Filter
foruseractivities,such
Qualifications—Developers
as
warehouse.
drilling,canuse
can reduce
reports
the they
number
haveoftimes
alreadya
filters as
well as other application objects, like custom groups and consolidations,
that enable developers toproducemorecomplexreports.
MicroStrategy Architect also provides functionality that helps you better design
projects to achieve your reporting requirements. For example, in the
MicroStrategy Architect: Project Design Essentials course, you learned about
functionalitythatsimplifies
the following features: project creation andreport development, including
• HeterogeneousFacts
attributesandfacts that
warehouse,eliminatingandtheneed
maptomultiple
Attributes—Project
physical
architects
columnscancreate
inthe data
The ways in which you map schema objects and construct reports affect whether
can
a business
do so. However,
intelligence
even
systemcananswer
though you can answermany
complex queries
questions
andhow efficiently it
using
successfully
warehouse
MicroStrategy
design
and
functionality,
efficiently
contributes
querying
the
to designofthe
anefficient
thedatareporting
warehouse.Specifically,
databaseitself isjustas data
integral to
environment by enabling
you to do the following:
•Satisfying
Satisfy
reporting
database toolReport
Increasequery
report
level Requirements
requirements
functionalityor
performance that
that
youyou
eithercannot
can more efficiently
achieve using
resolve
only
atthe
the relationships thatexist within a single set of source data. For instance, users
may want to see a list of employees, and then, within that list, they need to know
which employees manage other employees. One way to resolve this requirement
is to modify the lookup table structures tosupport such a query.
schema, joins betweentables can bemade more or less complex. The schema
performance.Inevitably,
well
structureitself
asthesize canalso
of thetables
impactthe
some
being
types
joined—both
numberof
of dataalways exist in high
joinsrequired
factorsthat seriously
for agivenqueryas
volume,
affect
like
consuming.
that
customer
tables
thetablestructure
as efficient
information.
as possiblebypaying
However,
itselfdoesyou
notcanmake
makethetask
attentionretrieving
to schema
moredifficult
records
designfrom
or
andensuring
time
large
Overview of Advanced Data
Warehousing
of MicroStrategy functionality,
or some combination thereof
• modeling
Selectingthe
Creating
achievethebest
logical
andoptimalschemadesign
schemadesign
views
possible
inMicroStrategy
report
issues
performance
forArchitectto
lookup tables
resolveavarietyofdata
andfacttables to
This course is divided into the following seven sections which includes an
appendix:
• Advanced Schema Design—Describes the various types of schemas,
analyzes their differences, reviews fact table key structure, and provides
optimal schema recommendations for a MicroStrategy project performance
• Logical
in the MicroStrategy
Views—Describes
environment
howtocreate logical views in MicroStrategy
and
Architect
fact expressions
and provides examplesofusingthemto build complexattribute
• Many
many
projectrelationships
toMany Relationhips--Describes
andmethodsfor resolving
the challenges
them inacaused
MicroStrategy
by many-to-
• you
Slowly
haveattributes
Changing Dimensions—Describes
that vary overtime andmethods
the challlengescaused
for resolvingany
themin
time a
MicroStrategy project
modifying the logical data model or schema of your data warehouse. You
can
views,
Views
alsoresolve
lesson.Thedatabase
a featureof
manyofthese
MicroStrategy
andissuesattheapplicationlevel
Architect thatis covered in using
theLogical
logical
In this lesson,youlearned:
• While MicroStrategy provides functionality that can help you answer many
complex questions, the design of the database itself is also integral to
successfully and efficiently querying the data warehouse.
Lesson Description
In this lesson, you willlearnabout each of these concepts and their impact on
report analysis. Youwilllearnhow to design the data warehouse schema for
optimal use in the MicroStrategy reporting environment.
Lesson Objectives
schemasandthedifferences
•AfterDescribethe characteristicsof
completing thetopics betweenthetwotypes
both snowflake
inthis lesson,youwill beableto:and
ofschemas.
star
Describe the characteristics ofboth snowflake and star schemas and the
differences between the two types of schemas. Explain the impact of each
schema type on query performance.
Two primary types of schemas exist, both of which have different effects on the
SQL generated for queries. The two schema types are the following:
• Snowflake
Star
In discussing these two schema types, you will look at how the structure of the
schema differs for the following data model:
• Completely normalized
• Moderately denormalized
• Completely denormalized
CompletelyNormalized Snowflake
This schema contains a separate lookup table for each attribute in each of the
two hierarchies, a characteristic that is common to allformsof snowflake
schemas. As a result, snowflake schemas contain multiple lookup tables for each
hierarchy present in a data
lookup
model. Though, in this normalized form of the
snowflake schema, each table contains only the following information:
• Attribute ID
In
the
• additiontothe
ID
IDof
ofthe
immediate
immediate
attribute’s
parentattribute
parent,
ID anddescription
which is necessary
columns,
tomapeachtable
the relationship
contains
bare
between
minimumof
theparent
information
andchild attribute.
necessaryAs
torelate
such, thisstructure
the data.No stores
redundant data is
onlythe
stored, so the schema is normalized. Also, because the tables are storing the
minimum amountof data, they are as small as they can possibly be.
When you need to query information from the fact table and join it to
information in the higher-level lookup tables, more joins are necessary in the
SQL to achieve the desired result.
For example,youcould
Joins in a Completely
runthe
Normalized
followingreport:
Snowflake Schema Design
• Contains
absolutelyrelatively
necessary smaller
tomaptables
relationships)
(does notstore more datathanis
• Attribute ID
• IDs
ID ofofallother
immediatehigher-level
parentattribute
attributes
Even
it canthough
haveaperformance
this structureadvantage.
stores more data,you
When dependingonthe
need toquery information
volumeofdata,
from
the fact tableand join it to information in the higher-level lookup tables, fewer
joins are necessary in the SQL to achieve the desired result.
Now,
required
Joinsin
if you
to obtain
runthe
a Moderately
the
samereporttodisplay
resultDenormalized
set: customer
Snowflake
statesales,
Schemafewer
Design
joins are
• Requires fewer joins when querying fact data in conjunction with higher-
level lookup tables
• Attribute ID
• ID ofofallother
IDs immediatehigher-level
parent attribute
attributes
higher-level lookup tables, fewer joins are necessary in the SQL to achieve the
desired result.
Now, if you runthe samereporttodisplay customer state sales, fewer joins are
required
JoinsinCompletely
to obtain the resultset:
Denormalized Snowflake Schema Design
In
characteristics:
summary, a completely denormalized snowflake schema has the following
Characteristics
Design of Completely Denormalized Snowflake Schema
Star Schemas
A star schema is a design that contains only one lookup table for each hierarchy
in the data model instead of having separate lookup tables for each attribute.
With only a single lookup table for each hierarchy, the IDs and descriptions of
all
completely
involves
attributes
agreat
denormalized.
inthehierarchy
degree of redundancy.
If youapply As
arestoredastar
inthe
such,star
schema
sametable.
schemas
to theThis
sampledata
aretype
always
ofstructure
model,
the schema
StarSchema
looks like
Design
the following:
This schema contains onlytwo lookup tables, one for each hierarchy.
LU_LOCATION storesthedatafor all of the attributes in the Location
hierarchy, while LU_CUSTOMER stores the data for all of the attributes in the
Customer hierarchy. Asaresult,star schemas contain very few lookup
tables—one for each hierarchy present in the data model. Each lookup table
contains the IDs and descriptions (if they exist) for all of the attribute levels in
the hierarchy.
Even
can bethough
much larger
youhave
because
fewertables
each one
ina
stores
star schema
all ofthethan
information
a snowflake,
for an
theentire
tables
hierarchy.Whenyouneed
informationinthe
achieve
For example,
thedesired
ifyoulookup
result.
run thetables,onlyasinglejoin
toquery
same reportto
information
display
from
customer
isnecessary
the factstate
tableandjoin
inthe
sales,
SQLonly
toitto
one
Even though achieving this result set requires only a single join, star schemas do
in
not
performance.
fact
any
necessarily
table.Insuch
onehierarchy,
equate
cases,
you
tobetter
more
maybejoining
joins
performance.
betweensmaller
a verylarge
Depending
tablescan
lookup
onthevolumeofdata
table
yield
toabetter
verylarge
• schemas
Contains due
verylarge
to storingall
tables(much
attribute
largerthan
ID and description
someformsof
columns)
snowflake
• Stores
table theIDs and descriptions of allthe attributes ina hierarchyin a single
• Requires only a single join when querying fact data regardless of the
attributelevelat which you arequerying data
In this first example, the LU_CUSTOMER table joins to the FACT_SALES table,
a base fact table in which the data is stored at the lowest level of each of the
hierarchies. As a result, fact records join to the LU_CUSTOMER table through
the Cust_ID column. Since the report template requires the sales data to be
displayed at the level ofCustomer City,the totalfor each customer is aggregated
and
result
comprise
grouped
set for
the bycustomer
each
salesfor
of the
Colorado
three
city for
cities,with
Springs.
theresult
Because
sales
set.for
The
the
Cassie
report
join andTimcombinedto
tothe
returnsacorrect
facttable occurs
In the illustration above, only the columns that are used in the report are
included in the sample data for the LU_CUSTOMER and
FACT_SALES_AGG tables. However, the actual tables would contain all of
the columns referenced in the schema.
Now that the sales data has been aggregated to a higher level, the records in the
FACT_SALES_AGG table need to be able to join to a distinct list of customer
cities to prevent multiple counting from occurring. Though, in a star schema, a
distinct list of any higher-level attribute does not exist because there is only a
single lookup table for the entire hierarchy. Thus, IDs and descriptions for all
but the lowest-level attribute are repeated many times within a lookup table.
Since
one forrecords
each customer
for ColoradoSprings
who resides there,
occurtwice in the
thesales forLU_CUSTOMER
ColoradoSpringsinthe
table,
FACT_SALES_AGG table can join to two different records. As a result, the sales
for Colorado Springs are counted twice in the report and are therefore inflated.
individual
Because aggregatefacttablesmust
lookuptables forthe higher-levelattributes
join toa distinct listifofyou
attributes,
plan touse
youthem.
need
You should use star schemas only if you do not intend to create aggregate tables.
If the query profile in your environment requires the use of aggregate fact tables
schema.
example,inthis
to achievedesired
You stillscenario,you
retain
performance, youof
the benefits
could usea
needto
having
completelydenormalized
design
fewerajoins
snowflake
when accessing
schema.
snowflake
For
base
fact tables, butthe higher-level lookup tables enable you to use aggregate fact
tables where appropriate.
Recommended Schema Design
LookupTableVolumeFactTableVolume
Describe the optimal schema design to use with MicroStrategy, identify schema
factors that affect query performance, and define the logical key for a fact table
in MicroStrategy Architect.
• DatatypeofID
Facttable keys columns
Generally, the recommended optimal schema design for use with MicroStrategy
is to completely denormalize the lookup tables. As you already saw earlier, when
you completely denormalize a table, the ID and description columns for all of
As
the attributesina hierarchy areavailable in the lowest-level lookup table. a
one
the
performance.
result,
additional
join
regardlessof
betweenthebase
joinsthat
the leveltowhichyou
arisefromamore
facttableandtheaggregate
normalized
lowest-level
data,queries
schema
lookupincreasesquery
table.
require
Avoiding
only
number of joinsrequired
Depending
achieve
HighLookupTable
goodonperformance.
tablevolume,
toFor
retrieve
complete
example,
customer
denormalization
the following
data for areport:
illustration
isnot alwaysshowsthe
required to
Volume
In this example, the report queriesthe Customer hierarchy, and the result set is
aggregated to the level of CustomerState. The lookup tables use a completely
denormalized schema. In thisscenario, the table volume for customer
information in any data warehouse is likely to be large since the customer base
often comprises the largest volume of data in a warehouse. Because the table
volume is high, avoiding the extra joins that a more normalized schema would
entail is the optimal route for increasing performance.
following
If table volume
illustration
is low,the
showsthe
extrajoins
numberareof
notso
joinsrequired
problematic.
toretrievelocation
For example, the
In this example, thereport queries the Location hierarchy, and the result setis
aggregated to the level of State. The lookup tables are shown using both a
completely denormalized schema and a completely normalized schema. The
denormalized schema requires only a single join, whereas the normalized
schema requires three joins to produce thesame result. In this scenario, the
table volume for store information in anydata warehouse is likely to be small.
At most, a company might have a couple thousand stores and maybe only a
couple hundred. The normalized schema increases the number of joins, but the
denormalized schema unnecessarily increases the size of what is originally a
very small lookup table. The performance difference between using the extra
joins or increasing the size of the tables is not significant. Because the volume of
the
tables
lookup
inthishierarchy
tables issolow,
without
you couldretain
negatively affecting
anormalized
performance.
schemafor thelookup
Fact Table Volume
For facttables, thevolume is determinedby the number of records stored and
the amountof information (attributeIDs) you store for each record.
You could modify the fact table to add the characteristic attributes as follows:
Adding these attributes denormalizes the fact table, which increases the volume
of the fact table. With a denormalized structure, you can retrieve the
information for Gender, Income and Occupation from the FACT_SALES table
itself without needing to join to the LU_CUSTOMER table (unless you have
descriptions for characteristic attributesstoredinthe lookuptable that you
wantYoustill
tableifyouwant
todisplay).
haveto joinfromthe
to displaythecustomer
FACT_SALESnameson
tabletotheLU_CUSTOMER
thereport.
The LU_CUSTOMER table is most likely very large, and the FACT_SALES table
may
denormalize
also be very
the fact
large.
table
Insuch
(evencases,it
thoughmay
it increases
increasethesizeofthe
performanceto
table) rather
could
Whenyoudenormalize
affectyour indexing
factstrategy.
tables, youadd
To ensure
columns
efficient
to use
thetable,
of indexes,
whichyou
should
runningreports.
indexonly the table columns on whichusers typically filterwhen
Data
Because Typeof
it is important IDthatthe
you useattribute
toensure Columns
ID columns for qualification,table joins,and indexes,
Whether they are present in lookup, relationship, or fact tables, you should
always create ID columns as either an integer, number, or date data type. You
should avoid using text data types, including varchar, to define ID columns as it
takes longerto qualifyor joinon text fields,andyoucannot take advantage of
indexes as efficiently.
The first issue of concern with regard to fact tables has to do with defining the
physical primary key of the table. For example, you have the following fact table:
SalesFact Table
In this report, you filter on Store, Customer, andDate, which representthe first
three columns in the index. When you run this report, it uses the index because
the report contains filter elements for the first three columns defined in the
index.
You may choose to run a slightly different report in which you view the data for
all stores not just Colorado Springs, as shown below:
In this example, the Store filter is no longer partof the report filter. However,
because the Store_ID column is first in the index sequence, its absence from the
filter means thatthe Customer and Date indexes are not invoked. Thequery
runs as if there were no index on the table at all. Depending on how you filter
reports, queries may not take advantage of theindex you defined on this fact
table when setting up the primary key.
To make fact table indexes more effective, MicroStrategy recommends that you
don’t define primary keyson the fact table.Instead, you can build individual
indexes on each attribute ID column in the fact table on which users typically
filter when they run reports. In this manner, you canruna reportthatfilters on
Fact
any
the Table
queryuses
single Logical
attribute Keys
the individualindexes
or combination ofyou
attributes
createdcontained
for eachof
inthefacttable,and
thoseattributes.
The primary keys defined for a fact table in the data warehouse are not the only
keys that affect queries. When you add a table to the project, MicroStrategy
Architect automatically defines a logical key for that table. The logical key
consists of any columns in a table for which you have defined an attribute in the
project. For example, youhavethe following fact table:
The logical key enables the MicroStrategy SQL Engine to optimize the SQLit
generates whenever you run a report in which the template contains all of the
logical keys for a given fact table. For example, you could run the following
report against the FACT_SALES table:
Report with All Fact Table Logical Keys on the Template
In the illustration above, only the columns that are used inthe report are
includedtablewould
inthe sample
containall
data for the FACT_SALES table. However, the
actual of the columns referenced in the schema. For
all of the logical key illustrations in this lesson, the same subset of columns
is shown in the sample data.
a12.[Customer_Name] AS Customer_Name,
a11.[Date_ID] AS Item_ID,
a11.[Item_ID] Date_ID,
a13.[Item_Desc] AS Item_Desc,
a11.[Sale_Amt]AS WJXBFS1
from [FACT_SALES]a11,
[LU_CUSTOMER]a12,
[LU_ITEM] a13,
[LU_STORE] a14
where a11.[Customer_ID] = a12.[Customer_ID]
and a11.[Item_ID] = a13.[Item_ID] and
a11.[Store_ID] = a14.[Store_ID]
The Source_ID column identifies the origin of thedatafor each record in the
table. While this information may be useful for tracking or troubleshooting
purposes, it may not be information that users want to analyze on reports.As
such, if it is not necessary toSQL
report
does
Engine
not
onthe
map
ignores
column,
to anthe
attribute,
you
presence
do not define an
attribute for it. If the column it is not partofthe
logical key. Essentially, the of this column in
the table.
A similar situation could also arise any time you have a fact table that contains
attribute columns you have not defined in your project. For example, the
FACT_SALES table contains several attribute ID columns, including an
Item_ID
Salescolumn:
Fact Table with an Undefined Attribute
Even though Item_ID existsas a column inthe table, users may not want to
analyze item information on reports.If noreporting requirements exist for item
information, you may choose not to define an Item attribute. If no
corresponding attribute exists, Item is not part of the logical key, and the SQL
Engine ignores the presence of this column in the table.
In cases like these where there are columns in a fact table that are not defined in
from
actual
For
correctly
the project,
example,
thedatawarehouse
keyinthe
whenrunning
thelogical
youcould
data warehouse.When
keyreferenced
runthe
queries
key, the
following
against
SQLinEngine
the
the
MicroStrategydoes
report
table.
logical
may
againstthe
key
not aggregate
for a FACT_SALES
tableisdifferent
not matchthe
rows of data
table:
a12.[Customer_ID]
group
In this example,
by a11.[Customer_ID]
theresultisaggregatedcorrectly because theSQLEngine
In this example, Store, Customer, and Date are all present on the report
template. SinceSQLand
these three
doesattributes
not aggregate
comprise
the the logical key, the SQL Engine
optimizes the Sale_Amt fact or group by any of
the attributes on the report. The SQLforthis report looks like the following:
select a11.[Store_ID] AS Store_ID,
a13.[Store_Desc] AS Store_Desc,
a11.[Customer_ID]
a12.[Customer_Name]
a11.[Date_ID] AS
a11.[Sale_Amt] ASDate_ID,
ASAS
WJXBFS1
Customer_ID,
Customer_Name,
from [FACT_SALES]
[LU_CUSTOMER]
[LU_STORE] a13a12,a11,
Engine
Regardless
fact table
isaware
thatyoudonot
ofthereason,
that the logical
define
any timeyou
keydoes
inyourhave
project,you
notadditional
represent the
need
attribute
totruekey
ensurethat
columnsina
of thedata
the SQL
warehouse table. Otherwise, optimizations in place for logical keys could result
in SQL that does not produce an accurately aggregated result set.
The
found
To disablethe
setting
inthethat
Logical
logical
controls
Table
key
how SQL is generated with regard tological keys is
Editor.
setting:
4 In the Logical Table Editor, on the Logical View tab, clear The key
specified isthe true keyforthe warehouse table check box.
The following image shows theLogical Table Editor with theoption to disable
the logical key setting.
Report with All Fact Table Logical Keys on the Template (Logical Key
Not Set as True Data Warehouse Key)
Now, the SQL Engine knows that the logical key, which consists of Store,
fact
Customer,
table. Therefore,
and Date, does
even not
though
represent all of
all ofthe members
the attribute
of thecolumns
logical key
in the
are actual
present on the report template, the SQL Engine still aggregates the Sale_Amt
fact and groups by the attributes on the report. The SQL for this report looks
like the following:
select a11.[Store_ID] AS Store_ID,
max(a13.[Store_Desc]) AS Store_Desc,
a11.[Customer_ID] AS Customer_ID,
max(a12.[Customer_Name]) AS Customer_Name,
a11.[Date_ID] AS Date_ID,
sum(a11.[Sale_Amt])AS WJXBFS1
from [FACT_SALES] a11,
[LU_CUSTOMER] a12,
[LU_STORE] a13
where a11.[Customer_ID]= a12.[Customer_ID]
and a11.[Store_ID]=a13.[Store_ID]
group by a11.[Store_ID],
a11.[Customer_ID],
a11.[Date_ID]
Because the SQL statement aggregates the Sale_Amt fact and groups by the
attributes onthereport,the two records for the items that Ian Rey purchased
are aggregated and grouped together in the result set. Therefore, the report
displays a single, aggregated row for Ian Rey.
Lesson Summary
In this lesson,youlearned:
• The two primary types of schemas are snowflake and star schemas.
• A
the
redundantly.
moderately
moderately
ID oftheimmediate
denormalized
Itcontains
denormalized
parent
many
snowflakeschema
snowflake
relatively
andall schemastores
higher-level
small
requires
tables.The
attributes.
somedata
fewerjoins
childHowever,
tables
whenstore
a
• A
querying
completelydenormalized
factdatain conjunction
snowflakeschema
with higher-level
stores
lookup
data redundantly
tables. to
• Generally,
MicroStrategy
the recommended
isto completelyoptimalschema
denormalize thelookup
design fortables.
use with
tables to reduce the number of joins that queries require and increase
• performance.
You can often normalize low-volume lookup tables because the performance
difference between using extra joins or increasing the size of the tables is
• not significant.
Fact table volumeis
amount ofinformation
determined by the number of records stored and the
(attribute IDs) you store for each record.
• You should always create ID columns using integer, number, or date data
on
types.
textYou
fields,
shouldnot
and youusetext
cannottake
dataadvantageof
typesasittakes
indexes
longer
astoqualify
efficiently.or join
• The logical key consists of any columns in a table for which you have
you
defined
SQL
whichEngineto
the
attributesin
template
optimize
contains
a project.
the SQL
allThelogical
ofitthe
generates
logical
keyenables
keys
whenever
for agiven
theMicroStrategy
run
fact
a table.
report in
• If you have undefined columns in a fact table, you may need to disable the
logical
Enginekeysetting
correctly aggregates
for thattableto
and groups
ensurethat
its factthe
datain
MicroStrategy
reports. SQL
3
LOGICAL VIEWS
Lesson Description
Architect.
•AfterDescribethe
completing thepurpose of logicalviews
topicsinthis in MicroStrategy
lesson,you willbe able to:
Logical Tables
The physical table contains columns thatstore the actual data, and it resides
only in the data warehouse. The logical table resides in the metadata, and it
maps directly to the physical table. Rather than storing actual data, it shows
which attributes and facts map to columns in the physical table. It also stores
information about the structure of the physical table. The Logical Table Editor
in MicroStrategy Architect enables you to view information about a logical table:
Table Aliases
You can also create a logical table in MicroStrategy Architect by creating an
project,
all
have
explicit
pointtothe
a LU_DATE
table
you automatically
alias.A
same
table in yourdata
physical
table createa
alias
table
enablesyou
LU_DATE
inthedata
warehouse.
tological
warehouse.
createmultiple
Whentable.
youForexample,
add
However,
logical
this table
you
tables
you
to
have
athat
multiple attributes such as Date, Ship Date, and Order Date that all use this
same lookup table. You can map the Date attribute to the original logical table.
Then, you can create table aliases that point to the LU_DATE table to map the
Ship Date and Order Date attributes:
Table Aliases
In this example, the LU_DATE logical table maps to the physical LU_DATE
table in the data warehouse. The two table aliases, ALIAS_O_DATE and
ALIAS_S_DATE, alsopoint tothe LU_DATE table. All three of these logical
tables use the same physical table to support different attributes in the project.
You will learn how to use table aliases later in the course.
As with any other logical table, you can use the Logical Table Editor to view the
attributes
physical table
or factsthatare
to which itismapped
associated.
toatable aliasaswell asthe structure ofthe
against tablesin the data warehouse. For example, you store order and shipping
informationin different tables in the data warehouse, LU_ORDER and
LU_SHIPMENT. You want to calculate the processing time between the order
dates in the LU_ORDER table and the ship dates in the LU_SHIPMENT table
to analyze the processing time for each order. You can create a logical view that
performs these functions:
LogicalViews
When you define the logical view, you write a SQL statement that selects the
pertinent information from the LU_ORDER and LU_SHIPMENT tables and
then calculates the difference between the order and ship dates to determine the
processing time. The following image shows an example of what this logical
viewLogical
might look
Table
likeinthe
Editor—Logical
Logical TableEditor:
View
Notice that the logicalviewcontains a SQL statement and definitions for each
column that is created as part of the logical view. You can then map attributes
and facts to these columns. You will learn more about creating logical views
later in this lesson.
When you execute a reportthat contains objects that map to a logical view, the
SQL for the logical view is inserted into the report SQL where the table name
would normally occur. The SQL is inserted as either a derived table expression
or a common table expression, depending on your data warehouse database
platform.
The SQL Engine generates logical views as either derived or common table
expressions
VLDBproperty.based onthe defaultvalue for theIntermediate Table Type
In general, the SQL Engine inserts the logical view SQL as
a common table expression for some versions of IBM DB2®. For other
Derived Table
databases, Expressions
it inserts itasa derivedtable expression.
In this example, notice that the WITH clause contains a SELECT statement that
extracts information from a table. This result is then used in the outer query.
Besides the obvious syntax differences, derived and common table expressions
support different features.
Commontableexpressions
multiple timeswithina enableyouto referencethe same expression
query. For example, notice that the following query
referencesthesamecommon table expression two different times:
with CTE1 as (select ColA, ColB, SUM(ColC) as
ColD
from TableA
group by ColA, ColB)
from
selectCTE1
ColA, ColB, ColD
In this example, the FROM clauses of both the SELECT statements that follow
the WITH clause (indicated in bold) reference the common table expression
contained in the WITH clause.
If
insert
you constructthis
theexpressionsame
each queryusinga
timeyou wantto
derivedtable expression, you haveto
use it:
select ColA, ColB, ColD
from (select ColA, ColB, SUM(ColC) asColD
from TableA
group by ColA, ColB)
where
selectColD
ColA,
>1000
ColB,
ColD
from (select ColA, ColB, SUM(ColC) as ColD
from TableA
group by ColA, ColB)
where ColA = 32
As you can see, an advantage of common table expressions is that they provide
the flexibilityto reference thesame expression multiple times in a SQL query.
If your project
Engine generates
runsagainst
logical viewsascommon
a data warehouse platformfor which the SQL
table expressions in the report SQL,
the query results in invalid SQLsyntax because you cannot nest one common
table expression (the one that defines the logical view) inside another common
if
table
Even
Engine
you maystill
expressions
expression
your
generates
project
encounter
(theone
tobenested
logical
runsproblems.
views
against
generatedtouse
inside
asderived
adata
Some
of derivedtable
warehouse
databasesalso
table
thelogical
expressions
platform
expressions.
viewinthe
do notallow
inthe
for which
reportSQL).
report
Again,
common
theSQL
SQL,
inthis
case, the query results in invalid SQL syntax because you cannot nest the
common table expression (the one that defines the logical view) inside a derived
Logical Views
table expression vs. Database
(the onegenerated to usetheViews
logicalview inthereport SQL).
Logical views are stored as logical tables in the MicroStrategy metadata. They
do not storeresultsof reports,only the SQL statementandcolumn definitions
used toretrievethe data forthe logical view anddisplayitin reports.
see
For information
“Creating Logicalon
Views”.
the individual components of logical views,
logical
is
Given
littlethesimilarity
views
orno carry
difference
the
in same
characteristics
between
advantages
thembetweenlogical
inand
termsofperformance.
limitations
anddatabase
as a database
Generally,
views,
view.there
Oneprimary
cannot build indexes
difference
onbetween
logical view
logicaland
columns.
database
Of course,
views
logical
is thatyou
viewsdo
take advantage of indexes on the underlying data warehouse tables that
they access.
When offer
views you are
yetanother
consideringhow
possibleto resolvea modeling or schemaissue, logical
solution that you can explore. Depending on the
nature of your data and the characteristics of your reporting environment, you
have to weigh whether an application-level solution (such as logical views) or a
Examples
database-level of Logical
makes themostsolution
sense and deliversViews
(suchaschanging
thebesttheUsage
performance.
structureof physicaltables)
• relationships
Slowly
sales representatives
changing
thatdimensions
change
switchdistricts
over
(SCDs)—track
time. For
overtime.
example
and analyzeattribute
inasalesorganization
• calculations,
database
to-date”
Time trendanalysis—
or“year-to-date”.
thatyou
mapneedto
each
todateto
perform
define
Ifyouallthe
transformations
month-to-date
previousdatesthat
andyear-to-date
using tables
makeup“month-
inthe
between
attribute lookup tables for a report that contains only attributes (that is, no
metrics).
• Recursive
logical
You will
views,one
learnmore
Hierarchies—split
foreach
aboutlevel
starschemas,
theinthe
recursive
hierarchy.
SCDs,
hierarchy
andrecursive
tableintoseveral
hierarchies
later in the course.
Creating Logical Views
UsingLogicalViewsto CreateComplexAttribute andFact Expressions
CreatingaNewLogical Table
Defining the SQL Statement
Defining the Columns
to
After completing
Mapping Logical
thistopic,
View you
Columns
will be ableto:
Attributes and Facts
Now that you understand the purpose of logical views and how they work, you
are readyto learnhowto create alogical view in MicroStrategy Architect.
The desired report display is what causes the complexity. This report contains
not only the attributes from the data model, but it also contains a Processing
expression
Time
the
ship
in
that
two
Ship_Date
calculates
dateforan
metric.
differenttables.
requires
Processing
the
columnis
order. you
difference
that
Therefore,
The
timeisthenumber
in Order_Date
the
combinetwo
between
LU_SHIPMENT
the Processing
thesetwo
columnis
ofdays
dates.However,thesedates
table.
Timemetricrequiresafact
inbetween
the
Creating
LU_ORDER
the this
orderfact
table,and
date and
are
You cannot create this fact through the Fact Editor but you can calculate the
processing
Then
You will
youlearn how the
canmap
timeand tocreatea
createalogical
facttothis
Processing_Time
column.
viewusingcolumnas
this business
partof
scenario.
alogicalview.
There
4 Map the columns in the logical view to the appropriate attributes and facts.
3 In the Tables folder, right-click a blank area of the Object Viewer, point to
New, and select Logical Table.
OR
• Mapping Pane—You use this pane to define each of the columns contained
in the logical view.
• Object Browser—You use this browser to find table and columns names
that you want to include in the logical view SQL or that you want to define
as column objects in the logical view.
In addition, the Physical View tab enables you to choose whether the logical
view’s SQL expression shouldbe enclosed inparentheses in the report’s FROM
clause by selecting the Enclose SQL expression in parentheses check box.
This willlearn
You option isenabled
how to useeachof
by default.these components asyou learn howtodefine a
logical view.
1 In
pane,
theLogical
type the TableEditor,on
SQL statement that
the you
Physical
wantView
to usetab,in
to define
the SQL
the logical
Statement
view.
When
linesofSQLas
you typethe SQLstatement, you canpressCTRL +TAB to indent
needed. This format makes the finished SQL statement
easier to read.
You can usethe Object Browser totheleftof the SQL Statement pane to add
table or column names to your SQL statement. By default, the Object Browser
displays all of the tables that are in the project warehouse catalog. You can then
expand individual tablesto viewthe columns in each table.
To add tables or column names to the SQL statement using the Object
Browser:
1 In the Logical Table Editor, on the Physical View tab, in the SQL Statement
pane, position the cursor at the point where you want to insert a table or
column name.
2 If you want to insert a table name, in the Object Browser, select the table
and drag ittothe SQL Statement pane.
OR
Select the column name and drag it to the SQL Statement pane.
You
into still
yourneed
statement.
totypeUsingtheObject
necessary SQL commands,
Browserkeeps
tablealiases, and operators
you from having to type
the column and table names, which comprise the bulk of the statement.
The following image shows the SQL statement for the logical view:
• Selects the order, order date, and customer information from the
LU_ORDER table
• Selects the ship date and shipper information from the LU_SHIPMENT
table
shipping
This SQLinformationaswell
statement creates alogical
asthe view thathas all ofthenecessary orderand
processing time.
Defining the Columns
The nextstepincreating the logicalviewistodefine each of the columns that
are partofthelogical view. You mustdefineacolumn object for every column
referenced in the SELECT clause of the SQL statement. The column object
names must exactly match the column names used in the SQL, but you can
define
you reference
You can
the
define
columns
them
column
in
inthe
any
objects
SQL
order.You
inoneof
statement.
donot
two needto
ways.If follow
your logical
the orderin
view contains
which a
column that already existsin other logical tables, you can simply locate that
column
logical view
inthe
createsa
Object Browser
column that
anddoes
dragnotexist
ittothe logical
inotherview definition. If your
logical tables, you have
to manually add the column to the logical view definition.
1
2 In
Select
expandthe
theLogical
thecolumn
table
Table
thatcontains
and
Editor,
drag it
onthePhysical
tothecolumn
the Mapping
that you want
Viewtab,
pane. in the
touse.
ObjectBrowser,
1 In the Logical Table Editor,on the Physical View tab, besidethe Mapping
pane, click Add.
The column name must match the name used in the SQL statement.
3 In the Data Type drop-down list, select the data type for the column.
4 In the Precision/Length box, type the length for the column.
The following image shows the columns defined for the logical view:
in
Notice
the SQL
thatstatement.
there is a column
Allof the
in the
columns,
Mappingpaneforeachcolumnreferenced
with the exception of
Processing_Time, are existing columns that you can drag and drop from other
logical tables. You have to add Processing_Time manually as this column exists
only in this logical view.
Withcannow
You theSQLsavethe
andcolumns
logicaldefined,
view. you have finishedcreatingthe logicalview.
When you do, it is added as a new logical
table inthe Tables folder of your project.
Structure
In this example, because all oftheattribute columnsin the logical view are
defined usingexisting columns fromother logicaltables, mapping the attributes
is very simple.
In the Logical TableEditor, when you define a logical view column by dragging
view
an existingcolumn
automatically appears
from another logicaltable intothe Mapping pane, thelogical
as a potential source table for the attribute or fact
that maps to that existing column.
Mapping AttributeColumns
When you create thelogical view, if you define the Order_ID column using the
existing Order_ID column in the LU_ORDER table, LVW_PROCESSING
automatically appears as a potential source table for the ID form of the Order
attribute. All you need to do to map the Order ID form to the logical view is to
select LVW_PROCESSING as a source table.
If you have a logical view column that maps to an existing attribute or fact
butuses adifferentcolumnname than those already defined for the object,
you have to create a new expression to map the logical view column just as
following
To map
youthe
would
steps:
attributes
for anytoheterogeneous
the LVW_PROCESSING
column expression.
table, you need to performthe
2 Edit
a sourcetable
theID form
forthe
ofeach attribute so thatLVW_PROCESSINGis selected as
attribute form expression.
To map the fact column in the LVW_PROCESSING table, you need to perform
the following steps:
The following image shows the definition for the Processing Time fact:
Fact
ProcessingTime
After mapping the attribute and fact columns, you can then update the schema.
You can create a Processing Time metric that aggregates the Processing Time
fact and then design a report to display the desired information from the logical
view. The result set looks like the following:
[LU_SHIPPER] a13
where a11.[Customer_ID] = a12.[Customer_ID]
and a11.[Shipper_ID]= a13.[Shipper_ID]
Afterlogical
Use viewstocreate
completing thistopic, ayou
distinctlist of elements for attributes
willbe ableto:
in star
schemas.
In the Advanced Schema Design lesson, you learned that star schemas, which
contain a single lookup table for each dimension, can be problematic. Joining
aggregate facttables to dimension lookup tablesat any level otherthan the
lowest level results in multiple counting. If youuse a star schema, you need to
include thehigher-level lookup tables to use aggregate fact tables.
Logical views provide an alternative to creating physical tables for the higher-
level
informationfor
logical
lookup
viewscan
tables.Instead,
particular
then function
attribute
youcan
as the
levels
definelogical
lookuptables
from theviews
dimensiontable.
for higher-level
that selectthe
attributes.
These
of
Forthe
example,
geography
youdata
haveasfollows:
asingle lookuptable in yourstar schema that stores all
Dimension Table
In this example, the LU_GEOGRAPHY table stores the data for all levels of the
Geography dimension—Store, District, and Region. Store_ID is the primary key,
so the table alreadycontains a unique list of stores. However, it does not contain
a distinct list of districts or regions.
You have an aggregate fact table that you want to use in conjunction with this
dimension lookup table:
If you look at the sample data for the FACT_REGION_SALES table, noticethat
it does not match the values displayed forthe Sales metric inthereport.
Because the FACT_REGION_SALES table does not join to adistinctlistof
regions, the sales values are counted multiple times—once for each occurrence
of a corresponding region ID in the LU_GEOGRAPHY table.
group by a11.[Region_ID]
This same behavior would alsooccurifyou tried to use an aggregate fact table at
the district level. With only the dimension table in place, aggregate fact tables at
the
list
One
attributes
of
region
method
region
is
anddistrict
tobuild
of
and
creating
districtelements.
two
levels
alogicalviews
distinct
areuseless
listof
(onefor
elements
because theyneedtojointoadistinct
for theRegion and District
Depending
logical
data without
viewfor
onusing
yourreporting
theStore
thedimension
attributeso
requirements,
lookup
that table.
youcanjoin
you mayalso
storedata
choose todistrict
to build a
To create distinct lookup tables for the District and Region attributes using
logical views, you need to do the following:
1 Createalogical viewtable.
LU_GEOGRAPHY that selects thenecessary district information fromthe
2 Create a logical view that selects the necessary region information from the
LU_GEOGRAPHYtable.
3 Map the ID and DESC forms for the District and Region attributes to the
appropriate columns in the logical views.
LogicalView—District
The SQL statement and column definitions for LVW_DISTRICT look like the
following:
Logical View—Region
LVW_REGION contains the Region_ID and Region_Desc columns fromthe
LU_GEOGRAPHY table. The Region_ID and Region_Desc columns provide a
distinct list of region elements.
The SQL statement and column definitions for LVW_REGION look like the
following:
1
2 Openthe
Modify the
District
ID formattributein
so that LVW_DISTRICT
theAttribute Editor.
is selected as a source table for
After
updateyou
themaptheDistrict
schema and thenusethese
andRegion attributes
logical views
tothelogical
tojointo aggregate
views, you
fact
can
If you look at the sample data for the FACT_REGION_SALES table, notice that
it now matches the values displayed for the Sales metric inthe report. Because
the FACT_REGION_SALES table joins to a distinct list ofregions in
LVW_REGION, the sales values are accurately aggregated.
The FROM clause contains a derived table expression. This expression is the
SQL statement for LVW_REGION.TheWHERE clause joins the
FACT_REGION_SALES table to the region information in LVW_REGION.
Lesson Summary
In this lesson,youlearned:
• A physical table stores the actual data and resides in the data warehouse.
• attributes
also
A logical
stores
table
andfactsthat
information
resides inabout
mapto
the metadata
the
columnsinitsassociated
structureofthe
andstores information
physicalphysical
table. table. It
aboutthe
• A table aliasenables youto create multiple logical tables that allpoint to the
same physical table in the data warehouse.
• A logical viewisa SQL query that you create and then execute against tables
in
column
the datawarehouse.
that iscreated aspartofthe
It contains a SQL
logical
statement
view. and definitions foreach
• the
WhenSQL
you
forexecute
the logical
a report
viewthat
is inserted
contains
into
objects
the report
that maptoa
SQL where
logical
the table
view,
• derived
common
You cantable
reference
tableexpression
expressions. in a query. Youdo
the sameexpression multiple times when
nothavethis flexibilitywith
youusea
• You can nest a derived table expression inside another derived table
expression inaquery. You donothavethis flexibilitywith commontable
expressions.
• The primary difference between logical views and database views is that you
create and maintain logical views at the application level rather than the
database level.
• You can use logical viewstocreate complex attribute form expressions and
fact expressions using multiple tables.
• A logical view can only contain a single SQL statement. The SQL statement
• When
can have
youmultiple
createalogical
levels (such
view,you
as nested
mustdefine
expressions).
acolumn
• You can also use logical views to address SCDs, create lookup tables to
handle star schemas, remove recursive hierarchies, anda varietyof other
reporting issues.
4
MANYTOMANY RELATIONSHIPS
Lesson Description
In this lesson, you will learn about many-to-many relationships and their
impact on report analysis. You will learn how to design the data warehouse
model and schema tobestsupport them in the MicroStrategy reporting
environment.
Lesson Objectives
Describe advanced data modeling concepts and explain how to design the
datawarehouse model and schema to support them in a MicroStrategy project.
in a MicroStrategy project.
ManytoMany Relationships
ReviewofAttributeRelationshipsExamplesofManytoManyRelationships
The following logical data model shows a variety of direct and indirect attribute
relationships:
•Directly
One-to-one
relatedattributes have oneof the following typesof relationships:
• One-to-many
• Many-to-many
• Course and Student—a college offers many courses and a specific course can
be takenbyoneormore students.
• Product and Invoice—a product can appear on many invoices and an invoice
can havemany products.
adequately
If the structure of your logicaldata modeland data warehouseschema doesnot
address the complexities of querying attribute data that contains
many-to-many relationships, you can have the following problems:
When you have attributes with many-to-many relationships, you need to design
the logical data model and data warehouse schema to ensure that users can
answer any relevant questions about the data. Otherwise, users may lose the
ability to analyze certain business scenarios.
An item can come inmultiple colors, and the samecolor can applyto multiple
items. The LU_COLOR and LU_ITEM tables storea distinct listof all colors
and items respectively. The REL_ITEM_COLOR table enables you to join data
in the two lookup tables, mapping the relationships between items and colors.
The FACT_ITEM_SALES table storessales byitem and date.
items
Answeringthesecond
along
only
which
second
The presenceofthe
containsthe
are
withbothitem
itemssell,
question.
available,
not
This
Item_ID
not
REL_ITEM_COLOR
the
andcolor
relationshiptableonly
question
whichitemand
color
column,
oftheitems
information.TheFACT_ITEM_SALES
requires
notColor_ID.
color
atableisnot
fact
that
combinationsareactually
lets
table
sell.
you
Therefore,
Because
thathas
sufficient
analyzeinwhich
the
sales
itonlystores
toanswer
color
information
colors
table
sold.
the
you
have
If youtowant
captureboththe
to be able to analyze
item andcolordataforsalesinyour
thesalesofitemandcolor combinations,
source system.
Thencolor
and youhave
information.The
tomodifythefollowing
FACT_ITEM_SALES
illustration shows
tablethesame
toinclude
scenario,but
both item
the FACT_ITEM_SALES table now contains both the Item_ID and Color_ID
columns:
The fact table alone is not sufficient to answer the first business question.
You can only retrieve item and color combinations that actually sold from
the FACT_ITEM_SALES table. If you have item and color combinations
that are available but have never sold, those item and color combinations
are not present in the fact table.
To ensure that you do not lose analytical flexibility when dealing with many-to-
many attribute relationships, you need the following tables in your data
warehouse schema:
• A
of table
elements
that relates the attributes, identifying all the possiblecombinations
• A
parent
fact tablewith
and child columnsthat
attributes enable you toaccurately jointoboth the
This same structure has to be in place for any fact table that contains facts
you wantto analyze with respect tothis particular attributerelationship.
Multiple Counting
Lost analytical capability isnotthe only issue that can arise when dealing with
many-to-many relationships.You can also experience problems with multiple
counting in certain scenarios. If you try to aggregate data to the level of the
parent attribute inthe many-to-many relationship (or any attribute level above
the
distinct
The
table
parent),
following
structure,
relationshiptable
multiple
illustration
wherecountingcanoccur
onlythe
shows
butnotinthefact
Item
the attributeispresent
ColorandItem
whentable.
the relationship
scenariowiththe
inthefact
exists
table:
ina
original
To understand how multiple counting can occur, consider the following sample
data:
indicatethattheIDcolumns the
inthe
data
physical table include
Itis notdescription
intended todata.
Multiple counting occurs when you run any report that requests sales
color
information
way toisrelatein
not inthe
thesaleofan
conjunctionwith
FACT_ITEM_SALES
item to
itemcolors.
the colorofthat
table.Within
The difficulty
particular
the facttable,
liesin
item.
thefactthat
You
thereisno
can only
hats.
determine
For example,
Basedwhichitemssold,
onthe
youmaywantto
sampledata,
not
runareport
the
theresultsetlooks
colorsofthe
that aggregates
items
like that
the thetotal
following:
sold. salesfor
However, what if you want to know the colors of items that sold? You may want
to run areport thataggregatesthetotal sales forallreditems.Based on the
sample data, the result set looks like the following:
sum(a11.[Sales_Amount])
from [FACT_ITEM_SALES] a11,
AS WJXBFS1
[REL_ITEM_COLOR] a12,
[LU_COLOR] a13
where a11.[Item_ID] = a12.[Item_ID] and
a12.[Color_ID] = a13.[Color_ID]
and a12.[Color_ID]in(1)
group bya12.[Color_ID]
Notice that the REL_ITEM_COLORtableis in the FROM clause along with the
LU_COLOR and FACT_ITEM_SALES tables. In the WHERE clause, the SQL
Engine joins the FACT_ITEM_SALES and REL_ITEM_COLOR tables using
only the Item_ID column since Color_ID is not part of the fact table. However,
the SQL Engine has to join the REL_ITEM_COLOR and LU_COLOR tables
result,
even
WHERE
basedthough
onthe
theclause
SQLthere
Color_ID
Engine
toretrieve
isnoway
retrieves
column.Thereisalsoafiltering
onlyitems
of knowing
the sales
whose
whether
forColor_IDvalue is 1 (red).
all itemsthatareavailable
theitemssoldwere
condition inthe Asinared,
actually
red.
Finally,
color? You
dresses. what
Based you want
maywant
if onthetoruna
sample
toknow
data,
report
howmuchyou
the
that
result
aggregatesthe
setlookslike
sold ofan
totalsales
the
itemfollowing:
in for
aparticular
all red
Again, there isnoway to accurately determine the total sales for all red dresses
since the Color attribute is not represented in the FACT_ITEM_SALES table.
Nevertheless, thereport shows that you sold $50 of red dresses. This amount is
the total sales forall dresses. This number could be correct if all the dresses that
were solddata,
available werered,but
thistotalitisiswhat
moreyou
likely incorrect.Then again, given the
obtainwhentheSQL
Engine aggregates the
dress sales data from the Item level to theColorlevel.The SQL for this report
looks like the following:
select a12.[Color_ID] AS Color_ID,
max(a13.[Color_Desc]) AS Color_Desc,
a11.[Item_ID] AS Item_ID,
max(a14.[Item_Desc]) AS Item_Desc,
sum(a11.[Sales_Amount]) AS WJXBFS1
from [FACT_ITEM_SALES]a11,
[REL_ITEM_COLOR]a12,
[LU_COLOR] a13,
[LU_ITEM] a14
where a11.[Item_ID]= a12.[Item_ID] and
a12.[Color_ID]= a13.[Color_ID] and
a11.[Item_ID] = a14.[Item_ID]
and (a12.[Color_ID] in (1)
and a11.[Item_ID]
group
a11.[Item_ID]
bya12.[Color_ID],
in(2))
Notice that theREL_ITEM_COLOR table is in the FROM clause, along with the
LU_COLOR, LU_ITEM, and FACT_ITEM_SALES tables. In the WHERE
clause, the SQLEngine joinsthe FACT_ITEM_SALES and REL_ITEM_COLOR
tables using only the Item_ID column since Color_ID is not part of the fact
table. However, the SQLEnginehastojointhe REL_ITEM_COLOR and
LU_COLOR tables based on the Color_ID column. There are also two filtering
conditionsinthe WHERE clause—one to retrieve only items whose Color_ID
value is 1 (red) and one to retrieve only items whose Item_ID value is 2 (dress).
As a result, the SQL Engine retrieves the sales for all dresses that are sold, even
though
In both there
because ofthe
the Color_ID
islast
no tworeport
wayof
column
knowingwhetherthedresses
examples,
is notpresent
multiple
inthe
countingoccursprecisely
FACT_ITEM_SALES
sold were actuallytable.
red.
Just as you needa relationship tableanda fact table that enables youtojoin to
Resolving
both
counting whenyou
ManytoMany
capability,thesesame
theparent andchild
aggregate Relationships
requirements
attributesto
dataabove
areensure
also
thelevelofthe
necessaryto
that you donot
child
preventmultiple
attribute.
lose analytical
As you can see, working with attributes that have many-to-many relationships is
more complex than other types of direct relationships. You need to design the
logical data model and data warehouse schema to account for the difficulties
they
You canimplement
can pose. many-to-many relationshipsusing oneofthe following
methods:
•EachCreating
of thesemethods
acommoninvolvesadifferent
childattribute designforthe
• A fact table structure that enables you to accurately join fact data to both the
parent and child attributes
youwantto
This samestructure
analyze with
hastobein
respectto
placeforany
this particular
fact tablethat
attribute relationship.
contains facts
In the following topics, you will learn about each of these methods for resolving
many-to-many relationships using the following sampledata:
Sample Data
Creating a Separate Relationship Table
Creating a separate relationship table is the most straightforward way in which
to effectively manage many-to-many relationships. This method keeps the
many-to-many relationship intact and structures the data warehouse schema to
resolve any analysis or aggregation issues. You have already seen this method in
the examples provided earlier.
You retain the many-to-many relationship between the two attributes and
create a separate relationship table that stores all of the possible attribute
element combinations. You alsoaddboththe parentandchildattributeIDs to
any facttablesthat contain factsyouwantto analyzewithrespecttothis
attribute relationship. The following illustration shows the structure of the
logical data model and schema for the Color and Item scenario if you use this
method:
You map the Color attribute to the Color_ID column in the LU_COLOR,
REL_ITEM_COLOR, and FACT_ITEM_SALES tables. You map the Item
attribute to the Item_ID column in the LU_ITEM, REL_ITEM_COLOR, and
FACT _ITEM_SALES tables.
You can then configure a many-to-many relationship between the Color and
Item attributes using REL_ITEM_COLOR as the relationship table. In this
relationship, Color is the parent attribute, and Item is its child.
If you
can runareportthatjustcontains
want toview a listofall thepossible
the ItemandColorattributes.
item and color combinations, you
[REL_ITEM_COLOR] a12,
[LU_ITEM] a13
where a11.[Color_ID]= a12.[Color_ID]and
a12.[Item_ID]=a13.[Item_ID]
If you want to view theitemand color combinations that have sold, you can run
a report that contains theItem and Color attributes along with a Sales metric.
Report Result Set with Sales for Item and Color Combinations
This report containsthe two attributes and a metric. When you have both
attributes on a report with a metric, the SQL Engine uses the fact table to
retrieve the list of all item and color combinations that have sold. The SQL for
this report looks like the following:
select a11.[Item_ID] AS Item_ID,
max(a13.[Item_Desc]) AS Item_Desc,
a11.[Color_ID] ASColor_ID,
max(a12.[Color_Desc]) AS Color_Desc,
sum(a11.[Sales_Amount]) AS WJXBFS1
from [FACT_ITEM_SALES] a11,
[LU_COLOR] a12,
where a11.[Color_ID] = a12.[Color_ID]
[LU_ITEM]a13
and
a11.[Item_ID] = a13.[Item_ID]
group by a11.[Item_ID],
a11.[Color_ID]
You
which
consists
createacompound
eliminatesthe
oftheIDs of needfor
thechild
keyforthelower-level
aseparate relationship
attributein
table.the
This
relationship,
compound key
Since Item is the lower-level attribute, you create a compound key for this
attribute that consists of the Item_ID and Color_ID columns.
You map the Color attribute to the COLOR_ID column in the the LU_COLOR
table. You map the Item attribute to the combination of the Item_ID and
Color_ID columns in the LU_ITEM and FACT _ITEM_SALES tables.
You
attributesusingthe
canthen configurea
LU_ITEM
one-to-many
tableto relationship
relatethetwo between
attributes.In
theColorand
this Item
column
FACT_ITEM_SALES
separate
While this
intheLU_ITEM
relationship
method eliminatesthe
table,
tables
table.
itare
alsohasdisadvantages.
Also,
more
many-to-manyrelationship
joinsbetween
complex because
theLU_ITEM
Youhave
you have
andthe
tostore
toand
join
needfora
an
onextra
a
Because
attribute,
challenges in the
thismethod
you losereporting
theusesacompound
abilitytoview
environment.
items
attribute,
With
independent
Item
italso
defined
presentssome
of their
asacompound
colors. Each
item andcolorcombinationis
cannot mergerow headers foritemstreatedasa
with thesame
separatedescription
record.Therefore, you
but different
colors. If you run a report that lists all item and color combinations, the result
set looks like the following:
The description for each item is repeated for every color in which it is available
since you cannot merge the row headers. The SQL for this report looks like the
following:
select a11.[Item_ID] AS Item_ID,
a11.[Color_ID] AS Color_ID,
a11.[Item_Desc]ASColor_ID0,
a11.[Color_ID] ASItem_Desc,
a12.[Color_Desc] AS Color_Desc
from
[LU_COLOR]
[LU_ITEM]a11,
a12
where a11.[Color_ID] = a12.[Color_ID]
In the WHERE clause, notice that the SQL Engine relates the two attributes
using the Color_ID columnsintheLU_ITEM andLU_COLORtables. A
separate relationship table is no longer necessary.
Having
aggregating
that showsthe
Itemallthe
defined
salesforallitem
salesdata
asa compound
forandcolor
anitem,
attribute
regardless
combinations
also prevents
ofcolor.If
thathavesold
youfrom
you run areport
and
include subtotals and grand total on the report, the result set looks like the
following:
The grand total onthereport correctly displays the total sales for all items.
However, because each item and color combination istreated as a separate
item, the item-level subtotal cannot show the totalsalesforeach item for all
colors
combination.
that sold.
TheInstead,
SQL foryou
thishave
report
a separate
looks like
subtotal
the following:
for every item and color
Attribute
Creating aHidden Common Compound Child
Another
direct relationship
alternativebetween
for resolving
thetwoa many-to-many
attributes and relationship
createa commonchild
isto remove the
attribute that relates them. This child attribute is hidden because it will not be
used to display on a report but only to ensure that the report result are accurate.
With this method, you create a new child attribute that is a concatenation of the
original
has
include
a one-to-manyrelationship
theIDofboth
attributes.This attributetobothparent
isachild of eachoftheoriginal
attributes.You stillattributes.It
needto
For any illustrations in this course that show logical data models, hidden
attributes are always indicated by dotted lines.
The ItemColor attribute represents allthe item and color combinations. Itisa
LU_ITEM_COLOR
compound attributeand
table.itThe
relates
FACT_ITEM_SALES
to both the Color and
table
Item
is still
attributes
keyed with
inthethe
The ItemColor attribute is a real attribute that exists in the logical data model.
enviroment,
browse.
However,Itsbecause
primarypurpose
you should
it doesnotincludeit
notcarryany
is torelate
inlogical
theColorand
the user
meaning
hiearchy
Itemattributes
or for
users
users
inthe
toso
viewor
reporting
that
you can consolidate the join path to the lookup tables. Because this attribute is
used only in the background, you should make it a hidden attribute.
combinations,youcan
attributes:
For Report
example,ifyou
Result Setwith
want
runa
to viewa
All
report
Item
list
that
and
ofjust
allColor
the
contains
possibleitem
Combinations
theItemandcolor
andColor
This report correctly displays the various item and color combinations.
Although the ItemColor attribute is not on the template, the SQL Engine uses it
to join the item and color data. The SQL for this report looks like the following:
select distinct a12.[Item_ID] AS Item_ID,
a13.[Item_Desc] AS Item_Desc,
a11.[Color_ID] AS Color_ID,
a11.[Color_Desc] AS Color_Desc
from [LU_COLOR]a11,
[LU_ITEM_COLOR] a12,
[LU_ITEM] a13
where a11.[Color_ID] = a12.[Color_ID] and
a12.[Item_ID]=a13.[Item_ID]
If you want to view the item and color combinations that have sold, you can run
a report that contains the Item and Color attributes along with a Sales metric:
Set
ReportResult with Sales for Itemand Color Combinations
if you
LogicalData Model and Schema—Common ChildAttribute
usethismethod:
The SKU attribute represents all the item and color combinations. It has itsown
ID column, but it relates to both the Color and Item attributes in the LU_SKU
table. You key the FACT_ITEM_SALES table using the SKU attribute ID rather
than the Item and Color attribute IDs.
You map the SKU attribute to the SKU_ID columns in the LU_SKU and
FACT_ITEM_SALES tables. You map the Item attribute to the Item_ID column
Color_ID
You
in the
canthen
LU_SKU
columnin
configure
andLU_ITEM
theaLU_SKU
one-to-manyrelationshipbetween
tables,and
and LU_COLOR
you maptables.
theColorthe
attribute
Color and
tothe
SKU
and Itemand SKU attributes, using theLU_SKU table to relate the SKU
attribute to both Color and Item. In this relationship, Color and Item are the
parent attributes, and SKU is a child of both attributes. The Color and Item
attributes
This methodareno
producesthe
longer directly
same related
resultsetsas
to eachifyou
other.
created a separate
For
combinations,
attributes:
example, ifyou
youcan
want
run
toaview
reportthatjust
alistofall the
contains
possible
theitemandcolor
Itemand Color
Report Result Set with All Item and Color Combinations
This report correctly displays the various item and color combinations.
join
Although
the item
the and
SKUcolor
attribute
data.isThe
notonthe
SQL fortemplate,
this reportthe
looks
SQLlike
Engine
the following:
uses it to
[LU_SKU] a12,
[LU_ITEM] a13
where a11.[Color_ID]= a12.[Color_ID]and
a12.[Item_ID]= a13.[Item_ID]
If you wanttoview the item and color combinations that have sold, you can run
a report that contains the Item and Color attributes along with a Sales metric:
Report Result Set with Sales for Item and Color Combinations
This report correctly displays the sales for each item and color combination as
well as the appropriate subtotals and the grand total. Although the SKU
attribute is not on the template, the SQL Engine uses it to join the sales data to
the item and color data. The SQL for this report looks like the following:
select a12.[Item_ID] AS Item_ID,
max(a14.[Item_Desc]) AS Item_Desc,
a12.[Color_ID] AS Color_ID,
max(a13.[Color_Desc])
sum(a11.[Sales_Amount])
from [FACT_ITEM_SALES]AS
a11,
ASWJXBFS1
Color_Desc,
[LU_SKU] a12,
[LU_COLOR] a13,
[LU_ITEM]
where a11.[SKU_ID]=
a14 a12.[SKU_ID]and
a12.[Color_ID]=a13.[Color_ID] and
a12.[Item_ID] = a14.[Item_ID]
group by a12.[Item_ID],
a12.[Color_ID]
create
Thedata
the only
thewarehouse
disadvantage
lookuptable
schema
ofthismethod
for the
to new
support
attribute
liesin
creating
and
thechangesyou
thenew
keyfactattribute.
tables
haveto
usingthe
Youhave
maketo
IDto
of
In this lesson,youlearned:
• If the structure of your logical data model and data warehouse schema does
not adequately address the complexities of querying attribute data that
contains many-to-many relationships, you can lose analytical capability and
have problems with multiple counting.
• When
attribute
table
Multiple
that
youhave
counting
or
definesthedirect
anyattribute
attributeswitha
occurs levelabove
when
attributerelationship
you
many-to-manyrelationship,
aggregate
the parent.
dataanda
to thelevelyou
of theparent
needa
Lesson Description
In this lesson, you will learn about impact of attribute roles on report analysis.
You will learn how to design the data warehouse model and schema to best
support themin the MicroStrategy reporting environment.
Lesson Objectives
Describe attribute roles and explain the four methods for implementing themin
a MicroStrategy project.
implementingthemin
•AfterDescribe role
completing attributes
thetopicsaMicroStrategy
and
in this explainthe
lesson, youwill befour
ablemethods
to: for
project.
Attribute Roles
WhatIsaRoleAttribute?Solution1:Creating
Explicit Table
Aliases
Solution 2:Enabling Automatic Attribute Role Recognition
Solution 3: Creating Table Views
4:
Solution Creating Logical Views
Describe role attributes and explain the four methods for implementing them in
a MicroStrategy project.
What
A Isa Role Attribute?
role attributerefersto any time
A
dimension.
orsingle
dimension
all ofthe
attribute
consistsof
For
attributes in adimension
example,
ina dimension
role
you
attributes,
could
or have
hierarchy
it
mayisreferredto
ShipTime
be role
mayattributes.Ifan
function
and
asarole-playing
Order
as aTime
roleattribute,
entire
dimensions
lookup
whereusedsothatboth
Toremove
Translating
alllevelsof
tables. ambiguity,
attribute
timetables
functionas
role
whento SQL,
writing
rolethe
attributes
the
jointypeis
SQL and
statement,
reference
referred
table
tothe
asaliases
same
self- join.
setof
are
In your reporting environment, you analyze information by both Store City and
Customer City. These are two separate attributes inthe reporting environment.
Therefore, you create two attributes in the project,but you map both attributes
to the City_ID and City_Desc columns in the LU_CITY table. The logical data
model for these two attributes looks like the following:
The
attribute
ID formsforthe
formexpressions.
StoreCityand
Bothattributes
CustomerCityattributes
have an attributeform
each havetwo
expression
that maps them to the City_ID column in the LU_CITY table, which functions
as the primarylookup table. Inaddition, the StoreCityattribute also maps to
the St_City_ID
City attribute mapsto
columnthe
in Cust_City_ID
the FACT_CUST_SALES
column inthe
table,and
FACT_CUST_SALES
the Customer
table.
a11.[St_City_ID] = a12.[City_ID]
group by a11.[St_City_ID],
a11.[Cust_City_ID]
In the SELECTclause, notice that the SQL Engine attempts to retrieve the
descriptions for StoreCityandCustomerCity from the sametable.Similarly, in
the WHERE
lookup tableatthesametimefor
clause, theSQLEnginetriestojoin
both attributes.Thisjoin
from thefacttabletothe
retrieves data only
for recordswherethestorecity and customercityarethesame. If no records
exist where the store andcustomercitiesare identical, the query does not return
any data.For example, theFACT_CUST_SALES table contains the following
records:
Because the join only finds the rows where the store city and customer city are
alike, the only record thatis returned in the result set is the last row in the table
in
attributes.
which Herndon
The other
is the
records
valueinfor
which
boththe
thevalues
Store City
for both
and attributes
Customer are
Citydifferent
are not included in the result set.If the record for the Herndon store were
removed from the FACT_CUST_SALES table, the report would not return any
data.
so
twice
To obtainan
that it
accurate
can instantiate
result set,
thetheSQL
table twice
Engine
inthe
must
same
beable
query—once
to aliasthe
to get
table
the
descriptions for theStore City attribute and onceto get the descriptions for the
Customer
times,
You
• Creatingexplicit
canyoumust
enable
City attribute.For
attribute
configure
tableroles
your
aliases
the
using
environment
SQLthe
Engine
following
tobe
sothatyou
four
ablemethods:
toalias
can userole
atableattributes.
multiple
• Creating logical
table views
viewsinthedata
in MicroStrategy
warehouse
Solution 1:Creating Explicit Table Aliases
One possiblewaytodeal withroleattributesis tomanuallydefine explicit table
aliases foranylookup tablesthatreferencerole attributes.
3 In the Tables folder, right-click the table you want to alias and select Create
Table
This action
Alias.creates a logical table alias for the table in the Tables folder. By
4 Repeat steps 1 to 3 for each table aliasyouwant to create for the table.
You create one table alias for each role attribute that you want to map to
The following
the lookup
image
table.shows the option for creating explicit tablealiases:
2 Ensure that the LU_CITY table alias is not selected as a source table for the
ID or DESC forms of the Store City attribute.
4 Select the table alias created for the LU_CITY table as the primary lookup
table forthe CustomerCity attribute.
5 Ensure that the LU_CITY table is not selected as a source table for the ID or
DESC forms of the Customer City attribute.
6 Update the project schema.
Now, if you run the same report, the SQL looks like the following:
select a11.[St_City_ID] AS City_ID,
max(a13.[City_Desc]) AS City_Desc,
a11.[Cust_City_ID] AS City_ID0,
max(a12.[City_Desc]) ASCity_Desc0,
sum(a11.[Revenue])AS
from [FACT_CUST_SALES]a11,
WJXBFS1
[LU_CITY] a12,
[LU_CITY] a13
where a11.[Cust_City_ID]= a12.[City_ID] and
a11.[St_City_ID]
group by a11.[St_City_ID],
=a13.[City_ID]
a11.[Cust_City_ID]
The SQL Engine uses the LU_CITY table to obtain the descriptions for the Store
City
Customer
attribute
Cityand
attribute.
theLU_CITYtablealiasto obtain thedescriptionsforthe
When you use explicit table aliasing, the SQL references both the original
lookup table and thetablealiasbythephysical name of the lookup table.
For example, the SQL shown on the previous page references both the
LU_CITYtableandtheLU_CITY table alias as LU_CITY in the FROM
clause.
5 Clear the Use default inherited value (Default Settings) check box.
The following image shows the VLDB property for automatic attribute role
recognition:
Now, if you run the same report, the SQLlooks like the following:
select a11.[St_City_ID] AS City_ID,
max(a13.[City_Desc]) AS City_Desc,
a11.[Cust_City_ID] AS City_ID0,
max(a12.[City_Desc]) AS City_Desc0,
sum(a11.[Revenue])AS WJXBFS1
from [FACT_CUST_SALES]a11,
[LU_CITY]a12,
[LU_CITY] a13
where a11.[Cust_City_ID] = a12.[City_ID] and
a11.[St_City_ID] = a13.[City_ID]
group by a11.[St_City_ID],
a11.[Cust_City_ID]
allocatememory
When must
basically
thatit youcreates
enable
create.Because
an
for this task,there isa table
automaticattributerole
alias forthelookup
MicroStrategy
limitonthe
recognition,
Intelligence
in memory theSQL
Server
foreach
hasto
Engine
tablealias
number of role
attributes you can map in a project if you are using the automatic
recognition option. Youcancreate up to100role attributes in any given
project.
Now, if you run the same report, the SQL looks like the following:
select a11.[St_City_ID] AS St_City_ID,
max(a13.[St_City_Desc]) AS St_City_Desc,
a11.[Cust_City_ID] AS City_ID,
max(a12.[City_Desc]) AS City_Desc,
sum(a11.[Revenue])AS WJXBFS1
from [FACT_CUST_SALES] a11,
[LU_CITY] a12,
[LU_CITY_ST] a13
where a11.[Cust_City_ID]= a12.[City_ID]and
a11.[St_City_ID] = a13.[St_City_ID]
group by a11.[St_City_ID],
a11.[Cust_City_ID]
The SQL Engine aliases the original lookup table to obtain the descriptions for
the CustomerCity attribute, and it aliases the view to obtain the descriptions for
the Store City attribute.
Although you can use table views to enable attribute roles, there is some
overhead associated with the creation and maintenance of these views,
especially as the number of roles increases for any single attribute. If you do not
already tablealiasing
explicit
advantagehaveviews
ofthe SQL betterinsolution
in functionality
place,
Enginea attributeMicroStrategy.
role recognition
in termsYou
ofVLDB
maintenanceis
couldalsotake
property. touse
The the
like SQLstatementand
following: column definitionsforthe LVW_STORE_CITY look
attribute
Finally,
logical
there
Givenislittleorno
thesimilarity
views
because
isdefined,
carry
explicit
thesame
difference
in characteristics
tableadvantages
between
aliasing between
enables
them
and in
limitations
youtocontrol
terms
logicalofviews
performance.Generally,
as a anddatabaseviews,
databaseview.
exactlyhowarole
Here
Airport
isanother
and Destination
exampleofrole
Airport attributewhereyou
thathave thesame definition
havetwo attributesOrigin
but play different
(Airport_ID,
rolesLookup
Destination
inthe reporting
Airport
Tablefor
Airport_Code
are
environment.
Airport
defined
and using
Airport_Desc)
Inthe
thissamelookup
example,Origin
asshowntable
below:
Airport
and columns
and
You also have the following fact table in your data warehousethat stores daily
flight details including the number of flights between Origin Airport and
Destination Airport:
Both Origin Airport and Destination Aiport share the same lookup
table; however, in the fact table, a separate column exists for each
of their roles (Origin_ID and Destination_ID).
2
1 in
Mapthe
Create
theoriginal
anexplicit
DestinationAirport
LU_tablealiasfor
AIRPORTtable.
attribute
the LU_AIRPORT
totheIDandalldescriptive
table. columns
3 Ensure that the LU_AIRPORT table alias is not selected as a source table
for the IDorany of thedescriptive formsof the DestinationAirport
attribute.
4 Map
tabletheOrigin
alias created
Airportattribute
forthe LU_AIRPORTtable.
totheID andalldescriptive columns in the
5 Select the table alias created for the LU_AIRPORT table as the primary
6 lookup
Ensure tablefor
thatthe the Origin Airportattribute.
Now, if you run the same report, the SQL looks like the following:
max(a13.[AIRPORT_CODE])
select a11.[Origin_ID] AS
ASAIRPORT_ID,
AIRPORT_CODE,
max(a13.[AIRPORT_DESC]) AS AIRPORT_DESC,
a11.[Destination_ID] AS AIRPORT_ID0,
max(a12.[AIRPORT_CODE]) AS AIRPORT_CODE0,
max(a12.[AIRPORT_DESC])AS AIRPORT_DESC0,
sum(a11.[NUM_FLIGHTS])ASWJXBFS1
from [FACT_AIRPORT] a11,
[LU_AIRPORT] a12,
[LU_AIRPORT]a13
Notice that the SQL references both the original lookup table and the table alias
by the physicalnameofthelookup table.
‘ file Edit Miew Insert Fgrmat Qata Qrid Move V_Vindow Help
Execution complete Execution Time: 00:00:00 Rows: 12 Columns:1 LocalTemplate Standard _;.j
Lesson Summary
In this lesson,youlearned:
• A role attribute refers to any time you have a column in a single lookup table
that is used to define more than one attribute. One set of data plays multiple
roles inthe reporting environment.
• Mapping two attributes to the same ID and description columns in the data
• You
lookup
warehouse
cantable to retrieve
enable
causes
attribute
problems
attribute
rolesby
whendescriptions
creatingexplicit
you need tojoin
to display
from onareport.
fact tables tothe
Lesson Description
In this lesson, you will learn about each of these concepts and their impact on
report analysis. You willlearn how to design the data warehouse model and
schema tobest supporttheminthe MicroStrategy reporting environment.
Lesson Objectives
Describe advanced data modeling concepts and explain how to design the
datawarehouse model and schema to support them in a MicroStrategy project.
ragged
Ragged
structure
StructureofSales
tothehierarchy:Data
There are four levels of data in the warehouse for the Sales hierarchy. They map
to the logical data model as follows:
does not exist uniformly at every level to populate each cell in the report.
Ragged hierarchies also pose a challenge when drilling. If a report contains an
attributefromaragged
hierarchy, values may notexistfor
hierarchyandevery
youneedto
rowof data
drill
inthe
tootherlevelsin
original report.
the
Changing the data model to directly relate the Region and Account Executive
attributes also means modifying the underlying structure of the
LU_ACCT_EXEC table. You need to add the ID column for Region to the
LU_ACCT_EXEC table to map the relationship between the two attributes:
inthe
ragged
from aragged
structureofthe
hierarchyon
hierarchy,it doesnotresolve issues withdisplaying data
logicfor
method
fromthe
onlyforvery
can
the
lowest
leadtoeven
Account
specific
level Executive
drill
tohigher
bigger
paths.
levelsina
attribute
issueswhenit
Resolvingdiffers
hierarchy.
ragged
comes
hierarchies
Inthis
to rolling
case,
using
updata
the this
roll-up
Populating
generated
with
A better
attribute NullAttribute
methodforresolving
values.
elements
Inserting
ofeither
values
ragged Values
thechildorparent
effectively
hierarchies
eliminates
istopopulate
attributeor
gaps inthenull
awithsystem-
raggedvalues
hierarchy.
For example, the following illustration shows the original data in the
LU_ACCT_EXEC table:
However, if yourun this report, Sara Kaplan and Dave Williams arenot
included in the result set since their respective market IDs are null:
Now, if you run the same report, the result set looks like the following:
Alternately, you could populate the empty cells for the Market_ID column with
the values for the Account Executive attribute. Then, the LU_ACCT_EXEC table
looks like the following:
Now, if you run the same report, the result set looks likethe following:
Whether you choose to populate empty cells withthe parentor child attribute
values completely depends on which action provides the most business value to
users as they view reports.
If inserting parent or child attribute values does not make sense in your
business environment, you can also populate theempty cellswith system-
generated IDs that map to descriptions that indicate that a value does not exist.
For example, you could generate market IDs for account executives who are not
assigned to a market. In the lookup table for the Market attribute, these IDs
map to description columns that indicate that no market is assigned. Then, the
LU_ACCT_EXECtable
LookupTable for Account
looks likethe
Executive
following:
with Generated Values
Report Result
Split Hierarchies
WhatIsa SplitHierarchy?
Creatinga Joint Child
ailments.
betweenthe
Splithierarchies
parentand
canbe present
child attributes.
without many-to-many relationships
The problemwith a split hierarchy is that it provides two paths that you can use
to join to fact tables. The lookup tables (and relationship tables in the case of
distinct
many-to-many
join paths.You
relationships)for
can useeither
each path
parent-child
tojoin tofacttables
attributeform separate,
for metrics that
are contained in a report. Nonetheless, the SQL Engine optimizes the path to
fact tables, so it is forced to make a choice.
A split hierarchy may not pose join issues if each child attribute inthe split
between
For split
joins
tothesamefacttable.
the
hierarchiesin
toa
parent
different
andwhich
setof
children,
thereare
factthe
tables
split
one-to-oneor
andyou
results never
in theSQL
one-to-many
usethe attributestojoin
relationships
Engine
consistently choosing one join path over the other, even though that path may
not be the most efficient way of joining from the fact tables to the parent
attribute forinwhich
hierarchies
many-to-many all relationship
queries.
there
Thissame
aremany-to-many
furtherproblem
compounds
alsoarises
relationships.
the issue.
whenyou
Because
However,
have
the the
split
SQL
Engine chooses one join path over the other, the join may not occur through the
proper relationship table, which can lead to an inaccurate result set.
For example,the
TablesinPrescriber
Prescriber
Hierarchy
hierarchy contains the following tables:
There are lookup tables for the Prescriber, Drug, andPatientattributes as well
as two separate relationship tables—one to map the relationship between
Prescriber and Drug and one to map the relationship between Prescriber and
Patient. These tables contain the following data:
Table Data
If you run a report to view the drugs that prescribershave prescribed, the result
set looks like the following:
The result set correctly displays each prescriber along with the drugs they have
prescribed. The SQL for this report looks like the following:
select a12.[Prescriber_ID] AS Prescriber_ID,
a13.[Prescriber_Name] AS [Prescriber_Name],
a11.[Drug_ID] AS Drug_ID,
a11.[Drug_Name] AS Drug_Name
from [LU_DRUG] a11,
[REL_DRUG_PRESCRIBER] a12,
[LU_PRESCRIBER] a13
where a11.[Drug_ID]=a12.[Drug_ID] and
a12.[Prescriber_ID] = a13.[Prescriber_ID]
prescribers anddrugs.
You could also run a report to view the patients for each prescriber. The result
set lookslike the following:
ResultSetforPrescriberPatient Information
The result set correctly displays each prescriber along with thepatients for
whom they have prescribed drugs.
[LU_PRESCRIBER] a13
where a11.[Patient_ID] = a12.[Patient_ID] and
a12.[Prescriber_ID] = a13.[Prescriber_ID]
With the Prescription Amount metric as part ofthe report, this result set does
not correctly display the prescriber andpatient relationships. Instead of relating
patients to prescribers who have prescribed drugs forthem, the result set relates
patients to any prescriber that prescribes drugs they have taken, regardless of
whether they actually obtained their prescription from that particular
prescriber. The SQL for this report looks like thefollowing:
select a12.[Prescriber_ID] AS Prescriber_ID,
max(a14.[Prescriber_Name]) AS
Prescriber_Name],
a11.[Patient_ID] AS Patient_ID,
max(a13.[Patient_Name])AS
sum(a11.[Presc_Amt]
from [FACT_PRESCRIPTIONS]
ASWJZBFS1
a11,
Patient_Name,
[REL_DRUG_PRESCRIBER] a12,
[LU_PATIENT] a13,
[LU_PRESCRIBER]a14
where a11.[Drug_ID] = a12.[Drug_ID] and
a11.[Patient_ID] = a13.[Patient_ID] and
a12.[Prescriber_ID] = a14.[Prescriber_ID]
group
In
a11.[Patient_ID]
this case,theresultsetisincorrect
by a12.[Prescriber_ID], because theSQLEnginechoosestojoin
At this point, if the logical table size is equal, the SQL Engine cannot distinguish
which path is themost efficient. Therefore,it simply picks the lookup table
based on the order of the attributes (Drug and Patient) in the system hierarchy.
was
Because
has
Essentially,
SQL
depending
situation
no
Engine
wayof
Drug
inonthe
which
when
choosesto
differentiating
is first
you
attributes
both
inhavea
join
the
choices
system
throughthe
on
between
split
the
seem
hierarchysuchas
hierarchy
report,
to
the
be
LU_DRUG
joinpaths
sometimes
equally
(it efficient.In
table.
thisone,
created
becauseit
you needtojoin
before
the
actuality,
creates
SQLEngine
Patient),
athrough
the
The best way to ensure thattheSQLEngine always selects the most efficient
join
frompath
relationships.
thehierarchy.
and uses tables
Youcanthatprovidethe
resolve split hierarchies
desired result
by creatingjoint
set isto remove
child
the split
Creating
The joint child a Joint Child
involvedinthesplit.It
relateseachof the original parent and child attributesthat are
1 Create a relationship table that includes the parent attribute and both child
attributes.
2 Create joint child relationship between the parent and the two children
attributes using this relationship table.
First, you need to create a relationship table that maps the relationship between
the parent attribute and both child attributes. The relationship table looks like
the following:
After creatingthe relationship table, you need to perform the following steps:
2 Map the ID forms of the Prescriber, Drug and Patient attributes to the
respective ID columns in the REL_PRESCRIBER_DRUG_PATIENT table.
3 Unmap the ID forms of the Prescriber, Drug and Patient attributes from the
4 Remove
the REL_PATIENT_PRESCRIBER
the REL_PATIENT_PRESCRIBER and LU_DRUG_PRESCRIBER
andLU_DRUG_PRESCRIBER tables.
The following image shows the optionfor setting the joint child relationship:
In the illustration above, when you make the Patient and Drug attributes joint
children of Prescriber, you create a structure where the two separate children
attributes involved in the original split are now jointly related to each other
through the parent attribute. Essentially, you modify the logical data model to
look like the following:
Prescriber_Name,
a12.[Patient_ID] ASPatient_ID,
sum(a11.[Presc_Amt]
max(a13.[Patient_Name])ASPatient_Name,
AS
WJXBFS1
from [FACT_PRESCRIPTIONS] a11,
[REL_PRESCRIBER_DRUG_PATIENT] a12,
[LU_PRESCRIBER]a14
[LU_PATIENT]a13,
Notice that the relationship table appears in the FROM clause. The WHERE
clause usesthesamerelationship table to jointothefacttable.Asa result, the
SQL Engine can now efficiently join to the fact table for either the Drug or
Patient attributeanddeliveravalid resultsetthatcorrectlyportrays the
relationships between prescribers and drugs or prescribers and patients.
Recursive Hierarchies
WhatIsaRecursiveHierarchy?FlatteningaRecursive
Hierarchy
Handling Complexities in Recursive Hierarchies
Generally, when you have a recursive attribute like Employee, you want to be
able to run reports that show managers and their corresponding employees. In
columns
attributes.
the database,
ReportinFor
thesametable.
with
thedataforallthree
example,
Managerand
youcouldruna
However,
Employee
levels
on areportthat
ofreport,
Attributes
employees
they
looks
come
are like
logically
from
the following:
the
different
same
The Level1Manager and Level2 Manager attributes represent the two levelsof
management, and the Employee attribute represents the lowest-level employees
who do not manage anyone. Because all three attributes map to the same
columns in the same lookup table, you need to be able to alias the table three
times
on the
To resolve
intheSQLtoretrieve
template.
this issue
By default,the
with recursive
the employeename
SQLhierarchies,
Engine aliasesa
youneed
foreachof
tabletoonly
thethreeattributes
flatten
once.
the recursive
in the lookup table that is aliased. When you run a report with any one of
it
the
displays
Sometimes,
threeevery
attributes
tables
employee
witha
(Level
in
recursive
1thetable
Manager,
structure
for
Level
each2also
attribute.
Manager,
containa
or Employee),
level column,
and
MicroStrategy
example,
does
which
an
not
indicates
Employee
resolve
aLevelSQL
the
issueswithrecursive
1would
Manager
Engine
levelbe“3.”
of doesnot
an
wouldbe
element
Using
look
“1,”aLevel
hierarchiesbecause
ainthe
levelcolumn
at the
recursive
specific
2 Manager
like
hierarchy.
data
thisinside
theelements
wouldbe“2,”
For
a table
Flattening
lookup
To flatten
Flattened
tables a Recursive
the orviews.Thethree
recursive
Lookup Tables
LU_EMPLOYEEHierarchy
tables orviews
table, youneedto
look like the
createthree
following: separate
After flattening the LU_EMPLOYEE table,you map the ID andDESC forms of
the three attributes as follows:
You
threecan
attributes
then change
as follows:
the data modeltoreflecttherelationships between the
a13.[Level1_Mgr_Name] ASLevel1_Mgr_Name,
a11.[Level2_Mgr_ID] AS Level2_Mgr_ID,
a12.[Level2_Mgr_Name]ASLevel2_Mgr_Name,
a11.[Employee_ID]ASEmployee_ID,
a11.Employee_Name] AS Employee_Name
from [LU_EMPLOYEE] a11,
[LU_LEVEL2MANAGER]a12,
[LU_LEVEL1MANAGER]a13
where a11.[Level2_Mgr_ID] =
a12.[Level2_Mgr_ID] and
a12.[Level1_Mgr_ID] = a13.[Level1_Mgr_ID]
In the above example, if you also have fact tables in your data warehouse that
store dataattheEmployeelevel,then you could have resolved this issue using a
completely denormalized lookup table as shown below:
However, there could be situations where the fact tables contain data from
higher levels. For instance, let us assume the above example represents a service
organization.Boththe
report tothemperformlevel1and2managersalong
billablework.Thebillable hours
withare
therecorded
employeeswho
in one fact
number of levels.
separateattributes.
three
Thereseparate
aredifferent
attributesso
solutionspossible
In the examplediscussed
based onsofar,itmade
whether theresense
isaneedto
tomodel
see
table,
is fromthere
the top
is another
ofthe hierarchy.
attributethat represents the distance that any employee
explosion
These relationship
tables. tables arealso referredtoas bridge,helper and
Essentially,the
sample
effectively
organization
represents
relationship
structure
the parent-child
table
showingthe
captures
relation
information
inthe hierarchy.Here
insucha way is
thatit
a
Employee OrganizationStructure
In our example, for Paul Smith the relationship table would contain multiple
entries, capturing all of Paul’s managers.
So
indicatingthe
for Paul, therewouldbe
overallrelationship
three records
in the hierarchy.
inthe relationshiptable,
Itmay each
also be useful to add
another row indicating a relationship from Paul tohimself with a distance of
zero. This will prove useful when the fact tables include data for both managers
and employees. The followingimage shows therelationship table structure and
the number of records for Paul:
Caution, if you are working with large and deep hierarchies, the
relationship table may become really big. Note that it essentially captures
all paths from any employee to the top level manager.
following
After creating
steps:
the REL_HIERARCHY table above, youneed to perform the
1
2 Add thethe
Create REL_HIERARCHY
Employee attribute
table
bymappingit
totheproject.
to Employee_ID columninthe
3 Create thedescription
Employee_Name column
formfortheEmployeeattribute using
in the LU_EMPLOYEE table.
You could also use the other options discussed in the Attribute Roles
lesson.
6 Use
LU_EMPLOYEE_ALIAStable
heterogeneousmapping totothe
maptheManager
Employee_ID
attribute.
inthe
7 Create the description form for the Manager attribute using the
Employee_Name column in the LU_EMPLOYEE_ALIAS table.
9 Make Manager the parent and Employee the child ofthe Distance atttribute
using the REL_HIERARCHY table.
The following image showsthe revised data model forthe Geography hierarchy:
Now, if you wanted to see all the employees who report to Joseph Duke, you can
create the following report:
It is easy to exclude Joseph Duke from the result set by creating a report
filter on the Distance attribute.
The SQL for the above report looks like the following:
select a12.[Manager_ID] AS Manager_ID,
a13.[Employee_Name] AS Employee_Name,
a11.[Employee_ID] AS Employee_ID,
a11.[Employee_Name] AS Employee_Name0
from [LU_EMPLOYEE]
[REL_HIERARCHY] a12,
a11,
[LU_EMPLOYEE] a13
where a11.[Employee_ID] = a12.[Employee_ID] and
a12.[Manager_ID]= a13.[Employee_ID]
>0
and (a12.[Distance]
and a12.[Manager_ID] in (2))
Notice that the relationshiptableis used to retrieve all employees and managers
who have a relationshipwithadistance greater than zero.
Sometimes you may not want to query a specific branch of the hierarchy, but
rather for any given employee you want to find the chain of managers. The
entire
on
relationship
the reporting
reporttable
template,
structure.
makesyoucanthen
thispossibleand
Forexample,
sortthe
the
ifyou
following
resultsetto
includethe
image
seean
Distance
shows
employee’s
theattribute
report
When
Report
you run
Result
the report
Showing
the followingresultsare
ReportingStructurefora
displayed:
Specific Employee
By sorting on the Distance attribute it is easy to see who Paul’s immediate
manager is and all of his indirect managers upto the highest level.
The SQL for the above report looks like the following:
select distinct a12.[Manager_ID] AS Manager_ID,
a13.[Employee_Name] AS Employee_Name,
a12.[Distance] AS Distance
from [REL_HIERARCHY] a12,
[LU_EMPLOYEE]
where a12.[Manager_ID]
a13 = a13.[Employee_ID]
Notice that the relationship table is used to retrieve all the managers and their
distances for PaulSmith.
Finally, whatifyou wanted to include in your report business fact data, such as
the hours billed by each employee. The following image shows the fact table:
3 Edit the Employee attribute and make sure that the Employee_ID attribute
form is mapped to the FACT_EMPLOYEE_BILLING _HOURS table.
5 Create
Then, you couldrun
Billed Hours
a reportthat
metric. shows thetotal numberofhours billedbyall
employees who are part of Joseph Duke’s reporting chain. The result set looks
like the following:
[LU_EMPLOYEE] a13
where a11.[Employee_ID]=a12.[Employee_ID] and
a11.[Employee_ID]=a13.[Employee_ID]
and a12.[Manager_ID] in (2)
group by a11.[Employee_ID]
Notice how the relationship table is used to join to the fact table to calculate the
employees billed hours.The relationship table is also used to filter the report for
only one manager and in this case Joseph Duke.
Lesson Summary
In this lesson,youlearned:
• In a ragged hierarchy, every child attribute element does not always have a
corresponding parent attribute element. Instead, the child attribute element
may have adirect relationship only with a grandparent attribute element.
• You
level
the recursive
canresolve
of recursion.
attributes
issueswith
by creatingseparate
reporting onrecursive tables or viewsforeach
lookuphierarchiesby flattening
Lesson Description
In this lesson, you will learn about impact of slowly changing dimensions on
report analysis. You will learn how to design the data warehouse model and
schema tobest support themin the MicroStrategy reporting environment.
Lesson Objectives
Describe slowly changing dimensions and explain the three methods for
implementing themina MicroStrategy project.
each store
contain thefollowing data:
different
Although
lesson
• AsAsIsvs.As
wewill
Is terminologies
vs.
SCDs
Asdiscuss
are
Is
Was(also
well
(alsoreferred
four
used
referredto
documented
typesof
to distinguish
toasType
as
SCDs,
in
TypeI
data
they
thedifferent
SCDs)
warehousing
IIare:
SCDs) types
literature;there
ofSCDs. In this
are
structuredasfollows:
Lookup Table forAsIsvs. AsIs(Type1 SCDs)
LU_STORE
The
assigned.
salesLU_STORE
for each
This
table.
manager,
schema
table
As areflects
preserves
result,
the fact
the
when
table
only
stores
you
rolls
therun
tocurrent
up
which
atothe
report
relationships
managers
store
to aggregate
to are
which
in
currently
the
theamount
the manager
of
is currently assigned:
The illustration above displays part of the LU_STORE table showing the
stores associated with a manager. The Manager_ID column is not in the
FACT_STORE_SALES table, but this column is shown inseveral of the
SCD illustrations in this lesson to make it easier to understandhow store
sales are rolled up to managers.
Notice that for Missy and Jim, the sales for October are for the only the stores
they manage as of November. For Liz even though did not start as an employee
until November, she has sales for October.
to
relationships
As
records
every
the manager-store
not
store
only
in
to awhich
tosingle
maprelationships
they
managers
lookup
weretable
previously
tochange
their
alongcurrent
over
with
assigned.
time,
data
stores,
range
there
Youbut
canstore
values
willbe
alsoto
and
multiple
these
mapthem
flagsthat
Since Liz did not start asamanager until November, she does not have any sales
number for October.
historical
Similar tothe
andcurrent
AsIsvs.relationships.
As WasorTypeIISCDs,
As a result, the
whenschema
you run
preservesboth
a report to the
aggregate the total sales for each manager, only data that exists unchanged are
part of the final result set:
Missy and Jim were reassigned stores in November and Liz came on as a new
manager in November. OnlyJena stayedidentical in both time periods. Thus,
the report contains sales from those stores that had the same managers in both
time periods.
As Was vs. As Was
As Was vs. AsWas involves analyzing dataonly in accordance with the attribute
relationships asthey existed historically. Similar to the As Is vs. As Was or Type
II SCDs, the schema preserves both the historical and current relationships.
With this type of analyis, however, you reference only historical relationships in
queries. Asa result, when you runareport toaggregate the salesfor each
manager,thefacttable rolls up thestore salestowhichthe manager was
historically assigned, notthestore thatthey are currently assigned:
Missy’s sales for both months roll up into the stores that she was intially
assigned to even though in November she no longer managers Metro South and
is instead responsible forMetro West. Similarly, Jim’s sales for both months roll
up into the three stores he was intially assigned to even though he no longer
manages Metro West in November. Since Liz did not start as an employee until
November, she is not included in the report. To include her November sales for
the Metro South store, you mustquery fordata basedon current relationships.
• As Isvs.AsIs(Type I)
• Like vs.Like
Each of these types of analysis returns a different result set when querying the
same data:
• Create columns in the affected lookup table to denote which values are
current and which are historical
If reporting requirements include the need for SCDs, youcan implement them
•using
Creating
oneofthe
a life
following
stamp (uses
methods:
a single lookup
table)
• Creating a hidden attribute that relates to both current and historical values
• (uses multiple lookuptables)
Denormalizing the fact table(changes thefact
table structure to
accommodate SCDs)
Creating a LifeStamp
If a reporttheappropriatedata
retrieves requirestimedependentisto analysis, one waytoensure thatthe query
include the time period for which you want
to view data in the SQL itself. You can then aggregate records according to the
attribute relationships as they existed at that point in time. You can implement
this solution by creating a life stamp.
A life stamp consists of a start date and end date that indicate the time period
for whichspecificrecords are valid. Forexample,in the sample scenario, you
could modifythe LU_STORE table as follows:
the record itself. For records that represent the current store assignments, the
end date is set arbitrarily large (in this example, 12/31/2099).
If you implement versioning using life stamps, you need to do the following:
1 Modify the lookup table to include start date and end date columns.
The modified
Asa result, thestore ID column
table includes duplicate
isno longersufficient
store IDswith different
to uniquely
startdates.
identify
each row.You need to create a compound primary key for the table that
consists ofthe store ID and startdate columns.
2 Create a Start Date attribute and map it to the start date column in the
lookup table.
3 Create an End Date attribute and map it to the end date column in the
lookup table.
These two attributes are logically different from the Date attribute that
maps to the date column in the fact table.
4 Make the Start Date andEndDate both parentsof theStore attribute with a
one-to-many relationship.
achieve
For example,
AsIsvs.
in the
As Was
sample
(TypeII)
scenario,
analysis:
you could usethe following filterto
select
max(a12.[Manager_Name]
a13.[Month_ID]
a12.[Manager_ID]AS
ASManager_ID,
Manager_Name,
AS Month_ID,
from [FACT_STORE _SALES] a11,
sum(a11.[Sales])ASWJXBFS1
max(a13.Month_Desc]ASMonth_Desc,
[LU_STORE] a12,
[LU_DATE]a13,
If you have all of your current records set to the same arbitrary end date (in
this example, 12/31/2099), you could also retrieve current data from the
table by filtering on the End or
Date attribute. Only the current records would
select
tocreate
Implementing
be associatedwiththe
for allofthe
aparticular
SCDsusing
appropriate
query,
arbitrary
lifestamp
filters
users
end date.
have
andprompt
requiresa
to timefilter,
userson which
you eitherhave
one to
2 Create separate lookup tables for the versioned attributes to store current
and historical values.
4 every
Createstore
aseparate
towhichthey
lookuptable
wereassigned.
that contains a record foreverymanager for
5 Create a hidden attribute that maps to this lookup table and relate it to both
the currentand historical attributes.
In the sample scenario, the original data model forthe Sales Organization
hierarchy looks likethe following:
The modified Sales Organization hierarchy contains a branch for the current
version of each attribute and a branch for the historical version of each
attribute. Right now, the data model does not show a join between the two
branches. Later, you will use the hidden attribute to relate the two branches,
CreatingCurrent
information.
enablingyouto join fact tabledatato either currentor historical manager
After modifying the data model for the Sales Organization hierarchy, you need
to
warehousesothat
historical
create current
attributes.
and
youhistorical
havetablesversions
towhich
of thelookup
youcanmaptables
thecurrentand
inthedata
The LU_CURR_STORE
managers. In this table, eachmangeris
tablestores only
related
the most
onlycurrent
to the stores
informationfor
towhich they
The lookup table for the Historical Store attribute looks like the following:
attributes.
OrganizationThehierarchy
higher-level
(forexample,
attributes in both branches oftheSales
Current Region and Historical Region)
also require tables to which you can map their attribute definitions.
If you have a distinct list of current versus historical regions, you need to build
separate lookup tables. At each attribute level, you would have both historical
and current versions of the lookup tables. However, if Region does not change
as
over
the
themsamelistof
timerole
then
attributes,
Current
regions.You
Region
andyoudonot
and
canHistoricalRegion
need
map separate
them using
lookup
table
attributes
tables.
views,canpull
You
automatic
cantreat
from
Creating
attribute Current
role andHistorical
recognition, or explicit tableAttributes
aliasing.
After you create the necessary lookup tables, you are ready to create the current
and historical attributes from the modified data model.
You also need to create the other higher-level attributes in both branches and
map them totheir respective lookup tables. After creating the higher-level
attributes, youneedtodefinetheparent-childrelationshipsforbothbranches.
After creating the current and historical lookup tables and defining the current
and historical attributes, you are ready to create the hidden attribute that you
will use to create a join path from either branch of the hierarchy to the fact
tables.
To set up the hidden attribute, you first need to modify the logical data model
for the Sales Organization hierarchy to look like the following:
Modified Logical Data Model for Sales Organization Hierarchy with the
Hidden Attribute
The Store attribute ties together the two branches of the hierarchy and enables
you to join either branch to fact table data. It exists only to provide a
consolidated join path to the fact tables. It is not logically relevant to users, so it
should not be visible to them. In reports, users will see the Current Store and
Historical Store attributes, depending on the type of analysis they want to
perform. These two objects comprise the logical representation of stores that
users see. In the background, the Store attribute, which is a hidden attribute,
will join elements from either the LU_CURR_STORE or LU_HIST_STORE
table to the relevant fact data.
like theup
To set following:
the hidden attribute, you needtocreate a third lookuptable that looks
1
3
2 Create
Make
LU_STORE
Store attributes
theStore
the
theStoreattribute,
Store
table.
attribute
(one-to-many
ahidden
a mapping
childofboththeCurrent
relationshiptoboth
attribute.
it tothe Store_SPK_ID
attributes).
Storeand column
Historical
inthe
3 In the Attributes folder, right-click the attribute you want to hide and select
Properties.
4 In the Properties window, in the Categories list, unde the General category,
select the Hidden checkbox.
5 Click OK.
The last step in implementing a hidden attribute to handle SCDs is to key fact
tables based on the or
ID column of the hidden attribute. Foryouany facts you want to
analyze for current historical employee information, need to ensure that
the facttablesare keyed usingthe Store_SPK_ID column, the surrogate
primary key in the LU_STORE table. Doing so ensures that joins from the fact
tables to either the historical or current data always occur through the Store
attribute,
The Fact
rekeyed
Table
which
FACT_STORE_SALES
Keyed
references
on Hidden
bothbranchesof
Attribute
tablelooks
thelikethefollowing:
hierarchy.
Because the FACT_STORE_SALES table is now keyed based on the Store
attribute, which relates to both the Current Store and Historical Store
attributes, you can join from the fact table to either “version” of store
You
information.
samecanjointhe Hist_Store_SPK_ID column in the LU_STORE table to the
columnin the
With the hidden attribute solution in place, you can perform the various types of
analyses.
For example, if you want to sales just for stores to which managers are currently
assigned,youcould runthe following report:
Since you want to view current information, the template contains the Current
Manager and Store attributes, which ensures that the result set is retrieved from
the LU_CURR_STORE table. The SQL for this report looks like the following:
select a13.[Curr_Manager_ID] AS Curr_Manager_ID,
max(a13.[Curr_Manager_Name]) AS Curr_Manager_Name,
a12.[Curr_Store_ID] AS Curr_Store_ID,
max(a13.[Curr_Store_Name]) AS Curr_Store_Name,
sum(a11.[Sales]) AS WJXBFS1
from [FACT_STORE_SALES]a11,
[LU_STORE]a12,
[LU_CURR_STORE] a13
where a11.[Store_SPK_ID] = a12.[Store_SPK_ID] and
a12.[Curr_Store_ID] = a13.[Curr_Store_ID]
group bya13.[Curr_Manager_ID],
a12.[Curr_Store_ID]
If you want to view just historical information for managers who have
previously managed other stores, you could run the following report:
Since you want to view historical information, the template contains the
Historical Manager and Store attributes, which ensures thatthe result set is
retrieved from the LU_HIST_STORE table. However, the LU_HIST_STORE
table contains both current and historical information (forperforming current-
historical comparisons), soyou needto ensure that you only retrieve records
from this table that are historical store assignments for managers. You limit the
result set to the historical records by filtering on the MRR_Flag column in the
LU_HIST_STORE table:
The MRR_Flag column (MRR stands for most recent record) exists in the
LU_HIST_STORE tabletodenotewhichrecordsarecurrentstoreassignments.
Current assignmentshaveavalueof“Y,”whilepastassignmentshaveavalueof
“N.”
To filter on this column, you need to create a MRR Flag attribute that maps to
the
include
of the
MRR_Flag
HistoricalStoreattribute.
itinthelogicaldata
columnin theLU_HIST_STORE
model
When youcreatetable.
theMRR
ThisFlag
attribute
attribute,
isa parent
you
as follows:
Logical Data Model for Sales Organization Hierarchy with MRR Flag
After you have created this attribute, you can use it in the report filter to include
only records where the MRR Flag is set to “N.” This filter limits the result set to
historical records. The SQL for this report looks like the following:
select a13.[Hist_Manager_ID] AS
Hist_Manager_ID,
max(a13.Hist_Manager_Name]) AS Hist_Manager_Name,
a12.[Hist_Store_SPK_ID] AS Hist_Store_SPK_ID,
max(a13.[Hist_Store_Name]) AS Hist_Store_Name,
sum(a11.[Sales]) AS WJXBFS1
from [FACT_STORE_SALES] a11,
[LU_STORE] a12,
[LU_HIST_STORE]a13,
where a11.[Store_SPK_ID]=a12.[Store_SPK_ID] and
a12.[Hist_Store_SPK_ID] =
a13.[Hist_Store_SPK_ID] and
a13.[MRR_Flag] in (‘N’)
group bya13.[Hist_Manager_ID],
a12.[Hist_Store_SPK_ID]
Since you want to view current and historical information, the query needs to
access the LU_HIST_STORE table, which contains bothcurrent and historical
store assignments for each manager. Therefore, the template contains the
Historical Manager and Store attributes. If the MRR Flag is also used on the
template, then you can further distinguish between the current and historical
records. The SQL for this report looks like the following:
select a13.[Hist_Manager_ID] AS Hist_Manager_ID,
max(a13.[Hist_Manager_Name]) AS Hist_Manager_Name,
a13.[MRR_Flag] AS MRR_Flag,
max(a13.[Hist_Store_Name])ASHist_Store_Name,
a12.[Hist_Store_SPK_ID]ASHist_Store_SPK_ID,
sum(a11.[Sales])
from
[LU_STORE]a12,
[FACT_STORE_SALES]a11,
ASWJXBFS1
[LU_HIST_STORE] a13
where a11.[Store_SPK_ID] = a12.[Store_SPK_ID] and
a12.[Hist_Store_SPK_ID] = a13.[Hist_Store_SPK_ID]
group bya13.[Hist_Manager_ID],
a12.[Hist_Store_SPK_ID],
a13.[MRR_Flag]
of
The LU_HIST_STORE
their managers,both current
table contains
and historical.However,
records that associate
if you stores
run areport
with all
that
displays only the Historical Store and Sales, the result looks like the following:
names along with the IDs, you still get a result set that does not group the
records
all
sales
records
intoa
for(historicaland
each
singlerowforeach
store. current)
store.
tothefacttable,
Because the LU_STORE
it containstable
multiple
relates
If you want todisplay the sales by each storeregardless ofthemanager, you can
to
modify
following:
achieve
thesucha
structure
report.
ofthe
ToSales
resolve
Organziation
this requirement, youand
hierarchy needtodothe
underlying tables
First, you can modify the Sales Organization hierarchy to look like the following:
After modifying the hierarchy, you need to create a lookup table for the All Store
attribute and modify the LU_STORE table so that you can relate the All Store
and Store attributes. The LU_ALL_STORE and LU_STORE tables look like the
following:
Lookup Table for AllStore Attribute
The LU_ALL_STORE table contains a single record for each store.Therefore,
you can use it as the lookup table from which to pull store information ifyou
want to groupsalesby each store, regardless of manager. You modify the
LU_STORE table to include the ID column from the LU_ALL_STORE table.
This foreignkeyrelates the two tables.
After modifying the hierarchy and schema, you can create the All Store attribute
and relate it to the hidden Store attribute as follows:
Mapping ofAllStoreAttribute
The All Store attribute maps to the All_Store_ID and All_Store_Name columns
in the LU_ALL_STORE table aswellas theAll_Store_ID column in the
LU_STORE table. You need toaddthe hidden Store attribute as a child of the
All Store attribute with a one-to-many relationship.
Now, if you want to view the total sales for each store regardless of the
managers, you can build a report that looks like the following:
Notice that the LU_STORE and LU_ALL_STORE tables are in the FROM
clause. Inthe WHEREclause,thequeryjoinsto the facttablebasedonthe
Store_SPK_ID column, which maps to the hidden Store attribute. Then, it joins
the StoreattributetotheAllStore attribute basedontheAll_Store_ID column
that relates the two attributes.
Denormalizing
A
contain
final alternative
the versioned Fact Denormalization
Tables
for implementing
attribute. SCDs isto denormalize
means introducing
the fact tables that
For example, the original structure of the FACT _STORE_SALES table looks
like the following:
Original FactTableStructure
The FACT_STORE_SALES table stores only the store and date IDs, so the sales
data is available only by store and date. Performing “version” analysis using this
table is difficult because it does not contain manager information. Since a
manager can be assigned to different stores at different times, you have no way
of knowing which store is associated with which manageronagivendate.
You can denormalize the fact table to include not only the lowest level attribute
(in this case, Store), but also the higher-level attribute (in this case, Manager as
follows:
manager and store levels removes the need to incorporate SCDsintothe lookup
tables themselves.
The fact table is larger if manager information is stored in it, but this
denormalization does provide a less complex answer to SCDs. Depending on the
volume of data in your fact tables and the way in which data is captured in the
source
If system,
you wantusersto
this optionseemayor
different
mayattributes
notbe viable
on reports
inyourown
depending
environment.
on which
“version”
tablesto
example,include
inthescenario
ofanattribute
separatethey
IDcolumns
are viewing,you
for current
canfurther
and historicalvalues.For
denormalize fact
In this lesson,youlearned:
• As Is vs. As Is(Type I)is a typeof SCDs that involves analyzing all data in
accordance with the attribute relationships as they exist currently.
• As
accordance
Was vs. AsWasisatype
withthe attributerelationships
of SCDs thatinvolves analyzingdata only in
as they existed historically.
lookup table that indicate the time period for which specific records are
date
valid.rangestodetermine
When you run reports,
howyou
data
can
isaggregated.
then use filters with theappropriate
• With a hidden attribute solution, you have multiple lookup tables that store
• joins
A
current
final
betweenthe
alternativefor
and historicalvalues,and
lookup
implementing
tables and
you
anyfact
SCDsis
use thehidden
to
tables.
denormalize
attribute
thetofacilitate
fact tables
that contain the “versioned” attribute. You can denormalize the fact table to
itself.
include
moving the lowest
relationship
levelattribute
in whichthe
aswell
versioning
asthe higher-level
occursintoattribute,
the facttable
thus
DATA WAREHOUSE OPTIMIZATION
Appendix Description
In this appendix, you willlearnabout each of the these concepts and their
impact onreport analysis.Youwill learn about recommendations for
implementing aggregation, partitioning, and indexing in your data warehouse to
optimally supportyour MicroStrategy reporting environment.
Review of Aggregation Concepts
Define base and aggregate fact tables and describe the purpose of pre-
aggregating information in the data warehouse.
records
A
aggregated
base
SalesBase
facttablestores
transactions.
tothelowest
FactTable
For
fact
levels
example,
data
ofthe
atthelowest
thefollowing
Customer,
levels
Time,
baseatfacttable
and
which
Locationhierarchies:
a source
shows system
data
The Sale_Amt and Txn_Qty fact data are stored at the level of Store, Customer,
and Date, which are the lowest-level attributes in each of the respective
hierarchies. Depending on the type of query you run, the database can either
select
the records
following
Querying
the templates:
recordsdirectly
onthe
Againstthe
fly.For from
example,
SalesBase
theFACT_SALES
youcouldruntwo
Fact Tabletable,different
or ithasto
reports
aggregate
with the
The first report requests dataat the samelevel at which it is stored in the
FACT_SALES table. Toretrieve the result setfor this report, the query only
needs to select the desired records fromthefact table.
The second report contains the State attribute, so it requests data at a higher
level than it is stored in the FACT_SALES table. To retrieve the result set for
this report, the query must aggregate all of the store records in the
FACT_SALES table into their corresponding states. This calculation is done on
the fly since the data is not stored at the state level in the fact table.
always
time
If youresults
often
hastodothis
run
in a reports
morecalculation
complex on thefly.
likethesecond
query, longer
onePerforming
inprocessing
your environment,
this
time,
aggregation
andthe
more
database
at run
resources allocated to processing the query. Allof these factors can degrade
performance.
To optimize performance, you could build an aggregate fact table, which is
simply a fact table where the data is pre-aggregated and stored at a higher level
for one or more hierarchies. This type of table is also called a summary table.
the
To illustrate
secondreport:
Querying this,you
Againstcouldbuildan
the SalesAggregate
aggregateFactTable
fact tablethat couldbeusedfor
You build aggregate fact tables in your data warehouse. To use them in a
MicroStrategy project, you add them to your project’s warehouse catalog. The
SQL Engine references the logical table size of each table in your project to
know when it is appropriate to use an aggregate fact table.
Compression Ratios
When you judiciouslybuild aggregate fact tables in your data warehouse, pre-
aggregating dataincreases query performance.
used However, building aggregate fact
tables thatare notusedat all, are infrequently, or require alot of
aggregate
maintenancefacttables
cannegatethe benefits of pre-aggregation. Insuchcases,
can be ineffective or time consuming to maintain.
Therefore, it is important to devise an aggregation strategy that takes into
account critical factors for determining whether an aggregate fact table is
•necessary.You
Queryprofile
Attributerelationship
Compression shouldconsiderthe
ratios volatility following factors:
Query Profile
It only makes sense to build, store, and maintain aggregate fact tables that users
will frequentlyquery whenrunningreports. Youmay have a data warehouse
comprised of 15 hierarchies, all of which contain multiple levels of attributes.
However, if youhavecertain hierarchies where users rarely query higher-level
attributes, thecostof occasionally aggregating data for those hierarchies is less
than storing and maintaining aggregate fact tables that are seldom used.
In some cases, determining that queries rarely access certain tables is very
straightforward. For example, you have the following data model:
In the image above, users run many reports that containthe Year attribute, so
data is often aggregated to the highest level of the Time hierarchy. They seldom
run reports where they view customer information above the Customer State
level, so data is rarely aggregated to the highest level of the Customer hierarchy.
Given this query profile, you can easily conclude that building a FACT_SALES
table aggregated to the Year level is worthwhile because users frequently query
data at this level. Though, you do not need to pre-aggregate data for the
Customer hierarchy beyond the level of Customer State. Users seldom run
queries
level
region
performance.
tothe
data,
at the
Customer
youcan
levelofeasily
Customer
Regionlevel
aggregate
Region.
without
data
When
on
a significant
thefly
they do
fromthe
need
decline
toCustomer
query
in query
customer
State
In this example, you need to take a closer look at the actual reports that users
are running and the tables these reports use. Specifically, users are interested in
analyzing sales data at the State level, but they often just want to see sales data
for new stores or old stores. This type of query requires qualification on the New
attribute that is related to the Store attribute. You can access the New attribute
only through the LU_STORE table.
Even though users want to view the data aggregated to the State level, accessing
this information from a FACT_SALES table pre-aggregated to the State level is
aggregatefacttable
store
possible
sales
the
aggregate
storedatain
informationjust
sales
only
thedata
make
if theywant
the
upthe
toat
base
the
for
the
bulk
State
fact
new
tosee
Statelevel.
oftable
storesor
level.
analysis,
allstore
toSince
satisfythe
oldstores,
queries
sales.If
reportswould
filter
usersrun
that
thoseconditions.
qualify
largelyignore
reports
reports
onhavetoquery
oldornew
Then
that
anrequest
they
and
to
have
beDistrictStore
employee
avolatile,andthe
Location
structures,
hierarchy
Relationships
relationships
geographicdivisions,
that lookslike
changethefrequently.
following:
orcustomer
For example,
demographics
youcould
tend
The
the Western
NorthernVirginia
Virginiadistrict
districtcontains
consistsstores
of stores
in Winchester
in Arlingtonand
andManassas.If
Fairfax, while
you build an aggregate fact table at the District level, sales for Arlington and
Fairfax stores roll up into the Northern Virginia district, while sales for the
Manassas and Winchester stores roll up into the Western Virginia district:
The configuration of stores into districts can change frequently as new stores
open, old stores close, and some stores shift to new districts. For example, the
December
previous district-store
2012, the company
structureis
changes
in place
theirthrough
district-store structure asInfollows:
November2012.
Revised DistrictStoreRelationships
The Arlington store has closed,soit no longer exists as part of the Northern
Virginia district. A new store hasopenedin Harrisonburg, so it has been added
to the Western Virginia district. Also, the Manassas store has been moved from
the Western Virginia to the Northern Virginia district. If you look at December
2012 data for the aggregate fact table built at the District level, sales for Fairfax
and Manassas stores now roll up into the Northern Virginia district, while sales
for the Winchester and Harrisonburg stores roll up into the Western Virginia
district.
the aggregate fact table. When these changes occur frequently, they consume
time and resources, and they significantly add to batch processing time and
complicate batch processing routines. Therefore, attributes with volatile parent-
child relationships are often not the best choices for inclusion in aggregate fact
tables since the maintenance overhead can outweigh any performance benefits,
especially
window is ifanthetables
issuein your
involved
database
arelarge
environment.
or the length ofyour batch processing
manually defines and maintains the procedures that determinehow data from a
lower level is pre-aggregated to a higher level. Volatility is a problem because of
the time and effort involved in updating these procedures to rebuild aggregate
fact tables when changes occur.
Depending
database vendors
onthe (forexample,
database platformyou
Oracle®, use
DB2,foryourdata warehouse,some
Sybase®) provide database-level
functionality that enables you to take advantage of the benefits of pre-
aggregation even for volatile attributes. This functionality is referred to as
materialized
the
terminology.Instead
materializedviews,
basefacttable.
viewsofMaterialized
although
ofbuilding
thebase various
facttable
views
separateaggregate
databasevendors
work
thatjust
are like
populated
fact
a regulartable
may
tables,you
atthe
use different
same
view
build
time as
except
primary
the
actsdatabaseadministrator
Besides
levelofthebasefact
asanaggregate
not
benefits
having
of to
materialized
fact
maintain
tableto
table.
can separate
definethe
thelevelof
viewsishow
aggregate
logicthatis
thematerialized
they handle
factused
tables,
changes
toview,
one
rollupdatafrom
of
whichthen
in the
well.
data
aggregation.
from
Therefore,
thebase
Asrelationships
youfact
canuse
tableto
materialized
between
the materialized
attributes
views to
view
change,
remove
automatically
the
thelogic
maintenance
forrolling
changesas up
overhead generally associated with creating aggregate fact tables for volatile
attributes.
Compression Ratios
two
base
Whenfacttables,the
facttableto
youbuild aggregatefact
producea single
tables,you
parent record.
aggregate
Forasetof
example,
child
inthefollowing
records in the
two records for the Manassas store are rolled up into a single
record for the Northern Virginia district.
When you pre-aggregate data, the average number of child records you combine
to create a single parent record is the compression ratio between the two
attributes involved. The size of the compression ratio provides an effective way
to measure how much an aggregate fact table reduces the number of records
building
that mustbe
aggregate
read tofact
satisfyqueries
tablesistoreduce
thataccessthe
the number
table.
of records
The primary
a query
reason
hasto
for
You
that
the Location
Attribute
calculate
exist foreach
hierarchy
compressionratiosusing
Cardinalities
attribute.
are asFor
for
follows:
Location
example,the
theHierarchy
cardinality,
cardinalities
or numberof
oftheattributes in
elements,
There are 5 regions, 20 states, 30 districts, and 3000 stores. In an environment
where users routinely query data at each level in this hierarchy, you need to
consider building aggregate fact tables. Given their respective cardinalities, the
compression ratios between attributes are as follows:
The compression ratios between Region and Store and District and Store are
both very large. However, the ratio between District and State is small. An
aggregate fact table at the Region level would average 1 record for every 600
records in the base fact table. An aggregate fact table at the District level would
average 1 record for every 100 records in the base fact table. Though, an
aggregate fact table at the State level would average 2 records for every 3
records in an aggregate fact table at the District level—not a significant
difference in table size. The State-level aggregate fact table would be almost as
big as the District-level aggregate fact table, so it would not significantly reduce
the number of records being queried. Plus, you would have the additional
storage space and maintenance on the State-level aggregate fact table.
Partitioning is the division of a larger table into smaller tables. You often
implementpartitioning
reducing thenumber of inadata
records thatqueriesmust
warehousetoincreasescantoretrieve
query performance by
a result set.
You can alsouse partitioning todecreasethe amountoftime necessary to load
data into warehouse tables and perform batch processing.
•There
Serverlevel
Application
aretwobasic
leveltypes of partitioning:
separate,
Application-levelpartitioning
into smaller
smaller
tablesinthe
physicaldatabase
tables
involves
calledpartition
itself.dividing
Then,thebase
one
application
large
tables.
tableinto
You
thatsplit
several
thetable
is running
queries against the database (in this case, MicroStrategy) manages which
partitions are used for any given query. Since multiple physical tables exist, the
application-level
tables
SQL EnginehastowriteSQL
areneededpartitioning
toretrieve theresult
againstdifferent
forfact tables
setfora
through
tables,
query.
one
dependingon
MicroStrategy
oftwo methods—
which
supports
Formore
partition
Design course.
mapping,
information
seeon
thethe
MicroStrategy
differences between
Architect:
warehouse
Advanced
andmetadata
Project
Advantages of ApplicationLevel
Partitioning
PartitioningbyMultipleHierarchiesDifferencesinPartitioningLogicReducedTimetoRead
Partitions
Implementing ApplicationLevel Partitioning
The SQL Engine would easily be able to determine that all of the information it
needs for the queryisinthe FACT_SALES_MAR partition. TheSQL Engine
only
With
would
different
warehouse
partition
server-level
generate
scenario
database
accessedbythe
theSQLforthereport
partitioning,
may
platform
occur. If
query.
doesnot
depending
the partitioning
againstthis
contain
ontheanequivalent
functionality
database
table, anditwouldbethe
platform,
logic
of thefor
data
a very
cannot
determining
necessarily
aheaddetermine
oftimewhich
themonthto
partition toaccessfor
which the datesinthe
this query,thedatabase
reportfilter
belong. The database can only “know” that the FACT_SALES_MAR partition is
the correct table if you include a specific month in the report filter, such as
March 2012.
Since
ID, thethe
database wouldexample is basedondates ratherthan aspecific month
filterinthis
have no way of determining that the desired data is in
the FACT_SALES_MAR partition. As a result, the database would scan all of
the logical partitions to resolve this query. This action is the same as doing a full
partitions.
table scanon an unpartitioned table,soitnegates any benefits to havingthe
You
filter
filteris
database
canobservethesamebehavior
issetto
based
scans
somethinglike
onallsome
partitions.
elementoftime
CurrentMonth
withserver-level
otherthan
or Current
a specific
partitioning
Week.month
When
ifthe
ID,
the
the
different
This behavior
versions
in the
of partitioning
Oracle. The logicissupposed
logic hasbeen observed inseveral
to have been fixed in
Oracle 9i®, Release 2.
that each table in the partition is a physically separate table, so the SQL Engine
writes SQL that reads a much smaller slice of data from a partition table rather
than the original fact table. With server-level partitioning, the partitions are by
definition logical, not physical. Only one table exists in the data warehouse, but
you define logical partitions within the table. For a query to access one of those
logical partitions,there is alwaysthe additionaloverhead ofthetranslation that
is required to find where the data is stored in the table. Simply put, it requires
less time to read a physical partition than a logical one.
Implementing ApplicationLevel Partitioning
If you choosetouseapplication-level
partitionbasetablesinyourdata warehouse.TousetheminaMicroStrategy
partitioning,youbuildeachofthe
project, you either add the partition base tables (if you are using metadata
partition mapping) or the partition mapping table (if you are using warehouse
partition
appropriatepartition
necessary
metadatamapping)to
to
information
determinemappings.
your
forthe
whichproject’swarehousecatalog
partitions
SQLAtEngine
that point,
itneedsto
toquery
MicroStrategy
accessfor
either
andcomplete
thedata
Architect
any given
warehouse
the
hasthe
query.or
course.
Formore
mapping,seethe
informationon
MicroStrategy
configuring
Architect:
warehouse
Advanced
ormetadata
Project partition
Design
Partitioning Guidelines
AttributeRelationshipVolatilityDistributionofData
Define
After guidelinesthis
completing fortopic,
building
youan effective
will beable partitioning
to:
strategy in a data
warehouse.
When you appropriately partition fact tables in a data warehouse, it reduces the
time ittakes torunqueries that access the partitioned tables, and it can make
the loading of data more efficient as well. If you have a poorly designed
partitioning
maintenanceoverhead.
strategy,partitioning canactually increasequery timesand
Therefore, it is important to devise a partitioning
strategy that takes into account critical factors for determining the best way to
implement partitioning. You should consider the following factors:
• Impactofthe
Partitiontablesize
Attribute relationship
Distributionofpartitioning
dataandthe
among
volatility
strategy
numberofpartition
partitions
onthe ETLand
tables
batchprocesses
accessed byqueries
In this to
switch example,
new districts.
each month,
Eventhough
new stores
this open,
hierarchy
old ones
reflects
close,
theand
company
existingones
Distribution
that
with
Another
aneven
important
amount
forof
the distributionofdata
theloadtime Data
factor
of
theacrossthe
data
different
in determining
ineachvarious
partitions
tableyour
partitioned
aids
isfairly
partitioning
paralleltables.
processing
strategy
Devising
byensuring
shouldbe
partitions
If you partition by the Monthattribute, each of the partitions contains the sales
data for a single month. the12partitions
Becauseeach month contains a nearly equal number of
days, the size of each of isnaturally pretty even. Your company
may experience some variations in sales (for example, promotions that occur in
particular months). However, overall, the amount of data is fairly evenly
distributed among each of the partitions.
If you partition by the Region attribute, each of the partitions contains the sales
data for a single region. In this example, the company has more stores in some
exist
regions
within
thaneach
inothers.
region,
Because
the Northeast
of thevariation
and Southeast
inthe number of stores
regions have significantly
that
more
partitions
The
Northeastand
otherproblemwith
sales
regions.
are
data
naturally
Becauseof
Southeast
thanthe
an uneven
uneven.
Central,
partitions
the data
geographiclocation
West,
distribution
are much
and Pacific
largerthan
isthatload
ofregions.
the stores,
theAs
partitionsfor
athesefive
result,thethe
in
If aan
particular
uneven distribution
attribute youwant
of data,touse
youcan
to combinemultiple
partition fact tablesnaturally
attribute results
elements
into the same partition to address variations in partition size.
Table
that
access
Another
consider Size ofby
tofactorto and
minimizestable
areaccessed
thesize
resolveaquery. Number
accountfor
partitions
queries.
sizewhilewhen
inalso of Partitions
Ideally,youwant
comparison
determining
reducingthe
totothe
your
numberof
choose
number
partitioning
apartitioning
of
tables
partition
strategy
you must
strategy
tables
isto
For example, you may choose to partition a fact table by both the Month and
Region attributes as follows:
The illustration above displays the partitions only for the first 3 months of
the year for a single region since those are the only partitions needed for
the sample report. However, 12 partitions (one for each month of theyear)
would exist for each region.
In this scenario, you have 12 partitions for each region—one for each month.
Partitioning by both the Region and the Month attributes makes each of the
tables much smaller than the original fact table. Also, partitioning by a time
attribute automatically limits the size of each table since you no longer add data
also
to the
tablecreate
size
table
ispartitions
after
desirable,
thatparticular
that
partitioningby
aresotimeperiod
smallanattribute
thatmost
haspassed.
queries
that decreases
Whilereducing
haveto cross
table partitions
sizecan
the
In the sample report, users want to view sales information for Q1 2012 for the
query
Southeast
smaller
most
frequentlyquery
querieshave
hastoaccess
tablesizereduces
region.atBecause
toaccessmultiple
all
levelsoftime
three
the
each
tables
number
partition
that
to tables
retrieve
are
ofrecords
containsonly
higherthan
and
theresult
then
a query
join
month
1set.
month
hasto
thedataof quarter),
Although
(like
scan,
from
data,the
ifusers
the
each one
to produce the result set. When queries cross tables, the series of joins required
to compile the result set often override any performance benefits that come
from accessing smaller tables.
In this
Report
example,a
QueryAccesses
better strategy
a Single
isto partition
Partitionbythe Quarter attribute:
The size of each partition is still smaller than the original FACT_SALES table.
Though, with the table partitioned by the Quarter attribute, you can now
retrieve the result set for the samereport by accessing a single table, eliminating
the unnecessary and time-consuming joins that occur if you partition the tables
by the Month attribute.
Generally, you want to ensure that you consider not just the table size but also
your query profile when determining which attributes to use for a partition. If
you partition a fact table at a lower level than what is requested by most queries,
having to frequently scan multiple partitions and join data for report results can
negate
Youcan
any performance
use the MicroStrategy
benefitsthat Enterprise
may comeManager™
from reducing
to learnmore
the tablesize.
about
howusers query the data in your warehouse. This application enables you
toview
information,
reportsthat
and reportprocessing.
display statisticsaboutuser actions, object
For more information on Enterprise
Manager, see the MicroStrategy Administration: Application Management
course.
For example, you do not want to establish partitions that are so complex and so
to
up
farset up partitions
addinghours
removed fromhow
to that
thelengthof
the
aredatais
tedious
the
structured
to
ETLor
back up
batchprocess.
inthesourcesystem
or that unnecessarily
Youalsodo
that
complicate
youend
notwant
the backup routine. Again, you want to find a balance between configuring
partitions so that you achieve a performance gain, while ensuring that you do
not
consuming.
make theloading, maintenance, orbackup ofdatamore complexand time
Overview of Indexing
Indexes
based
single
values)on
columnor
tothedata
aredatabaseobjects
keyvalues
multiplecolumns.
valuesthatare
(a primarykey
that enablequick
stored
or
Theyoperate
foreign
intable
key).
access
columns.
byYoucan
storing
totheThe tableona
baseindexes
datain
pointers
database
a (index
administrator specifies thesort order of these pointers when creating the index.
Just asyou might search anindexin a book tofind a pagenumber where
information on a certain topic is contained, when you run a query against an
indexed table, the database searches the index to findthe requested values and
then followsthe pointers tothe rowsinthe table where thosevalues are stored.
update
base
modified,
server.
outweigh
table
Usingthemonthequery
without
anindex
Becauseyouhave
or the
delete
ordeleted,
requirements
antofinda
an
index,
existing
they
but
profile,the
particularrow
toupdate
increase
indexesdo
of
record.
building
the
benefits
the
Provided
consume
time
and
index
ismoreefficient
maintaining
it
ofpointers
you
takesto
queryingtables
diskspace
use indexes
insert
asthan
the
datais
on
index
ascanning
judiciously
the
new
withan
added,
database
itself.
recordor
index
theand
entire
far
Index Types and Guidelines
BTreeIndexBitmapIndex
IndexOrganized Tables
Indexing Guidelines
Describe various typesof indexes,identify the best uses of each type of index,
and define guidelines for building effective indexes.
a
column.
The
result,
State_ID
an
Depending
index
column
is automatically
onisthe
defined
volume
ascreated
and
the primary
degree
basedofkeyfor
onthe
normalization
the
values
LU_STATE
inofthe
a lookup
State_ID
table. As
table, you may choose to define indexes on additional columns beyond the
primary key. If the table volume is high or if the table is highly denormalized,
you may want additional indexes so that values in the tables are easier to find
when joining it to another table.
For fact tables, MicroStrategy recommends that you do not define the
foreign keys that reference the related hierarchies as primary keys.
Therefore, you need to individually define indexes on fact table columns
that you frequently query.
important
When you to needtodefine
understand thevarious
individual indexes
typesofbeyondthe primarykey, itis
indexes, how they work, and when
it is best to implement a particular type. Depending on the type of index you are
using, you can create multiple indexes on lookup and fact tables to more
vendor
are
efficientlyaccess
• fairlyconstant
some
B-tree
Bitmap
Index-organized
toanother,
of the commontypesof
data.The
regardless
but
tables
theconditions
namesfor
ofthe
indexes:
database
theseindex
for whicheach
platform
typesvary
index
you use.
type
fromone
Thefollowing
isbestsuited
database
BTree Index
A B-tree index hasastructure somewhat like a family tree. The index begins
with a root. Therootcanbeany record within the table, but it is determined by
a database algorithm to ensure a balance in the number of records on both sides
of the tree. Each row in the table is compared to the root value. Values greater
than the root value are placed to the right in the B-tree structure, and values less
than the root value are placed to the left.
Using
following
BTree
a B-tree
structure:
Index
index ontheCust_ID column for these fiverecords results inthe
The database selects Henry Miller as the root, and it assigns an index value to
his record. From there, the database compares each row in the table to the root
first. It looks at Sara Wilder. Her ID value of 12 is smaller than Henry Miller’s,
which
Evans.
database
who hasanis17,
Her
places
IDvalue
ID
soitplaces
value
her record
of25is
of 15.
herrecord
This
togreater
the
valueis
rightof
tothe
thanless
leftofthe
thevalueof
thethan
root.the
The
root.
root
theroot(17),
next
Nextcomes
value
row(17),
is Todd
sothe
but
Natalie
Elliott,
working
the
less the the
Accordingly,
valueforMark
thanalongvalue
thedatabase
tree,
for
Stevens
Natalie
it isplaces histhan
isgreater
19,
Evans
which
record
(25).
isthevalue
greater
The
tothe
database
than
rightof
of Sara
the
places
Wilder
Sara
root his
Wilder.
value
(12).
record
(17)but
Finally,
to the
place
correspondingrecord.
It
left
like
Customer
isof
Attribute
Customer.
most
inthestructure
Natalie
appropriate
hierarchy:
Evans. of
Cardinalities
For example,
As
touse
This
the process
B-tree,
database
for
the
consider
Customer
B-tree
it
continuesfor
assignsanindex
compares
the
index
cardinalities
Hierarchy
foreach
higher-cardinality
each
row
value
ofthe
rowin
ID to
and
attributes
the
thedetermines
table.
attributes
intheits
B-tree indexes would be useful for the lookup tables for the Customer City and
Customer attributes, which both have high cardinalities. Although the tree
structure for an attribute like Customer would be very complex due to the
volume of customers, B-tree indexes are generally lower maintenance and do
not require too much reorganizing or rebuilding unless a significant number of
updates or inserts occur in the table. This lower maintenance cost makes them
more suitable for high-cardinality attributes than other types of indexes. You
should remember that due to the size of these indexes, they require a lot of disk
space.
Bitmap Index
A bitmap index orders the rows in a table using binary strings that are
generated and assigned by the database. The database uses an algorithm to
create thebinarystrings usedfor the index. Each string is the same byte length,
but the patternof each string varies. The differences in the pattern denote the
order of binary indexes. The database determines which patterns are considered
their
smaller
the
values
example,
smallest(or
Bitmap
corresponding
asorlarger.
follows:
using
Index
alowest)
bitmapindex
The
binary
smallest(or
value.
string
Accordingly,
onthe
from
lowest)
sample
lowest
binary
rows
customer
(smallest)
inthe
stringtable
data
references
to highest
orders
are orderedby
therow
(largest).
therowwith For
Since Sara Wilder has the lowest ID value, the database orders this row first in
the table index. From there, the database assigns indexes to each row in the
table working up to the customer with the highest ID value.
Because bitmap indexes require the database to generate and assign a binary
string to each row value, they are best reserved for use on lookup tables for low
cardinality attributes, such as some higher-level attributes, flags, and status and
type indicators. For example, consider the cardinalities of the attributes in the
Customer hierarchy.
You should use bitmap indexes only on lookup tables forthe higher-level
attributes like Customer Region and Customer State. The cardinality of the
Customer City and Customer attributes is so high that the resources involved in
building and maintaining the bitmap indexes would impose a burden on the
database server that outweighs any performance benefits you could derive from
the indexes.
IndexOrganized
Another
the
verydataisactually
useful
indexing Tables
for factalternativeistocreate
tables
storedor
inthe
lookup
physicaltable
tablesfor
an index-organized
high-cardinality
in indexorder.This
table,
attributes.
methodis
oneinUsing
which
warehouse:
The howthe
and followingsamedata
illustration
is stored
showssamplesource
inan index-organized
dataforthe
tablein
FACT_SALEStable
thedata
In the illustration above, only the columns in the FACT_SALES table that
are used in the report are included in the source and index-organized
sample data. However, theactual table would contain all of the columns
referenced in the schema.
Notice the order of the records in the source system table. Inthe data
warehouse, the FACT_SALES table is index organized. The physical table
actually stores the data with each record ordered by Store_ID, Date_ID, and
Cust_ID, since that is how the index organization is defined. If you run a report
Indexing
for
facilitating
in dataGuidelines
theindexedorder.
salesdatafrom
databaseserver.
retrieval
Defining an index-organized
the FACT_SALEStable,
from large tables without
thedata
tableisan
placinganundue
is retrieved
excellent
anddisplayed
solution
burdenfor
on
Various types of indexes can be effective depending on the size of a table, and
choosing the most appropriate type of index is certainly an important step in
setting up an index that you can efficiently use and maintain. Regardless of the
type of indexyouselect foragiventable,the following are some general points
to considerwhenyou build indexesonanytable:
• Table joins
• Degreeof denormalization
• Frequently
Number ofindexes
filtered elements
ona single table
Table Joins
If you are building an index for a table, you should pay close attention to how
that table is joinedto other tables when users run reports. If you examine the
SQL being generated for queriesand you noticethat certain columns in a table
are frequently used to join that table to other tables, you may want to consider
building an index onthat column.Thisway, the database can more quickly
locate the data used in joins.
Degree of Denormalization
analytical
manytomany
capability relationships
applicationlevel partitioning
implementation
attribute
hidden
attribute expressions
logical views
attribute relationships
direct
indirect
review
attribute roles
automatic attribute role recognition
cardinality
common child attribute
manytomany relationships 1, 2
logical views
compression
manytomany
ratio relationships
data type
ID columns
database partitioning
database view
denormalization
versioning
denormalizedfacttable
derived table expression
derived table expressions
differences
dimension
logicalviews
tables
logical views
fact expressions
logical views
fact table
ragged
recursive
split
index
bitmap
Btree
types
indexing
denormalization
guidelines
table joins
indexorganized table
indirect attribute relationships
joint child
L
logical key
fact table
logical table
creating
logical views
tables
view
additional examples
complex attribute and fact expressions
creating
definingthe columns
definingthe SQL statement
mapping columns to attributes and facts
performance
SQL
star schemas
summary
using derivedand common table expressions
manytomany relationships
manytomany relationships
manytomany relationships
normalization
normalized fact table
applicationlevel
attribute relationship volatility
batch process
data distribution
databaselevel
guidelines
ETL
logic
multiple hierarchies
serverlevel
table size
partitions
reading
physical schema
preaggregation
primary key
fact table
ragged hierarchy
recursive hierarchy
resolving manytomany relationships
review
attribute relationships
role attributes
roleplaying dimension
schema
snowflake
separate
starrelationship table
manytomany relationships
serverlevel partitioning
slowly changing dimensions
As Is vs. As Is (Type I)
As Is vs. As Was (Type II)
As Was
Like vs. Like
vs. As Was
snowflake schema
split hierarchy
SQL
logicalviews
star schema
aggregation
star schemas
logical views
summary table
table alias 1, 2
table expressions
common
derived
table view 1, 2
table volume
lookup tables
facttables
U
using derived and common table expressions
versioning 1, 2
view
VLDB property
volatility, attribute