Advanced Data Warehousing

You might also like

Download as pdf or txt
Download as pdf or txt
You are on page 1of 295

a} MICROSTRATEGYUNIVERSITY

MicroStrategy
(N )‘p‘

60"

§0
4( )"/Z»

WAREHOUSING ADVDW—QJ 1-Sepfl 3


© 2000–2013 MicroStrategy Incorporated. All rights reserved.

This Course (course and course materials) and any Software are provided “as is”
and without express orlimited warrantyofanykindby eitherMicroStrategy,
Inc. or anyone who has been involved in the creation, production, or
warranties
risk
Should
distribution
or anyone
asto
the
thequality
elsewho
of
Course
ofmerchantability
the Course
or
has
andperformance
Software
been
or Software,
involved
and
prove
fitnessfora
defective,
with
ofthe
including,
thecreation,
Courseand
particular
you(and
butnotproduction,
Software
not
purpose.
limited
MicroStrategy,
to,
iswith
The
the
or entire
implied
you.
Inc.

distribution ofthe Course or Software) assume theentire costofall necessary


In no eventwill
servicing, repair,or
MicroStrategy,
correction. Inc. orany other person involved with the

on
creation,
not
other
account
limitedtoany
special,incidental,
production,
ofany claim
damages
or for
distribution
consequential,
damage,
assessedagainst
including
ofthe
or exemplary
Courseor
anylost
Softwarebe
damages,
profits, lost
including
liable
savings,or
toyou
but

or paid by you to any third party,


arising fromthe use, inability to use, quality, or performance of such Course and
Software, even if MicroStrategy, Inc. or any such other person or entity has been
advised
In addition,
of the
MicroStrategy,Inc.
possibility ofsuchdamages,or
or anyotherperson
fortheinvolved
claimby inthe
any otherparty.
creation,
byyou
production, or
claim orany
distribution
other partyfor
of the Course
damages
andSoftware
arising fromtheuse,
shallnot beinability
liable for
toany

use,
of contract
or
otherwise.
contribution,
quality,warranty,
orperformance
the failure
negligence,
ofofsuch
anystrict
remedyto
CourseandSoftware,
liabilityforthe
achieve itsessential
negligence
baseduponprinciples
purpose,or
ofindemnity

is
prior
prohibited.
right
all
The
obligation
selling,
rights
informationcontained
tomakeperiodic
written
orarereserved
otherwise
toU.S.
notify
consent
Government
any
distributinganypart
ofan
by
modifications
person
MicroStrategy,
authorizedrepresentative
inthisCourse
Restricted
orentity
to the
ofsuch
Rights.
Inc.
and
ofCourse
theCourse
MicroStrategy,
the
revision.Copying,
It Software
orthe
acknowledged
of orSoftware
MicroStrategy,
Software
are
Inc.
copyrighted
reservesthe
duplicating,
without
that
without
Inc.
the and
are

Course andSoftwarewere developed at private expense,that no partis public


domain, andthat the Course and Software are Commercial Computer Software
provided with RESTRICTED RIGHTS under Federal Acquisition Regulations
and agency supplements to them. Use, duplication, or disclosure by the U.S.
Government is subject to restrictions as set forth in subparagraph (c)(1)(ii) of
the Rights in Technical Data and Computer Software clause at DFAR 252.227-
7013 et. seq. or subparagraphs (c)(1) and (2) of the
of Commercial Computer
Software—Restricted
MicroStrategy,
22182. Rights areInc.,
reserved
1850Towers
RightsatFAR
undercopyright
Crescent
52.227-19,as
Plaza,
laws TysonsCorner,
applicable.
theUnitedContractor
States
Virginia
withis

Copyright
respect to unpublished
Information
portionsoftheSoftware.

All Contents Copyright © 2013 MicroStrategy Incorporated. All Rights


Reserved.

Trademark Information

MicroStrategy7i
MicroStrategy, MicroStrategy6, MicroStrategy 7,MicroStrategy 7i,
Evaluation Edition, MicroStrategy 7i Olap Services,
MicroStrategy 8, MicroStrategy 9, MicroStrategy Distribution Services,
MicroStrategy MultiSource Option, MicroStrategy Command Manager,
MicroStrategy Enterprise Manager, MicroStrategy Object Manager,
MicroStrategy Reporting Suite, MicroStrategy Power User, MicroStrategy
Analyst, MicroStrategy Consumer, MicroStrategy Email Delivery, MicroStrategy
BI Author, MicroStrategy BI Modeler, MicroStrategy Evaluation Edition,
MicroStrategy
MicroStrategyAdministrator,
BI Developer Kit,
MicroStrategy
MicroStrategy
Agent,MicroStrategy
BroadcastServer, MicroStrategy
Architect,

MicroStrategy
Adapter,
Broadcaster,MicroStrategy
IntelligencePlatform,
Applications,MicroStrategy
MicroStrategyDesktop
eCRM
Executive,MicroStrategy
7,MicroStrategy
MicroStrategy
OLAP
IntelligenceServer
Provider,
MicroStrategyConsulting,
Analyst,
Education,
Narrowcast
Infocenter,
Broadcaster
Customer
MicroStrategy
MicroStrategy
UniversalEdition,
MicroStrategy
Server,
MicroStrategy
Analyzer,
Server,
MicroStrategy
SDK,MicroStrategy
Desktop
MicroStrategyBusiness
MicroStrategyDesktop,
MicroStrategyCRM
eTrainer,MicroStrategy
Intelligence
MicroStrategy
Designer,
Objects,
Server,
MicroStrategy
Support,
MDX

Centralized
MicroStrategyWebBusiness
Developmentand
MicroStrategyTelecaster,
Application
SophisticatedAnalysis,
Management,
MicroStrategy
Analyzer,Information
MicroStrategyWorld,Application
Transactor,MicroStrategy
BestInBusiness
Like Water,
Intelligence,
Intelligence
Web,
Through Every Phone, Intelligence To Every Decision Maker, Intelligent E-
Business, Personalized Intelligence Portal, Query Tone, Rapid Application
Development, MicroStrategy Intelligent Cubes, The Foundation For Intelligent
Enterprise,The
Intelligence
E-Business,The
Platform
PlatformForIntelligent
Integrated
Built ForTheInternet,
Business Intelligence
E-Business,
Office
PlatformBuilt
Intelligence,
The Scalable ForThe
MicroStrategy
Business

Office, MicroStrategy ReportServices, MicroStrategyWeb MMT, MicroStrategy


trademarksortrademarks
IntegrityManager
Web Services,Pixeland
Perfect,
MicroStrategy
ofPixel-Perfect,
MicroStrategy
DataMicroStrategy
MiningServices
Incorporated.Mobile,
areallregistered
MicroStrategy

All
without
companieswith
MicroStrategymakes
othernotice.
companyand
MicroStrategy
whichthey
noproduct
warrantiesor
areisnot
associated.Specifications
namesmay
responsiblefor
commitmentsconcerning
betrademarks
errorsor
subjectto
oftherespective
omissions.
theavailability
change

of future products or versions that may be planned or under development.

Patent Information

This product is patented. One or more of the following patents may apply to the
product sold herein: U.S. Patent Nos. 6,154,766, 6,173,310, 6,260,050,
6,263,051, 6,269,393, 6,279,033, 6,567,796, 6,587,547, 6,606,596, 6,658,093,
6,658,432, 6,662,195,6,671,715, 6,691,100, 6,694,316, 6,697,808, 6,704,723,
6,741,980, 6,765,997, 6,768,788, 6,772,137, 6,788,768, 6,798,867, 6,801,910,
6,820,073, 6,829,334, 6,836,537, 6,850,603, 6,859,798, 6,873,693, 6,885,734,
6,940,953, 6,964,012, 6,977,992, 6,996,568, 6,996,569, 7,003,512, 7,010,518,
7,016,480, 7,020,251, 7,039,165, 7,082,422, 7,113,993, 7,127,403, 7,174,349,
7,181,417, 7,194,457, 7,197,461, 7,228,303, 7,260,577, 7,266,181, 7,272,212,
7,302,639, 7,324,942, 7,330,847, 7,340,040, 7,356,758, 7,356,840, 7,415,438,
7,428,302, 7,430,562, 7,440,898, 7,486,780, 7,509,671, 7,516,181, 7,559,048,

8,094,788,
7,574,376,
7,881,443,7,925,616,
7,617,201,
8,130,918,7,725,811,7,801,967,
7,945,584,
8,296,287,7,970,782,
8,321,411and
7,836,178,
8,005,870,
8,452,755.
7,861,161,
8,051,168,
Other7,861,253,
patent
8,051,369,
applications are pending.

How to Contact Us

MicroStrategy University MicroStrategy Incorporated


1850 Towers Crescent Plaza 1850 Towers Crescent Plaza
Tysons Corner, VA 22182 Tysons Corner, VA 22182
Phone: 877.232.7168 Phone: 703.848.8600
Fax: 703.848.8602 Fax: 703.848.8610
http://www.microstrategy.com/education
E­mail: education@microstrategy.com E­mail:
http://www.microstrategy.com
info@microstrategy.com
PREFACE

Course Description
This 2-day course covers advanced issues in data warehousing design and
explains how to work with these complexities when implementing a
MicroStrategy project. The course assumes an understanding of basic report
building andprojectdesign conceptsfromthe2-day MicroStrategy Desktop:
ReportingEssentials course,the2-dayMicroStrategy Architect: Project Design
Essentials course, and the 1-day MicroStrategy Architect: Advanced Project
Design course as well as a basic knowledge of SQL.
ina
Students
can
model
The affectreportanalysis
coursecovers
and
willlearn
schemato
avariety
how
resolve
tomodel
andthen
ofdata
them
complex
examining
warehousing
MicroStrategy
hierarchies
how
topics,explaining
todesign
and
reporting
attribute
the datawarehouse
environment.
issues that

relationships,implement
design
modeling
theschema
and schema
foroptimal
design
roleissues,
attributesand
performance, use
and optimize
slowlychanging
logicalviewsto
query performance
dimensions,
solvedata
using

both database and MicroStrategy functionality.

arise
environment.
After when
know taking
howtodesigning,
this
bestcourse,students
accommodate
building, and
them
will
querying
understand
within the
adata
MicroStrategy
thecomplexissues
warehouse, reporting
andthey
thatoften
will

Who Should Take This Course


This course is designed for:

• Project architects

Course Prerequisites
Before starting this course, you should know all topics covered in the following
courses:

• MicroStrategyDesktop: ReportingEssentials

• MicroStrategy Architect: Project Design Essentials

• MicroStrategy Architect: Advanced Project Design

You should alsohaveabasic knowledge of SQL.

Follow­Up Courses
This course does not have any recommended follow-up courses.

Related Certifications
To validateyourproficiencyin the contentofthis course, you might consider
taking the following certification:

• Certified Engine Specialist


Course Objectives
After completingthis course,youwillbeableto:
• Describevarious schemadesigns,identify factors to consider
in choosing a particular type of schema, and design the data
warehouse schemafor optimal performance in the MicroStrategy
environment.
• Describe
Architect
complexattribute
elementsforattributes
andcreate
the concept
andfact
logicalviews.
ofexpressions
logical Use and
views
logical
in MicroStrategy
createdistinctlists
viewsto define of

in a star schema.
• Describe advanced data modeling concepts and explain how
to design thedata warehouse model and schema to support them
implementing
•in aDescribe
MicroStrategy in aMicroStrategy
attributeroles
themproject. thefourmethods for
and explainproject.

• Describe advanced data modeling concepts and explain how


•in
to a
design
Describe
MicroStrategy
thedata
slowlychanging
warehouse
project. dimensions
model and schemato supportthem

and the three methods


ina
for implementing them MicroStrategy project.
About the Course Materials
ContentDescriptionsLearning
Objectives
Lessons
Opportunities for Practice
Typographical Standards

This course is organized into lessons and reference appendices. Each lesson
focuses on majorconceptsand skills thathelp you to better understand
MicroStrategy productsanduse themto implement MicroStrategy projects. The
appendices provide you with supplemental information to enhance your
knowledge of MicroStrategy products.

ContentDescriptions
Each majorsection of thiscoursebegins withaDescription

heading. The
Description introduces you to the content contained in that section.

Learning Objectives
Learning objectives enable you to focus on the key knowledge and skills you
for
should
you obtain
atthefollowing
by successfully
threelevels:
completing this course.Objectives are provided

• Course—You will achieve these overall objectives by successfully


completing all the lessons in this course. The Course Objectives heading in
this Preface contains the list of course objectives.

• Lesson—Youwill achievethesemain objectives by successfully completing


all thetopicsinthe lesson.Youcanfindthe primary lesson objectives
directly under the Lesson Objectives heading at the beginning of each
lesson.
• Main Topic—You will achieve this secondary objective by successfully
completing the main topic. The topic objective is stated at the beginning of
the topic text. You can find a list of all the topic objectives in each lesson
under theLesson Objectives heading atthe beginning ofeachlesson.

Lessons
Each lesson sequentially presents concepts and guides you with step-by-step
procedures. Illustrations, screen examples, bulleted text, notes, and definition
tables helpyouto achieve the learning objectives.

Opportunities
This forPractice
version of this course manual excludes hands-on exercises. If

you are
interested in taking the complete course, please contact MicroStrategy
Education at education@microstrategy.com.

Typographical Standards
of
Following are explanations the font style changes, icons, and different types
of notes thatyou see in this course.

Actions

Referencestoscreenelementsand keysthatarethefocus of actions are in bold


Arial fontstyle.Thefollowingexample showsthisstyle:

Click Select Warehouse.


Code

References to code, formulas, or calculations within paragraphs are formatted


in regular Courier.New font style. The following example shows this style:

Sum(Sales)/Number of Months

Data Entry

References to literal data you must type in an exercise or procedure are in bold
Arial font style. References to data you type that could vary from user to user or
system to system are in bold italic Arial font style. The following example shows
this style:

Type copy c:\filename d:\foldername\filename.

Keyboard Keys

References to a keyboard key or shortcut keys are in uppercase letters in bold


Arial font style. The followingexampleshows thisstyle:

Press CTRL+B.

New Terms

New arefirst
they termstonote
encountered
are in regularitalic
inthe course.The
font style.
following
Theseterms
example
are
shows
defined
thiswhen
style:

The aggregation level is the level of calculation for the metric.

Notes and Warnings


A note icon indicates helpful information.

A warning icon calls your attention to very important information that you
should readbefore continuingthe course.
Other MicroStrategy Courses
CoreCoursesAdvanced
Courses

Core Courses
• Implementing MicroStrategy: Development and Deployment

• MicroStrategy Web Essentials

• MicroStrategy WebforReporters and Analysts

• MicroStrategy Web for Professionals

• MicroStrategy Visual Insight Essentials

• MicroStrategyReport Services: Dynamic Dashboards

• MicroStrategyMobilefor App Developers

• MicroStrategy Architect: Project Design Essentials

• MicroStrategy Desktop: Reporting Essentials

• MicroStrategy Desktop: Advanced Reporting

• MicroStrategyOfficeEssentials

Advanced Courses
• MicroStrategyAdministration: Configurationand Security

• MicroStrategy Administration: Application Management


• MicroStrategy Engine Essentials

• MicroStrategy Architect: Advanced Project Design

• MicroStrategy Advanced Data Warehousing

• MicroStrategyDataMining andAdvanced Analytics

• Deploying MicroStrategy HighPerformance BI

• MicroStrategy Desktop: Advanced Reporting Case Studies

• MicroStrategy Freeform SQL Essentials

• MicroStrategy Transaction Servicesfor Dashboard andMobile App


Developers

• MicroStrategy Web SDK: Customization Essentials

• MicroStrategyWebSDK: Customizing Security

• MicroStrategyWebSDK: PortalIntegration

All courses aresubjectto change. Please visit the MicroStrategy website


for the latest education offerings.
1
INTRODUCTION TO ADVANCED
DATA WAREHOUSING

Lesson Description

This lesson describes how the design of the data warehouse affects performance
in the reporting environment and provides an overview of the concepts that are
covered in this course.

In this lesson, you will learn why data warehouse design is so crucial to
achieving efficient reportperformance and resolving complex report
requirements. Then, youwillbe introduced at a high level to the topics that are
covered in this course.
Lesson Objectives

After completingthis lesson,youwillbeableto:

Explain how data warehouse design affects the reporting environment and
describe the topics that are covered in this course.

•AfterExplainhow
completing data
the topics
warehouse
in thislesson,
design contributes
you will betoableto:
an efficient

reporting
environment.

• Describe thetopicsthat are covered in this course.


Data Warehouse Design and Reporting
Satisfying ReportRequirements
Increasing Query Performance

After completing thistopic, youwill beableto:

Explain how datawarehouse design contributes to an efficient reporting


environment.

The MicroStrategy product platform includes various applications designed to


help you build a powerful, robust reporting environment. For example, in the
MicroStrategy Desktop: ReportingEssentials course, youlearned about
functionality that renders report development more efficient, including the
following features:

• Prompts—Because prompts make reports dynamic, embedding prompts


into
reporting
sincetheresultset
reportsenablesdevelopers
requirements.
changesbased
Asingletocreate
reportcan
onhowusers
fewer
answermany
reportsto
respondtosatisfymultiple
typesofqueries
prompts.

• additional
ReportObjects—With
informationin
MicroStrategy OLAP Services,developers can include
the SQL generated for Intelligent Cube reports
beyond the data initially displayed to users. Enabling developers to “plan
• ahead”
query
Reportexecutesagainstthedata
Filter
foruseractivities,such
Qualifications—Developers
as
warehouse.
drilling,canuse
can reduce
reports
the they
number
haveoftimes
alreadya

designed to filter the result setsof other reports.

These features are just a few examples of functionality available in


build
MicroStrategy
Essentials
on
The
better
conceptsthat
MicroStrategy
reports.
Desktop
course.Itcovers
are
andMicroStrategy
Desktop:
introducedinthe
more
Advanced
advanced
OLAPServices
MicroStrategy
Reporting
uses ofmetricsand
course
thatyou
Desktop:
further
canuseto
Reporting
expands

filters as
well as other application objects, like custom groups and consolidations,
that enable developers toproducemorecomplexreports.
MicroStrategy Architect also provides functionality that helps you better design
projects to achieve your reporting requirements. For example, in the
MicroStrategy Architect: Project Design Essentials course, you learned about
functionalitythatsimplifies
the following features: project creation andreport development, including

• HeterogeneousFacts
attributesandfacts that
warehouse,eliminatingandtheneed
maptomultiple
Attributes—Project
physical
architects
columnscancreate
inthe data

for completely homogeneous tables. This


functionality also provides a layer of abstraction between the structure of
• but
Attribute
that
having
reports
the underlying
do
makethisdata
notneedto
to
butarenotnecessarily
Forms—For
createseparate
dataand
aggregate,
available
descriptivedata
the
attributes.
developers
fordisplay,
project
knowledgeable
architects
that
anduserswhomust
sorting,
users
abouthowdatais
can
want
and
create
qualification
toviewon
attribute
create
stored.
reports
without
andrun
forms

The ways in which you map schema objects and construct reports affect whether
can
a business
do so. However,
intelligence
even
systemcananswer
though you can answermany
complex queries
questions
andhow efficiently it
using

successfully
warehouse
MicroStrategy
design
and
functionality,
efficiently
contributes
querying
the
to designofthe
anefficient
thedatareporting
warehouse.Specifically,
databaseitself isjustas data
integral to

environment by enabling
you to do the following:

•Satisfying
Satisfy
reporting
database toolReport
Increasequery
report
level Requirements
requirements
functionalityor
performance that
that
youyou
eithercannot
can more efficiently
achieve using
resolve
only
atthe

You can easily handle many of yourusers’ report requirements through


reporting functionality, such asadvanced qualifications or metric calculations.
However, youmust resolve some complex types of analysis by focusing on the
structure of the data itself.
For example, if relationships between certain data elements change frequently
and users need to analyze how those relationships change over time, you can
provide this capability by specifically designing tables to support querying
variouscommon
Other
database“versions”ofthe
level. For
business
example,
scenarios
data.
users may
are alsoonesyou
havereports canreadily
in whichtheyresolveat
need to the
view

the relationships thatexist within a single set of source data. For instance, users
may want to see a list of employees, and then, within that list, they need to know
which employees manage other employees. One way to resolve this requirement
is to modify the lookup table structures tosupport such a query.

Increasing Query Performance


In addition to helping achieve certain reporting requirements, the design of a
data warehouse is critically important in determining the level of performance
that users can expect in the reporting environment. Report developers can
carefully design every object used in a report so that it generates the most
efficient SQLpossible. Project architects can meticulouslymapevery schema
object inaprojecttothedata warehouse tables,makingsuretoaccount for all
relationships in the data. However, the best SQL query or data mapping is only
as good as the underlying logical data
inthemodel and schema design. Carefully
performance
constructedon
Depending reports
ifthedata
how datais
andwarehousemodel
architected
related projects
logicaldata
orschemais
canstillmodel
provideless
poorlydesigned.
and the
thanstellar
supporting

schema, joins betweentables can bemade more or less complex. The schema
performance.Inevitably,
well
structureitself
asthesize canalso
of thetables
impactthe
some
being
types
joined—both
numberof
of dataalways exist in high
joinsrequired
factorsthat seriously
for agivenqueryas
volume,
affect
like

consuming.
that
customer
tables
thetablestructure
as efficient
information.
as possiblebypaying
However,
itselfdoesyou
notcanmake
makethetask
attentionretrieving
to schema
moredifficult
records
designfrom
or
andensuring
time
large
Overview of Advanced Data
Warehousing

After completingthistopic,youwillbe able to:

Describe the topics that are covered in this course.

Given its impact on the reporting environment, data warehouse design is


essential to theultimate success ofany business intelligence effort, both in
terms of the breadth and complexity of analysis that is possible and the
efficiency with which youcan accomplish thatanalysis.

Many complex reporting challenges originate in the structure of the data


warehouseanitself.
answer toaproblem.
creating optimal Therefore,
datawarehouse
You canovercome
modifying
design.Because
thatstructure
orreduce many
the
oftenprovides
designof
performance
thethe
database
issuesby
best

is one factor over which companies exercisetheirown internal control


independentofany reporting tool, it can be thebestoffensive strategyingetting
the most out of your data analysis.

General recommendations existfor data warehouse designthat are well


documented in various businessintelligence booksand publications. Although
this
MicroStrategy
•course
the
Resolvingcommon
coursedoes
istoexplain
logicaldata
project, butorcomplexdata
containsome
model
howtocreate
including
schema
generic
the
thefollowing:
design,use
bestdatawarehouse
recommendations,
modeling issuesthe
through
design
focusof
fora
changes
this to

of MicroStrategy functionality,
or some combination thereof

• modeling
Selectingthe
Creating
achievethebest
logical
andoptimalschemadesign
schemadesign
views
possible
inMicroStrategy
report
issues
performance
forArchitectto
lookup tables
resolveavarietyofdata
andfacttables to

This course is divided into the following seven sections which includes an
appendix:
• Advanced Schema Design—Describes the various types of schemas,
analyzes their differences, reviews fact table key structure, and provides
optimal schema recommendations for a MicroStrategy project performance
• Logical
in the MicroStrategy
Views—Describes
environment
howtocreate logical views in MicroStrategy

and
Architect
fact expressions
and provides examplesofusingthemto build complexattribute

• Many
many
projectrelationships
toMany Relationhips--Describes
andmethodsfor resolving
the challenges
them inacaused
MicroStrategy
by many-to-

• Attribute Roles—Describes the challenges caused any time you have a


ina
• Hierarchies—Describes
attribute
column andsingle
methodsfor
lookup
the
tablethatisusedtodefinemore
resolvingtheminaMicroStrategy
challenges caused by ragged, recursive
than
project
oneand split

hierarchies and methods for resolving them in a MicroStrategy project

• you
Slowly
haveattributes
Changing Dimensions—Describes
that vary overtime andmethods
the challlengescaused
for resolvingany
themin
time a

MicroStrategy project

• Appendix A—Describes guidelines for implementing various types of


optimization,includingaggregatetables,
The solutions to data modeling and schema
partitioning,and
design issues involve
indexing

modifying the logical data model or schema of your data warehouse. You
can
views,
Views
alsoresolve
lesson.Thedatabase
a featureof
manyofthese
MicroStrategy
andissuesattheapplicationlevel
Architect thatis covered in using
theLogical
logical

application levels are both viable


alternatives forimplementing solutions, depending on the nature of your
data and the characteristics of your reporting environment.
Lesson Summary

In this lesson,youlearned:

• While MicroStrategy provides functionality that can help you answer many
complex questions, the design of the database itself is also integral to
successfully and efficiently querying the data warehouse.

• You may have reporting requirements that you can only in


resolve by making
• The
changes
design
tothe
ofthe
structureofthedata
datawarehouse is warehouse.
critically important determining the

level of performance that userscanexpect inthereporting environment.

• The focusof this courseis toexplain howtocreatethe bestdata warehouse


design fora MicroStrategy project.
2
ADVANCED SCHEMA DESIGN

Lesson Description

This lesson describes concepts related to schema design, including snowflake


and star schemas, normalization and denormalization, table volume, and fact
table key structure.

In this lesson, you willlearnabout each of these concepts and their impact on
report analysis. Youwilllearnhow to design the data warehouse schema for
optimal use in the MicroStrategy reporting environment.
Lesson Objectives

After completingthis lesson,youwillbeableto:

Describe various schema designs, identify factors to consider in choosing


aparticular type of schema, and design the data warehouse schema for
optimalperformance in the MicroStrategy environment.

schemasandthedifferences
•AfterDescribethe characteristicsof
completing thetopics betweenthetwotypes
both snowflake
inthis lesson,youwill beableto:and
ofschemas.
star

Explain the impact of each schema type on query performance.


• Describe the optimal schema design to use with MicroStrategy,
identify
the logical
schemafactors
keyfora factthat
tableinMicroStrategy
affect query performance,
Architect.
and define
Snowflake and Star Schemas
SnowflakeSchemasStarSchemas

Use of Aggregate Tables with Star Schemas

After completing this topic,you will be able to:

Describe the characteristics ofboth snowflake and star schemas and the
differences between the two types of schemas. Explain the impact of each
schema type on query performance.

The design of thephysical schema of the data warehouse is central to


optimizing
structure
performanceusers
oftables
query performance.Theway
in
experience
a datawarehouse
when running
has
in which
a significant
queries
youagainstthe
designthe
impact on
overall
database.
the

Two primary types of schemas exist, both of which have different effects on the
SQL generated for queries. The two schema types are the following:

• Snowflake
Star

In discussing these two schema types, you will look at how the structure of the
schema differs for the following data model:

Sample LogicalData Model


Snowflake Schemas
A snowflake schema can take a variety of forms, depending on the physical
structure of the table. These forms differ in the degree of denormalization they
employ.

Denormalization occurs any time data is stored multiple times, or stored


redundantly. Conversely, when aschemacontains no data redundancy, it is
completelynormalized. Snowflakeschemashave three basic forms:

• Completely normalized

• Moderately denormalized

• Completely denormalized

CompletelyNormalized Snowflake

A completely normalized snowflake schema does not store any data


redundantly. Itonlystores enoughinformation to relate parent and child
attributes. If you apply a completely normalized snowflake schema to the
sample data model, the schema looks like the following:

Completely Normalized Snowflake Schema Design

This schema contains a separate lookup table for each attribute in each of the
two hierarchies, a characteristic that is common to allformsof snowflake
schemas. As a result, snowflake schemas contain multiple lookup tables for each
hierarchy present in a data
lookup
model. Though, in this normalized form of the
snowflake schema, each table contains only the following information:

• Attribute ID

• Attribute description (if one exists)

In
the
• additiontothe
ID
IDof
ofthe
immediate
immediate
attribute’s
parentattribute
parent,
ID anddescription
which is necessary
columns,
tomapeachtable
the relationship
contains

bare
between
minimumof
theparent
information
andchild attribute.
necessaryAs
torelate
such, thisstructure
the data.No stores
redundant data is
onlythe

stored, so the schema is normalized. Also, because the tables are storing the
minimum amountof data, they are as small as they can possibly be.
When you need to query information from the fact table and join it to
information in the higher-level lookup tables, more joins are necessary in the
SQL to achieve the desired result.

For example,youcould
Joins in a Completely
runthe
Normalized
followingreport:
Snowflake Schema Design

To join the Customer Statedescription (Cust_State_Desc) to the Sales metric


(calculated from Sale_Amt) requires three joins between tables since the
Customer State description isnotstoredin either the LU_CUSTOMER or
LU_CUST_CITY tables.

In summary, a completely normalized snowflake schema has the following


characteristics:

• Contains many tables (one per attribute)

• Contains
absolutelyrelatively
necessary smaller
tomaptables
relationships)
(does notstore more datathanis

• Storesonlythe IDof theimmediate parent in child tables


• Requires more joins when querying fact data in conjunction with higher-
level lookup tables

Characteristicsof Completely Normalized Snowflake Schema Design

Moderately Denormalized Snowflake

A moderately denormalized snowflake schema stores some data redundantly.


You generally introduce redundancy to increase query performance. Each table
stores not only information to relate parent and child attributes, but also
contains additional columns to make querying data faster. If you apply a
moderately denormalized snowflake schema to the sample data model, the
schema looks like the following:

Moderately Denormalized Snowflake Schema Design


Like the normalized snowflake, this schema also contains a separate lookup
table for each attribute ineach of the two hierarchies. In this moderately
denormalized form ofthesnowflake schema, each lookup table contains the
following information:

• Attribute ID

• Attribute description (if one exists)

• IDs
ID ofofallother
immediatehigher-level
parentattribute
attributes

In additiontothe attribute’sIDand description columns, each table contains


the ID of the immediate parent, which is necessary to map the relationship
between theparentand child attribute. Unlike the completely normalized
schema, each table also contains the IDs of all other higher-level attributes in
the hierarchy. Assuch, thisstructure replicates thesame data in multiple tables.
For example, inthe LU_CUSTOMER table, the same customer region IDs are
stored in the Cust_Region_ID column multiple times for every customer who
belongs to that particularregion. Because thisschema introduces some degree
of redundancy in the storage of attribute IDs, the schema is moderately
denormalized. Also, because the tables are storing the additional ID columns,
they are slightly larger than in the completely normalized snowflake, though
minimally so since ID columns do not take up much space.

Even
it canthough
haveaperformance
this structureadvantage.
stores more data,you
When dependingonthe
need toquery information
volumeofdata,
from

the fact tableand join it to information in the higher-level lookup tables, fewer
joins are necessary in the SQL to achieve the desired result.

Now,
required
Joinsin
if you
to obtain
runthe
a Moderately
the
samereporttodisplay
resultDenormalized
set: customer
Snowflake
statesales,
Schemafewer
Design
joins are

To join the Customer Statedescription (Cust_State_Desc) to the Sales metric


(calculated from Sale_Amt)requires only two joins between tables since the
Customer State ID is stored in the LU_CUSTOMER table.

In summary, a moderately denormalized snowflake schema has the following


characteristics:

• Contains many tables (oneper attribute)


• Contains relatively smaller tables (slightly larger than a normalized schema
due to additional ID columns)

• Stores theIDofthe immediate parentand all higher-level attributes inchild


tables

• Requires fewer joins when querying fact data in conjunction with higher-
level lookup tables

Characteristics of Moderately Denormalized Snowflake Schema


Design

Completely Denormalized Snowflake

A completely denormalized snowflake schema stores data redundantly to the


greatest degree possible. You generally maximize redundancy to increase query
performance. Each contains
tablestores
denormalized
additional
not only information to relate parent andchild
attributes, but also columns to make querying data faster. If
you apply a completely snowflake schema to the sample data
model, the schema looks like the following:

Completely Denormalized Snowflake Schema Design


Like the other two snowflakes, this schema also contains a separate lookup table
for each attribute in each of the two hierarchies. Yet, in this completely
denormalized form of the snowflake schema, each lookup table contains the
following information:

• Attribute ID

• Attribute description (if one exists)

• ID ofofallother
IDs immediatehigher-level
parent attribute
attributes

• Descriptions ofallotherhigher-level attributes (if they exist)

In addition totheattribute’s ID and description columns, each table contains


the ID of the immediate parent, which is necessary to map the relationship
between the parentand childattribute. Unlike the completely normalized
schema, each tablealso containsthe IDs and descriptions of all other higher-
level attributes in the hierarchy. As such, this structure replicates the same data
in multiple tables. Forexample, inthe LU_CUSTOMER table, the same
customer region IDs and descriptions are stored in the Cust_Region_ID and
Cust_Region_Desc columns multiple times for every customer who belongs to
that particular region. Because this schema introduces the maximum degree of
redundancy in the storage of attribute
the IDs and descriptions, the schema is
completelydenormalized.Also,
and
snowflakes,
it
much
Even
need
can
description
toquery
though
haveaperformance
morespace.
specifically
thisstructure
information
columns, because
theyare
advantage
from
storesmore
becausethe
thesignificantly
fact
additional
overthe
data,
tabletablesare
dependingonthevolume
and
other
description
larger
join
twosnowflakes.
thanin
storing
it to columns
information of
theothertwo
the additional
takeup
Whenyou
indata,
theID

higher-level lookup tables, fewer joins are necessary in the SQL to achieve the
desired result.

Now, if you runthe samereporttodisplay customer state sales, fewer joins are
required
JoinsinCompletely
to obtain the resultset:
Denormalized Snowflake Schema Design

To join the Customer Statedescription (Cust_State_Desc) to the Sales metric


(calculated from Sale_Amt) requires only one join between tables since the
Customer State ID and description are both stored in the LU_CUSTOMER
table. As a result, the query has to access only the lowest-level lookup table to
obtain all of the necessary information for the report.

In
characteristics:
summary, a completely denormalized snowflake schema has the following

• Contains many tables (one per attribute)

• Containssignificantly larger tables (much larger than other snowflake


schemas due to additional ID and description columns)

• Stores theIDsand descriptions of the immediate parentand all higher-level


attributes in child tables

• Requires onlya single joinwhen querying fact data in conjunction with


higher-level lookup tables

Characteristics
Design of Completely Denormalized Snowflake Schema

Star Schemas
A star schema is a design that contains only one lookup table for each hierarchy
in the data model instead of having separate lookup tables for each attribute.
With only a single lookup table for each hierarchy, the IDs and descriptions of
all
completely
involves
attributes
agreat
denormalized.
inthehierarchy
degree of redundancy.
If youapply As
arestoredastar
inthe
such,star
schema
sametable.
schemas
to theThis
sampledata
aretype
always
ofstructure
model,

the schema
StarSchema
looks like
Design
the following:

This schema contains onlytwo lookup tables, one for each hierarchy.
LU_LOCATION storesthedatafor all of the attributes in the Location
hierarchy, while LU_CUSTOMER stores the data for all of the attributes in the
Customer hierarchy. Asaresult,star schemas contain very few lookup
tables—one for each hierarchy present in the data model. Each lookup table
contains the IDs and descriptions (if they exist) for all of the attribute levels in
the hierarchy.

Even
can bethough
much larger
youhave
because
fewertables
each one
ina
stores
star schema
all ofthethan
information
a snowflake,
for an
theentire
tables

hierarchy.Whenyouneed
informationinthe
achieve
For example,
thedesired
ifyoulookup
result.
run thetables,onlyasinglejoin
toquery
same reportto
information
display
from
customer
isnecessary
the factstate
tableandjoin
inthe
sales,
SQLonly
toitto
one

join betweenthe lookup and fact tableisrequired to obtain theresult set:

Joins in a Star Schema Design


To join the Customer Statedescription (Cust_State_Desc) to the Sales metric
(calculated from Sale_Amt)requires only one join between tables since the
Customer State ID and description are both stored in the LU_CUSTOMER
table. As a result, the query has to access only one lookup table to obtain all of
the necessary information for the report.

Even though achieving this result set requires only a single join, star schemas do
in
not
performance.
fact
any
necessarily
table.Insuch
onehierarchy,
equate
cases,
you
tobetter
more
maybejoining
joins
performance.
betweensmaller
a verylarge
Depending
tablescan
lookup
onthevolumeofdata
table
yield
toabetter
verylarge

In summary, astarschema hasthe following characteristics:

• Contains fewer tables (oneper hierarchy)

• schemas
Contains due
verylarge
to storingall
tables(much
attribute
largerthan
ID and description
someformsof
columns)
snowflake

• Stores
table theIDs and descriptions of allthe attributes ina hierarchyin a single

• Requires only a single join when querying fact data regardless of the
attributelevelat which you arequerying data

Characteristicsof Star Schema Design


Another form ofstarschema is the consolidated star schema, which
resembles a starschemaexcept it contains a level ID that indicates the
attribute level ofeachrecord within a hierarchy’s lookup table. For
example, in our sample data model, Customer Region might have a level
ID of 1, Customer State a level ID of 2, and so on. The MicroStrategy SQL
Engine looks at the columns to which attributes are mapped to determine
the level of aggregation and generate the appropriate SQL, not the data
itself. Therefore, MicroStrategy can support consolidated star schemas
Use
not
Althoughoflogicalviews.
ifAggregate
recommended
only
use
For information
MicroStrategy
youcreate
ifyou
ona LevelTables
generally
logical
plantouse
attribute
views, with
supports
see
onwhich Star
“Logical
starschemas,
you Schemas
Views”
qualify
usingastar
reportsoryou
schema
can
is

aggregate fact tables. For example, you


could run a report to view sales by customer city. This report executes against
the LU_CUSTOMER and FACT_SALES tables asfollows:

Base Fact Table with Star Schema Design


In the illustration above, only the columns that are used in the report are
included in the sample dataforthe LU_CUSTOMER and FACT_SALES
tables. However, theactual tables would contain all of the columns
referenced in the schema.

In this first example, the LU_CUSTOMER table joins to the FACT_SALES table,
a base fact table in which the data is stored at the lowest level of each of the
hierarchies. As a result, fact records join to the LU_CUSTOMER table through
the Cust_ID column. Since the report template requires the sales data to be
displayed at the level ofCustomer City,the totalfor each customer is aggregated
and
result
comprise
grouped
set for
the bycustomer
each
salesfor
of the
Colorado
three
city for
cities,with
Springs.
theresult
Because
sales
set.for
The
the
Cassie
report
join andTimcombinedto
tothe
returnsacorrect
facttable occurs

at the lowest level,thesales data areappropriately aggregated.


an
aggregate
this
rather
If users
manner,
thanbeing
tend
factto
the
table
perform
aggregation
performed
thatsummarizes
this query
onthe
isdone
frequently,
flyevery
when
thedataatthelevelof
datais
timethe
youmay
loadedinto
report
wantCustomer
isrun.
tothe
build
warehouse,
Now,ifyou
City. In
run the same report, it executes against the LU_CUSTOMER and
FACT_SALES_AGG tables as follows:

Aggregate Fact Tablewith StarSchema Design

In the illustration above, only the columns that are used in the report are
included in the sample data for the LU_CUSTOMER and
FACT_SALES_AGG tables. However, the actual tables would contain all of
the columns referenced in the schema.

Now that the sales data has been aggregated to a higher level, the records in the
FACT_SALES_AGG table need to be able to join to a distinct list of customer
cities to prevent multiple counting from occurring. Though, in a star schema, a
distinct list of any higher-level attribute does not exist because there is only a
single lookup table for the entire hierarchy. Thus, IDs and descriptions for all
but the lowest-level attribute are repeated many times within a lookup table.
Since
one forrecords
each customer
for ColoradoSprings
who resides there,
occurtwice in the
thesales forLU_CUSTOMER
ColoradoSpringsinthe
table,
FACT_SALES_AGG table can join to two different records. As a result, the sales
for Colorado Springs are counted twice in the report and are therefore inflated.

individual
Because aggregatefacttablesmust
lookuptables forthe higher-levelattributes
join toa distinct listifofyou
attributes,
plan touse
youthem.
need

You should use star schemas only if you do not intend to create aggregate tables.
If the query profile in your environment requires the use of aggregate fact tables
schema.
example,inthis
to achievedesired
You stillscenario,you
retain
performance, youof
the benefits
could usea
needto
having
completelydenormalized
design
fewerajoins
snowflake
when accessing
schema.
snowflake
For
base

fact tables, butthe higher-level lookup tables enable you to use aggregate fact
tables where appropriate.
Recommended Schema Design
LookupTableVolumeFactTableVolume

Data Type of ID Columns


Fact Table Keys

After completing this topic, you will beable to:

Describe the optimal schema design to use with MicroStrategy, identify schema
factors that affect query performance, and define the logical key for a fact table
in MicroStrategy Architect.

You canbuild aMicroStrategy project against a data warehouse that uses a


snowflake or star schema design. For optimal query performance, there are
several
• Lookup
following:
factorsyou
tablevolume
shouldconsider when designing your schema, includingthe

• Fact table volume

• DatatypeofID
Facttable keys columns

Lookup Table Volume


Table volume refers to the amount of data in a table. For lookup tables, the
volume is determined by the cardinalityof the attributes and the amount of
descriptive information you store about those attributes. For example, a
200
company may have only stores, which results in a relatively small lookup
table with only 200 rows. That same company may have 2,000,000 customers,
which results in a very large lookup table with 2,000,000 rows. In addition, the
company may store a lot of demographic information in the customer lookup
table such as address, phone, email, and so forth, which also increases table
size.

Generally, the recommended optimal schema design for use with MicroStrategy
is to completely denormalize the lookup tables. As you already saw earlier, when
you completely denormalize a table, the ID and description columns for all of
As
the attributesina hierarchy areavailable in the lowest-level lookup table. a
one
the
performance.
result,
additional
join
regardlessof
betweenthebase
joinsthat
the leveltowhichyou
arisefromamore
facttableandtheaggregate
normalized
lowest-level
data,queries
schema
lookupincreasesquery
table.
require
Avoiding
only

number of joinsrequired
Depending
achieve
HighLookupTable
goodonperformance.
tablevolume,
toFor
retrieve
complete
example,
customer
denormalization
the following
data for areport:
illustration
isnot alwaysshowsthe
required to

Volume

In this example, the report queriesthe Customer hierarchy, and the result set is
aggregated to the level of CustomerState. The lookup tables use a completely
denormalized schema. In thisscenario, the table volume for customer
information in any data warehouse is likely to be large since the customer base
often comprises the largest volume of data in a warehouse. Because the table
volume is high, avoiding the extra joins that a more normalized schema would
entail is the optimal route for increasing performance.

following
If table volume
illustration
is low,the
showsthe
extrajoins
numberareof
notso
joinsrequired
problematic.
toretrievelocation
For example, the

data for a report:

Low Lookup Table Volume

In this example, thereport queries the Location hierarchy, and the result setis
aggregated to the level of State. The lookup tables are shown using both a
completely denormalized schema and a completely normalized schema. The
denormalized schema requires only a single join, whereas the normalized
schema requires three joins to produce thesame result. In this scenario, the
table volume for store information in anydata warehouse is likely to be small.
At most, a company might have a couple thousand stores and maybe only a
couple hundred. The normalized schema increases the number of joins, but the
denormalized schema unnecessarily increases the size of what is originally a
very small lookup table. The performance difference between using the extra
joins or increasing the size of the tables is not significant. Because the volume of
the
tables
lookup
inthishierarchy
tables issolow,
without
you couldretain
negatively affecting
anormalized
performance.
schemafor thelookup
Fact Table Volume
For facttables, thevolume is determinedby the number of records stored and
the amountof information (attributeIDs) you store for each record.

In certain cases, youmay want to denormalize facttables to include


contains
characteristic
thefollowing
attributes.
information:
For example, you haveacustomer lookup tablethat

Lookup Table for Customer

The LU_CUSTOMER table stores certain characteristic attributesabout the


customer, including Income, Gender, and Occupation. If these characteristic
attributes are not stored in the fact table, when you run a reportthatcontains
information about any of them, a join must be performed between the fact table
and lookup table through the Customer attribute. For example, you could run a
report that looks like the following:

Normalized Fact Table


Since the facttable stores only the customer ID, to display Gender, Income and
Occupation onthe report, you must join the records in the FACT_SALES table
to matching customer records in the LU_CUSTOMER table based on the
Cust_ID column.

You could modify the fact table to add the characteristic attributes as follows:

Denormalized Fact Table

Adding these attributes denormalizes the fact table, which increases the volume
of the fact table. With a denormalized structure, you can retrieve the
information for Gender, Income and Occupation from the FACT_SALES table
itself without needing to join to the LU_CUSTOMER table (unless you have
descriptions for characteristic attributesstoredinthe lookuptable that you
wantYoustill
tableifyouwant
todisplay).
haveto joinfromthe
to displaythecustomer
FACT_SALESnameson
tabletotheLU_CUSTOMER
thereport.

The LU_CUSTOMER table is most likely very large, and the FACT_SALES table
may
denormalize
also be very
the fact
large.
table
Insuch
(evencases,it
thoughmay
it increases
increasethesizeofthe
performanceto
table) rather

than requiring additional joins between thetwo tables. Thebest wayto


determine which method is faster—joining two large tables or querying a larger
fact table—is totestbothscenarios and analyze response times as it largely
depends on thepowerofthedatabase server.

could
Whenyoudenormalize
affectyour indexing
factstrategy.
tables, youadd
To ensure
columns
efficient
to use
thetable,
of indexes,
whichyou

should
runningreports.
indexonly the table columns on whichusers typically filterwhen

Data
Because Typeof
it is important IDthatthe
you useattribute
toensure Columns
ID columns for qualification,table joins,and indexes,

data type of ID columns enables you to


perform these tasksas efficiently as possible.

Whether they are present in lookup, relationship, or fact tables, you should
always create ID columns as either an integer, number, or date data type. You
should avoid using text data types, including varchar, to define ID columns as it
takes longerto qualifyor joinon text fields,andyoucannot take advantage of
indexes as efficiently.

Fact Table Keys


When designing your data warehouse schema for optimal performance, fact
table structure is a primary concern. By their very nature, fact tables can be
quite large due to the number of rows and columns they contain. As such,
Fact Table
ensuringthat
possible Primary Keys
can significantly
youcanretrieve
improveperformance.
information fromfact tablesas efficiently as

The first issue of concern with regard to fact tables has to do with defining the
physical primary key of the table. For example, you have the following fact table:

SalesFact Table

The FACT_SALES table contains IDs for the following attributes—Store,


Customer, Date, and Item. Sometimes when fact tables are originally built, each
of the attribute ID columns in the table are defined as the part ofthe compound
primary key for the table. However, MicroStrategy recommends that you don’t
define a primary key for fact tables as it can cause indexes to beused
inefficiently.

For example, in the scenario above, if you define Store_ID, Customer_ID,


Item_ID, and Date_ID as the compound primary key for the FACT_SALES
For
table, you createanindexon
example, youcouldchoose thiskey
to order
orderedin
the indexwhatever
as follows:
manner youdefine.

Index on SalesFact Table


With this index in place, you run the following report:

Report with Filter That Follows Index Order

In this report, you filter on Store, Customer, andDate, which representthe first
three columns in the index. When you run this report, it uses the index because
the report contains filter elements for the first three columns defined in the
index.

You may choose to run a slightly different report in which you view the data for
all stores not just Colorado Springs, as shown below:

Report with Filter That Does Not Follow IndexOrder

In this example, the Store filter is no longer partof the report filter. However,
because the Store_ID column is first in the index sequence, its absence from the
filter means thatthe Customer and Date indexes are not invoked. Thequery
runs as if there were no index on the table at all. Depending on how you filter
reports, queries may not take advantage of theindex you defined on this fact
table when setting up the primary key.

To make fact table indexes more effective, MicroStrategy recommends that you
don’t define primary keyson the fact table.Instead, you can build individual
indexes on each attribute ID column in the fact table on which users typically
filter when they run reports. In this manner, you canruna reportthatfilters on
Fact
any
the Table
queryuses
single Logical
attribute Keys
the individualindexes
or combination ofyou
attributes
createdcontained
for eachof
inthefacttable,and
thoseattributes.

The primary keys defined for a fact table in the data warehouse are not the only
keys that affect queries. When you add a table to the project, MicroStrategy
Architect automatically defines a logical key for that table. The logical key
consists of any columns in a table for which you have defined an attribute in the
project. For example, youhavethe following fact table:

Sales Fact Table

If the Store_ID, Customer_ID, Date_ID, and Item_ID columns are mapped as


form expressions for attributes in the project, each of these columns is
considered
If
hierarchy,
a fact
part
table
only
of the
contains
the
logicalkey
lowest-level
more for
thanattributeispart
the
oneattribute
table. column
ofthe logical
from the
key.same

The logical key enables the MicroStrategy SQL Engine to optimize the SQLit
generates whenever you run a report in which the template contains all of the
logical keys for a given fact table. For example, you could run the following
report against the FACT_SALES table:
Report with All Fact Table Logical Keys on the Template

In the illustration above, only the columns that are used inthe report are
includedtablewould
inthe sample
containall
data for the FACT_SALES table. However, the
actual of the columns referenced in the schema. For
all of the logical key illustrations in this lesson, the same subset of columns
is shown in the sample data.

Since an attribute is mapped to each attribute ID column in the FACT_SALES


table, all of them are partofthe logical key. Each of the attributes that are part
Sale_Amt
logical
of
to the
select
logical
key
the
factorgroup
and
recordsfrom
key
are
exist
present
onthereport
bythefact
anyofthe
on the table.It
template.
template.
attributes
doesTherefore,
not
since
need
theytoaggregatethe
thequery
areallpart
only
ofthe
needs

The SQL for this report looks like the following:


select a11.[Store_ID] AS Store_ID,
a14.[Store_Desc] AS
a11.[Customer_ID] ASStore_Desc,
Customer_ID,

a12.[Customer_Name] AS Customer_Name,
a11.[Date_ID] AS Item_ID,
a11.[Item_ID] Date_ID,
a13.[Item_Desc] AS Item_Desc,
a11.[Sale_Amt]AS WJXBFS1
from [FACT_SALES]a11,
[LU_CUSTOMER]a12,
[LU_ITEM] a13,
[LU_STORE] a14
where a11.[Customer_ID] = a12.[Customer_ID]
and a11.[Item_ID] = a13.[Item_ID] and
a11.[Store_ID] = a14.[Store_ID]

Because ofthe logicalkey,theSQLEnginecan recognizewhenreportSQLcan


beall
If simplified
of the attribute
and optimizeitaccordingly.
columns in a fact table are mapped to attributes in the

project, thelogicalkeysetting always producestheappropriate SQL. Often, you


do define attributes for all of the columns in a particular fact table. In some
circumstances, you may deliberately choose not to define a column in your
project.
source for
Forexample,
aspecific record:
a facttable may contain acolumn that identifiesthedata

Sales Fact Table with an Undefined Column

The Source_ID column identifies the origin of thedatafor each record in the
table. While this information may be useful for tracking or troubleshooting
purposes, it may not be information that users want to analyze on reports.As
such, if it is not necessary toSQL
report
does
Engine
not
onthe
map
ignores
column,
to anthe
attribute,
you
presence
do not define an
attribute for it. If the column it is not partofthe
logical key. Essentially, the of this column in
the table.
A similar situation could also arise any time you have a fact table that contains
attribute columns you have not defined in your project. For example, the
FACT_SALES table contains several attribute ID columns, including an
Item_ID
Salescolumn:
Fact Table with an Undefined Attribute

Even though Item_ID existsas a column inthe table, users may not want to
analyze item information on reports.If noreporting requirements exist for item
information, you may choose not to define an Item attribute. If no
corresponding attribute exists, Item is not part of the logical key, and the SQL
Engine ignores the presence of this column in the table.

In cases like these where there are columns in a fact table that are not defined in
from
actual
For
correctly
the project,
example,
thedatawarehouse
keyinthe
whenrunning
thelogical
youcould
data warehouse.When
keyreferenced
runthe
queries
key, the
following
against
SQLinEngine
the
the
MicroStrategydoes
report
table.
logical
may
againstthe
key
not aggregate
for a FACT_SALES
tableisdifferent
not matchthe
rows of data
table:

Report Without All Fact Table Logical Keys on the Template


In this example, the logical key consistsof Store, Customer, and Date,but the
report template contains only Customer. SinceStore and Date, which are part of
the logical key, are not on the report template, the SQL Engine aggregates the
Sale_Amt fact and groups bythe Customer attribute. The SQL for this report
looks like the following:
select a11.[Customer_ID] AS Customer_ID,
max (a11.[Sale_Amt])AS
sum
from
[LU_CUSTOMER]
where
(a12.[Customer_Name])
[FACT_SALES]
a11.[Customer_ID]
a12 a11, WJXBFS1
= AS Customer_Name,

a12.[Customer_ID]
group
In this example,
by a11.[Customer_ID]
theresultisaggregatedcorrectly because theSQLEngine

recognizes that theentirelogicalkeyisnot present onthe template and includes


the aggregation operatorforthefactandthe GROUP BY clause in the SQL
statement. As a result,thetwo items that Ian Rey purchased are aggregated and
grouped together asonerowinthe report.

You could alsorunthe following report against the FACT_SALES table.

Report with AllFactTableLogicalKeys on theTemplate(NotAll


Columns Part of Logical Key)

In this example, Store, Customer, and Date are all present on the report
template. SinceSQLand
these three
doesattributes
not aggregate
comprise
the the logical key, the SQL Engine
optimizes the Sale_Amt fact or group by any of
the attributes on the report. The SQLforthis report looks like the following:
select a11.[Store_ID] AS Store_ID,
a13.[Store_Desc] AS Store_Desc,
a11.[Customer_ID]
a12.[Customer_Name]
a11.[Date_ID] AS
a11.[Sale_Amt] ASDate_ID,
ASAS
WJXBFS1
Customer_ID,
Customer_Name,

from [FACT_SALES]
[LU_CUSTOMER]
[LU_STORE] a13a12,a11,

where a11.[Customer_ID]= a12.[Customer_ID]


and
In thisa11.[Store_ID]=a13.[Store_ID]
example,theSQL Engine automatically

assumes that the logical key also


represents thedatawarehouse key (the entireset of attribute columns in the
actual fact table). The result is incorrect since it fails to take into account the
Item_ID column. If there is only one record for a particular store, customer,
and date, the result is the same as if you had aggregated the data. However, Ian
Rey
displays
Reyaggregated
not purchased
two rowstwo
andfor
itemsat
grouped
Ian thetogether
instead
Taosstore
in
of theresult
a single,
on 11/17/2012.
aggregated
set. Therefore,
These
row.two
the records
report are

Engine
Regardless
fact table
isaware
thatyoudonot
ofthereason,
that the logical
define
any timeyou
keydoes
inyourhave
project,you
notadditional
represent the
need
attribute
totruekey
ensurethat
columnsina
of thedata
the SQL

warehouse table. Otherwise, optimizations in place for logical keys could result
in SQL that does not produce an accurately aggregated result set.

The
found
To disablethe
setting
inthethat
Logical
logical
controls
Table
key
how SQL is generated with regard tological keys is
Editor.
setting:

1 Expand the Schema Objects folder.

2 In the Schema Objectsfolder, select theTables folder.

3 In the Tables folder, double-click the desired table.

4 In the Logical Table Editor, on the Logical View tab, clear The key
specified isthe true keyforthe warehouse table check box.

This setting is enabled by default.

5 Click Save and Close.

6 Update the project schema.

The following image shows theLogical Table Editor with theoption to disable
the logical key setting.

Logical Table Editor


After disabling this setting, if you run the same report, the result set looks like
the following:

Report with All Fact Table Logical Keys on the Template (Logical Key
Not Set as True Data Warehouse Key)
Now, the SQL Engine knows that the logical key, which consists of Store,
fact
Customer,
table. Therefore,
and Date, does
even not
though
represent all of
all ofthe members
the attribute
of thecolumns
logical key
in the
are actual

present on the report template, the SQL Engine still aggregates the Sale_Amt
fact and groups by the attributes on the report. The SQL for this report looks
like the following:
select a11.[Store_ID] AS Store_ID,
max(a13.[Store_Desc]) AS Store_Desc,
a11.[Customer_ID] AS Customer_ID,
max(a12.[Customer_Name]) AS Customer_Name,
a11.[Date_ID] AS Date_ID,
sum(a11.[Sale_Amt])AS WJXBFS1
from [FACT_SALES] a11,
[LU_CUSTOMER] a12,
[LU_STORE] a13
where a11.[Customer_ID]= a12.[Customer_ID]
and a11.[Store_ID]=a13.[Store_ID]
group by a11.[Store_ID],
a11.[Customer_ID],
a11.[Date_ID]

Because the SQL statement aggregates the Sale_Amt fact and groups by the
attributes onthereport,the two records for the items that Ian Rey purchased
are aggregated and grouped together in the result set. Therefore, the report
displays a single, aggregated row for Ian Rey.
Lesson Summary

In this lesson,youlearned:

• The two primary types of schemas are snowflake and star schemas.

• A denormalized schema storesdata redundantly, while a normalized


schema does not have any data redundancy.

• A completely normalized snowflake schemadoesnot storeanydata


redundantly. It contains many relatively small tables. The child tables store
only
conjunction
snowflake
the IDoftheimmediate
schema
with higher-level
requires more
parent.
lookup
joins
However,
tables.
when querying
a completelynormalized
factdatain

• A
the
redundantly.
moderately
moderately
ID oftheimmediate
denormalized
Itcontains
denormalized
parent
many
snowflakeschema
snowflake
relatively
andall schemastores
higher-level
small
requires
tables.The
attributes.
somedata
fewerjoins
childHowever,
tables
whenstore
a

• A
querying
completelydenormalized
factdatain conjunction
snowflakeschema
with higher-level
stores
lookup
data redundantly
tables. to

the greatest degree possible. It contains many significantly larger tables.


• The
all
A
schema
with
model
star
higher-levelattributes.
child
higher-level
schemacontains
insteadof
requiresonlya
tablesstorethe
having
lookupsinglejoin
onlyone
separate
tables.
IDsanddescriptions
Thoughalookuptablefor
lookup
when
completely
querying
tablesofthe
for
denormalized
each
factdata
each
immediate
hierarchyinthe
attribute.
inconjunction
snowflake
parentand
It data

contains fewerbutvery largetables.It stores theIDs and descriptions of all


• to
the
recommended.
querying
Using
when
correctly
attributes
querying
stardata.
schemasin
aggregatefactdata
inahierarchy
factdata
Aggregatefact
conjunctionwith
regardless
inasingle
tables
andavoid
ofthe
mustjoin
table.
aggregate
multiple
attributelevelat
Itrequires
toafact
distinct
tablesisnot
onlya
which
listofattributes
single
youare
join

counting. Therefore, you


need individual lookup tables for higher-level attributes if you plan to use
aggregate fact tables.
• When designing your data warehouse schema, you should consider the
following factors to achieve optimal query performance: lookup table
volume, fact table volume, data type of ID columns, and fact table keys.

• Generally,
MicroStrategy
the recommended
isto completelyoptimalschema
denormalize thelookup
design fortables.
use with

• Lookup table volume is determined by the cardinality of the attributes and


• You
the amountofdescriptive
oftenwanttopartiallyinformationyou
or completely denormalize
storeabout high-volume
thoseattributes.
lookup

tables to reduce the number of joins that queries require and increase
• performance.
You can often normalize low-volume lookup tables because the performance

difference between using extra joins or increasing the size of the tables is
• not significant.
Fact table volumeis

amount ofinformation
determined by the number of records stored and the
(attribute IDs) you store for each record.

• In some cases, you may wanttodenormalize


characteristicattributes toreduce the numberfacttables
ofjoins betweenlarge
byincluding fact

and lookup tables.

• You should always create ID columns using integer, number, or date data
on
types.
textYou
fields,
shouldnot
and youusetext
cannottake
dataadvantageof
typesasittakes
indexes
longer
astoqualify
efficiently.or join

• Rather than defining a primary key index on fact tables, MicroStrategy


recommends that youdefine individual indexes onattributeID columns on
which
table indexes
userstypicallyfilter
most effectively.
reports.This strategy enables youtouse fact

• The logical key consists of any columns in a table for which you have
you
defined
SQL
whichEngineto
the
attributesin
template
optimize
contains
a project.
the SQL
allThelogical
ofitthe
generates
logical
keyenables
keys
whenever
for agiven
theMicroStrategy
run
fact
a table.
report in

• If you have undefined columns in a fact table, you may need to disable the
logical
Enginekeysetting
correctly aggregates
for thattableto
and groups
ensurethat
its factthe
datain
MicroStrategy
reports. SQL
3
LOGICAL VIEWS

Lesson Description

This lesson provides an introduction to logical views, a feature in MicroStrategy


Architect that enables you to create more complex attribute and fact expressions
and also provides an alternative solution for many of the data warehousing
challenges covered in this course.

In this lesson, you willlearnhow to create logical views in MicroStrategy


Architect and how to use them to resolve a variety of business scenarios.
Lesson Objectives

After completingthis lesson,youwillbeableto:

Describe the concept of logical views in MicroStrategy Architect and


createlogical views. Use logical views to define complex attribute and fact
expressionsand create distinct lists of elements for attributes in a star schema.

Architect.
•AfterDescribethe
completing thepurpose of logicalviews
topicsinthis in MicroStrategy
lesson,you willbe able to:

• Create logical views in MicroStrategy Architect. Use logical


views to define complex attribute and fact expressions that require
data from multipletables.
• Use logicalviews to create a distinct listof elements for
attributes in star schemas.
Logical Views in MicroStrategy Architect
ReviewofLogical Tables
TableAliases

Introduction to Logical Views


Logical View SQL
Logical Views vs. Database Views
Examples ofLogical ViewsUsage

After completing this topic,you will be able to:


of
Describe the purpose logical views in MicroStrategy Architect.

Review ofLogical Tables


Before learning about logical views, it is important to review theconceptof
logical tables in MicroStrategy Architect. When you add a data warehouse table
to a MicroStrategy project, you automatically create a logical table in the
metadata that corresponds to it. For example, if you add a LU_CUSTOMER
table to the project, you see a logical table called LU_CUSTOMER in the project
schema. Eachlogical table in aproject maps toaphysical table in the data
warehousedatabase:

Logical Tables
The physical table contains columns thatstore the actual data, and it resides
only in the data warehouse. The logical table resides in the metadata, and it
maps directly to the physical table. Rather than storing actual data, it shows
which attributes and facts map to columns in the physical table. It also stores
information about the structure of the physical table. The Logical Table Editor
in MicroStrategy Architect enables you to view information about a logical table:

Logical Table Editor—Logical Table


The Logical View tab displays the attributes and facts that are mapped to the
logical table, while the Physical View tab displays the structure of the physical
table to which it points, including the name and data type of each column.

Table Aliases
You can also create a logical table in MicroStrategy Architect by creating an
project,
all
have
explicit
pointtothe
a LU_DATE
table
you automatically
alias.A
same
table in yourdata
physical
table createa
alias
table
enablesyou
LU_DATE
inthedata
warehouse.
tological
warehouse.
createmultiple
Whentable.
youForexample,
add
However,
logical
this table
you
tables
you
to
have
athat

multiple attributes such as Date, Ship Date, and Order Date that all use this
same lookup table. You can map the Date attribute to the original logical table.
Then, you can create table aliases that point to the LU_DATE table to map the
Ship Date and Order Date attributes:

Table Aliases

In this example, the LU_DATE logical table maps to the physical LU_DATE
table in the data warehouse. The two table aliases, ALIAS_O_DATE and
ALIAS_S_DATE, alsopoint tothe LU_DATE table. All three of these logical
tables use the same physical table to support different attributes in the project.

You will learn how to use table aliases later in the course.

As with any other logical table, you can use the Logical Table Editor to view the
attributes
physical table
or factsthatare
to which itismapped
associated.
toatable aliasaswell asthe structure ofthe

Introduction to Logical Views


You can also use the Logical Table Editor to create a third type of logical table
warehouse,alogical
called a logical view. Rather
viewis thanmapping
aSQL query thatyou
to a physical
createtable
and then
intheexecute
data

against tablesin the data warehouse. For example, you store order and shipping
informationin different tables in the data warehouse, LU_ORDER and
LU_SHIPMENT. You want to calculate the processing time between the order
dates in the LU_ORDER table and the ship dates in the LU_SHIPMENT table
to analyze the processing time for each order. You can create a logical view that
performs these functions:

LogicalViews

In this example, the LVW_PROCESSING logical table is a logical view that is


created by runninga SQL query against the LU_ORDER and LU_SHIPMENT
tables in the data warehouse. It does not map directly to any physical table.

When you define the logical view, you write a SQL statement that selects the
pertinent information from the LU_ORDER and LU_SHIPMENT tables and
then calculates the difference between the order and ship dates to determine the
processing time. The following image shows an example of what this logical
viewLogical
might look
Table
likeinthe
Editor—Logical
Logical TableEditor:
View
Notice that the logicalviewcontains a SQL statement and definitions for each
column that is created as part of the logical view. You can then map attributes
and facts to these columns. You will learn more about creating logical views
later in this lesson.

Logical View SQL


existing
Logical views
asviews
perform
in theadata
similar
warehouse,
functionthey
todatabase
exist only
views,
within
butratherthan
the application as

schema objects inthe metadata.

When you execute a reportthat contains objects that map to a logical view, the
SQL for the logical view is inserted into the report SQL where the table name
would normally occur. The SQL is inserted as either a derived table expression
or a common table expression, depending on your data warehouse database
platform.
The SQL Engine generates logical views as either derived or common table

expressions
VLDBproperty.based onthe defaultvalue for theIntermediate Table Type
In general, the SQL Engine inserts the logical view SQL as
a common table expression for some versions of IBM DB2®. For other
Derived Table
databases, Expressions
it inserts itasa derivedtable expression.

A derived table expression is a SELECT statement that appears in the FROM


clause of a query. The syntax for a derived table expression looks like the
following:
select ColA, ColB, ColD
from (select ColA, ColB, SUM(ColC)as ColD
from TableA
group by ColA, ColB)
where ColD> 1000

In this example, noticethat theouter FROM clause contains a SELECT


statement that extracts information from a table. This result is then used in the
outer query.

Common Table Expressions

A common table expression is a SELECT statement that appears in the WITH


clause of a query. The syntax for a common table expression looks like the
following:
with CTE1as (select ColA, ColB,SUM(ColC) as
ColD
from TableA
group by ColA, ColB)
select ColA, ColB, ColD
from CTE1
where ColD > 1000

In this example, notice that the WITH clause contains a SELECT statement that
extracts information from a table. This result is then used in the outer query.

Differences Between Derived and Common Table


Expressions

Besides the obvious syntax differences, derived and common table expressions
support different features.

Commontableexpressions
multiple timeswithina enableyouto referencethe same expression
query. For example, notice that the following query
referencesthesamecommon table expression two different times:
with CTE1 as (select ColA, ColB, SUM(ColC) as
ColD
from TableA
group by ColA, ColB)
from
selectCTE1
ColA, ColB, ColD

where ColD > 1000


select ColA, ColB, ColD
from CTE1
where ColA = 32

In this example, the FROM clauses of both the SELECT statements that follow
the WITH clause (indicated in bold) reference the common table expression
contained in the WITH clause.

If
insert
you constructthis
theexpressionsame
each queryusinga
timeyou wantto
derivedtable expression, you haveto
use it:
select ColA, ColB, ColD
from (select ColA, ColB, SUM(ColC) asColD
from TableA
group by ColA, ColB)
where
selectColD
ColA,
>1000
ColB,
ColD
from (select ColA, ColB, SUM(ColC) as ColD
from TableA
group by ColA, ColB)

where ColA = 32

When youuse aderived table expression, you mustrepeatthe entire expression


for each of the SELECT statements.

As you can see, an advantage of common table expressions is that they provide
the flexibilityto reference thesame expression multiple times in a SQL query.

Conversely, derivedtable expressions offer the ability to nest one expression


inside another. In other words, one expression can actually contain another
expression. You cannot nest a common table expression inside another one,
WITH
meaningthat
clause.the WITH clauseforone expressioncannot contain another

Using Derivedand Common Table Expressionsin Logical


Views

Because youcantype anySQL syntax whendefininga logical view, youcan


actually use derived and common table expressions in the logical view SQL
itself. Though, you may run into problems if your logical view SQL contains a
common table expression.

If your project
Engine generates
runsagainst
logical viewsascommon
a data warehouse platformfor which the SQL
table expressions in the report SQL,
the query results in invalid SQLsyntax because you cannot nest one common
table expression (the one that defines the logical view) inside another common
if
table
Even
Engine
you maystill
expressions
expression
your
generates
project
encounter
(theone
tobenested
logical
runsproblems.
views
against
generatedtouse
inside
asderived
adata
Some
of derivedtable
warehouse
databasesalso
table
thelogical
expressions
platform
expressions.
viewinthe
do notallow
inthe
for which
reportSQL).
report
Again,
common
theSQL
SQL,
inthis

case, the query results in invalid SQL syntax because you cannot nest the
common table expression (the one that defines the logical view) inside a derived
Logical Views
table expression vs. Database
(the onegenerated to usetheViews
logicalview inthereport SQL).

Logical views aresimplyan alternative tousing database views. Most functions


that you can performusing database views,youcan also accomplish using
logical views. The primary difference between them is that you create and
maintain logical views at the application level rather than the database level.

Logical views are stored as logical tables in the MicroStrategy metadata. They
do not storeresultsof reports,only the SQL statementandcolumn definitions
used toretrievethe data forthe logical view anddisplayitin reports.
see
For information
“Creating Logicalon
Views”.
the individual components of logical views,

logical
is
Given
littlethesimilarity
views
orno carry
difference
the
in same
characteristics
between
advantages
thembetweenlogical
inand
termsofperformance.
limitations
anddatabase
as a database
Generally,
views,
view.there

Oneprimary
cannot build indexes
difference
onbetween
logical view
logicaland
columns.
database
Of course,
views
logical
is thatyou
viewsdo
take advantage of indexes on the underlying data warehouse tables that
they access.

When offer
views you are
yetanother
consideringhow
possibleto resolvea modeling or schemaissue, logical
solution that you can explore. Depending on the
nature of your data and the characteristics of your reporting environment, you
have to weigh whether an application-level solution (such as logical views) or a
Examples
database-level of Logical
makes themostsolution
sense and deliversViews
(suchaschanging
thebesttheUsage
performance.
structureof physicaltables)

• Create complex attributes and facts—mapping attribute form expressions


and fact expressions that span multiple tables.

• Distinct lookup tables for star schemas—effective use of aggregate fact


tablesandefficient element browsing,for example,when populatingan
elementlistprompt.

• relationships
Slowly
sales representatives
changing
thatdimensions
change
switchdistricts
over
(SCDs)—track
time. For
overtime.
example
and analyzeattribute
inasalesorganization

• calculations,
database
to-date”
Time trendanalysis—
or“year-to-date”.
thatyou
mapneedto
each
todateto
perform
define
Ifyouallthe
transformations
month-to-date
previousdatesthat
andyear-to-date
using tables
makeup“month-
inthe

do not already have such a table in the


warehouse and you cannot add additional tables to the database, then you
• Day
Outerattribute.
can use
joins
thelogical
betweenattributes—ability
viewsas longasyoualready
to generatean
havealookup
outer join
table forthe

between
attribute lookup tables for a report that contains only attributes (that is, no
metrics).

• Recursive
logical
You will
views,one
learnmore
Hierarchies—split
foreach
aboutlevel
starschemas,
theinthe
recursive
hierarchy.
SCDs,
hierarchy
andrecursive
tableintoseveral
hierarchies
later in the course.
Creating Logical Views
UsingLogicalViewsto CreateComplexAttribute andFact Expressions
CreatingaNewLogical Table
Defining the SQL Statement
Defining the Columns
to
After completing
Mapping Logical
thistopic,
View you
Columns
will be ableto:
Attributes and Facts

Create logical viewsin MicroStrategy Architect.Use logicalviews to define


complex attribute and fact expressions that require data from multiple tables.

Now that you understand the purpose of logical views and how they work, you
are readyto learnhowto create alogical view in MicroStrategy Architect.

Using Logical Views to Create Complex Attribute


and Fact Expressions
One of the many applications for logical views is the ability to create complex
attribute and fact expressions using multiple tables. In the MicroStrategy
Architect: Project Design Essentials course, you learned how to create attribute
form expressions and fact expressions that use multiple columns, but all of the
columns had to come from the same table. Because logical views enable you to
and
define
factexpressions
aSQLstatement, thatrequire
youcanusecolumns
themtocreate
from different
attributeform
tables. expressions

The example provided earlier inthislessonto explain logical views involves a


scenario in which order and shipping information is stored in different tables.
as
Thewell
Logical
following
astheViews—Complex
desired
image shows
reporttheschemaand
displayforthe
Fact Expression
information:
logical data modelfor this example,
In this example, information about when an order is placed and when it is
shipped is stored in two different tables—LU_ORDER and LU_SHIPMENT.
The schema also contains customer and shipper information relatedtoeach
order. Looking at the logical data model, you can see that all of the attributesare
related to the Order attribute.

The desired report display is what causes the complexity. This report contains
not only the attributes from the data model, but it also contains a Processing
expression
Time
the
ship
in
that
two
Ship_Date
calculates
dateforan
metric.
differenttables.
requires
Processing
the
columnis
order. you
difference
that
Therefore,
The
timeisthenumber
in Order_Date
the
combinetwo
between
LU_SHIPMENT
the Processing
thesetwo
columnis
ofdays
dates.However,thesedates
table.
Timemetricrequiresafact
inbetween
the
Creating
LU_ORDER
the this
orderfact
table,and
date and
are

columns from different tables.

You cannot create this fact through the Fact Editor but you can calculate the
processing
Then
You will
youlearn how the
canmap
timeand tocreatea
createalogical
facttothis
Processing_Time
column.
viewusingcolumnas
this business
partof
scenario.
alogicalview.
There

are foursteps involved in creating thelogicalview:


1 Create a new logical table.

2 Define the SQL statement for the logical view.

3 Define the columns in the logical view.

4 Map the columns in the logical view to the appropriate attributes and facts.

You willlearn about otherexamples for using logicalviews later in this


lesson and in the course.

Creating a New Logical Table


When you create a logical view, you manually create a new logical table in your
project.

To create a new logical table:

1 ExpandtheSchema Objects folder.

2 In theSchema Objects folder,select the Tables folder.

3 In the Tables folder, right-click a blank area of the Object Viewer, point to
New, and select Logical Table.

OR

On the Filemenu, point toNewand select LogicalTable.


of
This action opens a blank instance the Logical Table Editor so that you can
create a logical view. The followingimage shows theLogical TableEditor:

Logical Table Editor


The Logical Table Editor automatically opens to display the Physical View tab.
Using this tab, youcan definethe logical view. This tab includes the following
components:

• SQL Statement Pane—You use thispanetodefinetheSQLforthe logical


view.

• Mapping Pane—You use this pane to define each of the columns contained
in the logical view.

• Object Browser—You use this browser to find table and columns names
that you want to include in the logical view SQL or that you want to define
as column objects in the logical view.

In addition, the Physical View tab enables you to choose whether the logical
view’s SQL expression shouldbe enclosed inparentheses in the report’s FROM
clause by selecting the Enclose SQL expression in parentheses check box.
This willlearn
You option isenabled
how to useeachof
by default.these components asyou learn howtodefine a

logical view.

Defining the SQL Statement


The second step in creating the logical view is to define the SQL statement that
selects the information you want to include in the view. A logical view can only
contain a single SQL statement. The SQL statement can have multiple levels
(such as nested derived table expressions), but you cannot create multiple SQL
statements.

To define the SQL statement for a logical view:

1 In
pane,
theLogical
type the TableEditor,on
SQL statement that
the you
Physical
wantView
to usetab,in
to define
the SQL
the logical
Statement
view.

When
linesofSQLas
you typethe SQLstatement, you canpressCTRL +TAB to indent
needed. This format makes the finished SQL statement
easier to read.

You can usethe Object Browser totheleftof the SQL Statement pane to add
table or column names to your SQL statement. By default, the Object Browser
displays all of the tables that are in the project warehouse catalog. You can then
expand individual tablesto viewthe columns in each table.

To add tables or column names to the SQL statement using the Object
Browser:

1 In the Logical Table Editor, on the Physical View tab, in the SQL Statement
pane, position the cursor at the point where you want to insert a table or
column name.

2 If you want to insert a table name, in the Object Browser, select the table
and drag ittothe SQL Statement pane.

OR

If you want toinsert a column name, intheObject Browser, expand the


table that contains the column.

Select the column name and drag it to the SQL Statement pane.

You
into still
yourneed
statement.
totypeUsingtheObject
necessary SQL commands,
Browserkeeps
tablealiases, and operators
you from having to type
the column and table names, which comprise the bulk of the statement.

The following image shows the SQL statement for the logical view:

Logical View SQL Statement


This SQL statement doesthe following:

• Selects the order, order date, and customer information from the
LU_ORDER table

• Selects the ship date and shipper information from the LU_SHIPMENT
table

• Calculates the difference between the ship dates in the LU_SHIPMENT


table and the order dates in the LU_ORDER table

• Joinsthe tables inthequery based on theOrderattribute

shipping
This SQLinformationaswell
statement creates alogical
asthe view thathas all ofthenecessary orderand
processing time.
Defining the Columns
The nextstepincreating the logicalviewistodefine each of the columns that
are partofthelogical view. You mustdefineacolumn object for every column
referenced in the SELECT clause of the SQL statement. The column object
names must exactly match the column names used in the SQL, but you can
define
you reference
You can
the
define
columns
them
column
in
inthe
any
objects
SQL
order.You
inoneof
statement.
donot
two needto
ways.If follow
your logical
the orderin
view contains
which a

column that already existsin other logical tables, you can simply locate that
column
logical view
inthe
createsa
Object Browser
column that
anddoes
dragnotexist
ittothe logical
inotherview definition. If your
logical tables, you have
to manually add the column to the logical view definition.

To define columns for a logical viewusing existing columns:

1
2 In
Select
expandthe
theLogical
thecolumn
table
Table
thatcontains
and
Editor,
drag it
onthePhysical
tothecolumn
the Mapping
that you want
Viewtab,
pane. in the
touse.
ObjectBrowser,

This step automatically populates thegrid withthe selected columnand its


properties (data type, precision, and scale). You can change the properties
for a column object. Although doing so changes the properties for that
To define
columncolumns
foreveryfor
logical
a logical
tablein
viewwhichit
by manually
occurs.adding columns:

1 In the Logical Table Editor,on the Physical View tab, besidethe Mapping
pane, click Add.

This action addsa newrowinthe Mapping pane.

2 In the Column Object box, type the nameof the column.

The column name must match the name used in the SQL statement.

3 In the Data Type drop-down list, select the data type for the column.
4 In the Precision/Length box, type the length for the column.

The Precision/Length property does not apply to all data types.

5 In the Scale box, type the scale for the column.

The Scale property does not apply to all data types.

The following image shows the columns defined for the logical view:

Logical View Columns

in
Notice
the SQL
thatstatement.
there is a column
Allof the
in the
columns,
Mappingpaneforeachcolumnreferenced
with the exception of
Processing_Time, are existing columns that you can drag and drop from other
logical tables. You have to add Processing_Time manually as this column exists
only in this logical view.

Withcannow
You theSQLsavethe
andcolumns
logicaldefined,
view. you have finishedcreatingthe logicalview.
When you do, it is added as a new logical
table inthe Tables folder of your project.

Mapping Logical View Columns to Attributes and


Facts
After you have created the logical view, the last step in configuring it for use is to
map the columns of the logical view to the appropriate attributes and facts. The
LogicalView
logical viewthatyoucreated hasthefollowing structure:

Structure

Even though the logical viewisnota physicaltable,this image shows its


column structure for the purpose of understandingwhich attributes and
facts you need to map to thelogical view.

LVW_PROCESSING, the logical table that representsthelogical view, contains


columns that you need to map tothe Order, Order Date, Customer, Ship Date,
and Shipper attributes. All of these attributes already exist because they also
map to columns in other logical tables. You also need to create a Processing
Time fact to map to the Processing_Time column, which exists only in the
LVW_PROCESSING table.
Mapping the Attribute Columns

In this example, because all oftheattribute columnsin the logical view are
defined usingexisting columns fromother logicaltables, mapping the attributes
is very simple.

In the Logical TableEditor, when you define a logical view column by dragging
view
an existingcolumn
automatically appears
from another logicaltable intothe Mapping pane, thelogical
as a potential source table for the attribute or fact
that maps to that existing column.

The following image illustrates thisbehavior for the Order attribute:

Mapping AttributeColumns

When you create thelogical view, if you define the Order_ID column using the
existing Order_ID column in the LU_ORDER table, LVW_PROCESSING
automatically appears as a potential source table for the ID form of the Order
attribute. All you need to do to map the Order ID form to the logical view is to
select LVW_PROCESSING as a source table.

If you have a logical view column that maps to an existing attribute or fact
butuses adifferentcolumnname than those already defined for the object,
you have to create a new expression to map the logical view column just as
following
To map
youthe
would
steps:
attributes
for anytoheterogeneous
the LVW_PROCESSING
column expression.
table, you need to performthe

1 Open the project in Architect.

2 Edit
a sourcetable
theID form
forthe
ofeach attribute so thatLVW_PROCESSINGis selected as
attribute form expression.

Mapping the Fact Column

In this example, the Processing_Time factcolumn isunique tothelogical view.


Therefore, Processing Time does not exist as a fact in the project.

To map the fact column in the LVW_PROCESSING table, you need to perform
the following steps:

1 Createanew fact called Processing Time.

2 Map itsfact expression


LVW_PROCESSING table.
to theProcessing_Timecolumn inthe

The following image shows the definition for the Processing Time fact:
Fact
ProcessingTime
After mapping the attribute and fact columns, you can then update the schema.
You can create a Processing Time metric that aggregates the Processing Time
fact and then design a report to display the desired information from the logical
view. The result set looks like the following:

Processing Time Report


Notice that the Processing Time metric displays a number for each order that is
equal to the difference between the orderand ship dates.

The SQL for this reportlooks likethe following:


select a11.[Customer_ID] AS Customer_ID,
a12.[Customer_Name] AS Customer_Name,
a11.[Order_ID] AS Order_ID,
a11.[Order_Date] AS Order_Date,
a11.[Shipper_ID]
a11.[Ship_Date] AS
ASShipper_ID,
Ship_Date,
a13.[Shipper_Name] AS Shipper_Name,
a11.[Processing_Time] AS WJXBFS1
from (Selecta.Order_ID, a.Order_Date,
a.Customer_ID,b.Ship_Date, b.Shipper_ID,
b.Ship_Date ­ a.Order_Date as
Processing_Time
From LU_ORDERa,LU_SHIPMENT
Where
[LU_CUSTOMER]
a.Order_ID=
a12, b.Order_ID)a11,
b

[LU_SHIPPER] a13
where a11.[Customer_ID] = a12.[Customer_ID]
and a11.[Shipper_ID]= a13.[Shipper_ID]

The FROMclausecontains a derived table expression where you would


normally see a physical table name. This expression is the SQL statement for the
logical viewthatselectsthe order and shipping information and calculates the
processing time.
Logical Views and Star Schemas
CreatingtheLogical ViewforRegion
ViewforDistrict

Mapping the Attributes to the Logical Views

Afterlogical
Use viewstocreate
completing thistopic, ayou
distinctlist of elements for attributes
willbe ableto:

in star
schemas.

In the Advanced Schema Design lesson, you learned that star schemas, which
contain a single lookup table for each dimension, can be problematic. Joining
aggregate facttables to dimension lookup tablesat any level otherthan the
lowest level results in multiple counting. If youuse a star schema, you need to
include thehigher-level lookup tables to use aggregate fact tables.

Logical views provide an alternative to creating physical tables for the higher-
level
informationfor
logical
lookup
viewscan
tables.Instead,
particular
then function
attribute
youcan
as the
levels
definelogical
lookuptables
from theviews
dimensiontable.
for higher-level
that selectthe
attributes.
These

of
Forthe
example,
geography
youdata
haveasfollows:
asingle lookuptable in yourstar schema that stores all

Dimension Table

In this example, the LU_GEOGRAPHY table stores the data for all levels of the
Geography dimension—Store, District, and Region. Store_ID is the primary key,
so the table alreadycontains a unique list of stores. However, it does not contain
a distinct list of districts or regions.
You have an aggregate fact table that you want to use in conjunction with this
dimension lookup table:

Aggregate Fact Table

The FACT_REGION_SALES table storessalesat the region and date level.


Though, when you join this aggregate fact table to the LU_GEOGRAPHY table,
the join occurs on the Region_ID column. This column does not contain a
distinct list of regions since Region_ID is not part of the key for the
LU_GEOGRAPHY table. If you run a report that joins these two tables, the
result set looks like the following:

Result Set Using Dimension Table

If you look at the sample data for the FACT_REGION_SALES table, noticethat
it does not match the values displayed forthe Sales metric inthereport.
Because the FACT_REGION_SALES table does not join to adistinctlistof
regions, the sales values are counted multiple times—once for each occurrence
of a corresponding region ID in the LU_GEOGRAPHY table.

The SQL for this report looks like the following:


select a11.[Region_ID] AS Region_ID,
max(a12.[Region_Desc]) AS Region_Desc,
sum(a11.[Sales]) AS WJXBFS1
from [FACT_REGION_SALES]a11,
where
[LU_GEOGRAPHY]a12
a11.[Region_ID]= a12.[Region_ID]

group by a11.[Region_ID]

The FROM clausecontains the FACT_REGION_SALES and LU_GEOGRAPHY


tables. TheWHEREclause joins these two tables based on the Region_ID
column. Because thiscolumnisnotunique in the LU_GEOGRAPHYtable, this
join results in multiplecounting.

This same behavior would alsooccurifyou tried to use an aggregate fact table at
the district level. With only the dimension table in place, aggregate fact tables at
the
list
One
attributes
of
region
method
region
is
anddistrict
tobuild
of
and
creating
districtelements.
two
levels
alogicalviews
distinct
areuseless
listof
(onefor
elements
because theyneedtojointoadistinct
for theRegion and District

each attribute) to serve as lookup


tables. These logical views are based on the LU_GEOGRAPHY table.

Depending
logical
data without
viewfor
onusing
yourreporting
theStore
thedimension
attributeso
requirements,
lookup
that table.
youcanjoin
you mayalso
storedata
choose todistrict
to build a

To create distinct lookup tables for the District and Region attributes using
logical views, you need to do the following:

1 Createalogical viewtable.
LU_GEOGRAPHY that selects thenecessary district information fromthe

2 Create a logical view that selects the necessary region information from the
LU_GEOGRAPHYtable.
3 Map the ID and DESC forms for the District and Region attributes to the
appropriate columns in the logical views.

Creating the Logical View for District


The logical view for the District attribute has the following structure:

LogicalView—District

LVW_DISTRICT contains the District_ID, District_Desc, and Region_ID


columns from the LU_GEOGRAPHY table. The District_ID and District_Desc
columns provide a distinct list ofdistrict elements. The Region_ID column
maps the relationship between regions and districts, enabling you to join
district data to region data using thelogical view.

The SQL statement and column definitions for LVW_DISTRICT look like the
following:

Logical View SQL and Column Definitions—District


The SQL statement selects a distinct list of the district IDs and descriptions and
the region IDs from the LU_GEOGRAPHY table. You define the three columns
for the logical view using the existing columns from the LU_GEOGRAPHY
table.

Creating the Logical View for Region


The logical view fortheRegion attribute has the following structure:

Logical View—Region
LVW_REGION contains the Region_ID and Region_Desc columns fromthe
LU_GEOGRAPHY table. The Region_ID and Region_Desc columns provide a
distinct list of region elements.

The SQL statement and column definitions for LVW_REGION look like the
following:

LogicalViewSQLand Column Definitions—Region


The SQL statement selects a distinct list of the region IDs and descriptions from
the LU_GEOGRAPHY table. You define the two columns for the logical view
using the existing columns from the LU_GEOGRAPHY table.

Mapping the Attributes to the Logical Views


After you create both logical views, you need to map the Region and District
attributes to the appropriate columns in the logical views.

To map theDistrict attribute, you need to perform the following steps:

1
2 Openthe
Modify the
District
ID formattributein
so that LVW_DISTRICT
theAttribute Editor.
is selected as a source table for

the attribute form expression. Set LVW_DISTRICT as the primary lookup


table.
so
3 Modify the DESC form that LVW_DISTRICT is selectedas a source table
for the attribute form expression. Set LVW_DISTRICT asthe primary
lookup table.
you
To map the Region attribute, need toperform thefollowing steps:

1 Open theRegion attribute in the Attribute Editor.


as
2 LVW_REGION
Modify
selectedtheID
source
form
asthe
tablesfor
so primary
that LVW_DISTRICT
thelookuptable.
attribute form
andLVW_REGIONare
expression. Set

3 Modify the DESC form so that LVW_REGION is selected as a source table


for the attribute form expression. Set LVW_REGION as theprimary lookup
table.

4 On the Children tab, select LVW_DISTRICT as therelationship table that is


used to relate the Region attribute to the District attribute.
If you leave LU_GEOGRAPHY as the relationship table for the Region and
District attributes, multiple counting occurs if you use both attributes on a
report in conjunction with an aggregate fact table.

After
updateyou
themaptheDistrict
schema and thenusethese
andRegion attributes
logical views
tothelogical
tojointo aggregate
views, you
fact
can

tables. Now, ifyou run the samereport against the FACT_REGION_SALES


table, the result set looks like the following:

Result Set Using Logical View

If you look at the sample data for the FACT_REGION_SALES table, notice that
it now matches the values displayed for the Sales metric inthe report. Because
the FACT_REGION_SALES table joins to a distinct list ofregions in
LVW_REGION, the sales values are accurately aggregated.

The SQL for this report looks like the following:


select a11.[Region_ID] AS Region_ID,
max(a12.[Region_Desc]) AS Region_Desc,
sum(a11.[Sales])
from
(Select
[FACT_REGION_SALES]
distinctRegion_ID,
AS WJXBFS1
a11,
Region_Desc

From LU_GEOGRAPHY) a12


where a11.[Region_ID] = a12.[Region_ID]
group by a11.[Region_ID]

The FROM clause contains a derived table expression. This expression is the
SQL statement for LVW_REGION.TheWHERE clause joins the
FACT_REGION_SALES table to the region information in LVW_REGION.
Lesson Summary

In this lesson,youlearned:

• When you add a data warehouse table to a MicroStrategy project, you


automatically create a corresponding logical table in the metadata.

• A physical table stores the actual data and resides in the data warehouse.

• attributes
also
A logical
stores
table
andfactsthat
information
resides inabout
mapto
the metadata
the
columnsinitsassociated
structureofthe
andstores information
physicalphysical
table. table. It
aboutthe

• A table aliasenables youto create multiple logical tables that allpoint to the
same physical table in the data warehouse.

• A logical viewisa SQL query that you create and then execute against tables
in
column
the datawarehouse.
that iscreated aspartofthe
It contains a SQL
logical
statement
view. and definitions foreach

• the
WhenSQL
you
forexecute
the logical
a report
viewthat
is inserted
contains
into
objects
the report
that maptoa
SQL where
logical
the table
view,

name would normallyoccur. The SQLis inserted aseithera derived table


expression or a common table expression, depending on your data
• warehouse
A
FROM
derived
clauseof
table
database
expression
aquery.
platform.
isaSELECT statement that appears inthe

• A common table expression is a SELECT statement that appearsin the


WITH clause of a query.

• derived
common
You cantable
reference
tableexpression
expressions. in a query. Youdo
the sameexpression multiple times when
nothavethis flexibilitywith
youusea

• You can nest a derived table expression inside another derived table
expression inaquery. You donothavethis flexibilitywith commontable
expressions.
• The primary difference between logical views and database views is that you
create and maintain logical views at the application level rather than the
database level.

• You can use logical viewstocreate complex attribute form expressions and
fact expressions using multiple tables.

• A logical view can only contain a single SQL statement. The SQL statement
• When
can have
youmultiple
createalogical
levels (such
view,you
as nested
mustdefine
expressions).
acolumn

object for every


column referenced in the SELECT clause of the SQL statement. The column
but
names
youmustexactly
can define thecolumns
match the column
inany order.
names usedinthe SQL statement,

• You can also use logical views to address SCDs, create lookup tables to
handle star schemas, remove recursive hierarchies, anda varietyof other
reporting issues.
4
MANY­TO­MANY RELATIONSHIPS

Lesson Description

This lesson describes the concept of many-to-many relationships.

In this lesson, you will learn about many-to-many relationships and their
impact on report analysis. You will learn how to design the data warehouse
model and schema tobestsupport them in the MicroStrategy reporting
environment.
Lesson Objectives

After completingthis lesson,youwillbeableto:

Describe advanced data modeling concepts and explain how to design the
datawarehouse model and schema to support them in a MicroStrategy project.

After completing thetopicsin this lesson, you will be able to:


• Describe the
relationships andchallenges
explain thecausedby
four methods
many­to­many
for implementing them

in a MicroStrategy project.
Many­to­Many Relationships
ReviewofAttributeRelationshipsExamplesofMany­to­ManyRelationships

Challenges of Many­to­Many Relationships


Creating a Separate Relationship Table
Creating a Compound Child Attribute
Creating aHidden Common CompoundChildAttribute
Creating aCommon Child Attribute

After completing thistopic, youwillbeable to:

Describe the challenges caused by many-to-many relationships and explain the


four methods forimplementing them ina MicroStrategy project.

Review of Attribute Relationships


Attribute relationships are essentialtothestructure of a logicaldatamodelas
they are the key to joining data from different attributes and aggregating fact
data to different levels. Attribute relationships describe logical associations
between attributes that exist based on business rules.

Attributes are related to each other in one of the following ways:

• Direct—A parent-child relationship exists betweentwoormoreattributes.


In the data warehouse, direct relationships are explicitly defined using
• facts.In
tables.
Indirect—Two
lookuptables
the datawarehouse,
ordistinctrelationship
or more attributesare
indirectrelationships
tables.
related only througha
arestoredfactorsetof
onlyinfact

The following logical data model shows a variety of direct and indirect attribute
relationships:

Direct andIndirect Attribute Relationships


For example, Month and Date are attributes that have a direct relationship.
Each month is directly related to specific dates. However, Customer and Date
are attributes that have an indirect relationship. Without facts, these attributes
are not inherently related, but if customers make purchases on specific dates,
you can relate these two attributes through the facts that capture those sales. In
this logical data model, you can relate Customer and Date using the Revenue,
Quantity Sold, or Profit facts.

•Directly
One-to-one
relatedattributes have oneof the following typesof relationships:

• One-to-many

• Many-to-many

The following illustration showsexamples of these three relationship types:

Three Types of Direct Attribute Relationships


The ways in which you model direct attribute relationships affect both the
design of hierarchies in the logical data model and the structure of the data
warehouse schema.

Attributes that have one-to-one or one-to-many relationships are very


They
straightforward.
tables.
requirea
However,
separaterelationship
You
attributes
caneasily
withrelate
many-to-many
table
thesetomap
typesof
relationships
theattributes
parent-child
are
usingonlylookup
morecomplex.
relationship.

This tablerelatesthe lookup tables of the respective attributes.

Examplesof Many­to­Many Relationships


• teacher
Studentwillhave
andTeacher—a
many students.
student willhaveoneor more teachers. A specific

• Doctor and Patient—in a hosiptal a doctor will be assigned to any number of


patients. A specific patient may be assigned to one or more doctors.

• Course and Student—a college offers many courses and a specific course can
be takenbyoneormore students.

• Product and Invoice—a product can appear on many invoices and an invoice
can havemany products.

Challenges of Many­to­Many Relationships


Because many-to-many relationships requiredistinct relationship tables, you
have to design the logical data model and data warehouse schema in such a way
that you can accurately analyze the relationship in regard to any relevant fact
data.

adequately
If the structure of your logicaldata modeland data warehouseschema doesnot
address the complexities of querying attribute data that contains
many-to-many relationships, you can have the following problems:

• Lost analytical capability


Multiplecounting

Lost Analytical Capability

When you have attributes with many-to-many relationships, you need to design
the logical data model and data warehouse schema to ensure that users can
answer any relevant questions about the data. Otherwise, users may lose the
ability to analyze certain business scenarios.

In the following example,the ColorandItem attributes have a many-to-many


relationship:

Attributes with a Many­to­Many Relationship


The illustration above shows the primary key for each lookup and
relationship table in bold type. For any illustrations in this course that
show table structure, if a primary key exists, it is always indicated in bold
type.

An item can come inmultiple colors, and the samecolor can applyto multiple
items. The LU_COLOR and LU_ITEM tables storea distinct listof all colors
and items respectively. The REL_ITEM_COLOR table enables you to join data
in the two lookup tables, mapping the relationships between items and colors.
The FACT_ITEM_SALES table storessales byitem and date.

In analyzing this relationship,users may wanttoknow answers to the following


1 In what colorsarethevariousitems available?
questions:

2 How much of a particular item and color combination was sold?

Answeringthefirst question requires a table that contains a list of all possible


item and color combinations. With many-to-many relationships, the distinct
relationship table is what you use to map the relationship betweentwo
attributes. Therefore, the REL_ITEM_COLOR table contains arowfor every
possible item and color combination. The information in this table is sufficient
in
available.
to answer this first question. You can determine the colors which anitemis

items
Answeringthesecond
along
only
which
second
The presenceofthe
containsthe
are
withbothitem
itemssell,
question.
available,
not
This
Item_ID
not
REL_ITEM_COLOR
the
andcolor
relationshiptableonly
question
whichitemand
color
column,
oftheitems
information.TheFACT_ITEM_SALES
requires
notColor_ID.
color
atableisnot
fact
that
combinationsareactually
lets
table
sell.
you
Therefore,
Because
thathas
sufficient
analyzeinwhich
the
sales
itonlystores
toanswer
color
information
colors
table
sold.
the

information isnot part of the facttable,you cannot answerthis second business


question.

you
have
If youtowant
captureboththe
to be able to analyze
item andcolordataforsalesinyour
thesalesofitemandcolor combinations,
source system.

Thencolor
and youhave
information.The
tomodifythefollowing
FACT_ITEM_SALES
illustration shows
tablethesame
toinclude
scenario,but
both item
the FACT_ITEM_SALES table now contains both the Item_ID and Color_ID
columns:

Many­to­ManyRelationship—Both Attributes inFact Table

The fact table alone is not sufficient to answer the first business question.
You can only retrieve item and color combinations that actually sold from
the FACT_ITEM_SALES table. If you have item and color combinations
that are available but have never sold, those item and color combinations
are not present in the fact table.

To ensure that you do not lose analytical flexibility when dealing with many-to-
many attribute relationships, you need the following tables in your data
warehouse schema:

• A
of table
elements
that relates the attributes, identifying all the possiblecombinations

• A
parent
fact tablewith
and child columnsthat
attributes enable you toaccurately jointoboth the

This same structure has to be in place for any fact table that contains facts
you wantto analyze with respect tothis particular attributerelationship.
Multiple Counting

Lost analytical capability isnotthe only issue that can arise when dealing with
many-to-many relationships.You can also experience problems with multiple
counting in certain scenarios. If you try to aggregate data to the level of the
parent attribute inthe many-to-many relationship (or any attribute level above
the
distinct
The
table
parent),
following
structure,
relationshiptable
multiple
illustration
wherecountingcanoccur
onlythe
shows
butnotinthefact
Item
the attributeispresent
ColorandItem
whentable.
the relationship
scenariowiththe
inthefact
exists
table:
ina
original

Colorand Item—Original Table Structure

To understand how multiple counting can occur, consider the following sample
data:

Color and Item—Sample Data


In this example, there are three items—hats, dresses, and socks. Thereare also
three colors—red, blue, and green. Based on the data in the
REL_ITEM_COLOR table, hats and dresses are available in all three colors,
while socks are available only in blue and green.

The illustration above shows the attribute descriptions in parentheses


beside the IDs in the Color_ID and Item_ID columns of the tables. These
descriptions are not actually part of the ID columns in the physical tables.
IDin
make
in
Forparenthesesinsidean
anyof
it easierto
theillustrations
understand column,
this course,
these
inthe
whendescriptions
descriptions
table. are displayed
areincluded
to

indicatethattheIDcolumns the
inthe
data
physical table include
Itis notdescription
intended todata.

Multiple counting occurs when you run any report that requests sales
color
information
way toisrelatein
not inthe
thesaleofan
conjunctionwith
FACT_ITEM_SALES
item to
itemcolors.
the colorofthat
table.Within
The difficulty
particular
the facttable,
liesin
item.
thefactthat
You
thereisno
can only

hats.
determine
For example,
Basedwhichitemssold,
onthe
youmaywantto
sampledata,
not
runareport
the
theresultsetlooks
colorsofthe
that aggregates
items
like that
the thetotal
following:
sold. salesfor

Report ResultSet with Total SalesforHats


The total sales for hats in thereportmatchestheactual data in the
FACT_ITEM_SALES table. The SQL Engine can correctly aggregate the total
sales for hats basedon the information in the fact table. The SQL for this report
looks like the following:
select a11.[Item_ID] AS Item_ID,
max(a12.[Item_Desc]) AS Item_Desc,
sum(a11.[Sales_Amount]) AS WJXBFS1
from [FACT_ITEM_SALES] a11,
[LU_ITEM] a12
where a11.[Item_ID] = a12.[Item_ID]
and a11.[Item_ID] in (1)
group bya11.[Item_ID]

Notice thatonlytheLU_ITEMand FACT_ITEM_SALES tables appear in the


FROM clause. Becausethequeryonlyasks for hat sales, not hat sales in specific
colors, there is noneedtoincludethe REL_ITEM_COLOR table in the query.

However, what if you want to know the colors of items that sold? You may want
to run areport thataggregatesthetotal sales forallreditems.Based on the
sample data, the result set looks like the following:

Report Result Set with Total Sales for Red Items


There is no way toaccurately determine the total sales for all red items since the
Color attribute isnot representedin the FACT_ITEM_SALES table. However,
the report shows thatyou sold $85 of red items. This amount is the total sales
for all hats and dresses because those are the two items that are available in red.
This number could be correct if all the hats and dresses that were sold were red,
but it is more likely incorrect. However, given the available data, this total is
what you obtain when the SQL Engine aggregates the item sales data from the
select
max(a13.[Color_Desc])
Item levelto
a12.[Color_ID]
the Color level. TheSQLfor
AS Color_Desc,
Color_ID,
this reportlooks like the following:

sum(a11.[Sales_Amount])
from [FACT_ITEM_SALES] a11,
AS WJXBFS1

[REL_ITEM_COLOR] a12,
[LU_COLOR] a13
where a11.[Item_ID] = a12.[Item_ID] and
a12.[Color_ID] = a13.[Color_ID]
and a12.[Color_ID]in(1)
group bya12.[Color_ID]

Notice that the REL_ITEM_COLORtableis in the FROM clause along with the
LU_COLOR and FACT_ITEM_SALES tables. In the WHERE clause, the SQL
Engine joins the FACT_ITEM_SALES and REL_ITEM_COLOR tables using
only the Item_ID column since Color_ID is not part of the fact table. However,
the SQL Engine has to join the REL_ITEM_COLOR and LU_COLOR tables
result,
even
WHERE
basedthough
onthe
theclause
SQLthere
Color_ID
Engine
toretrieve
isnoway
retrieves
column.Thereisalsoafiltering
onlyitems
of knowing
the sales
whose
whether
forColor_IDvalue is 1 (red).
all itemsthatareavailable
theitemssoldwere
condition inthe Asinared,
actually

red.

Finally,
color? You
dresses. what
Based you want
maywant
if onthetoruna
sample
toknow
data,
report
howmuchyou
the
that
result
aggregatesthe
setlookslike
sold ofan
totalsales
the
itemfollowing:
in for
aparticular
all red

Report Result Set with Total Sales for Red Dresses

Again, there isnoway to accurately determine the total sales for all red dresses
since the Color attribute is not represented in the FACT_ITEM_SALES table.
Nevertheless, thereport shows that you sold $50 of red dresses. This amount is
the total sales forall dresses. This number could be correct if all the dresses that
were solddata,
available werered,but
thistotalitisiswhat
moreyou
likely incorrect.Then again, given the
obtainwhentheSQL
Engine aggregates the
dress sales data from the Item level to theColorlevel.The SQL for this report
looks like the following:
select a12.[Color_ID] AS Color_ID,
max(a13.[Color_Desc]) AS Color_Desc,
a11.[Item_ID] AS Item_ID,
max(a14.[Item_Desc]) AS Item_Desc,
sum(a11.[Sales_Amount]) AS WJXBFS1
from [FACT_ITEM_SALES]a11,
[REL_ITEM_COLOR]a12,
[LU_COLOR] a13,
[LU_ITEM] a14
where a11.[Item_ID]= a12.[Item_ID] and
a12.[Color_ID]= a13.[Color_ID] and
a11.[Item_ID] = a14.[Item_ID]
and (a12.[Color_ID] in (1)
and a11.[Item_ID]
group
a11.[Item_ID]
bya12.[Color_ID],
in(2))

Notice that theREL_ITEM_COLOR table is in the FROM clause, along with the
LU_COLOR, LU_ITEM, and FACT_ITEM_SALES tables. In the WHERE
clause, the SQLEngine joinsthe FACT_ITEM_SALES and REL_ITEM_COLOR
tables using only the Item_ID column since Color_ID is not part of the fact
table. However, the SQLEnginehastojointhe REL_ITEM_COLOR and
LU_COLOR tables based on the Color_ID column. There are also two filtering
conditionsinthe WHERE clause—one to retrieve only items whose Color_ID
value is 1 (red) and one to retrieve only items whose Item_ID value is 2 (dress).
As a result, the SQL Engine retrieves the sales for all dresses that are sold, even
though
In both there
because ofthe
the Color_ID
islast
no tworeport
wayof
column
knowingwhetherthedresses
examples,
is notpresent
multiple
inthe
countingoccursprecisely
FACT_ITEM_SALES
sold were actuallytable.
red.

Just as you needa relationship tableanda fact table that enables youtojoin to
Resolving
both
counting whenyou
Many­to­Many
capability,thesesame
theparent andchild
aggregate Relationships
requirements
attributesto
dataabove
areensure
also
thelevelofthe
necessaryto
that you donot
child
preventmultiple
attribute.
lose analytical
As you can see, working with attributes that have many-to-many relationships is
more complex than other types of direct relationships. You need to design the
logical data model and data warehouse schema to account for the difficulties
they
You canimplement
can pose. many-to-many relationshipsusing oneofthe following

methods:

• Creating a separate relationship table

• Creating a compoundchild attribute

• Creating a hidden common compound child attribute

•EachCreating
of thesemethods
acommoninvolvesadifferent
childattribute designforthe

logical data model and


data warehouse schema. Though, each method retains the two fundamental
components thatyou needtoadequately support many-to-many relationships:

• A table that defines the direct attribute relationship

• A fact table structure that enables you to accurately join fact data to both the
parent and child attributes

youwantto
This samestructure
analyze with
hastobein
respectto
placeforany
this particular
fact tablethat
attribute relationship.
contains facts

In the following topics, you will learn about each of these methods for resolving
many-to-many relationships using the following sampledata:

Sample Data
Creating a Separate Relationship Table
Creating a separate relationship table is the most straightforward way in which
to effectively manage many-to-many relationships. This method keeps the
many-to-many relationship intact and structures the data warehouse schema to
resolve any analysis or aggregation issues. You have already seen this method in
the examples provided earlier.

You retain the many-to-many relationship between the two attributes and
create a separate relationship table that stores all of the possible attribute
element combinations. You alsoaddboththe parentandchildattributeIDs to
any facttablesthat contain factsyouwantto analyzewithrespecttothis
attribute relationship. The following illustration shows the structure of the
logical data model and schema for the Color and Item scenario if you use this
method:

Logical DataModel and Schema—Separate Relationship Table

You map the Color attribute to the Color_ID column in the LU_COLOR,
REL_ITEM_COLOR, and FACT_ITEM_SALES tables. You map the Item
attribute to the Item_ID column in the LU_ITEM, REL_ITEM_COLOR, and
FACT _ITEM_SALES tables.

You can then configure a many-to-many relationship between the Color and
Item attributes using REL_ITEM_COLOR as the relationship table. In this
relationship, Color is the parent attribute, and Item is its child.

If you
can runareportthatjustcontains
want toview a listofall thepossible
the ItemandColorattributes.
item and color combinations, you

Report Result Set with AllItem and Color Combinations


This report contains only the two attributes with no metrics. When you have
both attributes on a report without a metric, the SQL Engine uses the
relationship table to retrieve the list of all item and color combinations. The
SQL for this report looks like the following:
select a12.[Item_ID] AS Item_ID,
a13.[Item_Desc] AS Item_Desc,
a11.[Color_ID] AS Color_ID,
a11.[Color_Desc]
from [LU_COLOR] a11,
AS Color_Desc

[REL_ITEM_COLOR] a12,
[LU_ITEM] a13
where a11.[Color_ID]= a12.[Color_ID]and
a12.[Item_ID]=a13.[Item_ID]

Notice thattheFROMclausecontains the REL_ITEM_COLOR tables along


with the lookup tables for each attribute.

If you want to view theitemand color combinations that have sold, you can run
a report that contains theItem and Color attributes along with a Sales metric.

Report Result Set with Sales for Item and Color Combinations
This report containsthe two attributes and a metric. When you have both
attributes on a report with a metric, the SQL Engine uses the fact table to
retrieve the list of all item and color combinations that have sold. The SQL for
this report looks like the following:
select a11.[Item_ID] AS Item_ID,
max(a13.[Item_Desc]) AS Item_Desc,
a11.[Color_ID] ASColor_ID,
max(a12.[Color_Desc]) AS Color_Desc,

sum(a11.[Sales_Amount]) AS WJXBFS1
from [FACT_ITEM_SALES] a11,
[LU_COLOR] a12,
where a11.[Color_ID] = a12.[Color_ID]
[LU_ITEM]a13
and
a11.[Item_ID] = a13.[Item_ID]
group by a11.[Item_ID],
a11.[Color_ID]

Notice that the FROMclausedoesnotuse the relationship table. Instead, it


contains the FACT_ITEM_SALES table along with the lookup tables for each
attribute.
Creating a Compound Child Attribute
Another methodfor resolving amany-to-many relationshipisto eliminatethe
many-to-manyrelationship byconvertingitintoa compound attribute
relationship. This method changes both the logical data model and the data
warehouse schema.

You
which
consists
createacompound
eliminatesthe
oftheIDs of needfor
thechild
keyforthelower-level
aseparate relationship
attributein
table.the
This
relationship,
compound key

and parent attributes. With the compound key in


place, the attributes essentially have a one-to-many relationship, so you can
relate them using the lookup table of the child attribute. You also add the
you
attribute
logical
method:
to
compound
any
Logical
fact
datamodel
relationship.
tables
attribute
Datathatcontain
Modeland
and
ID,whichincludes
The
schema
following
Schema—Compound
facts
for theillustration
Color
want
bothand
to
theanalyze
showsthe
parentandchild
Itemscenario
Child
withstructureofthe
Attribute
respecttothis
ifyouuse
attributeIDs,
this

Since Item is the lower-level attribute, you create a compound key for this
attribute that consists of the Item_ID and Color_ID columns.

You map the Color attribute to the COLOR_ID column in the the LU_COLOR
table. You map the Item attribute to the combination of the Item_ID and
Color_ID columns in the LU_ITEM and FACT _ITEM_SALES tables.

You
attributesusingthe
canthen configurea
LU_ITEM
one-to-many
tableto relationship
relatethetwo between
attributes.In
theColorand
this Item

relationship, Color is the parent attribute, and Item is its child.

column
FACT_ITEM_SALES
separate
While this
intheLU_ITEM
relationship
method eliminatesthe
table,
tables
table.
itare
alsohasdisadvantages.
Also,
more
many-to-manyrelationship
joinsbetween
complex because
theLU_ITEM
Youhave
you have
andthe
tostore
toand
join
needfora
an
onextra
a

compound key rather than a simple key.

Because
attribute,
challenges in the
thismethod
you losereporting
theusesacompound
abilitytoview
environment.
items
attribute,
With
independent
Item
italso
defined
presentssome
of their
asacompound
colors. Each

item andcolorcombinationis
cannot mergerow headers foritemstreatedasa
with thesame
separatedescription
record.Therefore, you
but different
colors. If you run a report that lists all item and color combinations, the result
set looks like the following:

ReportResult Set with AllItemand ColorCombinations


If you use this method to resolve a many-to-many relationship, you could
change item descriptions to better reflect their underlying ID structure to
avoid confusing users. For example, you could use item descriptions such
as “Red Dress,”“Blue Socks,”and so forth.

The description for each item is repeated for every color in which it is available
since you cannot merge the row headers. The SQL for this report looks like the
following:
select a11.[Item_ID] AS Item_ID,
a11.[Color_ID] AS Color_ID,
a11.[Item_Desc]ASColor_ID0,
a11.[Color_ID] ASItem_Desc,

a12.[Color_Desc] AS Color_Desc
from
[LU_COLOR]
[LU_ITEM]a11,
a12
where a11.[Color_ID] = a12.[Color_ID]

In the WHERE clause, notice that the SQL Engine relates the two attributes
using the Color_ID columnsintheLU_ITEM andLU_COLORtables. A
separate relationship table is no longer necessary.

Having
aggregating
that showsthe
Itemallthe
defined
salesforallitem
salesdata
asa compound
forandcolor
anitem,
attribute
regardless
combinations
also prevents
ofcolor.If
thathavesold
youfrom
you run areport

and
include subtotals and grand total on the report, the result set looks like the
following:

ReportResult Set with Sales forItem and Color Combinations

The grand total onthereport correctly displays the total sales for all items.
However, because each item and color combination istreated as a separate
item, the item-level subtotal cannot show the totalsalesforeach item for all
colors
combination.
that sold.
TheInstead,
SQL foryou
thishave
report
a separate
looks like
subtotal
the following:
for every item and color

select a11.[Item_ID] AS Item_ID,


a11.[Color_ID] AS Color_ID,
max(a12.[Item_Desc]) AS Item_Desc,
a11.[Color_ID] AS Color_ID0,
max(a13.[Color_Desc]) AS Color_Desc,
sum(a11.[Sales_Amount]) AS WJXBFS1
from [FACT_ITEM_SALES]a11,
[LU_ITEM] a12,
[LU_COLOR] a13
where a11.[Color_ID]=a12.[Color_ID] and
a11.[Item_ID]= a12.[Item_ID] and
a11.[Color_ID] = a13.[Color_ID]
group by a11.[Item_ID],
a11.[Color_ID],
In the WHEREclause,notice that theSQLEnginejoinsthe
a11.[Color_ID]

sales data to the


item and color data using the Item_ID and Color_ID columns in the
FACT_ITEM_SALES, LU_ITEM,andLU_COLOR tables. Also, the Color_ID
column is listed twice in the SELECT and GROUP BY clauses—once as part of
the Itemattributeandoncefor the Color attribute.

Attribute
Creating aHidden Common Compound Child

Another
direct relationship
alternativebetween
for resolving
thetwoa many-to-many
attributes and relationship
createa commonchild
isto remove the

attribute that relates them. This child attribute is hidden because it will not be
used to display on a report but only to ensure that the report result are accurate.

With this method, you create a new child attribute that is a concatenation of the
original
has
include
a one-to-manyrelationship
theIDofboth
attributes.This attributetobothparent
isachild of eachoftheoriginal
attributes.You stillattributes.It
needto

the parent attributes in any fact tables that contain facts


you wanttoanalyzewith respect to these attribute relationships. The following
illustration shows the structure of the logical data model and schema for the
Color and Item scenario if you use this method:

LogicalData Modeland Schema—Hidden Common Compound Child


Attribute

For any illustrations in this course that show logical data models, hidden
attributes are always indicated by dotted lines.

The ItemColor attribute represents allthe item and color combinations. Itisa
LU_ITEM_COLOR
compound attributeand
table.itThe
relates
FACT_ITEM_SALES
to both the Color and
table
Item
is still
attributes
keyed with
inthethe

Item and Color attribute IDs.

The ItemColor attribute is a real attribute that exists in the logical data model.
enviroment,
browse.
However,Itsbecause
primarypurpose
you should
it doesnotincludeit
notcarryany
is torelate
inlogical
theColorand
the user
meaning
hiearchy
Itemattributes
or for
users
users
inthe
toso
viewor
reporting

that
you can consolidate the join path to the lookup tables. Because this attribute is
used only in the background, you should make it a hidden attribute.

You canthenconfigure a one-to-many relationship between the Color and


ItemColorandItemandItemColor attributes, usingtheLU_ITEM_COLOR
table to relate the ItemColor attribute to both Color and Item. In this
relationship, Color and Item are the parent attributes, and ItemColor is a child
of both attributes. The Color and Item attributes are no longer directly related
to eachother.
This
relationship
methodtable,
producesthe
but same resultsets asif you createdaseparate

it uses the ItemColor attribute to translate each item and


color combination into a compound value. You can use the ItemColor attribute
to join the lookup tables without using it on the report template.

combinations,youcan
attributes:
For Report
example,ifyou
Result Setwith
want
runa
to viewa
All
report
Item
list
that
and
ofjust
allColor
the
contains
possibleitem
Combinations
theItemandcolor
andColor

This report correctly displays the various item and color combinations.
Although the ItemColor attribute is not on the template, the SQL Engine uses it
to join the item and color data. The SQL for this report looks like the following:
select distinct a12.[Item_ID] AS Item_ID,
a13.[Item_Desc] AS Item_Desc,
a11.[Color_ID] AS Color_ID,
a11.[Color_Desc] AS Color_Desc
from [LU_COLOR]a11,
[LU_ITEM_COLOR] a12,
[LU_ITEM] a13
where a11.[Color_ID] = a12.[Color_ID] and
a12.[Item_ID]=a13.[Item_ID]

Notice that theFROM clause includes the LU_ITEM_COLOR table. In the


WHERE clause, theSQLEngineusesthe Color_ID andItem_ID columns to
join the LU_ITEM_COLOR tabletothe LU_COLOR andLU_ITEM tables.

If you want to view the item and color combinations that have sold, you can run
a report that contains the Item and Color attributes along with a Sales metric:
Set
ReportResult with Sales for Itemand Color Combinations

The SQL for this report looks like the following:


select a11.[Item_ID] AS Item_ID,
max(a13.[Item_Desc])
a11.[Color_ID]AS AS Item_Desc,
Color_ID,
max(a12.[Color_Desc]) AS Color_Desc,
sum(a11.[Sales_Amount]) AS WJXBFS1
from [FACT_ITEM_SALES] a11,
[LU_COLOR]a12,
[LU_ITEM] a13
where a11.[Color_ID] = a12.[Color_ID] and
a11.[Item_ID] = a13.[Item_ID]
group bya11.[Item_ID],
a11.[Color_ID]

Notice that theFROMclause does notuse theLU_ITEM_COLOR table.


Instead, it contains the FACT_ITEM_SALES table along with the lookup tables
for eachattribute.

Creating a Common Child Attribute


The fourth option forresolving a many-to-many relationship is to implement a
variation of the previous method. It too eliminates the many-to-many
relationship and the need for a separate relationship table. It uses only a simple
attribute rather than a compound attribute. It requires only one attribute
column in fact tables rather than two attribute columns.

With thismethod,you create anewattributethat isaconcatenation ofthe


original attributes. This attribute is a child of each of the original attributes. It
has a one-to-many relationship to both parent attributes. You include the ID of
this
structure
respect
newtothese
attributein
ofthe logical
attribute
anyfact
data
relationships.
model
tablesand
thatcontain
schema
The following
for
factsyouwantto
the illustrationshows
Color andItem
analyzewith
scenario
the

if you
LogicalData Model and Schema—Common ChildAttribute
usethismethod:
The SKU attribute represents all the item and color combinations. It has itsown
ID column, but it relates to both the Color and Item attributes in the LU_SKU
table. You key the FACT_ITEM_SALES table using the SKU attribute ID rather
than the Item and Color attribute IDs.

You map the SKU attribute to the SKU_ID columns in the LU_SKU and
FACT_ITEM_SALES tables. You map the Item attribute to the Item_ID column
Color_ID
You
in the
canthen
LU_SKU
columnin
configure
andLU_ITEM
theaLU_SKU
one-to-manyrelationshipbetween
tables,and
and LU_COLOR
you maptables.
theColorthe
attribute
Color and
tothe
SKU

and Itemand SKU attributes, using theLU_SKU table to relate the SKU
attribute to both Color and Item. In this relationship, Color and Item are the
parent attributes, and SKU is a child of both attributes. The Color and Item
attributes
This methodareno
producesthe
longer directly
same related
resultsetsas
to eachifyou
other.
created a separate

relationshiptable, butituses the SKU attribute to translate eachitem andcolor


combination into a single value. You canusethe SKU attributetojoin fact table
and lookup table data without using it on the report template.

For
combinations,
attributes:
example, ifyou
youcan
want
run
toaview
reportthatjust
alistofall the
contains
possible
theitemandcolor
Itemand Color
Report Result Set with All Item and Color Combinations

This report correctly displays the various item and color combinations.
join
Although
the item
the and
SKUcolor
attribute
data.isThe
notonthe
SQL fortemplate,
this reportthe
looks
SQLlike
Engine
the following:
uses it to

select distinct a12.[Item_ID] AS Item_ID,


a13.[Item_Desc] AS Item_Desc,
a11.[Color_ID] AS Color_ID,
a11.[Color_Desc]
from [LU_COLOR] a11,
AS Color_Desc

[LU_SKU] a12,
[LU_ITEM] a13
where a11.[Color_ID]= a12.[Color_ID]and
a12.[Item_ID]= a13.[Item_ID]

Notice thattheFROMclauseincludes the LU_SKUtable. In the WHERE


clause, the SQL Engine uses the Color_ID and Item_ID columns to join the
LU_SKU table totheLU_COLOR and LU_ITEM tables.

If you wanttoview the item and color combinations that have sold, you can run
a report that contains the Item and Color attributes along with a Sales metric:
Report Result Set with Sales for Item and Color Combinations

This report correctly displays the sales for each item and color combination as
well as the appropriate subtotals and the grand total. Although the SKU
attribute is not on the template, the SQL Engine uses it to join the sales data to
the item and color data. The SQL for this report looks like the following:
select a12.[Item_ID] AS Item_ID,
max(a14.[Item_Desc]) AS Item_Desc,
a12.[Color_ID] AS Color_ID,
max(a13.[Color_Desc])
sum(a11.[Sales_Amount])
from [FACT_ITEM_SALES]AS
a11,
ASWJXBFS1
Color_Desc,

[LU_SKU] a12,
[LU_COLOR] a13,
[LU_ITEM]
where a11.[SKU_ID]=
a14 a12.[SKU_ID]and

a12.[Color_ID]=a13.[Color_ID] and
a12.[Item_ID] = a14.[Item_ID]
group by a12.[Item_ID],
a12.[Color_ID]

This methodprovidesseveral advantages. Using a simple key in the fact table is


more efficient, and it reduces the number of columns in the table, which can be
significant in a large fact table.

create
Thedata
the only
thewarehouse
disadvantage
lookuptable
schema
ofthismethod
for the
to new
support
attribute
liesin
creating
and
thechangesyou
thenew
keyfactattribute.
tables
haveto
usingthe
Youhave
maketo
IDto
of

the newattribute, which canadd complexity to the ETL process.


Lesson Summary

In this lesson,youlearned:

• If the structure of your logical data model and data warehouse schema does
not adequately address the complexities of querying attribute data that
contains many-to-many relationships, you can lose analytical capability and
have problems with multiple counting.

• When
attribute
table
Multiple
that
youhave
counting
or
definesthedirect
anyattribute
attributeswitha
occurs levelabove
when
attributerelationship
you
many-to-manyrelationship,
aggregate
the parent.
dataanda
to thelevelyou
of theparent
needa

fact table structure


that enable you to accurately join fact data to both the parent and child
attributes.and
capability Having
prevents
thesetables
multipleinplace
counting.
ensures you donotlose analytical

• You can resolve many-to-many relationships by using a separate


compound
relationshiptable,
child attributeor
creating acompound
creating a commonchild
child attribute,creatinga
attribute. hidden
5
ATTRIBUTE ROLES

Lesson Description

This lesson describes the concept of attribute roles.

In this lesson, you will learn about impact of attribute roles on report analysis.
You will learn how to design the data warehouse model and schema to best
support themin the MicroStrategy reporting environment.
Lesson Objectives

After completingthis lesson,youwillbeableto:

Describe attribute roles and explain the four methods for implementing themin
a MicroStrategy project.

implementingthemin
•AfterDescribe role
completing attributes
thetopicsaMicroStrategy
and
in this explainthe
lesson, youwill befour
ablemethods
to: for

project.
Attribute Roles
WhatIsaRoleAttribute?Solution1:Creating
Explicit Table
Aliases
Solution 2:Enabling Automatic Attribute Role Recognition
Solution 3: Creating Table Views
4:
Solution Creating Logical Views

After completing this topic, youwillbe ableto:

Describe role attributes and explain the four methods for implementing them in
a MicroStrategy project.

What
A Isa Role Attribute?
role attributerefersto any time

you have a column in a single lookup table


that is used to define more than one attribute. In such cases, one set of data
plays multiple roles in the reporting environment. For example, a data
warehouse may contain one lookup table for cities. However, on reports, you
may choose to view the city in which a store is located (Store City), the city in
which acustomer is located (Customer City),orthecity from which an order is
shipped (Ship City). All of these attributesarerolesthat reference the same
underlying table.

A
dimension.
orsingle
dimension
all ofthe
attribute
consistsof
For
attributes in adimension
example,
ina dimension
role
you
attributes,
could
or have
hierarchy
it
mayisreferredto
ShipTime
be role
mayattributes.Ifan
function
and
asarole-playing
Order
as aTime
roleattribute,
entire
dimensions

lookup
whereusedsothatboth
Toremove
Translating
alllevelsof
tables. ambiguity,
attribute
timetables
functionas
role
whento SQL,
writing
rolethe
attributes
the
jointypeis
SQL and
statement,
reference
referred
table
tothe
asaliases
same
self- join.
setof
are

involved in the join are treated as different tables.


You can easily map two attributes to the same ID and description columns. The
problem arises when you need to join from fact tables to the lookup table to
retrieve attribute descriptions to display on a report. For example, you have the
following
LookupTable
lookup tablein
for City
your data warehouse:

In your reporting environment, you analyze information by both Store City and
Customer City. These are two separate attributes inthe reporting environment.
Therefore, you create two attributes in the project,but you map both attributes
to the City_ID and City_Desc columns in the LU_CITY table. The logical data
model for these two attributes looks like the following:

Logical Data Model for Customer and Store Hierarchies


You also have the following fact table in your data warehouse that stores sales
information by both Store City and Customer City:

Sales Fact Table

The
attribute
ID formsforthe
formexpressions.
StoreCityand
Bothattributes
CustomerCityattributes
have an attributeform
each havetwo
expression

that maps them to the City_ID column in the LU_CITY table, which functions
as the primarylookup table. Inaddition, the StoreCityattribute also maps to
the St_City_ID
City attribute mapsto
columnthe
in Cust_City_ID
the FACT_CUST_SALES
column inthe
table,and
FACT_CUST_SALES
the Customer

table.

You could run a report with the following template:

Template with Role Attributes


This report generates the following SQL:
select a11.[St_City_ID] AS City_ID,
max(a12.[City_Desc])AS City_Desc,
a11.[Cust_City_ID] AS City_ID0,
max(a12.[City_Desc]) AS City_Desc0,
sum(a11.[Revenue])AS WJXBFS1
from [FACT_CUST_SALES]
where
[LU_CITY]a12
a11.[Cust_City_ID]
a11,
= a12.[City_ID] and

a11.[St_City_ID] = a12.[City_ID]
group by a11.[St_City_ID],
a11.[Cust_City_ID]

In the SELECTclause, notice that the SQL Engine attempts to retrieve the
descriptions for StoreCityandCustomerCity from the sametable.Similarly, in
the WHERE
lookup tableatthesametimefor
clause, theSQLEnginetriestojoin
both attributes.Thisjoin
from thefacttabletothe
retrieves data only
for recordswherethestorecity and customercityarethesame. If no records
exist where the store andcustomercitiesare identical, the query does not return
any data.For example, theFACT_CUST_SALES table contains the following
records:

Sales Fact Table Data


If you run areport with both theStore City and Customer City attributes on the
template, theresult set looks likethe following:

Report Result Set with Role Attributes

Because the join only finds the rows where the store city and customer city are
alike, the only record thatis returned in the result set is the last row in the table
in
attributes.
which Herndon
The other
is the
records
valueinfor
which
boththe
thevalues
Store City
for both
and attributes
Customer are
Citydifferent

are not included in the result set.If the record for the Herndon store were
removed from the FACT_CUST_SALES table, the report would not return any
data.
so
twice
To obtainan
that it
accurate
can instantiate
result set,
thetheSQL
table twice
Engine
inthe
must
same
beable
query—once
to aliasthe
to get
table
the

descriptions for theStore City attribute and onceto get the descriptions for the
Customer
times,
You
• Creatingexplicit
canyoumust
enable
City attribute.For
attribute
configure
tableroles
your
aliases
the
using
environment
SQLthe
Engine
following
tobe
sothatyou
four
ablemethods:
toalias
can userole
atableattributes.
multiple

• Enabling automatic attribute role recognition

• Creating logical
table views
viewsinthedata
in MicroStrategy
warehouse
Solution 1:Creating Explicit Table Aliases
One possiblewaytodeal withroleattributesis tomanuallydefine explicit table
aliases foranylookup tablesthatreferencerole attributes.

To create an explicit table alias:

1 Expandthe Schema Objects folder.

2 In the Schema Objectsfolder, select theTables folder.

3 In the Tables folder, right-click the table you want to alias and select Create
Table
This action
Alias.creates a logical table alias for the table in the Tables folder. By

default, the aliasisnamed <Table Name>(1). Youcan right-click the table


alias and select Rename if you want to modify the name of the table alias.

4 Repeat steps 1 to 3 for each table aliasyouwant to create for the table.

You create one table alias for each role attribute that you want to map to
The following
the lookup
image
table.shows the option for creating explicit tablealiases:

Explicit Table Aliasing


You could create an explicit table alias for the LU_CITY table and define the
Store City and Customer City attributes as follows:

1 Map the Store City attributetothe IDand description columns in the


original LU_CITY table.

2 Ensure that the LU_CITY table alias is not selected as a source table for the
ID or DESC forms of the Store City attribute.

3 Mapthe Customer City attributetothe IDand description columnsinthe


tablealias created for theLU_CITY table.

4 Select the table alias created for the LU_CITY table as the primary lookup
table forthe CustomerCity attribute.

5 Ensure that the LU_CITY table is not selected as a source table for the ID or
DESC forms of the Customer City attribute.
6 Update the project schema.

Now, if you run the same report, the SQL looks like the following:
select a11.[St_City_ID] AS City_ID,
max(a13.[City_Desc]) AS City_Desc,
a11.[Cust_City_ID] AS City_ID0,
max(a12.[City_Desc]) ASCity_Desc0,
sum(a11.[Revenue])AS
from [FACT_CUST_SALES]a11,
WJXBFS1

[LU_CITY] a12,
[LU_CITY] a13
where a11.[Cust_City_ID]= a12.[City_ID] and
a11.[St_City_ID]
group by a11.[St_City_ID],
=a13.[City_ID]

a11.[Cust_City_ID]

The SQL Engine uses the LU_CITY table to obtain the descriptions for the Store
City
Customer
attribute
Cityand
attribute.
theLU_CITYtablealiasto obtain thedescriptionsforthe

When you use explicit table aliasing, the SQL references both the original
lookup table and thetablealiasbythephysical name of the lookup table.
For example, the SQL shown on the previous page references both the
LU_CITYtableandtheLU_CITY table alias as LU_CITY in the FROM
clause.

Because explicit tablealiasing enables youtocontrol exactly how a role attribute


recognition,which
is defined anddoesnot
is describednext,
have the limitationsof automatic attribute role
MicroStrategy recommends explicit table
aliasing as a more robust method of supporting attribute roles.

Solution 2:Enabling Automatic Attribute Role


Recognition
Another methodfordealingwith role attributes is to enable automatic attribute
role recognition,whichyoucan accomplish using a VLDB property setting.
VLDB properties are settings that enable you to change the behavior of the
MicroStrategy SQL Engine or modify query-related properties. When you
enable
To
1
column
query
Theenable
In
SQLEngine
Bydefault,
thatismappedto
the
theattribute
of a
Database
automatic
lookuptableare
automaticattribute
automatically
role
Instances
attributerolerecognition:
it.
property,
candidates
manager,
aliasesthe
any
rolerecognition
attributes
for
right-click
table
automaticattribute
namefor
that
the
isaremapped
disabled.
project
each attribute
database
role
to recognition.
theina
sameSQL

instance and select VLDB Properties.

2 In the VLDB Properties window, on the Tools menu, select Show


Advanced Settings.

This option provides access to more advanced VLDB properties. It may


already
databasebeinstance.
selectedifother advanced VLDB properties areinuse for the

3 In the VLDB Settings list, expand the Query Optimizations folder.

4 In the Query Optimizations folder, select Engine Attribute Role Options.

5 Clear the Use default inherited value ­ (Default Settings) check box.

6 Click EnableEngine Attribute Role Feature.

7 Click Save and Close.

You can either update theproject schema ordisconnect from and


reconnect to the two-tier project source for the change in the setting to take
If you
effect.
Intelligence Severneeds
are connected
tobere-started
toathree-tierinorder
project for
source,
the settingtotake
then the
effect.

The following image shows the VLDB property for automatic attribute role
recognition:

Attribute Role VLDB Property

Now, if you run the same report, the SQLlooks like the following:
select a11.[St_City_ID] AS City_ID,
max(a13.[City_Desc]) AS City_Desc,
a11.[Cust_City_ID] AS City_ID0,
max(a12.[City_Desc]) AS City_Desc0,
sum(a11.[Revenue])AS WJXBFS1
from [FACT_CUST_SALES]a11,
[LU_CITY]a12,

[LU_CITY] a13
where a11.[Cust_City_ID] = a12.[City_ID] and
a11.[St_City_ID] = a13.[City_ID]
group by a11.[St_City_ID],
a11.[Cust_City_ID]

The SQLEngineautomatically aliases theoriginallookuptable twice—once to


obtain thedescriptionsforthe Customer Cityattributeand once to obtain the
descriptions for the StoreCityattribute.

Automaticattributerole recognition does not work if the role attributes


existinthesamehierarchyand share a child attribute.

allocatememory
When must
basically
thatit youcreates
enable
create.Because
an
for this task,there isa table
automaticattributerole
alias forthelookup
MicroStrategy
limitonthe
recognition,
Intelligence
in memory theSQL
Server
foreach
hasto
Engine
tablealias

number of role
attributes you can map in a project if you are using the automatic
recognition option. Youcancreate up to100role attributes in any given
project.

Solution 3: Creating Table Views


The
lookup
LU_CITY
Lookup
third
table
way
tableasfollows:
Tablefor
inthedata
toresolvewarehouse.
thedilemma For
ofrole
example,
attributes
youcould
is to create
create aa view
viewofthe
of the

City and View of Lookup Table


You could map the Customer City attribute to the ID and description columns in
the original LU_CITY table and the Store City attribute to the columns inthe
LU_CITY_ST view.

Now, if you run the same report, the SQL looks like the following:
select a11.[St_City_ID] AS St_City_ID,
max(a13.[St_City_Desc]) AS St_City_Desc,
a11.[Cust_City_ID] AS City_ID,
max(a12.[City_Desc]) AS City_Desc,
sum(a11.[Revenue])AS WJXBFS1
from [FACT_CUST_SALES] a11,
[LU_CITY] a12,
[LU_CITY_ST] a13
where a11.[Cust_City_ID]= a12.[City_ID]and
a11.[St_City_ID] = a13.[St_City_ID]
group by a11.[St_City_ID],
a11.[Cust_City_ID]

The SQL Engine aliases the original lookup table to obtain the descriptions for
the CustomerCity attribute, and it aliases the view to obtain the descriptions for
the Store City attribute.
Although you can use table views to enable attribute roles, there is some
overhead associated with the creation and maintenance of these views,
especially as the number of roles increases for any single attribute. If you do not
already tablealiasing
explicit
advantagehaveviews
ofthe SQL betterinsolution
in functionality
place,
Enginea attributeMicroStrategy.
role recognition
in termsYou
ofVLDB
maintenanceis
couldalsotake
property. touse

Solution4: Creating Logical Views


Finally, you could resolve the issue of role attributes by creating logical views in
MicroStrategy. For more information see the “Creating Logical Views”lesson.

You couldmaptheCustomer Cityattributetothe IDanddescription columns in


the original LU_CITY table and theStoreCity attributetothe columns in the
logical view.

The the
like SQLstatementand
following: column definitionsforthe LVW_STORE_CITY look

Logical View SQL and Column Definitions


Now, if you run the same report, the SQL looks like the following:
select a11.[St_City_ID] AS St_City_ID,
max(a13.[St_City_Desc]) AS St_City_Desc,
a11.[Cust_City_ID] AS City_ID0,
max(a12.[City_Desc]) AS City_Desc0,
sum(a11.[Revenue])AS WJXBFS1
from [FACT_CUST_SALES] a11,
[LU_CITY]
(Select City_ID
a12, as St_City_ID, City_Desc as
St_City_Desc, State_ID as St_State_ID
From LU_CITY) a13
where a11.[Cust_City_ID] =a12.[City_ID] and
a11.[St_City_ID] = a13.[St_City_ID]
group by a11.[St_City_ID], a11.[Cust_City_ID]

Notice thattheFROMclause contains a derivedtableexpression. This


expression istheSQLstatement for the LVW_STORE_CITY.The SQL Engine
aliases the original lookup tabletoobtainthe descriptions fortheCustomerCity
attribute, and it uses theviewtoobtainthedescriptions fortheStoreCity
attribute.

attribute
Finally,
logical
there
Givenislittleorno
thesimilarity
views
because
isdefined,
carry
explicit
thesame
difference
in characteristics
tableadvantages
between
aliasing between
enables
them
and in
limitations
youtocontrol
terms
logicalofviews
performance.Generally,
as a anddatabaseviews,
databaseview.
exactlyhowarole

MicroStrategy recommends explicit table aliasing as a more


robust method of supporting attribute roles.

Here
Airport
isanother
and Destination
exampleofrole
Airport attributewhereyou
thathave thesame definition
havetwo attributesOrigin
but play different

(Airport_ID,
rolesLookup
Destination
inthe reporting
Airport
Tablefor
Airport_Code
are
environment.
Airport
defined
and using
Airport_Desc)
Inthe
thissamelookup
example,Origin
asshowntable
below:
Airport
and columns
and
You also have the following fact table in your data warehousethat stores daily
flight details including the number of flights between Origin Airport and
Destination Airport:

Flight Fact Table

Both Origin Airport and Destination Aiport share the same lookup
table; however, in the fact table, a separate column exists for each
of their roles (Origin_ID and Destination_ID).

If you run a report with boththe attributes onareporttoobtain


the number of flights onMay1,2012, for the variousoriginand
destination airports, an empty result set is returned.
Template with Role Attributes

This report generates the following SQL:


select a11.[Origin_ID] AS AIRPORT_ID,
max(a12.[AIRPORT_CODE]) AS AIRPORT_CODE,
max(a12.[AIRPORT_DESC]) AS AIRPORT_DESC,
a11.[Destination_ID] AS AIRPORT_ID0,
max(a12.[AIRPORT_CODE]) AS AIRPORT_CODE0,
max(a12.[AIRPORT_DESC]) AS AIRPORT_DESC0,
sum(a11.[NUM_FLIGHTS])ASWJXBFS1

from [FACT_AIRPORT] a11, [LU_AIRPORT] a12


where a11.[Destination_ID] = a12.[AIRPORT_ID] and
a11.[Origin_ID] =a12.[AIRPORT_ID] and a.11[Date_ID]in
(#2012­10­01 00:00:00#)
group by a11.[Origin_ID], a11.[Destination_ID]
The SQL statement tries to obtain the descriptions of airports
using the one lookup table. However, since an origin airport
be
cannot
origin
previously
To
returned.
model
obtainan
airport
attribute
the
described
accurate
destination
=“LAX”methods.
resultset,
andairport
destination
Since
youcan
at the
the
airport=
same
useoneofthe
recommended
time;
“JFK”,
for example,
four
nodatais
wayto

roles in MicroStrategy is to use explicit table


aliasing, this method is explained below:

2
1 in
Mapthe
Create
theoriginal
anexplicit
DestinationAirport
LU_tablealiasfor
AIRPORTtable.
attribute
the LU_AIRPORT
totheIDandalldescriptive
table. columns

3 Ensure that the LU_AIRPORT table alias is not selected as a source table
for the IDorany of thedescriptive formsof the DestinationAirport
attribute.

4 Map
tabletheOrigin
alias created
Airportattribute
forthe LU_AIRPORTtable.
totheID andalldescriptive columns in the

5 Select the table alias created for the LU_AIRPORT table as the primary
6 lookup
Ensure tablefor
thatthe the Origin Airportattribute.

LU_AIRPORT table is not selected as a source table for the


ID or any of the descriptive forms of the Origin Airport attribute.

7 Update the project schema.

Now, if you run the same report, the SQL looks like the following:
max(a13.[AIRPORT_CODE])
select a11.[Origin_ID] AS
ASAIRPORT_ID,
AIRPORT_CODE,

max(a13.[AIRPORT_DESC]) AS AIRPORT_DESC,
a11.[Destination_ID] AS AIRPORT_ID0,
max(a12.[AIRPORT_CODE]) AS AIRPORT_CODE0,
max(a12.[AIRPORT_DESC])AS AIRPORT_DESC0,
sum(a11.[NUM_FLIGHTS])ASWJXBFS1
from [FACT_AIRPORT] a11,
[LU_AIRPORT] a12,
[LU_AIRPORT]a13

where a11.[Destination_ID]= a12.[AIRPORT_ID] and


a11.[Origin_ID] = a13.[AIRPORT_ID] and a11.[Date_ID] in
(#2012­10­01 00:00:00#)
group bya11.[Origin_ID], a11.[Destination_ID]

Notice that the SQL references both the original lookup table and the table alias
by the physicalnameofthelookup table.

Report ResultSetwith RoleAttributes with Table Aliasing


___ Daily Number of Flights D I E IIék

‘ file Edit Miew Insert Fgrmat Qata Qrid Move V_Vindow Help

UH §ave and Close I g ,3_I


. H

12 (1) to til 101 a 21


Report objects X Report details _ X
Name ’
e T [v.57 :_ Report Filter:
~ e - Day = 10/1/2012 ,
LUOrigin Airport
{II Number of Flights Number
Metrics of
Flights
Origin Airport Destination Airport
JFK New York (John F. Kennedy) 20
ORD Chicago (O'Hare) 16
LAX Los Angles IAD Washington DC. (Dulles) 8
ATL Atlanta (Hartsfield-Jackson) 12
MIA Miami International 6
DEN Denver International 20
ORD Chicago (O'Hare) 24
1A0 Washington DC. (Dulles) 4
LGA New York (LaGuardia) ATL Atlanta (Hartsfield-Jackson) 20
MIA Miami International 8
DEN Denver International 5
, m— , JFK New York (John F. Kennedy) ORD Chicago (O'Hare) 7

Execution complete Execution Time: 00:00:00 Rows: 12 Columns:1 LocalTemplate Standard _;.j
Lesson Summary

In this lesson,youlearned:

• A role attribute refers to any time you have a column in a single lookup table
that is used to define more than one attribute. One set of data plays multiple
roles inthe reporting environment.

• Mapping two attributes to the same ID and description columns in the data
• You
lookup
warehouse
cantable to retrieve
enable
causes
attribute
problems
attribute
rolesby
whendescriptions
creatingexplicit
you need tojoin
to display
from onareport.
fact tables tothe

table aliases, enabling


automatic attribute role recognition, creating table views in the data
warehouse,orlogical viewsinMicroStrategy.
6
HIERARCHIES

Lesson Description

This lesson describes advanced data modeling concepts of complex hierarchy


structures; such as, ragged, recursive and split hierarchies.

In this lesson, you will learn about each of these concepts and their impact on
report analysis. You willlearn how to design the data warehouse model and
schema tobest supporttheminthe MicroStrategy reporting environment.
Lesson Objectives

After completingthis lesson,youwillbeableto:

Describe advanced data modeling concepts and explain how to design the
datawarehouse model and schema to support them in a MicroStrategy project.

After completing thetopicsin this lesson, you will be able to:


• Describe ragged
implementing them hierarchies andexplain two methods for

• Describe split hierarchies and explain how to implement them


in a MicroStrategy project.
• Describe recursive hierarchies and explainhow to implement
them in a MicroStrategy project.
Ragged Hierarchies
WhatIsaRagged Hierarchy?
RevisingtheData Model
Populating Null Attribute Values

After completing thistopic, you will beable to:

Describe ragged hierarchies and explaintwo methods for implementing them in


a MicroStrategy project.

What Isa Ragged Hierarchy?


A ragged hierarchy is one in which the organizational structure varies such that
the depth of the hierarchy is not uniform. In other words, for every child
attribute element, there does not always exist a corresponding parent attribute
element. Instead, the child attribute element may have a direct relationship only
with a grandparent attribute element.

For example, an advertising companyhas its salesorganization represented as


follows:

Logical Model forSales Hierarchy


In this model, the company is divided into regions, which are then split into
markets by advertising segments. Each market has dedicated account executives
who are responsible for specific clients. However, that general structure may not
hold true
directly
report directly
correspond
for allclients.
to theregion
toamarket.
For example,you
level.Therefore,account
A lookatcouldhave
some ofthe
executivesfor
some
actual
clientsthat
datathese
reveals
donot
clients
a

ragged
Ragged
structure
StructureofSales
tothehierarchy:Data

There are four levels of data in the warehouse for the Sales hierarchy. They map
to the logical data model as follows:

• Level 1 maps to the Region attribute.

• Level 2 maps to the Market attribute.

• Level 3 maps to the Account Executive attribute.

• Level 4 maps to the Client attribute.

In the second level ofdata, notice thatthere areonly three markets—Fashion,


Food, and Transportation. One ofthe elementsthat ties directly into the East
region is an account executive, Sara Kaplan. The physical data at the second
level is ragged. The Market attribute is not represented between the Region and
Account Executive levels for all of the data in the warehouse. In the case of Sara
Kaplan, she is responsible for a single client that is not associated with a
particular
structure the
element, rather
market,soshe
Market
thanreporting
attribute
falls carries
directly
through
no
undertheEast
ameaning,
market. For
making
region
this particular
the
inthesales
hierarchy
attribute
ragged.

Attributesfrom raggedhierarchies are problematicwhen you place them in


attributesina
reports. Forexample,
ragged hierarchy,
ifyou wantthemissing
tocreatea reportthat
valuescan causeissues
displaysallof
sincedata
the

does not exist uniformly at every level to populate each cell in the report.
Ragged hierarchies also pose a challenge when drilling. If a report contains an
attributefromaragged
hierarchy, values may notexistfor
hierarchyandevery
youneedto
rowof data
drill
inthe
tootherlevelsin
original report.
the

Inanormalized schema,ragged hierarchies can alsocause problems when


higher-level
you
lookuptable
result,thisjoin
hierarchy.
aggregate
Inattributethat
requires
such
data
excludes
cases,
thatisstored
usingthe
some
joining
exists
fact
lookuptable
ator
from
attable
alower
abovetheragged
therecords
facttable
levelin
in which
from
the
tothe
being
the
hierarchy
level
gaps
higher-level
aggregated.
inthe
exist.
toaAs a

Denormalizing the schema can prevent aggregation issues and resolve


specific drill paths.However, evenina denormalized schema, drills or
other queries that join through the lookup table in which the gaps exist will
continue to poseachallenge as some data willbe “left out”of result sets
You
• Populate
Model
attribute
can
because
denormalization.
resolve
theattributerelationships
orwith
thegaps
ofthegaps.For
thesesystem-generated
issues
foran with
attribute
moreragged
information
sothat
withvalues
values
hierarchies
skipped in
onfrom
normalization
levelsare
one
its parent
of two
eliminated
or
and
ways:
child

Revising the Data Model


One method for resolving ragged hierarchies is to revise the data model so that
gaps do not exist. For example, you could revise the Sales hierarchy to model
the attribute relationships as follows:

Revised Logical Model for SalesHierarchy

Changing the data model to directly relate the Region and Account Executive
attributes also means modifying the underlying structure of the
LU_ACCT_EXEC table. You need to add the ID column for Region to the
LU_ACCT_EXEC table to map the relationship between the two attributes:

Modified Lookup Table for Account Executive

In the illustration above, both Sara Kaplan andDaveWilliams are not


assigned to markets.
By adding the Region_ID column to the LU_ACCT_EXEC table and making
Account Executive a child of Region, you establish a means of relating account
executives directly to their corresponding regions. You can now drill directly
from Region
table
Although
forMarket.
changing
toAccount
thedata
Executivewithout
model provideshaving
adrillpathto
to join throughthelookup
avoidthe gaps

inthe
ragged
from aragged
structureofthe
hierarchyon
hierarchy,it doesnotresolve issues withdisplaying data

a report. If a report contains all levels of the


hierarchy, the SQL joins include the lookup table for Market in which the gaps
exist. As a result, account executives who are not assigned to markets still do not
display
resolvingissues
Revising
onthereport.
thedata
withragged
Therefore,
modelresolves
hierarchies.
this option
the problems
isnot acomplete
posed byragged
solutionhierarchies
to

logicfor
method
fromthe
onlyforvery
can
the
lowest
leadtoeven
Account
specific
level Executive
drill
tohigher
bigger
paths.
levelsina
attribute
issueswhenit
Resolvingdiffers
hierarchy.
ragged
comes
hierarchies
Inthis
to rolling
case,
using
updata
the this
roll-up

dataisaggregated by joining through theLU_MARKETdepending on whether the


table or joining
directly to the LU_REGION table. Aggregation that occurs through the
LU_MARKETtable
Williams,
does includetheir
while aggregation
revenue.
does notthat
include
occurs
revenuefor
through theLU_REGION
Sara KaplanorDave table

Populating
generated
with
A better
attribute NullAttribute
methodforresolving
values.
elements
Inserting
ofeither
values
ragged Values
thechildorparent
effectively
hierarchies
eliminates
istopopulate
attributeor
gaps inthenull
awithsystem-
raggedvalues

hierarchy.

For example, the following illustration shows the original data in the
LU_ACCT_EXEC table:

LookupTableforAccount Executive withNullValues


Sara Kaplan and Dave Williams are not assigned to markets, sothe market ID
for both of them is null. You could run a report with the following template in
which all three attribute levels are present:

Template with Attributes from Sales Hierarchy

However, if yourun this report, Sara Kaplan and Dave Williams arenot
included in the result set since their respective market IDs are null:

Report Result with Null Values


To ensure that all account executives are included in the report display, you can
populate the empty values in the LU_ACCT_EXEC table by inserting the values
of the parent (Region) or child (Account Executive) attributes into the
If
Market_ID
values
you populatethe
toreplace
column
theMarket_ID
ofthe
nulls.LU_ACCT_EXEC
column with theparent
tableor byattribute
generatingyour
values, the
own

LU_ACCT_EXEC table looks like the following:

Lookup Table for Account Executive with Parent Values

Now, if you run the same report, the result set looks like the following:

Report Result with Parent Values

Alternately, you could populate the empty cells for the Market_ID column with
the values for the Account Executive attribute. Then, the LU_ACCT_EXEC table
looks like the following:

Lookup Table for Account Executive withChild Values

Now, if you run the same report, the result set looks likethe following:

Report Result with Child Values

Whether you choose to populate empty cells withthe parentor child attribute
values completely depends on which action provides the most business value to
users as they view reports.

If inserting parent or child attribute values does not make sense in your
business environment, you can also populate theempty cellswith system-
generated IDs that map to descriptions that indicate that a value does not exist.
For example, you could generate market IDs for account executives who are not
assigned to a market. In the lookup table for the Market attribute, these IDs
map to description columns that indicate that no market is assigned. Then, the
LU_ACCT_EXECtable
LookupTable for Account
looks likethe
Executive
following:
with Generated Values

Now, if you run the with


sameGenerated
report, the Values
result setlooks like the following:

Report Result
Split Hierarchies
WhatIsa SplitHierarchy?
Creatinga Joint Child

After completing this topic, you willbeableto:

Describe split hierarchies and explain how to implement them in a


MicroStrategy project.

What Isa Split Hierarchy?


A split hierarchy is one in which there is a split in the primary hierarchy such
that more than one child attribute exists at some level in the hierarchy. Most
hierarchies follow a linear progression from higher-level to lower-level
attributes. While characteristic attributes may branch off the primary hierarchy
at various points, the primary hierarchy itself generally follows a single path to
the lowest-level attribute andany relatedfacttables.With split hierarchies,
somewhere along the primary hierarchy, asplitoccurs.For example, a
pharmaceutical company has its Prescriber hierarchy organized as follows:

Logical Data Modelfor Prescriber Hierarchy


The prescriber relates at the lowest level to both the drug that is being
prescribed and the patient to whom the prescription belongs. In this example,
the complexity is compounded by the many-to-many relationships between the
parent
multiple
patients,
andprescribers
anda
childattributes.
patient
can
can
prescribe
go
Aprescriber
tomultiple
the same
canprescribe
prescribers
drug.A prescriberhasmultiple
(doctors)
multiplefor
drugs,
different
and

ailments.

betweenthe
Splithierarchies
parentand
canbe present
child attributes.
without many-to-many relationships

The problemwith a split hierarchy is that it provides two paths that you can use
to join to fact tables. The lookup tables (and relationship tables in the case of
distinct
many-to-many
join paths.You
relationships)for
can useeither
each path
parent-child
tojoin tofacttables
attributeform separate,
for metrics that
are contained in a report. Nonetheless, the SQL Engine optimizes the path to
fact tables, so it is forced to make a choice.

A split hierarchy may not pose join issues if each child attribute inthe split
between
For split
joins
tothesamefacttable.
the
hierarchiesin
toa
parent
different
andwhich
setof
children,
thereare
factthe
tables
split
one-to-oneor
andyou
results never
in theSQL
one-to-many
usethe attributestojoin
relationships

Engine
consistently choosing one join path over the other, even though that path may
not be the most efficient way of joining from the fact tables to the parent
attribute forinwhich
hierarchies
many-to-many all relationship
queries.
there
Thissame
aremany-to-many
furtherproblem
compounds
alsoarises
relationships.
the issue.
whenyou
Because
However,
have
the the
split
SQL

Engine chooses one join path over the other, the join may not occur through the
proper relationship table, which can lead to an inaccurate result set.

For example,the
TablesinPrescriber
Prescriber
Hierarchy
hierarchy contains the following tables:
There are lookup tables for the Prescriber, Drug, andPatientattributes as well
as two separate relationship tables—one to map the relationship between
Prescriber and Drug and one to map the relationship between Prescriber and
Patient. These tables contain the following data:

Table Data
If you run a report to view the drugs that prescribershave prescribed, the result
set looks like the following:

Result Set for Prescriber­Drug Information

The result set correctly displays each prescriber along with the drugs they have
prescribed. The SQL for this report looks like the following:
select a12.[Prescriber_ID] AS Prescriber_ID,
a13.[Prescriber_Name] AS [Prescriber_Name],
a11.[Drug_ID] AS Drug_ID,
a11.[Drug_Name] AS Drug_Name
from [LU_DRUG] a11,
[REL_DRUG_PRESCRIBER] a12,
[LU_PRESCRIBER] a13
where a11.[Drug_ID]=a12.[Drug_ID] and
a12.[Prescriber_ID] = a13.[Prescriber_ID]

result set is correctbecauseitis


REL_DRUG_PRESCRIBER
The table,which
obtained
mapsbyqueryingthe
therelationships between

prescribers anddrugs.

You could also run a report to view the patients for each prescriber. The result
set lookslike the following:

ResultSetforPrescriber­Patient Information

The result set correctly displays each prescriber along with thepatients for
whom they have prescribed drugs.

The SQL for this report looks like the following:


select a12.[Prescriber_ID] AS Prescriber_ID,
a13.[Prescriber_Name] AS [Prescriber_Name],
a11.[Patient_ID] AS Patient_ID,
a11.[Patient_Name] AS Patient_Name
from [LU_PATIENT] a11, a12,
[REL_PATIENT_PRESCRIBER]

[LU_PRESCRIBER] a13
where a11.[Patient_ID] = a12.[Patient_ID] and
a12.[Prescriber_ID] = a13.[Prescriber_ID]

The result set is correct because it is obtained by querying the


REL_PATIENT_PRESCRIBERtable, which mapstherelationshipsbetween
prescribers and patients.

You could alsorun a report thatshows prescriber and patient informationalong


with
following:
theamountofprescriptionsforeachpatient. The resultset looks like the

Result Set for Prescriber­Patient Information with a Metric

With the Prescription Amount metric as part ofthe report, this result set does
not correctly display the prescriber andpatient relationships. Instead of relating
patients to prescribers who have prescribed drugs forthem, the result set relates
patients to any prescriber that prescribes drugs they have taken, regardless of
whether they actually obtained their prescription from that particular
prescriber. The SQL for this report looks like thefollowing:
select a12.[Prescriber_ID] AS Prescriber_ID,
max(a14.[Prescriber_Name]) AS
Prescriber_Name],
a11.[Patient_ID] AS Patient_ID,
max(a13.[Patient_Name])AS
sum(a11.[Presc_Amt]
from [FACT_PRESCRIPTIONS]
ASWJZBFS1
a11,
Patient_Name,
[REL_DRUG_PRESCRIBER] a12,
[LU_PATIENT] a13,
[LU_PRESCRIBER]a14
where a11.[Drug_ID] = a12.[Drug_ID] and
a11.[Patient_ID] = a13.[Patient_ID] and
a12.[Prescriber_ID] = a14.[Prescriber_ID]
group
In
a11.[Patient_ID]
this case,theresultsetisincorrect
by a12.[Prescriber_ID], because theSQLEnginechoosestojoin

from the FACT_PRESCRIPTIONS table to the LU_PRESCRIBER table through


the Drugattribute, ratherthanthePatientattribute. Therefore, it uses the
relationship table between Drug and Prescriber to obtain the result set. As a
result, thequeryfindsthedrugs that each prescriber prescribed and then just
joins to each patient who took those drugs, regardless of whether or not they
have a relationship with a particular prescriber. To determine relationships
between
Prescriber
by
in
A
the
Engine
analyze
• preliminary
the
the
most
Logicaltable
Order
query.
Drugattribute,
checks
inthis
patients
efficientjoinpathfrom
of
andattributes
stepto
the
report),thequerymustaccessthe
Patient.Because
size
and
following:
generating
prescribers
the
inthe
REL_PATIENT_PRESCRIBER
system
theSQL
SQL
fact
(thehierarchy
tablestolookup
isthat
informationthat
Engine
theSQL
choosesthe
relationship
Engine
tables.Todo
youreally
tableis
joinpath
hastodetermine
table
not
want
so,
between
included
provided
theSQL
to

MicroStrategy Architect automatically calculates the logical table size and


assigns a numeric value to each table relative to the attributes that are contained
logical
where
in the tableandtheir
checks the
tablesizeequates
thelogical
SQL Enginefinds
sizeof
position
both
toatwo
smaller
theLU_DRUG
in theirrespective
joinpaths
physical
to the
table
hierarchies.
LU_PRESCRIBER
size. In cases
Usually,
like table,
this
a smaller
one
it

and LU_PATIENT tables. In this


example, the Drug and Patient attributes have the same weight. Therefore, the
tables have thesame logical size. The SQL Engine cannot differentiate between
the two paths based on logical table size.

At this point, if the logical table size is equal, the SQL Engine cannot distinguish
which path is themost efficient. Therefore,it simply picks the lookup table
based on the order of the attributes (Drug and Patient) in the system hierarchy.
was
Because
has
Essentially,
SQL
depending
situation
no
Engine
wayof
Drug
inonthe
which
when
choosesto
differentiating
is first
you
attributes
both
inhavea
join
the
choices
system
throughthe
on
between
split
the
seem
hierarchysuchas
hierarchy
report,
to
the
be
LU_DRUG
joinpaths
sometimes
equally
(it efficient.In
table.
thisone,
created
becauseit
you needtojoin
before
the
actuality,
creates
SQLEngine
Patient),
athrough
the

the LU_DRUG table and sometimes through the LU_PATIENT table.

The best way to ensure thattheSQLEngine always selects the most efficient
join
frompath
relationships.
thehierarchy.
and uses tables
Youcanthatprovidethe
resolve split hierarchies
desired result
by creatingjoint
set isto remove
child
the split

Creating
The joint child a Joint Child
involvedinthesplit.It
relateseachof the original parent and child attributesthat are

also provides a single path for joins. In this example, you


need to relate Prescriber, Drug, and Patient. To set up the joint child, you need
to do the following:

1 Create a relationship table that includes the parent attribute and both child
attributes.

2 Create joint child relationship between the parent and the two children
attributes using this relationship table.

Creating the Relationship Table

First, you need to create a relationship table that maps the relationship between
the parent attribute and both child attributes. The relationship table looks like
the following:

Relationship Tablewith AllThree Attributes

The REL_PRESCRIBER_DRUG_PATIENT table provides a means of relating


prescribe
all
relationship
three attributes
them
to both
at the
tothe
one
same
drugs
another.In
time.The
theyprescribe
thisway,
following
andyou
image
the
canviewprescribers
patientsto
showsthemodified
whom they
in

relationship between the three attributes:

Modified Tables in the Prescriber Hierarchy


Creating the Joint Child

After creatingthe relationship table, you need to perform the following steps:

1 Add the REL_PRESCRIBER_DRUG_PATIENT table to the project.

2 Map the ID forms of the Prescriber, Drug and Patient attributes to the
respective ID columns in the REL_PRESCRIBER_DRUG_PATIENT table.

3 Unmap the ID forms of the Prescriber, Drug and Patient attributes from the
4 Remove
the REL_PATIENT_PRESCRIBER
the REL_PATIENT_PRESCRIBER and LU_DRUG_PRESCRIBER
andLU_DRUG_PRESCRIBER tables.

tables from the project.

5 Add Drug and Patient attributes as children ofthe Prescriber attribute.

6 Make sure to select the joint child check box.

The following image shows the optionfor setting the joint child relationship:

Creating Joint Child Relationship


7 Update the project schema.

In the illustration above, when you make the Patient and Drug attributes joint
children of Prescriber, you create a structure where the two separate children
attributes involved in the original split are now jointly related to each other
through the parent attribute. Essentially, you modify the logical data model to
look like the following:

Modified Logical Data Model for Prescriber Hierarchy


Now, if you run the same report to obtain prescriber and patient information
along with the prescription amounts for each patient, the result set looks like
the following:

Result Set Obtained by Joint Child Relationship


Because of the joint child, the report now contains the correct result set since
the information from the FACT_PRESCRIPTIONS table is joined to the
prescriber information through the REL_PRESCRIBER_DRUG_PATIENT
table.
select
max(a14.[Prescriber_Name])
TheSQL
a12.[Prescriber_ID]
looks like the following:
ASPrescriber_ID,
AS

Prescriber_Name,
a12.[Patient_ID] ASPatient_ID,
sum(a11.[Presc_Amt]
max(a13.[Patient_Name])ASPatient_Name,
AS

WJXBFS1
from [FACT_PRESCRIPTIONS] a11,
[REL_PRESCRIBER_DRUG_PATIENT] a12,
[LU_PRESCRIBER]a14
[LU_PATIENT]a13,

where a11.[Drug_ID] = a12.[Drug_ID] and


a.11.[Patient_ID] =a.12[Patient_ID]
a11.[Patient_ID]=a13.[Patient_ID]and
a12.[Prescriber_ID] = a14.[Prescriber_ID]
group by a12.[Prescriber_ID],
a12.[Patient_ID]

Notice that the relationship table appears in the FROM clause. The WHERE
clause usesthesamerelationship table to jointothefacttable.Asa result, the
SQL Engine can now efficiently join to the fact table for either the Drug or
Patient attributeanddeliveravalid resultsetthatcorrectlyportrays the
relationships between prescribers and drugs or prescribers and patients.
Recursive Hierarchies
WhatIsaRecursiveHierarchy?FlatteningaRecursive
Hierarchy
Handling Complexities in Recursive Hierarchies

After completing thistopic, you willbeable to:

Describe recursivehierarchies andexplain howtoimplement themin a


MicroStrategy project.

What Isa Recursive Hierarchy?


A recursive hierarchy is one in which elements of an attribute have a parent-
child relationship with other elements of the same attribute. All of the attributes
in a hierarchy may be recursive, or a hierarchy may have only a single attribute
that is recursive. For example, a company’s organizational structure looks like
the following:

LogicalData Model forGeography Hierarchy


At first, this hierarchy seems very simple and straightforward. However, within
the Employee attribute, there are two levels of management along with the
lowest-level employees who do not manage anyone. An organization chart for
the company looks like the following:

Employee Organizational Structure


In the database, the employee data is stored in a single table in a recursive
fashion. This table looks like the following:

Lookup Table for Employee


The LU_EMPLOYEE table stores not onlytheID and nameof each employee,
but
employee’s
it also has
manager.
a Manager_ID
If you just
column,
wanttoview a list of allthe
which references employees,
employee ID of each
you can
create an Employee attribute and mapitsID and DESC forms tothe
Employee_ID and Employee_Name columns in the LU_EMPLOYEE table.

Generally, when you have a recursive attribute like Employee, you want to be
able to run reports that show managers and their corresponding employees. In
columns
attributes.
the database,
ReportinFor
thesametable.
with
thedataforallthree
example,
Managerand
youcouldruna
However,
Employee
levels
on areportthat
ofreport,
Attributes
employees
they
looks
come
are like
logically
from
the following:
the
different
same

The Level1Manager and Level2 Manager attributes represent the two levelsof
management, and the Employee attribute represents the lowest-level employees
who do not manage anyone. Because all three attributes map to the same
columns in the same lookup table, you need to be able to alias the table three
times
on the
To resolve
intheSQLtoretrieve
template.
this issue
By default,the
with recursive
the employeename
SQLhierarchies,
Engine aliasesa
youneed
foreachof
tabletoonly
thethreeattributes
flatten
once.
the recursive

attribute, creating separate lookup tables or viewsforeach levelof recursion.

You cannot use explicit table aliasing to support recursive hierarchies.


Although you could create two logical table aliases for the LU_EMPLOYEE
table
solutiondoes
andmapnotwork
each ofthebecauseall
manageroftheemployee
levelsto oneofthetable
records are
aliases,
contained
this

in the lookup table that is aliased. When you run a report with any one of
it
the
displays
Sometimes,
threeevery
attributes
tables
employee
witha
(Level
in
recursive
1thetable
Manager,
structure
for
Level
each2also
attribute.
Manager,
containa
or Employee),
level column,

and
MicroStrategy
example,
does
which
an
not
indicates
Employee
resolve
aLevelSQL
the
issueswithrecursive
1would
Manager
Engine
levelbe“3.”
of doesnot
an
wouldbe
element
Using
look
“1,”aLevel
hierarchiesbecause
ainthe
levelcolumn
at the
recursive
specific
2 Manager
like
hierarchy.
data
thisinside
theelements
wouldbe“2,”
For
a table

contained inthe rowsof a table.

Flattening
lookup
To flatten
Flattened
tables a Recursive
the orviews.Thethree
recursive
Lookup Tables
LU_EMPLOYEEHierarchy
tables orviews
table, youneedto
look like the
createthree
following: separate
After flattening the LU_EMPLOYEE table,you map the ID andDESC forms of
the three attributes as follows:

• Level 1 Manager—Maps to the Lev1_Mgr_ID and Lev1_Mgr_Name


columns in the LU_LEVEL1MANAGER tableand the Lev1_Mgr_ID column
in the LU_LEVEL2MANAGER table

• Level 2 Manager—Maps to the Lev2_Mgr_ID and Lev2_Mgr_Name


columns in the LU_LEVEL2MANAGER table and the Lev2_Mgr_ID
column intheLU_EMPLOYEE table

• Employee—Maps to the Employee_ID and Employee_Name columns in the


LU_EMPLOYEE table

You
threecan
attributes
then change
as follows:
the data modeltoreflecttherelationships between the

Revised Geography LogicalData Model


Now, if you run the report, the result set looks like the following:

Report Result with Recursive Relationships


Since the Level 1 Manager, Level 2 Manager, and Employee attributes map to
different tables or views, they each are aliased in the SQL, which enables the
query to display the employees with respect to the managerial relationships that
exist.
select TheSQL
a12.[Level1_Mgr_ID]
for thisreport lookslike
ASthefollowing:
Level1_Mgr_ID,

a13.[Level1_Mgr_Name] ASLevel1_Mgr_Name,
a11.[Level2_Mgr_ID] AS Level2_Mgr_ID,
a12.[Level2_Mgr_Name]ASLevel2_Mgr_Name,
a11.[Employee_ID]ASEmployee_ID,
a11.Employee_Name] AS Employee_Name
from [LU_EMPLOYEE] a11,
[LU_LEVEL2MANAGER]a12,
[LU_LEVEL1MANAGER]a13
where a11.[Level2_Mgr_ID] =
a12.[Level2_Mgr_ID] and
a12.[Level1_Mgr_ID] = a13.[Level1_Mgr_ID]

In the FROM clause, the SQL Engine uses the LU_LEVEL1MANAGER,


LU_LEVEL2MANAGER,andLU_EMPLOYEE tables that comprise the
flattened schema to retrieve the data for the result set.

In the above example, if you also have fact tables in your data warehouse that
store dataattheEmployeelevel,then you could have resolved this issue using a
completely denormalized lookup table as shown below:

Completely Denormalized Employee Lookup Table


Handling Complexities in Recursive Hierarchies
In the previous example, flattening therecursive table is a good solution since
the number of levels is relatively smalland fixed. Furthermore, there was an
added assumption that the fact tables stored data at the employee level only.
This then allows you to drill up and down the hierarchy.

However, there could be situations where the fact tables contain data from
higher levels. For instance, let us assume the above example represents a service
organization.Boththe
report tothemperformlevel1and2managersalong
billablework.Thebillable hours
withare
therecorded
employeeswho
in one fact

table. So if Joseph Duke performed 20 hours of billable work for a customer, in


order
employee
a
he
reportingto
level
wouldhave
to2 create
manager.
in himself.
thetobealevel2
anaccurate
employee
Similarly,
table
monthly
if
manager
Matt
reporting
Wilsonalso
billingreport,
andtoMatt
an employee
performed
Wilson
he wouldhaveto
andin
who
billable
would
both work,then
cases
bean
be listedas
Alternative Solution ­ Using a Relationship Table
Additional
ragged. Also,somerecursive
challengesposed byrecursive
hierarchiescould
hierarchies
havenopredefined
arethat they couldbe
limittothe

number of levels.

separateattributes.
three
Thereseparate
aredifferent
attributesso
solutionspossible
In the examplediscussed
based onsofar,itmade
whether theresense
isaneedto
tomodel
see

that you could look at business facts from an


organizational hierarchy perspective. If such a requirement is not needed, then
the entire hierarchy could be modeled through employee and manager
employee
attributes.toher
Basically,
directandindirect
youcreate aseparate
managers.In
relationship
addition,inthe
table that linksany
relationship

table,
is fromthere
the top
is another
ofthe hierarchy.
attributethat represents the distance that any employee

explosion
These relationship
tables. tables arealso referredtoas bridge,helper and

Essentially,the
sample
effectively
organization
represents
relationship
structure
the parent-child
table
showingthe
captures
relation
information
inthe hierarchy.Here
insucha way is
thatit
a

employees and their relationship to


each other:

Employee OrganizationStructure
In our example, for Paul Smith the relationship table would contain multiple
entries, capturing all of Paul’s managers.

Cheryl Green is Paul’s direct manager (distance1)

Joseph Duke is Paul’s indirect manager (distance 2)

Matt Wilson is Paul’s indirect manager (distance 3)

So
indicatingthe
for Paul, therewouldbe
overallrelationship
three records
in the hierarchy.
inthe relationshiptable,
Itmay each
also be useful to add
another row indicating a relationship from Paul tohimself with a distance of
zero. This will prove useful when the fact tables include data for both managers
and employees. The followingimage shows therelationship table structure and
the number of records for Paul:

Hierarchy Relationship Table


The relationship table only contains the IDsnecessary to represent
relationships between the attributes.

Caution, if you are working with large and deep hierarchies, the
relationship table may become really big. Note that it essentially captures
all paths from any employee to the top level manager.

following
After creating
steps:
the REL_HIERARCHY table above, youneed to perform the

1
2 Add thethe
Create REL_HIERARCHY
Employee attribute
table
bymappingit
totheproject.
to Employee_ID columninthe

LU_EMPLOYEE and REL_HIERARCHY tables.

3 Create thedescription
Employee_Name column
formfortheEmployeeattribute using
in the LU_EMPLOYEE table.

4 Use explicit table aliasingtocreatethe LU_EMPLOYEE_ALIAS table.

You could also use the other options discussed in the Attribute Roles
lesson.

5 Create theManagerattribute by mappingittotheManager_ID inthe


REL_HIERARCHY table.

6 Use
LU_EMPLOYEE_ALIAStable
heterogeneousmapping totothe
maptheManager
Employee_ID
attribute.
inthe
7 Create the description form for the Manager attribute using the
Employee_Name column in the LU_EMPLOYEE_ALIAS table.

8 Create theDistance attributeby mapping it to the Distance_ID inthe


REL_HIERARCHYtable.

9 Make Manager the parent and Employee the child ofthe Distance atttribute
using the REL_HIERARCHY table.

10 Update the project schema.

The following image showsthe revised data model forthe Geography hierarchy:

Revised Geography Logical Data Model

Now, if you wanted to see all the employees who report to Joseph Duke, you can
create the following report:

Report for all Employees of a Specific Manager


When you run the report the following results are displayed:

Report Result of all Employees Reporting to a Specific Manager

It is easy to exclude Joseph Duke from the result set by creating a report
filter on the Distance attribute.

The SQL for the above report looks like the following:
select a12.[Manager_ID] AS Manager_ID,
a13.[Employee_Name] AS Employee_Name,
a11.[Employee_ID] AS Employee_ID,
a11.[Employee_Name] AS Employee_Name0
from [LU_EMPLOYEE]
[REL_HIERARCHY] a12,
a11,
[LU_EMPLOYEE] a13
where a11.[Employee_ID] = a12.[Employee_ID] and
a12.[Manager_ID]= a13.[Employee_ID]
>0
and (a12.[Distance]
and a12.[Manager_ID] in (2))

Notice that the relationshiptableis used to retrieve all employees and managers
who have a relationshipwithadistance greater than zero.

Sometimes you may not want to query a specific branch of the hierarchy, but
rather for any given employee you want to find the chain of managers. The
entire
on
relationship
the reporting
reporttable
template,
structure.
makesyoucanthen
thispossibleand
Forexample,
sortthe
the
ifyou
following
resultsetto
includethe
image
seean
Distance
shows
employee’s
theattribute
report

template to determine all the managers for Paul Smith:

Report for all Managers for a Specific Employee

When
Report
you run
Result
the report
Showing
the followingresultsare
ReportingStructurefora
displayed:
Specific Employee
By sorting on the Distance attribute it is easy to see who Paul’s immediate
manager is and all of his indirect managers upto the highest level.

The SQL for the above report looks like the following:
select distinct a12.[Manager_ID] AS Manager_ID,
a13.[Employee_Name] AS Employee_Name,
a12.[Distance] AS Distance
from [REL_HIERARCHY] a12,
[LU_EMPLOYEE]
where a12.[Manager_ID]
a13 = a13.[Employee_ID]

and (a12.[Employee_ID] in (12)


and a12.[Distance]>0)

Notice that the relationship table is used to retrieve all the managers and their
distances for PaulSmith.

Finally, whatifyou wanted to include in your report business fact data, such as
the hours billed by each employee. The following image shows the fact table:

Fact Table Structure


1 Add the FACT_EMPLOYEE_BILLING _HOURS table to the project.

2 Create the Billed Hours fact bymapping it to Billed_Hours column.

3 Edit the Employee attribute and make sure that the Employee_ID attribute
form is mapped to the FACT_EMPLOYEE_BILLING _HOURS table.

4 Update the project schema.

5 Create
Then, you couldrun
Billed Hours
a reportthat
metric. shows thetotal numberofhours billedbyall

employees who are part of Joseph Duke’s reporting chain. The result set looks
like the following:

Result Set for Employee Information withaMetric


Notice that the report displays correctly all the employees who are under Joseph
Duke’s chain of command. The report also shows the hours billed by Joseph
himself. To exclude him from the report result, an additional report filter using
the Distance attribute (Distance > 0) canbe applied to this report. The SQL for
this report looks like the following:
select a11.[Employee_ID] AS Employee_ID,
sum(a11.[Billed_Hours])
from
[REL_HIERARCHY]
max(a13.[Employee_Name])
FACT_EMPLOYEE_BILLING_HOURS]
a12, AS
ASWJXBFS1
Employee_Name,
a11,

[LU_EMPLOYEE] a13
where a11.[Employee_ID]=a12.[Employee_ID] and
a11.[Employee_ID]=a13.[Employee_ID]
and a12.[Manager_ID] in (2)
group by a11.[Employee_ID]

Notice how the relationship table is used to join to the fact table to calculate the
employees billed hours.The relationship table is also used to filter the report for
only one manager and in this case Joseph Duke.
Lesson Summary

In this lesson,youlearned:

• In a ragged hierarchy, every child attribute element does not always have a
corresponding parent attribute element. Instead, the child attribute element
may have adirect relationship only with a grandparent attribute element.

• You can resolve ragged hierarchies by either revising the attribute


values.
relationships inthedata modelso
populatingthegapsforan attributethatskipped
withparent,levelsareeliminated
child,or system-generated
or

• A split hierarchy is one in which there is a split in the primary hierarchy


• You
such
The
use to
problemwithasplit
canresolve
thatmore
jointofacttables.
thanonechild
hierarchy is thatit
attribute existsatsome
provides twolevelinthe
pathsthat
hierarchy.
youcan

split hierarchies by creating joint child relationship that


relates each of the original parent and child attributes involved in the split
• A
and
child
attributes
recursive
provides
relationshipwith
inhierarchy
aahierarchy
singlejoin
isone
other
may
path
in
elements
berecursive,
which
to factelements
tables.
ofthesame attribute. All
ofanattribute have
ofthe
a parent-

or a hierarchy may have only a


single attribute that is recursive.

• You
level
the recursive
canresolve
of recursion.
attributes
issueswith
by creatingseparate
reporting onrecursive tables or viewsforeach
lookuphierarchiesby flattening

• Another solution for resolving recursive hierachies is to create a


relationship
parent-child tablethat
relationshipand
maintainsa
distance
linkattribute
for everywhich
possible
indicates
combination
the depth
of

of a specific entity fromthe topmost level.


7
SLOWLY CHANGING DIMENSIONS

Lesson Description

This lesson describes the concept of slowly changing dimensions.

In this lesson, you will learn about impact of slowly changing dimensions on
report analysis. You will learn how to design the data warehouse model and
schema tobest support themin the MicroStrategy reporting environment.
Lesson Objectives

After completingthis lesson,youwillbeableto:

Describe slowly changing dimensions and the three methods for


implementingthem in a MicroStrategy project.

After completing the topics inthis lesson, youwillbe able to:


• Describe slowly changing dimensionsand explain the three
methodsforimplementing them inaMicroStrategy project.
Slowly Changing Dimensions
SlowlyChanging Dimensions(SCDs)
AsIsvs.AsIs(Type ISCDs)

As Is vs. As Was (Type II SCDs)


Like vs. Like
As Was vs.As Was
Summary ofFour Typesof SCDs
Creating a Life Stamp
Using a Hidden Attribute for SCDs
Denormalizing Fact Tables

After completing thistopic, you will beable to:

Describe slowly changing dimensions and explain the three methods for
implementing themina MicroStrategy project.

Slowly Changing Dimensions (SCDs)


Slowly ChangingDimensions (SCDs) referstotheprocess oftracking and
analyzing attribute relationships that changeovertime. For example, a retail
company has numerous stores that are assigned to various managers. The sales
organization hierarchy in their data model is structured as follows:

Logical Data Model for Sales Organization Hierarchy


Although the relationships between districts, regions and managers do not
change much over time, the managers assigned to stores change as they are
reassigned or as they enter or leave the company.

Running a report to view the stores to which a manager is assigned is a standard


query that looks at the current state of the data. Though, if the company wants
to view reports that show past stores to which a manager has been assigned, this
type of query requires a historical view of the data. So in this example the sales
organization changes slowly in time as the mangers are reorganized, that is,
managersswitch
For example,the stores
store lookup
intime.table and a fact table for the salesfor

each store
contain thefollowing data:

LookupTable andFact Table Data


In November 2012, there was an organizational change as new managers were
hired and existing ones were reassigned to different stores. The sales fact table
records the daily sales for each store, but if you run a report like the sample
report shown above, how are the October 2012 sales to be calculated for
managers whochangedstores?Aredatafornew managers to be included in the
report? Theanswerdependsonwhetheryouareinterested in analyzing changes
in data relationships across time.

SCDs is also referred to as versioning in MicroStrategy.

different
Although
lesson
• AsAsIsvs.As
wewill
Is terminologies
vs.
SCDs
Asdiscuss
are
Is
Was(also
well
(alsoreferred
four
used
referredto
documented
typesof
to distinguish
toasType
as
SCDs,
in
TypeI
data
they
thedifferent
SCDs)
warehousing
IIare:
SCDs) types
literature;there
ofSCDs. In this
are

• Like vs. Like


• As Was vs. As Was

As Is vs. As Is (Type I SCDs)


As Is vs. As Is (Type I) involves analyzing all data in accordance with the
attribute relationships as they exist currently. Regardless ofhow relationships
have changed over time, you aggregate andqualify alldata(current and
historical) based on the current values in the lookup and relationship tables. If
aggregate tables exist, you either have to modify how the values roll up to reflect
the current
perform
For example,
thistype
attribute
usingthe
ofanalysis.
relationships,
sample storeoryouhave
data, theLU_STORE
toignorethetable
tableswhen
would beyou

structuredasfollows:
Lookup Table forAsIsvs. AsIs(Type1 SCDs)

LU_STORE
The
assigned.
salesLU_STORE
for each
This
table.
manager,
schema
table
As areflects
preserves
result,
the fact
the
when
table
only
stores
you
rolls
therun
tocurrent
up
which
atothe
report
relationships
managers
store
to aggregate
to are
which
in
currently
the
theamount
the manager
of
is currently assigned:

Report Result for As Is vs. As Is (Type I SCDs)

The illustration above displays part of the LU_STORE table showing the
stores associated with a manager. The Manager_ID column is not in the
FACT_STORE_SALES table, but this column is shown inseveral of the
SCD illustrations in this lesson to make it easier to understandhow store
sales are rolled up to managers.

Notice that for Missy and Jim, the sales for October are for the only the stores
they manage as of November. For Liz even though did not start as an employee
until November, she has sales for October.

As Is vs. As Was (TypeII SCDs)


As Is vs. As Was (Type II) involves analyzing all data in accordance with the
attribute relationships as they exist currently and as they existed historically.
You aggregate and qualify data based on the values in the lookup and
relationship tables that correspond to thedesired timeframe.If aggregate tables
exist, thelogic behind howthe values rollup may differ based on the time
period
For
structuredas
example,usingthe
Lookup
thatyou
Table
follows:
query.
forAs
sameIsvs.
sample data,theLU_STORE tablewould be

As Was (Type II SCDs)

to
relationships
As
records
every
the manager-store
not
store
only
in
to awhich
tosingle
maprelationships
they
managers
lookup
weretable
previously
tochange
their
alongcurrent
over
with
assigned.
time,
data
stores,
range
there
Youbut
canstore
values
willbe
alsoto
and
multiple
these
mapthem
flagsthat

indicate the time period when a particular relationship existed.


This schema preserves both the historical and current relationships. As a result,
when you run a report to aggregate the total sales for each manager, each store
record in the fact table rolls up to the manager who was assigned that store at
the timethat
Report Result
thesales
foroccurred:
As Isvs. As Was(TypeII SCDs)

Since Liz did not start asamanager until November, she does not have any sales
number for October.

Like vs. Like


Like vs. Like, also referred to as comparable analysis, involves analyzing only
data records that exist and are identical for the querying time period. In other
words, only data relationships that have not changed over time are included in
the result set.

historical
Similar tothe
andcurrent
AsIsvs.relationships.
As WasorTypeIISCDs,
As a result, the
whenschema
you run
preservesboth
a report to the

aggregate the total sales for each manager, only data that exists unchanged are
part of the final result set:

ReportResult for Likevs. Like

Missy and Jim were reassigned stores in November and Liz came on as a new
manager in November. OnlyJena stayedidentical in both time periods. Thus,
the report contains sales from those stores that had the same managers in both
time periods.
As Was vs. As Was
As Was vs. AsWas involves analyzing dataonly in accordance with the attribute
relationships asthey existed historically. Similar to the As Is vs. As Was or Type
II SCDs, the schema preserves both the historical and current relationships.
With this type of analyis, however, you reference only historical relationships in
queries. Asa result, when you runareport toaggregate the salesfor each
manager,thefacttable rolls up thestore salestowhichthe manager was
historically assigned, notthestore thatthey are currently assigned:

ReportResult for As Wasvs. As Was

Missy’s sales for both months roll up into the stores that she was intially
assigned to even though in November she no longer managers Metro South and
is instead responsible forMetro West. Similarly, Jim’s sales for both months roll
up into the three stores he was intially assigned to even though he no longer
manages Metro West in November. Since Liz did not start as an employee until
November, she is not included in the report. To include her November sales for
the Metro South store, you mustquery fordata basedon current relationships.

Summary of Four Types of SCDs


In summary, there are four types of SCDs:

• As Isvs.AsIs(Type I)

• As Isvs.AsWas (Type II)

• Like vs.Like

• As Was vs. As Was

Each of these types of analysis returns a different result set when querying the
same data:

Report Results for Each Type of SCDs


As Is vs. As Is (Type I) analysis isthe most common type of query performed
since you do not preserve any history of attribute relationships. If data is time
independent and users do not require historical comparisons, thistype of
analysis is sufficient to support a variety of reporting requirements.

When users do want to track changes in attribute relationships across time, As


Is vs. As Was (Type II) and Like vs. Like analysis are the SCDs that are most
frequently used. If data is time dependent and users are interested in analyzing
the changes in relationships (As Is vs. As Was or Type II) or in determining
what
analysis
must
types relationships
dooneofthe
ofanalysisrequire
are required
have
following:
tofully
remained
a more
support
complex
constant
reporting
data
(Like
warehouse
requirements.
vs. Like),structure
thesetypesof
Bothofsinceyou
these

• Create columns in the affected lookup table to denote which values are
current and which are historical

• Create multiple versionsof lookup tablesto store both current and


historicalvalues
• Modify the fact table structure to include the versioned relationship

In addition to having to maintain a more complex database design, the SQL


generated ismore complex and may takelongerto process.
in
historical
As Was vs.AsWas
values, is analysis,
rarely seenaswhich
a requirement.
usersare interested
This typeonlyintracking
ofanalysis also

requires thesame changesto thedata warehouse structure asAsIs vs. AsWas


(Type II) or Like vs. Like analysis.

If reporting requirements include the need for SCDs, youcan implement them
•using
Creating
oneofthe
a life
following
stamp (uses
methods:
a single lookup

table)

• Creating a hidden attribute that relates to both current and historical values
• (uses multiple lookuptables)
Denormalizing the fact table(changes thefact

table structure to
accommodate SCDs)

Creating a LifeStamp
If a reporttheappropriatedata
retrieves requirestimedependentisto analysis, one waytoensure thatthe query
include the time period for which you want
to view data in the SQL itself. You can then aggregate records according to the
attribute relationships as they existed at that point in time. You can implement
this solution by creating a life stamp.

A life stamp consists of a start date and end date that indicate the time period
for whichspecificrecords are valid. Forexample,in the sample scenario, you
could modifythe LU_STORE table as follows:

Lookup Table forStores with LifeStamp


Using this method, the LU _STORE table functions as a single lookup table that
contains
map
start them
andrecords
end
to every
dates,
not
store
you
onlycan
totowhich
map
determine
managers
they have
the validity
to
been
their
previously
of
current
any particular
stores,
assigned.
but also to
record
Byusing
from

the record itself. For records that represent the current store assignments, the
end date is set arbitrarily large (in this example, 12/31/2099).

If you implement versioning using life stamps, you need to do the following:

1 Modify the lookup table to include start date and end date columns.

The modified
Asa result, thestore ID column
table includes duplicate
isno longersufficient
store IDswith different
to uniquely
startdates.
identify

each row.You need to create a compound primary key for the table that
consists ofthe store ID and startdate columns.

2 Create a Start Date attribute and map it to the start date column in the
lookup table.

3 Create an End Date attribute and map it to the end date column in the
lookup table.
These two attributes are logically different from the Date attribute that
maps to the date column in the fact table.

4 Make the Start Date andEndDate both parentsof theStore attribute with a
one-to-many relationship.

5 Include the desired date range in the report filter.

Based on thedaterangethat youincludein the report filter, the report


aggregates data according to the stores that the managers werre responsible for
during the specified time period. Essentially, the life stamp determines which
version of the manager-store relationship isused for aggregation.

achieve
For example,
AsIsvs.
in the
As Was
sample
(TypeII)
scenario,
analysis:
you could usethe following filterto

ReportResult UsingaLife Stamp


Using the date range in the filter, the SQL Engine joins the start and end dates
to the dates in the fact table to retrieve the records for the requested time
period. The report aggregates the sales based on the stores to which managers
were assigned during that timeframe. The SQL forthe report lookslike the
following:

select
max(a12.[Manager_Name]
a13.[Month_ID]
a12.[Manager_ID]AS
ASManager_ID,
Manager_Name,

AS Month_ID,
from [FACT_STORE _SALES] a11,
sum(a11.[Sales])ASWJXBFS1
max(a13.Month_Desc]ASMonth_Desc,

[LU_STORE] a12,
[LU_DATE]a13,

where a11.[STORE_ID]= a12.[STORE_ID]


and a11.[Date_ID] = a13.[Date_ID]
and (a11.[Date_ID] >= a12.[Start_Date]
and a11.[Date_ID] < a12.[End_Date]
and a13.[Month_ID] in (201210, 201211))
group by a12.[Manager_ID], a13.[Month_ID]

When you uselifestampstoimplement SCDs,overtime,asingle lookup table


can store many different historical versions of the data, along with the current
definition. For queriesinwhichyou simply wanttoviewthecurrentdata,
filtering using the lifestamp can be a tedious, time-consuming method of
retrieving the mostcurrentdatafrom the lookuptable.

You canmake iteasiertoretrievethe rows thatreflectthe currentattribute


relationships by using a status flag to indicate which records in a table are
historicaland which arecurrent.Forexample,inthe samplescenario,youcould
modify the LU_STORE table as follows:

Lookup Table for Stores with Current Flag


The Current_Flag columnintheLU_STOREtableindicateswhethereach
record in the table contains a current value (Y) or historical value (N). You can
easily perform As Is vs. As Is (Type I) analysis onthistable by simply filtering
on the Current Flag being set to “Y.”

If you have all of your current records set to the same arbitrary end date (in
this example, 12/31/2099), you could also retrieve current data from the
table by filtering on the End or
Date attribute. Only the current records would
select
tocreate
Implementing
be associatedwiththe
for allofthe
aparticular
SCDsusing
appropriate
query,
arbitrary
lifestamp
filters
users
end date.
have
andprompt
requiresa
to timefilter,
userson which
you eitherhave
one to

understand the data well


enough to create their own filters.

Using a Hidden Attribute for SCDs


Now, you will learn how to use hidden attributes to implement SCDs.

You will have twoseparate tables—one tostorejust thecurrent information and


one to store current and historical information. You can then create a third
lookup table that stores a record for every manager for every store that they
were assigned. You use this table to join the values in fact tables to either the
current or historical versionsof the lookuptables asneeded.You then create a
If
hidden
you implement
attribute toSCDs
reference
usingathis
hidden
lookup
attribute,
table. you needto dothe following:

1 Modifythe datamodel to includecurrent and historical attributes that


represent the different “versions” of the data.

2 Create separate lookup tables for the versioned attributes to store current
and historical values.

3 Create attributesto map tothecurrent andhistorical versions of the lookup


tables.

4 every
Createstore
aseparate
towhichthey
lookuptable
wereassigned.
that contains a record foreverymanager for

5 Create a hidden attribute that maps to this lookup table and relate it to both
the currentand historical attributes.

6 Key affected fact tables based on the ID of the hidden attribute.

Modifying the Logical Data Model

In the sample scenario, the original data model forthe Sales Organization
hierarchy looks likethe following:

Logical Data Model for Sales Organziation Hierarchy


Supporting different versions of an attribute entails logically separating the
attributes that reference current and historical information. Therefore, you need
to modify the Sales Organization hierarchy to include both current and
historical attributes as follows:

Modified Logical Data Model for Sales Organization Hierarchy

The modified Sales Organization hierarchy contains a branch for the current
version of each attribute and a branch for the historical version of each
attribute. Right now, the data model does not show a join between the two
branches. Later, you will use the hidden attribute to relate the two branches,
CreatingCurrent
information.
enablingyouto join fact tabledatato either currentor historical manager

and Historical Versions of the Lookup


Tables

After modifying the data model for the Sales Organization hierarchy, you need
to
warehousesothat
historical
create current
attributes.
and
youhistorical
havetablesversions
towhich
of thelookup
youcanmaptables
thecurrentand
inthedata

The lookup table for the Current Manager attribute looks


like the following:

Lookup Tablefor Current Manager Attribute

The LU_CURR_STORE
managers. In this table, eachmangeris
tablestores only
related
the most
onlycurrent
to the stores
informationfor
towhich they

are currently assigned.

The lookup table for the Historical Store attribute looks like the following:

Lookup Table for Historical Store Attribute


Themanagers.
for LU_HIST_STORE
In this table,each
table stores
manager
boththe
isrelated
currentto
and
every
historical
store toinformation
which they

were previously assigned. The Hist_Store_SPK_ID column serves as a


surrogate primary key. The Hist_Store_ID isnot sufficientas the primary key
since the same store willhave multiple records in the table because of different
managers are assigned over time. Thus, the surrogate key is necessary to
uniquely identify each row in the table. For example, the Metro West store has
two records , one for Iim who was the previous manager and one for Missy who
is
Both
thethe
Current
information
The
topics
historical
oftheillustrations
current
LU_HIST_STORE
MRR_Flag
later
Store,
manager.
recordsfora
inthis
toenable
Current
column
lesson.
only
Manager,
AsIsvs.As
query.
table
show
isalsomust
the
You
present
Historical
lookup
Was
willlearnmoreabout
contain
inthistableto
(TypeII)andLikevs.Like
tablesthat
Store
bothcurrent
andHistorical
will
enable
andhistorical
eventually
bothyou
ofManager
these
toisolate
analysis.
mapto

attributes.
OrganizationThehierarchy
higher-level
(forexample,
attributes in both branches oftheSales
Current Region and Historical Region)
also require tables to which you can map their attribute definitions.
If you have a distinct list of current versus historical regions, you need to build
separate lookup tables. At each attribute level, you would have both historical
and current versions of the lookup tables. However, if Region does not change
as
over
the
themsamelistof
timerole
then
attributes,
Current
regions.You
Region
andyoudonot
and
canHistoricalRegion
need
map separate
them using
lookup
table
attributes
tables.
views,canpull
You
automatic
cantreat
from

Creating
attribute Current
role andHistorical
recognition, or explicit tableAttributes
aliasing.

After you create the necessary lookup tables, you are ready to create the current
and historical attributes from the modified data model.

When youcreatethe CurrentStore attribute, itsIDform mapstothe


Curr_Store_ID column and its DESC form to the Curr_Store_Name column in
the LU_CURR_STORE table. When you create the Current Manager attribute,
its
Curr_Manager_Name
ID form maps tothecolumninthe
Curr_Manager_ID
LU_CURR_STORE
columnand itsDESC
table: form to the

Mapping of Current Store and Manager Attributes


When you create the Historical Store attribute, its ID form maps to the
Hist_Store_SPK_ID column and its DESC form to the Hist_Store_Name
column in the LU_HIST_STORE table. When you create the Historical
Managerattribute,
DESC formtotheHist_Manager_Name
itsID form mapstothe column
Hist_Manager_IDcolumn
in the LU_HIST_STORE and table:
its

Mappingof Historical StoreandManager Attributes

You also need to create the other higher-level attributes in both branches and
map them totheir respective lookup tables. After creating the higher-level
attributes, youneedtodefinetheparent-childrelationshipsforbothbranches.

Creating the Hidden Attribute

After creating the current and historical lookup tables and defining the current
and historical attributes, you are ready to create the hidden attribute that you
will use to create a join path from either branch of the hierarchy to the fact
tables.

To set up the hidden attribute, you first need to modify the logical data model
for the Sales Organization hierarchy to look like the following:

Modified Logical Data Model for Sales Organization Hierarchy with the
Hidden Attribute

The Store attribute ties together the two branches of the hierarchy and enables
you to join either branch to fact table data. It exists only to provide a
consolidated join path to the fact tables. It is not logically relevant to users, so it
should not be visible to them. In reports, users will see the Current Store and
Historical Store attributes, depending on the type of analysis they want to
perform. These two objects comprise the logical representation of stores that
users see. In the background, the Store attribute, which is a hidden attribute,
will join elements from either the LU_CURR_STORE or LU_HIST_STORE
table to the relevant fact data.

like theup
To set following:
the hidden attribute, you needtocreate a third lookuptable that looks

LookupTablefor the Store Hidden Attribute


The LU_STORE table contains a record for every manager who has been
assigned oneormore stores throughout their employment with the
organization. Sincea store appears morethan once in this table if the store has
been managed by morethan one manager, the table contains a surrogate
primary key, Store_SPK_ID. The Curr_Store_ID column relates records in this
table to the LU_CURR_STORE table, and the Hist_Emp_SPK_ID column
relates them to the LU_HIST_STORE. Because it contains a foreign key to both
the historical and current versions of store information, this table provides a
join path betweenfact tables and eitherversion ofthe lookup table.

After creatingthis LU_STORE table, you needtodothe following:

1
3
2 Create
Make
LU_STORE
Store attributes
theStore
the
theStoreattribute,
Store
table.
attribute
(one-to-many
ahidden
a mapping
childofboththeCurrent
relationshiptoboth
attribute.
it tothe Store_SPK_ID
attributes).
Storeand column
Historical
inthe

To createa hidden attribute:


1 Expand the Schema Objects folder.

2 In the Schema Objects folder, select the Attributes folder.

3 In the Attributes folder, right-click the attribute you want to hide and select
Properties.

4 In the Properties window, in the Categories list, unde the General category,
select the Hidden checkbox.

5 Click OK.

The following image showthe optionformaking an objecthidden:

Hidden Object Option


Creating the Store attribute as a hidden attribute makes it available to use in the
join path, while hiding it from the view of users to eliminate confusion. Users do
not need to directly access the Store attribute since it is never used as a display
Keying Fact Tables Based
attributeonreports.

on the Hidden Attribute

The last step in implementing a hidden attribute to handle SCDs is to key fact
tables based on the or
ID column of the hidden attribute. Foryouany facts you want to
analyze for current historical employee information, need to ensure that
the facttablesare keyed usingthe Store_SPK_ID column, the surrogate
primary key in the LU_STORE table. Doing so ensures that joins from the fact
tables to either the historical or current data always occur through the Store
attribute,
The Fact
rekeyed
Table
which
FACT_STORE_SALES
Keyed
references
on Hidden
bothbranchesof
Attribute
tablelooks
thelikethefollowing:
hierarchy.
Because the FACT_STORE_SALES table is now keyed based on the Store
attribute, which relates to both the Current Store and Historical Store
attributes, you can join from the fact table to either “version” of store
You
information.
samecanjointhe Hist_Store_SPK_ID column in the LU_STORE table to the
columnin the

LU_HIST_STORE table to retrieve current or historical


records forstoresand relate them to facts. You can join the Curr_Store_ID
column in the LU_STORE table to the same column in the LU_CURR_STORE
AnalysisUsing a HiddenAttribute
table to retrieve current recordsforstoresandrelate themtofacts.

With the hidden attribute solution in place, you can perform the various types of
analyses.

For example, if you want to sales just for stores to which managers are currently
assigned,youcould runthe following report:

Report Result Set with Current Manager­Store Information

Since you want to view current information, the template contains the Current
Manager and Store attributes, which ensures that the result set is retrieved from
the LU_CURR_STORE table. The SQL for this report looks like the following:
select a13.[Curr_Manager_ID] AS Curr_Manager_ID,
max(a13.[Curr_Manager_Name]) AS Curr_Manager_Name,
a12.[Curr_Store_ID] AS Curr_Store_ID,
max(a13.[Curr_Store_Name]) AS Curr_Store_Name,
sum(a11.[Sales]) AS WJXBFS1
from [FACT_STORE_SALES]a11,
[LU_STORE]a12,
[LU_CURR_STORE] a13
where a11.[Store_SPK_ID] = a12.[Store_SPK_ID] and
a12.[Curr_Store_ID] = a13.[Curr_Store_ID]
group bya13.[Curr_Manager_ID],
a12.[Curr_Store_ID]

Notice thattheLU_STOREtableisin the FROMclause.IntheWHEREclause,


it joins tothefacttablebasedonthe Store_SPK_ID column,whichmapsto the
hidden Store attribute. The SQL Enginethenjoinstothe current version of the
lookup table to retrieve the managers andtheirassociated stores

If you want to view just historical information for managers who have
previously managed other stores, you could run the following report:

Report Result Set with Historical Manager­Store Information

Since you want to view historical information, the template contains the
Historical Manager and Store attributes, which ensures thatthe result set is
retrieved from the LU_HIST_STORE table. However, the LU_HIST_STORE
table contains both current and historical information (forperforming current-
historical comparisons), soyou needto ensure that you only retrieve records
from this table that are historical store assignments for managers. You limit the
result set to the historical records by filtering on the MRR_Flag column in the
LU_HIST_STORE table:

Lookup Table for HistoricalStore Information withMRR Flag

The MRR_Flag column (MRR stands for most recent record) exists in the
LU_HIST_STORE tabletodenotewhichrecordsarecurrentstoreassignments.
Current assignmentshaveavalueof“Y,”whilepastassignmentshaveavalueof
“N.”

If you need to differentiate between multiple historical records, you can


expand the MRR Flag to include more than two values.

To filter on this column, you need to create a MRR Flag attribute that maps to
the
include
of the
MRR_Flag
HistoricalStoreattribute.
itinthelogicaldata
columnin theLU_HIST_STORE
model
When youcreatetable.
theMRR
ThisFlag
attribute
attribute,
isa parent
you

as follows:

Logical Data Model for Sales Organization Hierarchy with MRR Flag
After you have created this attribute, you can use it in the report filter to include
only records where the MRR Flag is set to “N.” This filter limits the result set to
historical records. The SQL for this report looks like the following:
select a13.[Hist_Manager_ID] AS
Hist_Manager_ID,
max(a13.Hist_Manager_Name]) AS Hist_Manager_Name,
a12.[Hist_Store_SPK_ID] AS Hist_Store_SPK_ID,
max(a13.[Hist_Store_Name]) AS Hist_Store_Name,
sum(a11.[Sales]) AS WJXBFS1
from [FACT_STORE_SALES] a11,
[LU_STORE] a12,
[LU_HIST_STORE]a13,
where a11.[Store_SPK_ID]=a12.[Store_SPK_ID] and
a12.[Hist_Store_SPK_ID] =
a13.[Hist_Store_SPK_ID] and
a13.[MRR_Flag] in (‘N’)
group bya13.[Hist_Manager_ID],
a12.[Hist_Store_SPK_ID]

Notice thatthe LU_STOREtableisintheFROMclause.In the WHERE clause,


it joins to the fact table based on the Store_SPK_ID column, which maps to the
hidden Employeeattribute.TheSQLEngine then joins to the historical version
of the lookup table to retrieve the stores and their associated managers. Though
the WHERE clause includes a condition that filters on the MRR_Flag column to
filter,
retrieve
If
could
you
ReportResult
Information
the
run
want
only
SQL
the
torecordsSet
following
view
Engine
both
for
retrieves
with
report:
whichtheMRR_Flag
current
Current
only
andhistorical
historical
and Historical
records
value
information
isManager­Store
fromthetable.
“N”. Because
for managers,
ofthisyou

Since you want to view current and historical information, the query needs to
access the LU_HIST_STORE table, which contains bothcurrent and historical
store assignments for each manager. Therefore, the template contains the
Historical Manager and Store attributes. If the MRR Flag is also used on the
template, then you can further distinguish between the current and historical
records. The SQL for this report looks like the following:
select a13.[Hist_Manager_ID] AS Hist_Manager_ID,
max(a13.[Hist_Manager_Name]) AS Hist_Manager_Name,
a13.[MRR_Flag] AS MRR_Flag,
max(a13.[Hist_Store_Name])ASHist_Store_Name,
a12.[Hist_Store_SPK_ID]ASHist_Store_SPK_ID,

sum(a11.[Sales])
from
[LU_STORE]a12,
[FACT_STORE_SALES]a11,
ASWJXBFS1

[LU_HIST_STORE] a13
where a11.[Store_SPK_ID] = a12.[Store_SPK_ID] and
a12.[Hist_Store_SPK_ID] = a13.[Hist_Store_SPK_ID]
group bya13.[Hist_Manager_ID],
a12.[Hist_Store_SPK_ID],
a13.[MRR_Flag]

Notice thattheLU_STOREtableisinthe FROM clause. IntheWHEREclause,


it joins tothefacttablebasedontheStore_SPK_ID column,whichmapstothe
hidden Employee attribute. The SQLEnginethenjoinsto the historical version
of the lookup table to retrieve themangersandtheirassociated stores. Because
no filter exists,theSQLEngineretrievesallof the manager records from the
table, bothcurrentandhistoricalstoreassignments.

Youshouldusethe LU_HIST_STORE table to retrieve current information


only when you also want to retrieve historical information. If you want to
viewonly current information,you should use the LU_CURR_STORE
These
Using
buildtable
even
reports
moreas
more
advanced
itprovideafew
isa
complex
smaller
reporting
reports
table.
examplesof
functionality,
to compare
SCDshistorical
like
analysis
conditional
usinga
sales tometrics,you
hidden
currentattribute.
sales.
can

Formore informationon conditionalmetrics and othertypesofadvanced


metric functionality, see the MicroStrategy Desktop: Advanced Reporting
course.
Analysis of All Store Sales

The hidden attribute solution provides a logical and physical separation


betweencurrentand historical datathat makesit easy for users to understand.
By defining both historical and current attributes inon the Sales Organization
sales
What
generated
hierarchy,
byeachstore
if yourusersalso
bythe
users can
various
regardless
easily
want
managers
place
tobeableto
of thecurrent
specific
fortheir
attributes
run
or
currentand
areport
previous
athatdisplaysthe
report
managerfora
previous
to view
stores.
the
store?
total
sales

of
The LU_HIST_STORE
their managers,both current
table contains
and historical.However,
records that associate
if you stores
run areport
with all
that

displays only the Historical Store and Sales, the result looks like the following:

Report ResultSet with Historical Store Attribute

The report displaysmultiplerecordsforMetroSouthandWestintheresultset


because multiple records exist for them in the LU_HIST_STORE table. The
LU_HIST_STORE table stores a record for each store that has been managed
by one or more managers. As a result, given the structure of this table, the
HistoricalStore
accommodate
in thethatuserscannot
cannot
availabletousers
TheLU_STORE
reportdisplay.Thesame
satisfythis
“versions”
attribute
table
accessitfor
and
requirement.
prevents
cannotgroup
modifythe
(thelookup
table
areports.
simple
structure
First,
LU_STORE
the
tablefor
sales
the
grouping
Eventhatenables
attribute
thehidden
intoa
if youmake
tableto
bysingle
store.
ishidden,
includethe
the
attribute)
thisattribute
row
attribute
foreachstroe
which
also
store
to
means

names along with the IDs, you still get a result set that does not group the
records
all
sales
records
intoa
for(historicaland
each
singlerowforeach
store. current)
store.
tothefacttable,
Because the LU_STORE
it containstable
multiple
relates

If you want todisplay the sales by each storeregardless ofthemanager, you can
to
modify
following:
achieve
thesucha
structure
report.
ofthe
ToSales
resolve
Organziation
this requirement, youand
hierarchy needtodothe
underlying tables

1 Modify theSales Organization hierarchy to include a new attribute that


relates to all of the records for a store, historical or current.

2 Modifythe Sales Organization schema to accommodatethe newattribute.

3 Create the new attribute.

First, you can modify the Sales Organization hierarchy to look like the following:

Modified Sales Organization Hierarchy


Since the LU_STORE table contains all of the records for each store, you can
use the Store attribute to join to the fact table and get the total sales for any
store regardless of the manager. The All Store attribute relates all of the records
for a single store, so that you can group them together in a report result set.

After modifying the hierarchy, you need to create a lookup table for the All Store
attribute and modify the LU_STORE table so that you can relate the All Store
and Store attributes. The LU_ALL_STORE and LU_STORE tables look like the
following:
Lookup Table for AllStore Attribute
The LU_ALL_STORE table contains a single record for each store.Therefore,
you can use it as the lookup table from which to pull store information ifyou
want to groupsalesby each store, regardless of manager. You modify the
LU_STORE table to include the ID column from the LU_ALL_STORE table.
This foreignkeyrelates the two tables.

After modifying the hierarchy and schema, you can create the All Store attribute
and relate it to the hidden Store attribute as follows:

Mapping ofAllStoreAttribute
The All Store attribute maps to the All_Store_ID and All_Store_Name columns
in the LU_ALL_STORE table aswellas theAll_Store_ID column in the
LU_STORE table. You need toaddthe hidden Store attribute as a child of the
All Store attribute with a one-to-many relationship.

Now, if you want to view the total sales for each store regardless of the
managers, you can build a report that looks like the following:

Report Result Set with Total Sales by Store


The report displays the total sales by each store regardless of which managers
were responsible for the store. The SQL for this report looks like the following:
select a12.[All_Store_ID] AS All_Store_ID],
max(a13.[All_Store_Name]) AS All_Store_Name,
sum(a11.[Sales]) AS WJXBFS1
from FACT_STORE_SALES] a11,
[LU_STORE] a12,
[LU_ALL_STORE]a13
where a11.[Store_SPK_ID] = a12.[Store_SPK_ID] and
a12.[All_Store_ID] = a13.[All_Store_ID]
group by a12.[All_Store_ID]

Notice that the LU_STORE and LU_ALL_STORE tables are in the FROM
clause. Inthe WHEREclause,thequeryjoinsto the facttablebasedonthe
Store_SPK_ID column, which maps to the hidden Store attribute. Then, it joins
the StoreattributetotheAllStore attribute basedontheAll_Store_ID column
that relates the two attributes.

Denormalizing
A
contain
final alternative
the versioned Fact Denormalization
Tables
for implementing
attribute. SCDs isto denormalize
means introducing
the fact tables that

redundancy intohow data is stored.

For example, the original structure of the FACT _STORE_SALES table looks
like the following:

Original FactTableStructure
The FACT_STORE_SALES table stores only the store and date IDs, so the sales
data is available only by store and date. Performing “version” analysis using this
table is difficult because it does not contain manager information. Since a
manager can be assigned to different stores at different times, you have no way
of knowing which store is associated with which manageronagivendate.

You can denormalize the fact table to include not only the lowest level attribute
(in this case, Store), but also the higher-level attribute (in this case, Manager as
follows:

Denormalized Fact Table Structure


With both the manger and store IDs present in thefact table, you have moved
the
nowrelationship
capturing the
in sales
whichbythe
manager,
changesstore,and
occur intothefact
date.Storing
tablethe
itself.You
factat the
are

manager and store levels removes the need to incorporate SCDsintothe lookup
tables themselves.

The fact table is larger if manager information is stored in it, but this
denormalization does provide a less complex answer to SCDs. Depending on the
volume of data in your fact tables and the way in which data is captured in the
source
If system,
you wantusersto
this optionseemayor
different
mayattributes
notbe viable
on reports
inyourown
depending
environment.
on which

“version”
tablesto
example,include
inthescenario
ofanattribute
separatethey
IDcolumns
are viewing,you
for current
canfurther
and historicalvalues.For
denormalize fact

above, you could add Curr_Manger_ID and


Hist_Manager_ID columns to the table instead of a single Manager_ID
column.
Lesson Summary

In this lesson,youlearned:

• SCDs or versioning refers to the process of tracking and analyzing attribute


relationships that change over time.

• As Is vs. As Is(Type I)is a typeof SCDs that involves analyzing all data in
accordance with the attribute relationships as they exist currently.

• As Is vs.AsWas(Type II)isa typeof SCDs that involves analyzing all data


in accordance withthe attribute relationships as they exist currently and as
they existed historically.

• Like vs. Likeisatype ofSCDs that involves analyzingonlydata records that


exist and are identical in all versions of the lookup and relationship tables,
both historical and current. Itisalso referred toas comparable analysis

• As
accordance
Was vs. AsWasisatype
withthe attributerelationships
of SCDs thatinvolves analyzingdata only in
as they existed historically.

• You can implement SCDs by creating a life stamp, creating a hidden


attribute that relatesto both current and historical values,or by
• With
denormalizing
a life stamp
thesolution,
associatedfact
you includea
tables. start date and enddate in the

lookup table that indicate the time period for which specific records are
date
valid.rangestodetermine
When you run reports,
howyou
data
can
isaggregated.
then use filters with theappropriate

• With a hidden attribute solution, you have multiple lookup tables that store
• joins
A
current
final
betweenthe
alternativefor
and historicalvalues,and
lookup
implementing
tables and
you
anyfact
SCDsis
use thehidden
to
tables.
denormalize
attribute
thetofacilitate
fact tables

that contain the “versioned” attribute. You can denormalize the fact table to
itself.
include
moving the lowest
relationship
levelattribute
in whichthe
aswell
versioning
asthe higher-level
occursintoattribute,
the facttable
thus
DATA WAREHOUSE OPTIMIZATION

Appendix Description

This appendix describes various database-level and application-level


optimization strategies, including aggregation, partitioning, and indexing.

In this appendix, you willlearnabout each of the these concepts and their
impact onreport analysis.Youwill learn about recommendations for
implementing aggregation, partitioning, and indexing in your data warehouse to
optimally supportyour MicroStrategy reporting environment.
Review of Aggregation Concepts

After completingthis topic, you willbeableto:

Define base and aggregate fact tables and describe the purpose of pre-
aggregating information in the data warehouse.

Aggregation, the summarization of fact data, always takes place in a data


warehouse.You
aggregate dataaheadcan aggregate
oftime,which
dataon
is known
thefly whenyou run aquery,or you can
as pre-aggregation.
When you
pre-aggregatedata, the resultisstored at a summarized level,so you can
retrieve it later when you run a query. When you determine an aggregation
strategy for your data warehouse, you are deciding when to pre-aggregate data.

records
A
aggregated
base
SalesBase
facttablestores
transactions.
tothelowest
FactTable
For
fact
levels
example,
data
ofthe
atthelowest
thefollowing
Customer,
levels
Time,
baseatfacttable
and
which
Locationhierarchies:
a source
shows system
data
The Sale_Amt and Txn_Qty fact data are stored at the level of Store, Customer,
and Date, which are the lowest-level attributes in each of the respective
hierarchies. Depending on the type of query you run, the database can either
select
the records
following
Querying
the templates:
recordsdirectly
onthe
Againstthe
fly.For from
example,
SalesBase
theFACT_SALES
youcouldruntwo
Fact Tabletable,different
or ithasto
reports
aggregate
with the

The first report requests dataat the samelevel at which it is stored in the
FACT_SALES table. Toretrieve the result setfor this report, the query only
needs to select the desired records fromthefact table.

The second report contains the State attribute, so it requests data at a higher
level than it is stored in the FACT_SALES table. To retrieve the result set for
this report, the query must aggregate all of the store records in the
FACT_SALES table into their corresponding states. This calculation is done on
the fly since the data is not stored at the state level in the fact table.

always
time
If youresults
often
hastodothis
run
in a reports
morecalculation
complex on thefly.
likethesecond
query, longer
onePerforming
inprocessing
your environment,
this
time,
aggregation
andthe
more
database
at run

resources allocated to processing the query. Allof these factors can degrade
performance.
To optimize performance, you could build an aggregate fact table, which is
simply a fact table where the data is pre-aggregated and stored at a higher level
for one or more hierarchies. This type of table is also called a summary table.
the
To illustrate
secondreport:
Querying this,you
Againstcouldbuildan
the SalesAggregate
aggregateFactTable
fact tablethat couldbeusedfor

The first report stillselectsrecordsfrom the original base fact table,


FACT_SALES, but the secondreport no longer has to aggregate the records in
the FACT_SALES table. Instead, it can select records from the aggregate fact
table, FACT_SALES_STATE. The data inthis table is pre-aggregated to the
State level.

You build aggregate fact tables in your data warehouse. To use them in a
MicroStrategy project, you add them to your project’s warehouse catalog. The
SQL Engine references the logical table size of each table in your project to
know when it is appropriate to use an aggregate fact table.

Formoreinformation on adding aggregate fact tables toa MicroStrategy


projectandcalculating logical table size, seethe MicroStrategy Architect:
Adding
AdvancedProjectDesign
aggregatefact course.

tables to a project is easy. The harder question is knowing


when to build aggregate fact tables in the first place.
Aggregation Guidelines
QueryProfileAttributeRelationshipVolatility

Compression Ratios

After completing thistopic, youwillbe ableto:

Define guidelines forbuilding effective aggregate fact tables in a data


warehouse.

When you judiciouslybuild aggregate fact tables in your data warehouse, pre-
aggregating dataincreases query performance.
used However, building aggregate fact
tables thatare notusedat all, are infrequently, or require alot of
aggregate
maintenancefacttables
cannegatethe benefits of pre-aggregation. Insuchcases,
can be ineffective or time consuming to maintain.
Therefore, it is important to devise an aggregation strategy that takes into
account critical factors for determining whether an aggregate fact table is
•necessary.You
Queryprofile
Attributerelationship
Compression shouldconsiderthe
ratios volatility following factors:

Query Profile
It only makes sense to build, store, and maintain aggregate fact tables that users
will frequentlyquery whenrunningreports. Youmay have a data warehouse
comprised of 15 hierarchies, all of which contain multiple levels of attributes.
However, if youhavecertain hierarchies where users rarely query higher-level
attributes, thecostof occasionally aggregating data for those hierarchies is less
than storing and maintaining aggregate fact tables that are seldom used.
In some cases, determining that queries rarely access certain tables is very
straightforward. For example, you have the following data model:

Query Profile for SalesDatafor Customer andTime Hierarchies

In the image above, users run many reports that containthe Year attribute, so
data is often aggregated to the highest level of the Time hierarchy. They seldom
run reports where they view customer information above the Customer State
level, so data is rarely aggregated to the highest level of the Customer hierarchy.

Given this query profile, you can easily conclude that building a FACT_SALES
table aggregated to the Year level is worthwhile because users frequently query
data at this level. Though, you do not need to pre-aggregate data for the
Customer hierarchy beyond the level of Customer State. Users seldom run
queries
level
region
performance.
tothe
data,
at the
Customer
youcan
levelofeasily
Customer
Regionlevel
aggregate
Region.
without
data
When
on
a significant
thefly
they do
fromthe
need
decline
toCustomer
query
in query
customer
State

Sometimes, recognizing that users neveraccess data at a particular level is not


so obvious. For example, using the same data model, users may also frequently
query dataat the Statelevel:
Query Profile for Sales Data for Location Hierarchy

In this example, you need to take a closer look at the actual reports that users
are running and the tables these reports use. Specifically, users are interested in
analyzing sales data at the State level, but they often just want to see sales data
for new stores or old stores. This type of query requires qualification on the New
attribute that is related to the Store attribute. You can access the New attribute
only through the LU_STORE table.

Even though users want to view the data aggregated to the State level, accessing
this information from a FACT_SALES table pre-aggregated to the State level is
aggregatefacttable
store
possible
sales
the
aggregate
storedatain
informationjust
sales
only
thedata
make
if theywant
the
upthe
toat
base
the
for
the
bulk
State
fact
new
tosee
Statelevel.
oftable
storesor
level.
analysis,
allstore
toSince
satisfythe
oldstores,
queries
sales.If
reportswould
filter
usersrun
that
thoseconditions.
qualify
largelyignore
reports
reports
onhavetoquery
oldornew
Then
that
anrequest
they

Attribute Relationship Volatility


Another factor that should influence decisions on when to build aggregate fact
tables is the volatility of parent-child relationships between attributes in a given
hierarchy. At one end of the spectrum, you have hierarchies like Calendar Time
in
The
date
relationshipsbetween
remain
You
infrequent.
whichthe
monthof
canalso
of October4
thesame
Forexample,
relationships
havehierarchies
January
over
is always
time.
attributes
isalways
inaFiscal
between
part
wherechanges
of
part
such
the
parent
Time
ofthefirstquarter
as
monthof
Quarterand
hierarchy,
andchild
in October
relationships
attributes
the
Month
foranygiven
relationships
inanygiven
or
occur,
remain
Monthbutthey
year.The
year.The
between
static.
andDate are

attributes aredependent onhow a company organizes its fiscalyear. These


relationships could change ifthe company decides to shift from a fiscal year
starting at the beginning of June and running until the end of May to ofa fiscal
probablyhappen
Other
involve
in
year
September.
thestarting
FiscalTime
hierarchies
organizational
Such
at the
very
aare
hierarchy.
shift
beginning
more
infrequently,so
attributesrelated
affects
dynamic
However,
ofOctober
howdata
bythe
changesto
nature.
to
rolls
and
relationships
products
running
up
Infrom
particular,
thecompany’s
andservices,
untilthe
childto
arestill
hierarchies
parent
end
relatively
fiscal
managerial
attributes
calendar
that
static.

and
to
have
beDistrict­Store
employee
avolatile,andthe
Location
structures,
hierarchy
Relationships
relationships
geographicdivisions,
that lookslike
changethefrequently.
following:
orcustomer
For example,
demographics
youcould
tend

The
the Western
NorthernVirginia
Virginiadistrict
districtcontains
consistsstores
of stores
in Winchester
in Arlingtonand
andManassas.If
Fairfax, while
you build an aggregate fact table at the District level, sales for Arlington and
Fairfax stores roll up into the Northern Virginia district, while sales for the
Manassas and Winchester stores roll up into the Western Virginia district:

Aggregation of Sales Data Based on District­Store Relationships

In the illustrationabove,onlythe columns that are used in the example are


included in the sample data for the FACT_SALES and
FACT_SALES_DISTRICT tables. The actual tables would contain all of the
in
columns
this lesson
referenced
that use
inthe
theDistrict-Store
schema. For all
relationship,
of the aggregation
thesameillustrations
subset of

columns is shown in the sample data.

The configuration of stores into districts can change frequently as new stores
open, old stores close, and some stores shift to new districts. For example, the
December
previous district-store
2012, the company
structureis
changes
in place
theirthrough
district-store structure asInfollows:
November2012.

Revised District­StoreRelationships
The Arlington store has closed,soit no longer exists as part of the Northern
Virginia district. A new store hasopenedin Harrisonburg, so it has been added
to the Western Virginia district. Also, the Manassas store has been moved from
the Western Virginia to the Northern Virginia district. If you look at December
2012 data for the aggregate fact table built at the District level, sales for Fairfax
and Manassas stores now roll up into the Northern Virginia district, while sales
for the Winchester and Harrisonburg stores roll up into the Western Virginia
district.

Aggregation of Sales Data Based on Revised District­Store


Relationships
every
For
howattributes
the
time
store-level
the like
configuration
District
data rolls
and
of
upStore,
districts
into districts
where
and the
stores
and
relationship
recalculateisthe
changes, you very dynamic, in
havetorekey
information

the aggregate fact table. When these changes occur frequently, they consume
time and resources, and they significantly add to batch processing time and
complicate batch processing routines. Therefore, attributes with volatile parent-
child relationships are often not the best choices for inclusion in aggregate fact
tables since the maintenance overhead can outweigh any performance benefits,
especially
window is ifanthetables
issuein your
involved
database
arelarge
environment.
or the length ofyour batch processing

Using Materialized Views for Aggregate Fact Tables

Although attribute volatility isdefinitely an important factorto consider when


determining whichaggregate fact tables you should build inyour data
aggregatefacttables
tables
warehouse,
areseparate
frequent
from
as
changes
physical
the base
areanissue
tablesinthe
fact table,
preciselybecause
the
data
database
warehouse.
administrator
youoftenbuild
Becausethese

manually defines and maintains the procedures that determinehow data from a
lower level is pre-aggregated to a higher level. Volatility is a problem because of
the time and effort involved in updating these procedures to rebuild aggregate
fact tables when changes occur.

Depending
database vendors
onthe (forexample,
database platformyou
Oracle®, use
DB2,foryourdata warehouse,some
Sybase®) provide database-level
functionality that enables you to take advantage of the benefits of pre-
aggregation even for volatile attributes. This functionality is referred to as
materialized
the
terminology.Instead
materializedviews,
basefacttable.
viewsofMaterialized
although
ofbuilding
thebase various
facttable
views
separateaggregate
databasevendors
work
thatjust
are like
populated
fact
a regulartable
may
tables,you
atthe
use different
same
view
build
time as
except

primary
the
actsdatabaseadministrator
Besides
levelofthebasefact
asanaggregate
not
benefits
having
of to
materialized
fact
maintain
tableto
table.
can separate
definethe
thelevelof
viewsishow
aggregate
logicthatis
thematerialized
they handle
factused
tables,
changes
toview,
one
rollupdatafrom
of
whichthen
in the

well.
data
aggregation.
from
Therefore,
thebase
Asrelationships
youfact
canuse
tableto
materialized
between
the materialized
attributes
views to
view
change,
remove
automatically
the
thelogic
maintenance
forrolling
changesas up

overhead generally associated with creating aggregate fact tables for volatile
attributes.

Compression Ratios
two
base
Whenfacttables,the
facttableto
youbuild aggregatefact
producea single
tables,you
parent record.
aggregate
Forasetof
example,
child
inthefollowing
records in the

two records for the Manassas store are rolled up into a single
record for the Northern Virginia district.

Aggregation of Child Records Into a Parent Record


Manassas
Both of thestore
table. December
comprise
7 transactions
only a single
forrecord
Sam Elsinthat
inthe FACT_SALES_DISTRICT
occurred at the

When you pre-aggregate data, the average number of child records you combine
to create a single parent record is the compression ratio between the two
attributes involved. The size of the compression ratio provides an effective way
to measure how much an aggregate fact table reduces the number of records
building
that mustbe
aggregate
read tofact
satisfyqueries
tablesistoreduce
thataccessthe
the number
table.
of records
The primary
a query
reason
hasto
for

access, thereby decreasing query time. Therefore, pre-aggregating data is only


cost effective if the compression ratio is significant.

You
that
the Location
Attribute
calculate
exist foreach
hierarchy
compressionratiosusing
Cardinalities
attribute.
are asFor
for
follows:
Location
example,the
theHierarchy
cardinality,
cardinalities
or numberof
oftheattributes in
elements,
There are 5 regions, 20 states, 30 districts, and 3000 stores. In an environment
where users routinely query data at each level in this hierarchy, you need to
consider building aggregate fact tables. Given their respective cardinalities, the
compression ratios between attributes are as follows:

Compression Ratios Between Attributes in Location Hierarchy

The compression ratios between Region and Store and District and Store are
both very large. However, the ratio between District and State is small. An
aggregate fact table at the Region level would average 1 record for every 600
records in the base fact table. An aggregate fact table at the District level would
average 1 record for every 100 records in the base fact table. Though, an
aggregate fact table at the State level would average 2 records for every 3
records in an aggregate fact table at the District level—not a significant
difference in table size. The State-level aggregate fact table would be almost as
big as the District-level aggregate fact table, so it would not significantly reduce
the number of records being queried. Plus, you would have the additional
storage space and maintenance on the State-level aggregate fact table.

In this example,given thatusers frequently querydata athigher levelsin the


Location hierarchy, the most effectiveaggregation strategy wouldbe tobuild
aggregate fact tables at the Region and District level. You could easily aggregate
queries at the State level on the fly using the District-level table. This table still
aggregate
from
provides
thebasefact
afact
significantincrease
tabletable,yetit
at the Stateavoids
inlevel.
performance
the maintenance
overaggregating
andoverheadof
data onthe
an fly
Review of Partitioning Concepts

After completingthis topic, you willbeableto:

Describe the purpose of partitioning tables in a data warehouse and define


server-level and application-level partitioning.

Partitioning is the division of a larger table into smaller tables. You often
implementpartitioning
reducing thenumber of inadata
records thatqueriesmust
warehousetoincreasescantoretrieve
query performance by
a result set.
You can alsouse partitioning todecreasethe amountoftime necessary to load
data into warehouse tables and perform batch processing.

•There
Serverlevel
Application
aretwobasic
leveltypes of partitioning:

Two Types of Partitioning


Server-level partitioning involves dividing one physical table into logical
partitions in the database environment. The database software handles this type
of partitioning completely, so these partitions are effectively transparent to
MicroStrategysoftware.
only hastowrite
logical partitionsare
SQLused
against
Sinceonlyone
toresolvethe
asingle table,andthedatabase
physical
query. tableexists, themanageswhich
SQLEngine

separate,
Application-levelpartitioning
into smaller
smaller
tablesinthe
physicaldatabase
tables
involves
calledpartition
itself.dividing
Then,thebase
one
application
large
tables.
tableinto
You
thatsplit
several
thetable

is running
queries against the database (in this case, MicroStrategy) manages which
partitions are used for any given query. Since multiple physical tables exist, the
application-level
tables
SQL EnginehastowriteSQL
areneededpartitioning
toretrieve theresult
againstdifferent
forfact tables
setfora
through
tables,
query.
one
dependingon
MicroStrategy
oftwo methods—
which
supports

warehouse partition mapping or metadata partition mapping.

Formore
partition
Design course.
mapping,
information
seeon
thethe
MicroStrategy
differences between
Architect:
warehouse
Advanced
andmetadata
Project
Advantages of Application­Level
Partitioning
PartitioningbyMultipleHierarchiesDifferencesinPartitioningLogicReducedTimetoRead

Partitions
Implementing Application­Level Partitioning

After completing thistopic, you willbeableto:


of
Describe the advantages application-level partitioning.

Because server-level partitioning is supported by database vendors to varying


degrees and can be configured and maintained completely external to
MicroStrategy, many companies that require partitioned fact tables choose to
use only server-level partitioning to support those requirements. There are
some limitations that existwhenusing server-levelpartitioning thatmaymake
application-level partitioning more effective toimplementin certain
circumstances. Specifically, application-level partitioning can provide the
•following
Abilityto
Lesstimeinvolvedto
without
Logictodetermine
advantages
requiring
partitionover
afilteron
awhich
single
scanphysical
server-level
partitionsmust
facttableby
aspecificattribute
partitioning:
partitions
multiple
be accessed
ratherthan
element
hierarchiesor
foralogical
givenquery
dimensions
partitions

Partitioning by Multiple Hierarchies


Some
dimension.
database
For platforms
example, youhave thetofollowing
enable you partitionfacttable:
onlyby a single hierarchyor

Sales Fact Table with Three Hierarchies Represented


The FACT_SALES table contains columnsthat represent three
hierarchies—Time (Date_ID), Location (Store_ID), and Customer (Cust_ID).
In this example, you could choose to partition the FACT_SALES table by Month
(a Time attribute) or by Customer State (a Customer attribute). However, you
may not be able to partition the fact table by both attributes when using server-
Differences
level
partitioning in Partitioning
partitioningsincethey
wouldenableyou arefrom
todoso. Logic
different hierarchies. Application-level

A secondpossible drawbackof server-level partitioning is how the database


scans the logical partitions. With MicroStrategy’s application-level partitioning,
depending on the type of partition mapping you use, the SQL Engine checks
either the warehouse partition mapping table or references the metadata
partition mapping object to determine which partition tables it must access for
any givenquery.Somedatabase platforms donothavean equivalent type of
logic builtintotheirpartitioning functionality.Therefore,they requirethat
reports be filtered by specific elements of the partitioned attribute to be able to
determine which logical partitions to access for the query. Without such a filter,
aand
the
For
full
first
example,you
Report
you
scan
place.
run
ofallthe
Without
a report
havea
partitions
Partitioning
likethe
FACT_SALES
following:
occursasif
Attributein
table
thepartitioned
logicalpartitions
Filter by the neverexisted
Month attribute,
in
This report contains a filter for customers whose last name begins with “W” and
for sales occurringbetween March 1 and March 15, 2012. Since the
FACT_SALES table is partitioned bytheMonth attribute, whichcomesfrom the
Time hierarchy, the filter on the Date attribute is key in this example.

With application-level partitioning, the MicroStrategy SQL Engine would be


able to determine based on attribute relationships that the Date attribute is a
child of Month. From that relationship, the SQL Engine would be able to
or
determine
table
metadata
Engine
implemented)
(ifwarehouse
wouldthen
partition
thatthe
todetermine
datesincluded
mapping
eithersend
partitionwhich
object
mapping
aprequery
inthe
partition
(ifmetadata
were
filter
tothe
implemented)
contains
arepartof
partition
warehouse
thedatafor
March2012.
mapping
partition
reference
March
were
mapping
The
the
2012.
SQL

The SQL Engine would easily be able to determine that all of the information it
needs for the queryisinthe FACT_SALES_MAR partition. TheSQL Engine
only
With
would
different
warehouse
partition
server-level
generate
scenario
database
accessedbythe
theSQLforthereport
partitioning,
may
platform
occur. If
query.
doesnot
depending
the partitioning
againstthis
contain
ontheanequivalent
functionality
database
table, anditwouldbethe
platform,
logic
of thefor
data
a very

cannot
determining
necessarily
aheaddetermine
oftimewhich
themonthto
partition toaccessfor
which the datesinthe
this query,thedatabase
reportfilter
belong. The database can only “know” that the FACT_SALES_MAR partition is
the correct table if you include a specific month in the report filter, such as
March 2012.

Since
ID, thethe
database wouldexample is basedondates ratherthan aspecific month
filterinthis
have no way of determining that the desired data is in
the FACT_SALES_MAR partition. As a result, the database would scan all of
the logical partitions to resolve this query. This action is the same as doing a full
partitions.
table scanon an unpartitioned table,soitnegates any benefits to havingthe

You
filter
filteris
database
canobservethesamebehavior
issetto
based
scans
somethinglike
onallsome
partitions.
elementoftime
CurrentMonth
withserver-level
otherthan
or Current
a specific
partitioning
Week.month
When
ifthe
ID,
the
the

If you modify the report to filter on March 2012, both application-level


partitioning
one partition.and server-level partitioning would yieldthe same result—ascanof

different
This behavior
versions
in the
of partitioning
Oracle. The logicissupposed
logic hasbeen observed inseveral
to have been fixed in
Oracle 9i®, Release 2.

Reduced Time to Read Partitions


A final advantage that application-level partitioning affords over server-level
partitioning
when runningisareduction
aquery. Byits
inthe
verynature,
amount oftime
application-level
ittakes to partitioning
read partitiontables
means

that each table in the partition is a physically separate table, so the SQL Engine
writes SQL that reads a much smaller slice of data from a partition table rather
than the original fact table. With server-level partitioning, the partitions are by
definition logical, not physical. Only one table exists in the data warehouse, but
you define logical partitions within the table. For a query to access one of those
logical partitions,there is alwaysthe additionaloverhead ofthetranslation that
is required to find where the data is stored in the table. Simply put, it requires
less time to read a physical partition than a logical one.
Implementing Application­Level Partitioning
If you choosetouseapplication-level
partitionbasetablesinyourdata warehouse.TousetheminaMicroStrategy
partitioning,youbuildeachofthe

project, you either add the partition base tables (if you are using metadata
partition mapping) or the partition mapping table (if you are using warehouse
partition
appropriatepartition
necessary
metadatamapping)to
to
information
determinemappings.
your
forthe
whichproject’swarehousecatalog
partitions
SQLAtEngine
that point,
itneedsto
toquery
MicroStrategy
accessfor
either
andcomplete
thedata
Architect
any given
warehouse
the
hasthe
query.or

course.
Formore
mapping,seethe
informationon
MicroStrategy
configuring
Architect:
warehouse
Advanced
ormetadata
Project partition
Design
Partitioning Guidelines
AttributeRelationshipVolatilityDistributionofData

Table Size and Number of Partitions


Impact on ETL and Batch Process

Define
After guidelinesthis
completing fortopic,
building
youan effective
will beable partitioning
to:

strategy in a data
warehouse.

When you appropriately partition fact tables in a data warehouse, it reduces the
time ittakes torunqueries that access the partitioned tables, and it can make
the loading of data more efficient as well. If you have a poorly designed
partitioning
maintenanceoverhead.
strategy,partitioning canactually increasequery timesand
Therefore, it is important to devise a partitioning
strategy that takes into account critical factors for determining the best way to
implement partitioning. You should consider the following factors:

• Impactofthe
Partitiontablesize
Attribute relationship
Distributionofpartitioning
dataandthe
among
volatility
strategy
numberofpartition
partitions
onthe ETLand
tables
batchprocesses
accessed byqueries

Attribute Relationship Volatility


You learned earlier that attribute relationship volatility is a consideration when
building aggregatefact
determining attributes that
tables.
areFor
good
thecandidates
same reasons,it
for partitioning
can poseproblems
fact tables.when

Generally, if relationships between attributes in a given hierarchy are likely to


change, thathierarchy is probablynotawise choiceby which to partition. As
attribute relationships change, you have to redefine the partitions to account for
these changes. As a result, maintaining partitions that are based on a volatile
attribute can become very time consuming to the point that maintenance
requirements outweigh any performance benefits.

Often, youpartition facttables eitherby some unitoftimeor by an


organizationalattribute.

Partitioning by Time Versus Organization

By definition, relationships between time attributes are static, and therefore,


volatility is not an issue. However, organizational attributes may be static or
dynamic. They can be based on various types of organizational components like
customer location, product categories, geographic location, or company
hierarchy. Partitioning by an organizational attribute can be very beneficial
since it reflects the company’s business structure and generally encapsulates
some of the elements users may be most interested in analyzing. If relationships
between organizational attributes change infrequently, they can be excellent
candidates for partitioning. Though, if you have organizational attributes that
are dynamic, they can be problematic to use for partitioning.

For example, theLocation


structuremaychange frequently
hierarchythat
in certaintypesof
definesa company’s
business. organizational
Partitioning by Volatile Organizational Attributes

In this to
switch example,
new districts.
each month,
Eventhough
new stores
this open,
hierarchy
old ones
reflects
close,
theand
company
existingones

structure, because that structure is subject to frequent changes, maintaining


partitions based on this hierarchy would be time consuming. So while
organizational attributes can be an obvious choice for partitioning, make sure
that the ones you select are not volatile, or the “headache” of maintaining the
partitions may outweigh the benefits.

Distribution
that
with
Another
aneven
important
amount
forof
the distributionofdata
theloadtime Data
factor
of
theacrossthe
data
different
in determining
ineachvarious
partitions
tableyour
partitioned
aids
isfairly
partitioning
paralleltables.
processing
strategy
Devising
byensuring
shouldbe
partitions

equal. Depending on the


attributes you select to partition tables, a natural division may exist that
equalizes the amount of data in each table of the partition, or you may need to
engineer your partitions to force an even distribution of data.
For example, you may choose to partition a fact table either by the Month or
Region attributes.

Even andUneven Data Distribution AmongPartitions

If you partition by the Monthattribute, each of the partitions contains the sales
data for a single month. the12partitions
Becauseeach month contains a nearly equal number of
days, the size of each of isnaturally pretty even. Your company
may experience some variations in sales (for example, promotions that occur in
particular months). However, overall, the amount of data is fairly evenly
distributed among each of the partitions.

If you partition by the Region attribute, each of the partitions contains the sales
data for a single region. In this example, the company has more stores in some
exist
regions
within
thaneach
inothers.
region,
Because
the Northeast
of thevariation
and Southeast
inthe number of stores
regions have significantly
that

more
partitions
The
Northeastand
otherproblemwith
sales
regions.
are
data
naturally
Becauseof
Southeast
thanthe
an uneven
uneven.
Central,
partitions
the data
geographiclocation
West,
distribution
are much
and Pacific
largerthan
isthatload
ofregions.
the stores,
theAs
partitionsfor
athesefive
result,thethe

times can be much


longer for larger partitions than smaller ones. For example, in this scenario,
loading data into the partition for the Northeast region takes considerably
longer than loading data into the Pacific region, which is much smaller. The
disparity in load times hinders parallel processing.

in
If aan
particular
uneven distribution
attribute youwant
of data,touse
youcan
to combinemultiple
partition fact tablesnaturally
attribute results
elements
into the same partition to address variations in partition size.

Creating anEven DataDistribution Among Partitions

In this example, although the FACT_SALES table is still partitioned by the


Region
the
forces
partition.
much
aattribute,
more
smaller
Storing
evenrather
Central,
the
distribution
data
than
West,
from
each
of
these
and
data
partition
Pacific
three
amongsmaller
containing
regions
the regionpartitions.
regions
arecombined
a single
inaregion
single
into
This
element,
partition
a

combined partition is now more equivalent in size to the muchlarger Northeast


and Southeast partitions.

Table
that
access
Another
consider Size ofby
tofactorto and
minimizestable
areaccessed
thesize
resolveaquery. Number
accountfor
partitions
queries.
sizewhilewhen
inalso of Partitions
Ideally,youwant
comparison
determining
reducingthe
totothe
your
numberof
choose
number
partitioning
apartitioning
of
tables
partition
strategy
you must
strategy
tables
isto
For example, you may choose to partition a fact table by both the Month and
Region attributes as follows:

Report Query Accesses MultiplePartitions

The illustration above displays the partitions only for the first 3 months of
the year for a single region since those are the only partitions needed for
the sample report. However, 12 partitions (one for each month of theyear)
would exist for each region.

In this scenario, you have 12 partitions for each region—one for each month.
Partitioning by both the Region and the Month attributes makes each of the
tables much smaller than the original fact table. Also, partitioning by a time
attribute automatically limits the size of each table since you no longer add data
also
to the
tablecreate
size
table
ispartitions
after
desirable,
thatparticular
that
partitioningby
aresotimeperiod
smallanattribute
thatmost
haspassed.
queries
that decreases
Whilereducing
haveto cross
table partitions
sizecan
the

and join data to retrieve result sets.

In the sample report, users want to view sales information for Q1 2012 for the
query
Southeast
smaller
most
frequentlyquery
querieshave
hastoaccess
tablesizereduces
region.atBecause
toaccessmultiple
all
levelsoftime
three
the
each
tables
number
partition
that
to tables
retrieve
are
ofrecords
containsonly
higherthan
and
theresult
then
a query
join
month
1set.
month
hasto
thedataof quarter),
Although
(like
scan,
from
data,the
ifusers
the
each one
to produce the result set. When queries cross tables, the series of joins required
to compile the result set often override any performance benefits that come
from accessing smaller tables.

In this
Report
example,a
QueryAccesses
better strategy
a Single
isto partition
Partitionbythe Quarter attribute:

The size of each partition is still smaller than the original FACT_SALES table.
Though, with the table partitioned by the Quarter attribute, you can now
retrieve the result set for the samereport by accessing a single table, eliminating
the unnecessary and time-consuming joins that occur if you partition the tables
by the Month attribute.

Generally, you want to ensure that you consider not just the table size but also
your query profile when determining which attributes to use for a partition. If
you partition a fact table at a lower level than what is requested by most queries,
having to frequently scan multiple partitions and join data for report results can
negate
Youcan
any performance
use the MicroStrategy
benefitsthat Enterprise
may comeManager™
from reducing
to learnmore
the tablesize.
about

howusers query the data in your warehouse. This application enables you
toview
information,
reportsthat
and reportprocessing.
display statisticsaboutuser actions, object
For more information on Enterprise
Manager, see the MicroStrategy Administration: Application Management
course.

Impact on ETL and Batch Process


A final point to consider when determining your partitioning strategy is to
ensure that partitions do not produce a significant, negative impactonthe ETL
and batchprocess thatis usedtoload source dataintothe datawarehouse.

For example, you do not want to establish partitions that are so complex and so
to
up
farset up partitions
addinghours
removed fromhow
to that
thelengthof
the
aredatais
tedious
the
structured
to
ETLor
back up
batchprocess.
inthesourcesystem
or that unnecessarily
Youalsodo
that
complicate
youend
notwant

the backup routine. Again, you want to find a balance between configuring
partitions so that you achieve a performance gain, while ensuring that you do
not
consuming.
make theloading, maintenance, orbackup ofdatamore complexand time
Overview of Indexing

After completingthistopic, you will be ableto:

Define indexing and describe the purpose of creating indexes in a data


warehouse.

Pre-aggregation and partitioning are both ways to increase query performance


forlimitingthe
by makingqueriesamountof data thatqueries access. Indexing is another method
more efficient, but it increases performance by enabling a
database administrator to order rows in a table so that the database can more
easily find the particular rows that are needed to resolve a given query.

Indexes
based
single
values)on
columnor
tothedata
aredatabaseobjects
keyvalues
multiplecolumns.
valuesthatare
(a primarykey
that enablequick
stored
or
Theyoperate
foreign
intable
key).
access
columns.
byYoucan
storing
totheThe tableona
baseindexes
datain
pointers
database
a (index

administrator specifies thesort order of these pointers when creating the index.
Just asyou might search anindexin a book tofind a pagenumber where
information on a certain topic is contained, when you run a query against an
indexed table, the database searches the index to findthe requested values and
then followsthe pointers tothe rowsinthe table where thosevalues are stored.

update
base
modified,
server.
outweigh
table
Usingthemonthequery
without
anindex
Becauseyouhave
or the
delete
ordeleted,
requirements
antofinda
an
index,
existing
they
but
profile,the
particularrow
toupdate
increase
indexesdo
of
record.
building
the
benefits
the
Provided
consume
time
and
index
ismoreefficient
maintaining
it
ofpointers
you
takesto
queryingtables
diskspace
use indexes
insert
asthan
the
datais
on
index
ascanning
judiciously
the
new
withan
added,
database
itself.
recordor
index
theand
entire
far
Index Types and Guidelines
B­TreeIndexBitmapIndex

Index­Organized Tables
Indexing Guidelines

After completing thistopic,youwill beable to:

Describe various typesof indexes,identify the best uses of each type of index,
and define guidelines for building effective indexes.

The simplest type of index is a primary key index. It is created automatically


when you definetheprimary key fora table. For example, you have the
following LU_STATE table:

Primary Key Index

a
column.
The
result,
State_ID
an
Depending
index
column
is automatically
onisthe
defined
volume
ascreated
and
the primary
degree
basedofkeyfor
onthe
normalization
the
values
LU_STATE
inofthe
a lookup
State_ID
table. As

table, you may choose to define indexes on additional columns beyond the
primary key. If the table volume is high or if the table is highly denormalized,
you may want additional indexes so that values in the tables are easier to find
when joining it to another table.

For fact tables, MicroStrategy recommends that you do not define the
foreign keys that reference the related hierarchies as primary keys.
Therefore, you need to individually define indexes on fact table columns
that you frequently query.

important
When you to needtodefine
understand thevarious
individual indexes
typesofbeyondthe primarykey, itis
indexes, how they work, and when
it is best to implement a particular type. Depending on the type of index you are
using, you can create multiple indexes on lookup and fact tables to more
vendor
are
efficientlyaccess
• fairlyconstant
some
B-tree
Bitmap
Index-organized
toanother,
of the commontypesof
data.The
regardless
but
tables
theconditions
namesfor
ofthe
indexes:
database
theseindex
for whicheach
platform
typesvary
index
you use.
type
fromone
Thefollowing
isbestsuited
database

B­Tree Index
A B-tree index hasastructure somewhat like a family tree. The index begins
with a root. Therootcanbeany record within the table, but it is determined by
a database algorithm to ensure a balance in the number of records on both sides
of the tree. Each row in the table is compared to the root value. Values greater
than the root value are placed to the right in the B-tree structure, and values less
than the root value are placed to the left.

For example,you have the following LU_CUSTOMER table:

Lookup Table for Customer


In the illustration above, only the Cust_ID and Cust_Name columns are
included in the sample data for the LU_CUSTOMER table since they relate
to the index example. However, the actual table would contain all of the
columns referenced in the schema. For all of the B-tree and bitmap index
illustrations in this lesson, only the Cust_ID and Cust_Name columns are
shown in the sample data.

Using
following
B­Tree
a B-tree
structure:
Index
index ontheCust_ID column for these fiverecords results inthe
The database selects Henry Miller as the root, and it assigns an index value to
his record. From there, the database compares each row in the table to the root
first. It looks at Sara Wilder. Her ID value of 12 is smaller than Henry Miller’s,
which
Evans.
database
who hasanis17,
Her
places
IDvalue
ID
soitplaces
value
her record
of25is
of 15.
herrecord
This
togreater
the
valueis
rightof
tothe
thanless
leftofthe
thevalueof
thethan
root.the
The
root.
root
theroot(17),
next
Nextcomes
value
row(17),
is Todd
sothe
but
Natalie
Elliott,

working
the
less the the
Accordingly,
valueforMark
thanalongvalue
thedatabase
tree,
for
Stevens
Natalie
it isplaces histhan
isgreater
19,
Evans
which
record
(25).
isthevalue
greater
The
tothe
database
than
rightof
of Sara
the
places
Wilder
Sara
root his
Wilder.
value
(12).
record
(17)but
Finally,
to the

place
correspondingrecord.
It
left
like
Customer
isof
Attribute
Customer.
most
inthestructure
Natalie
appropriate
hierarchy:
Evans. of
Cardinalities
For example,
As
touse
This
the process
B-tree,
database
for
the
consider
Customer
B-tree
it
continuesfor
assignsanindex
compares
the
index
cardinalities
Hierarchy
foreach
higher-cardinality
each
row
value
ofthe
rowin
ID to
and
attributes
the
thedetermines
table.
attributes
intheits

B-tree indexes would be useful for the lookup tables for the Customer City and
Customer attributes, which both have high cardinalities. Although the tree
structure for an attribute like Customer would be very complex due to the
volume of customers, B-tree indexes are generally lower maintenance and do
not require too much reorganizing or rebuilding unless a significant number of
updates or inserts occur in the table. This lower maintenance cost makes them
more suitable for high-cardinality attributes than other types of indexes. You
should remember that due to the size of these indexes, they require a lot of disk
space.

Bitmap Index
A bitmap index orders the rows in a table using binary strings that are
generated and assigned by the database. The database uses an algorithm to
create thebinarystrings usedfor the index. Each string is the same byte length,
but the patternof each string varies. The differences in the pattern denote the
order of binary indexes. The database determines which patterns are considered
their
smaller
the
values
example,
smallest(or
Bitmap
corresponding
asorlarger.
follows:
using
Index
alowest)
bitmapindex
The
binary
smallest(or
value.
string
Accordingly,
onthe
from
lowest)
sample
lowest
binary
rows
customer
(smallest)
inthe
stringtable
data
references
to highest
orders
are orderedby
therow
(largest).
therowwith For

Since Sara Wilder has the lowest ID value, the database orders this row first in
the table index. From there, the database assigns indexes to each row in the
table working up to the customer with the highest ID value.

Because bitmap indexes require the database to generate and assign a binary
string to each row value, they are best reserved for use on lookup tables for low
cardinality attributes, such as some higher-level attributes, flags, and status and
type indicators. For example, consider the cardinalities of the attributes in the
Customer hierarchy.

Attribute Cardinalities for Customer Hierarchy

You should use bitmap indexes only on lookup tables forthe higher-level
attributes like Customer Region and Customer State. The cardinality of the
Customer City and Customer attributes is so high that the resources involved in
building and maintaining the bitmap indexes would impose a burden on the
database server that outweighs any performance benefits you could derive from
the indexes.

Index­Organized
Another
the
verydataisactually
useful
indexing Tables
for factalternativeistocreate
tables
storedor
inthe
lookup
physicaltable
tablesfor
an index-organized
high-cardinality
in indexorder.This
table,
attributes.
methodis
oneinUsing
which

the FACT_SALES table as an example, you could set it up as an index-organized


table as follows:

Index­Organized Sales Fact Table


All three foreign keys—Store_ID, Date_ID, and Cust_ID are included in the
index order for the table along with the fact columns. The order of the index is
Store_ID, then Date_ID, and then Cust_ID. Essentially, the index structure and
table structure are one and the same in the database.

warehouse:
The howthe
and followingsamedata
illustration
is stored
showssamplesource
inan index-organized
dataforthe
tablein
FACT_SALEStable
thedata

Index­Organized Sales Fact Data and Sample Report

In the illustration above, only the columns in the FACT_SALES table that
are used in the report are included in the source and index-organized
sample data. However, theactual table would contain all of the columns
referenced in the schema.

Notice the order of the records in the source system table. Inthe data
warehouse, the FACT_SALES table is index organized. The physical table
actually stores the data with each record ordered by Store_ID, Date_ID, and
Cust_ID, since that is how the index organization is defined. If you run a report
Indexing
for
facilitating
in dataGuidelines
theindexedorder.
salesdatafrom
databaseserver.
retrieval
Defining an index-organized
the FACT_SALEStable,
from large tables without
thedata
tableisan
placinganundue
is retrieved
excellent
anddisplayed
solution
burdenfor
on

Various types of indexes can be effective depending on the size of a table, and
choosing the most appropriate type of index is certainly an important step in
setting up an index that you can efficiently use and maintain. Regardless of the
type of indexyouselect foragiventable,the following are some general points
to considerwhenyou build indexesonanytable:

• Table joins

• Degreeof denormalization

• Frequently
Number ofindexes
filtered elements
ona single table

• Disk storage configuration

Table Joins

If you are building an index for a table, you should pay close attention to how
that table is joinedto other tables when users run reports. If you examine the
SQL being generated for queriesand you noticethat certain columns in a table
are frequently used to join that table to other tables, you may want to consider
building an index onthat column.Thisway, the database can more quickly
locate the data used in joins.
Degree of Denormalization

If you haveavery large tablethatishighly denormalized, it contains not only a


primary keybut various foreignkeysaswell.Since you denormalize tables to
reduce the number of joins to higher-level lookup tables, you want to ensure
keys.
that maywantto
lookup
you queries
tablescan
as access
quickly
consider
higher-level
asindexing
possible.someofthe
attribute
Therefore,
information
when
more frequently
tables
from
aredenormalized,
thelowest-level
queried foreign
Index
A

aggregate fact table


aggregation

attribute relationship volatility


compression ratios
guidelinesqueryprofile

analytical
many­to­many
capability relationships

application­level partitioning

implementation

attribute

hidden

attribute expressions

logical views

attribute relationships

direct
indirect
review
attribute roles
automatic attribute role recognition

base fact table


bitmap index
B­tree index

cardinality
common child attribute

many­to­many relationships 1, 2

common table expression


common tableexpressions
completely denormalized snowflake
completely normalized snowflake 1, 2
complex attribute andfactexpressions

logical views

compound child attribute

compression
many­to­many
ratio relationships

creating anewlogical table


creating logical views

data type
ID columns

database partitioning
database view
denormalization

versioning

denormalizedfacttable
derived table expression
derived table expressions
differences

derivedandcommon table expressions

dimension
logicalviews
tables

direct attribute relationship types


direct attribute relationships
distinct lookup tables

logical views

examples oflogical views usage


examples of many­to­many relationships
explicit table alias

fact expressions

logical views
fact table

logical key setting

fact table volume

hidden attribute 1,2


hierarchy

ragged
recursive
split

index

bitmap
B­tree
types

indexing

denormalization
guidelines
table joins

index­organized table
indirect attribute relationships

joint child
L

logical key

fact table

logical table

creating

logical views
tables
view

additional examples
complex attribute and fact expressions
creating
definingthe columns
definingthe SQL statement
mapping columns to attributes and facts
performance
SQL
star schemas
summary
using derivedand common table expressions

logical views and star schemas


lost analytical capability

many­to­many relationships

many­to­many relationships

creating a common child attribute 1, 2


creating a compound child attribute
creating a separate relationship table
lost analytical capability
methodscounting
multiple forresolving

mapping logical view columns to attributes and facts


materialized views
moderately denormalized snowflake
multiple counting

many­to­many relationships

normalization
normalized fact table

partition base tables


partitioning

application­level
attribute relationship volatility
batch process
data distribution
database­level
guidelines
ETL

logic
multiple hierarchies
server­level
table size
partitions

reading

physical schema
pre­aggregation
primary key

fact table

ragged hierarchy
recursive hierarchy
resolving many­to­many relationships
review

attribute relationships

role attributes
role­playing dimension

schema

snowflake
separate
starrelationship table

many­to­many relationships

server­level partitioning
slowly changing dimensions
As Is vs. As Is (Type I)
As Is vs. As Was (Type II)
As Was
Like vs. Like
vs. As Was

snowflake schema
split hierarchy
SQL

logicalviews

star schema

aggregation

star schemas

logical views

summary table

table alias 1, 2
table expressions

common
derived

table view 1, 2
table volume

lookup tables
facttables

U
using derived and common table expressions

versioning 1, 2
view
VLDB property
volatility, attribute

You might also like