Download as ppt, pdf, or txt
Download as ppt, pdf, or txt
You are on page 1of 98

Data Warehouse Fundamentals

Chapter 15
OLAP IN THE DATA
WAREHOUSE
Slides Complied from: Paul Chen

Summary of Topics
1.
2.
3.

OLAP Definition, Key features and Benefits


How OLAP differs from OATP?
Multidimensional data ? What? Why? How To Use?

4.

OLAP Server Query, Features and Applications.

5.

Category of OLAP tools -Multi-Dimensional OLAP (MOLAP),


Relational OLAP (ROLAP), and Managed Query Environment
(MQE).
OLAP extensions to SQL.
The Microsoft Data Warehousing Framework.

6.
7.

Data Warehousing and End-User Access Tools

Accompanying growth in data warehouses is increasing


demands for more powerful access tools providing
advanced analytical capabilities.

Key developments include:


Online analytical processing (OLAP)
SQL extensions for complex data analysis
Data mining tools.

Limitations of Other Analysis Methods

SQL has been the accepted interface for retrieving and manipulating data
from relational databases. These methods are used in OLTP systems and in
data warehousing environments (referring to the environments with simple
queries and routine reports).
Now consider information retrieval and manipulation in these
environments- reports writers and spreadsheets.
Report writers: two features- the ability to point and click for generating
and issuing SQL calls, and the capability to format the output reports.
However, report writers do not support multidimensionality. With basic
report writers, you cannot drill down to lower levels in the dimensions. You
cannot rotate the results by switching rows and columns. The report writers
do not provide aggregate navigation. Once the report is formatted and run,
you cannot alter the presentation of the result data sets. Spreadsheets are
still cumbersome for showing all the aggregate levels and multidimensional
views, let alone doing calculations for roll-up and drill-down.

Topic 1: OLAP Definition


On-Line Analytical Processing (OLAP) is a category of software
technology that enables analysts, managers and executives to gain
insight into data through fast, consistent, interactive access to a
wide variety of possible views of information that has been
transformed from raw data to reflect the real dimensionality of
the enterprise as understood by the user
-- DBMS Magazine, April, 1995

Key Features of OLAP

Supports analysis, dynamic synthesis and


consolidation of large volumes of multi-dimensional
data. Types of analysis ranges from basic navigation
and browsing (slicing and dicing) to calculations, to
more complex analyses such as time series and complex
modeling.

Is able to drill down or roll up with each dimension.

Is capable of applying mathematical formulas and


calculations to measures.
6

Key Features of OLAP

Can easily answer who? and what?.


Ability to answer what if? and why? type questions.
Distinguishes OLAP from general-purpose query
tools.

Enables users to gain a deeper understanding and knowledge


about various aspects of their corporate data through fast,
consistent, interactive access to a wide variety of possible views
of the data.

Can be implemented on the web.

OLAP Benefits

Increasedproductivityofendusers.
ReducedbacklogofapplicationsdevelopmentforIT
staff.
Retentionoforganizationalcontrolovertheintegrity
ofcorporatedata.
ReducedquerydragandnetworktrafficonOLTP
systemsoronthedatawarehouse.
Improvedpotentialrevenueandprofitability.

Topic 2: OLAP VS OLTP (ON-LINE


TRANSACTION PROCESSING)

OLTP (RELATIONAL)

ATOMIZED
PRESENT
RECORD-AT-A-TIME

PROCESS ORIENTED

OLAP
(MULTIDIMENSIONAL)
SUMMARIZED
HISTORICAL
MANY RECORDS-AT-ATIME
SUBJECT ORIENTED

OLAP VS OLTP
WHILE OLTP APPLICATIONS GENERALLY DO
NOT REQUIRE HISTORICAL DATA, NEARLY
EVERY OLAP APPLICATION IS CONCERNED
WITH VIEWING TRENDS AND THEREFORE
REQUIRES HISTORICAL DATA. OLTP
APPLICATIONS AND DATABASE TEND TO BE
ORGANIZED AROUND SPECIFIC PROCESSES
(SUCH AS ORDER ENTRY), OLAP
APPLICATIONS TEND TO BE SUBJECTORIENTED ANSWERING SUCH QUESTIONS AS
WHAT PRODUCTS ARE SELLING WELL OR
WHAT ARE MY WEAKEST SALES OFFICES?

10

Topic 3: Multi-dimensional Database

In a multidimensional database, data is


stored as Facts and Dimensions instead
of rows and columns

Multi-dimensional structures are best


visualized as cubes of data, and cubes
within cubes of data. Each side of a cube
is a dimension.

11

Sample Star Schema


Sales

Fact Table

Time
Time key
Date
Month
quarter
year

Product Key
Store Key
Time Key
Fixed Cost
Variable cost
Profit margin
YTD_Sales_dollars_by_store
YTD_Sales_dollar_by_category
YTD_Sales_By_department

Store
Store key
Store name
region

Store

Product

Month

Product

Product Key
Product Name

Category

Product line

12

Kinds of Queries

Displaythetotalsalesofallproductsforpastfiveyears
inallstores.
Comparetotalsalesforallstores,productbyproduct,
betweenyears2000and1999
Showcomparisonoftotalsalesforallstores,product
byproduct,betweenyears2000and1999onlyfor
thoseproductsforreducedsales.

13

Cube

Users often view and analyze data multidimensionality,


using hierarchical segmentation along each dimension.
Thus a user may analyze sales along the time
dimension (such as months within quarters with years),
along geographical dimension (cities with regions
within countries), along the organizational dimension
(sales persons within branches within territories). We
can conceptualize the approach as a cube.

14

Lattice of Cuboids
all
time

time,item

0-D(apex) cuboid

item

time,location

location

item,location

time,supplier
time,item,location

supplier

location,supplier

item,supplier

time,location,supplier

time,item,supplier

1-D cuboids

2-D cuboids
3-D cuboids

item,location,supplier

4-D(base) cuboid
time, item, location, supplier

15

Cube
Fact Table View

Multi-Dimensional Cube

Property sale
sale p

Branc
h
p1 c1
p2 c2
p3 c3
p4 c1

wee pric
k
e
1
1
2
2
2
4
1
3

P = property #

week 2
week
1

P1
P4

c
p2 1
p3

c2

2 c3
4

C1 C2

1
3

Cells: roughly equivalent to records in a relational


database.

C3

16

WHAT IS MULTIDIMENSIONAL DATA?


RELATIONAL DATABASES ARE ORGANIZED
AROUND A LIST OF RECORDS. EACH RECORD
CONTAINS RELATED INFORMATION WHICH ARE
ORGANIZED INTO FIELDS.
CUS NAME
JACK
WALTER
HOOVER
BELLRED

CUSTOMER # TELEPHONE
10001
345-4444
10002
10003

ADDRESS
40 MAIN

345-6666
345-8588

30 ELM
6

THIS TABLE HAS ONLY ONE DIMENSION.

17

WHAT IS MULTIDIMENSIONAL DATA?


LOOKING AT CUSTOMER BY TELEPHONE # OR
TELEPHONE # BY CUSTOMER ONLY PRODUCES
A ONE-FOR-ONE CORRESPONDENCE.

18

WHAT IS MULTIDIMENSIONAL DATA?


Lets take a look at an example of a relational table
where
there is more than one-to-one correspondence
between
the fields.
In the following table, we have sales data for each
product
In each region-- four products sold in three regions.
The
Sales data is a two-dimensional matrix (Product and
Region).

19

product/region/sales table
Product
Nuts
Nuts
Nuts
Screws
Screws
Screws
Bolts
Bolts
Bolts
Washers
Washers
Washers

Region
East
West
Central
East
West
Central
East
West
Central
East
West
Central

Sales
50
40
30
60
50
60
100
120
80
90
100
40

20

A Much Better Way To Represent the Data:


Two-Dimensional Matrix
PRODUCT
NUTS
SCREWS
BOLTS
WASHERS

EAST WEST CENTRAL


50
40
30
60
50
60
100 120
80
90
100
40

21

QUERY ON MULTIDIMENSIONAL
DATA
QUESTIONS LIKE WHAT WERE TOTAL SALES OF
NUTS? OR WHAT WERE TOTAL SALES FOR THE
EAST? TO FIND THE ANSWER IN THE TWO
DIMENSIONAL TABLE, JUST FIND THE CELL
CALLED EAST AND ADD UP ALL THE NUMBERS IN
THE COLUMN.

22

AGGREGATION: THE KEY TO


CONSISTENTLY FAST RESPONSE
PRODUCT
NUTS
120
SCREWS
BOLTS 100
WASHERS
TOTAL

EAST
50
60
120
90
300

WEST

50
80
100
310

CENTRAL
TOTAL
40
30
60
40
210

170
300
230
820

23

Multiple Reads & Database Writes


In the above example, computing the totals involves 28 (4*4
+3*4) database reads and eight database writes. A typical
relational database can read about 200 records per second
and writes perhaps 20 records per second. So consolidating
this tiny database would take less than one second.
However, for some larger tables, computing for totals could
take days or even weeks to consolidate.

24

Relational Database vs. Multidimensional


OLAP Server

With a multidimensional OLAP server, we can perform the same


consolidation with row and column arithmetic. Whereas a
relational database can access a few hundred records per second,
a good OLP server should be capable of consolidating 20,000 to
30,000 cells (equivalent to records in the relational table) per
second, including the time to write the totals to the database. The
multidimensional OLAP database will take up less space since the
names of the regions and products are not repeated in the
multidimensional database as they are in the relational database.

25

Multiple Hierarchies And Classes


Within Dimensions

The single biggest factor in determining how many


dimensions for a database is the existence of
multiple hierarchies and classes within dimensions.
Classes are typically attributes such as size,
color and other characteristics that define a
subset of the members of a dimension.
For example, a database of shampoo sales might
want to roll up product sales by size (6 oz, 15 oz), by
type(dry hair, oily hair) and possibly by other
attributes such as scented/unscented, brand name,
and so on.
26

TERMINOLOGY

Dimension: roughly equivalent to Fields in a relational database.


In the multidimensional data, Product and Region are both
dimensions.

Cells: roughly equivalent to Records in a relational


database.

27

Simple Hierarchies (Roll up) Within


Dimensions

The preceding table can be represented graphically as follows:


Region Total

East

Central

West
Product Total

nuts

Bolts

Screws

Individual products roll up into a Product Total

Washers

28

Multiple Levels of Hierarchies


Region Total

West

EAST

Calif

Central

Oregon

Washington

Seattle

Bellevue

29

OLAP Operations

30

Slicing

31

Dicing (Sub-cube)

32

Roll-Up

33

Drill-Down

34

Other OLAP Operations


o Drill-Across: Queries involving more than one
fact table
o Drill-Through: Makes use of SQL to drill through
the bottom level of a data cube down to its backend relational tables
o Pivot (rotate): Pivot (also called "rotate") is a
visualization operation which rotates the data axes
in
view in order to provide an alternative presentation
of
the data. Other examples include rotating the axes
in a
3-D cube, or transforming a 3-D cube into a series
35
of 2-

Practical Limitations on Database Size

In general, as the number of dimensions increases, the number of


cells in the database increases exponentially.
for ex., a two-dimensional database with 100 products and 100
regions would have 10,000 cells. If we add a third dimension for
time with 52 weeks, we now have 520,000 cells.
Most commercial OLAP servers hit the cell limit long before they
run out of dimensions.

36

Topic 4: A list of some important features


supported by some OLAP Servers

Special time-series data types


Special dimensions for variables (complex mathematical
relationships, such as computed averages, and simultaneous
equations)
Multiple hierarchies within a dimension
Classes with a dimension
Virtual variables (computed on the fly at run time, such as gross
margin derived from revenues and expenses.

37

Examples of OLAP applications in various


functional areas

38

OLAP Applications

AlthoughOLAPapplicationsarefoundinwidely
divergentfunctionalareas,allhavefollowingkey
features:
multidimensionalviewsofdata
supportforcomplexcalculations
time intelligence.

39

OLAP Applications - multidimensional


viewsofdata

Corerequirementofbuildingarealisticbusiness
model.

Providesbasisforanalyticalprocessingthrough
flexibleaccesstocorporatedata.

Theunderlyingdatabasedesignthatprovidesthe
multidimensionalviewofdatashouldtreatall
dimensionsequally.

40

OLAP Applications - support for complex


calculations

Mustprovidearangeofpowerfulcomputational
methodssuchasthatrequiredbysalesforecasting,
whichusestrendalgorithmssuchasmovingaverages
andpercentagegrowth.

Mechanismsforimplementingcomputationalmethods
shouldbeclearandnonprocedural.

41

OLAP Applications time intelligence

Keyfeatureofalmostanyanalyticalapplicationas
performanceisalmostalwaysjudgedovertime.

Timehierarchyisnotalwaysusedinsamemanneras
otherhierarchies.

Conceptssuchasyeartodateandperiodoverperiod
comparisonsshouldbeeasilydefined.

42

Representing Multi-Dimensional Data

Example of two-dimensional query.


What is the total revenue generated by property
sales in each city, in each quarter of 1997?

Choice of representation is based on types of queries


end-user may ask.

Compare representation - three-field relational table


versus two-dimensional matrix.

43

Multi-dimensional Data as Three-field table


versus Two-dimensional Matrix

44

Representing Multi-Dimensional Data

Example of three-dimensional query.


What is the total revenue generated by property
sales for each type of property (Flat or House) in
each city, in each quarter of 1997?

Compare representation - four-field relational table


versus three-dimensional cube.

45

Multi-dimensional Data as Four-field Table


versus Three-dimensional Cube

46

Representing Multi-Dimensional Data

Cube represents data as cells in an array.

Relational table only represents multi-dimensional


data in two dimensions.

47

Multi-Dimensional OLAP Servers

Use multi-dimensional structures to store data and


relationships between data.

Multi-dimensional structures are best visualized as


cubes of data, and cubes within cubes of data. Each
side of cube is a dimension.

A cube can be expanded to include other dimensions.

48

Multi-Dimensional OLAP Servers

A cube supports matrix arithmetic.

Multi-dimensional query response time depends on


how many cells have to be added on the fly.

As number of dimensions increases, number of the


cubes cells increases exponentially.

49

Multi-Dimensional OLAP Servers

However, majority of multi-dimensional queries use


summarized, high-level data.

Solution is to pre-aggregate (consolidate) all logical


subtotals and totals along all dimensions.

Pre-aggregation is valuable, as typical dimensions are


hierarchical in nature.
(e.g. Time dimension hierarchy - years, quarters,
months, weeks, and days)
50

Multi-Dimensional OLAP Servers

Predefined hierarchy allows logical pre-aggregation


and, conversely, allows for a logical drill-down.

Supports common analytical operations


Consolidation
Drill-down
Slicing and dicing.

51

Multi-Dimensional OLAP Servers

Consolidation - aggregation of data such as simple


roll-ups or complex expressions involving interrelated data.

Drill-Down - is reverse of consolidation and involves


displaying the detailed data that comprises the
consolidated data.

Slicing and Dicing - (also called pivoting) refers to the


ability to look at the data from different viewpoints.

52

Multi-Dimensional OLAP servers

Can store data in a compressed form by dynamically


selecting physical storage organizations and
compression techniques that maximize space
utilization.

Dense data (ie., data that exists for high percentage of


cells) can be stored separately from sparse data (ie.,
significant percentage of cells are empty).

53

Multi-Dimensional OLAP Servers

Ability to omit empty or repetitive cells can greatly


reduce the size of the cube and the amount of
processing.

Allows analysis of exceptionally large amounts of data.

54

Multi-Dimensional OLAP Servers

In summary, pre-aggregation, dimensional hierarchy,


and sparse data management can significantly reduce
the size of the cube and the need to calculate values
on-the-fly.

Removes need for multi-table joins and provides quick


and direct access to arrays of data, thus significantly
speeding up execution of multi-dimensional queries.

55

Codds Rules for OLAP Systems

In 1993, E.F. Codd formulated twelve rules as the basis


for selecting OLAP tools.
Multi-dimensional conceptual view
Transparency
Accessibility
Consistent reporting performance
Client-server architecture
Generic dimensionality

56

Codds rules for OLAP Systems

Dynamic sparse matrix handling


Multi-user support
Unrestricted cross-dimensional operations
Intuitive data manipulation
Flexible reporting
Unlimited dimensions and aggregation levels.

57

Codds Rules for OLAP Systems

Thereareproposalstoredefinedorextendedthe
rules.Forexample,toalsoinclude:

Comprehensivedatabasemanagementtools
Abilitytodrilldowntodetail(sourcerecord)level
Incrementaldatabaserefresh
SQLinterfacetotheexistingenterpriseenvironment

58

Categories of OLAP Tools

OLAP tools are categorized according to the


architecture of the underlying database.

Three main categories of OLAP tools include


Multi-dimensional OLAP (MOLAP or MD-OLAP)
Relational OLAP (ROLAP), also called multirelational OLAP
Managed query environment (MQE)

59

Topic 5: Multi-Dimensional OLAP


(MOLAP)

Use specialized data structures and multi-dimensional


Database Management Systems (MDDBMSs) to
organize, navigate, and analyze data.

Data is typically aggregated and stored according to


predicted usage to enhance query performance.

60

Multi-Dimensional OLAP (MOLAP)

Use array technology and efficient storage techniques


that minimize the disk space requirements through
sparse data management.

Provides excellent performance when data is used as


designed, and the focus is on data for a specific
decision-support application.

61

Multi-Dimensional OLAP (MOLAP)

Traditionally, require a tight coupling with the


application layer and presentation layer.

Recent trends segregate the OLAP from the data


structures through the use of published application
programming interfaces (APIs).

62

Typical Architecture for MOLAP Tools

63

MOLAP Tools - Development Issues

Underlying data structures are limited in their ability


to support multiple subject areas and to provide access
to detailed data.

Navigation and analysis of data is limited because the


data is designed according to previously determined
requirements.

64

MOLAP Tools - Development Issues

MOLAP products require a different set of skills and


tools to build and maintain the database, thus
increasing the cost and complexity of support.

65

Relational OLAP (ROLAP)

Fastest-growing style of OLAP technology.

Supports RDBMS products using a metadata layer avoids need to create a static multi-dimensional data
structure - facilitates the creation of multiple multidimensional views of the two-dimensional relation.

66

Relational OLAP (ROLAP)

To improve performance, some products use SQL


engines to support complexity of multi-dimensional
analysis, while others recommend, or require, the use
of highly denormalized database designs such as the
star schema.

67

Typical Architecture for ROLAP Tools

68

ROLAP Tools - Development Issues

Middleware to facilitate the development of multidimensional applications. (Software that converts the
two-dimensional relation into a multi-dimensional
structure).

Development of an option to create persistent, multidimensional structures with facilities to assist in the
administration of these structures.

69

Managed Query Environment (MQE)

Relatively new development.

Provide limited analysis capability, either directly


against RDBMS products, or by using an intermediate
MOLAP server.

70

Managed Query Environment (MQE)

Deliver selected data directly from DBMS or via a


MOLAP server to desktop (or local server) in form of a
datacube, where it is stored, analyzed, and maintained
locally.

Promoted as being relatively simple to install and


administer with reduced cost and maintenance.

71

Typical Architecture for MQE Tools

72

MQE Tools - Development Issues

Architecture results in significant data redundancy and


may cause problems for networks that support many
users.

Ability of each user to build a custom datacube may


cause a lack of data consistency among users.

Only a limited amount of data can be efficiently


maintained.

73

Topic 6: OLAP Extensions to SQL

SQL promoted as easy-to-learn, nonprocedural, freeformat, DBMS-independent, and international


standard.

However, major disadvantage has been inability to


represent many of the questions most commonly asked
by business analysts.

IBM and Oracle jointly proposed OLAP extensions to


SQL early in 1999, adopted as an amendment to SQL.

74

OLAP Extensions to SQL

Many database vendors including IBM, Oracle,


Informix, and Red Brick Systems have already
implemented portions of specifications in their DBMSs.

Red Brick Systems was first to implement many


essential OLAP functions (as Red Brick Intelligent
SQL (RISQL)), albeit in advance of the standard.

75

OLAP Extensions to SQL - RISQL

Designed for business analysts.

Set of extensions that augments SQL with a variety of


powerful operations appropriate to data analysis and
decision-support applications such as ranking, moving
averages, comparisons, market share, this year versus
last year.

76

Use of the RISQL CUME Function

Show the quarterly sales for branch office B003, along


with the monthly year-to-date figures.
SELECTquarter,quarterlySales,CUME(quarterlySales)
ASYeartoDate
FROMBranchSales
WHEREbranchNo='B003';

77

Use of the RISQL MOVINGAVG / MOVINGSUM


Function

Show the first six monthly sales for branch office B003
without the effect of seasonality.

SELECTmonth,monthlySales,
MOVINGAVG(monthlySales)
MonthMovingAvg,
MOVINGSUM(monthlySales)
MonthMovingSum
FROMBranchSales
WHEREbranchNo='B003';

AS

AS

78

Topic 7: The Microsoft Data Warehousing


Framework

The Microsoft Data Warehousing Framework is an


open architecture that is easily integrated with
existing systems. The Microsoft SQL Server DTS tool
is used to import, export, and repair or transform
data (where it is necessary). The Framework
contains a rich object-oriented programming
interface for customized data warehousing
implementations. There is also a user interface, the
Microsoft SQL Server Analysis Services Manager that
can be used to configure the data warehouse and to
populate or update the content in a cube. It can be
used to schedule tasks, monitor performance, and
perform queries on the data warehouse.

79

Microsoft Data Warehouse


Product

Data warehousing software has been included with


Microsoft SQL Server™ since the release of
version 7.0 in 1998.

80

The Cube as a Model for the Data


Warehouse

The cube is an imperfect yet satisfactory name for a


data warehouse repository. How does a data
warehouse cube differ from a geometrical cube? There
are a few important differences. A data warehouse cube
is defined by any number of dimensions (it is not
limited to three, and sometimes a data-warehousing
cube may have fewer than three dimensions).
Dimensions describe a data-warehousing cube just as
width, height, and depth describe a geometrical cube.
Where it is appropriate, dimensions can be organized
into any number of levels.

81

The Cube as a Model for the Data


Warehouse

The relationship between two dimensions can be


modeled using a grid. Dimensions are like the labels
of along the axes of the grid. The cells are facts. Facts
correspond to the cross product of each dimension of
the cube. The data in the cell is a measure. Measures
are the whole reason for the cube. If the cube is about
the number of items sold, the measure is a count of the
number of items sold. To repeat the grid example, the
measure is the number that you would find in the grid
cell.

82

Dimensions and Levels

Levels are used to organize dimensions into smaller units where


necessary. Levels may also contain other levels, depending on
how they are configured in the cube. For example, consider that
there is a region dimension. Perhaps this grocery store operates
in three states and uses the state boundaries as territorial
boundaries. Let's say that the region dimension contains three
levels: California, Oregon, and Washington. If the business has
additional sub-regions such as Seattle, Olympia, Yakima, and
Spokane in the state of Washington, these levels can be added as
levels to the Washington region, even if such detail is not needed
for the California and Oregon region. Levels are just a
convenient way to organize facts for a dimension.

83

Facts and Measures

A fact is about the combination of the various dimensions.


Locating a fact is like using a coordinate system. Just like a
position in a mathematical cube such as the origin, which might
be represented by (x=0,y=0,z=0), a fact would be represented by
specific combination of dimensions such as
(Product=broccoli, Region=Seattle, Time=Wednesday) yielding a
specific fact about broccoli being sold in Seattle on Wednesday.
Depending on the way that the cube is being used, the fact may
show a measure of something like 580 units sold or perhaps a
different measure like $860.00 in sales. The meaning of the
measure depends on how the cube is defined.

84

Aggregations

The mathematical operations of count and sum are an essential


part of the reason data warehouses are useful. These are
aggregations. Once dimensions are organized and a cube is being
processed, the aggregations are calculated. Generally,
aggregations are calculated immediately after the cube is initially
populated or there is a change to the content of the cube.

85

Using a Data Warehouse to Make Decisions

Consider a grocery store. Let's say that a


promotion has been running for a few
days and the grocer needs to decide
whether or not to run the promotion
again. A question that the grocer might
pose would be something like, "Has
more of the product been sold during the
promotion period compared to prior
periods?"

86

Using a Data Warehouse to Make Decisions


A grocery store inventory system may record prices, products, sales, and
promotions in a transactional database using normalized structures.
Inventory systems are optimized for inserting and updating records and
perhaps for simple procedural selections such as retrieving the cost of an
item. It is much less likely that the system is organized in such a way that it
would be just as efficient to produce a report that details on a day-by-day
and product-by-product basis the effectiveness of a sale. In fact, there is
usually a contradiction between systems that are designed for
transactional
efficiency and those that are designed for efficient queries. This is where a
data warehouse should be used. The data warehouse is a separate
repository that uses the relevant data from existing sources in a structure
that has been optimized for selection.

87

Using a Data Warehouse to Make Decisions

In this case, it is easy to use a data


warehouse to answer the grocer's
question. The sum of the fact records
that measure the number of items sold
using the cube dimensions of products,
promotion, and time can produce the
needed results.

88

Using a Data Warehouse to Make Decisions

To contrast this technique, the information in


the other systems may not even be in the
same database. The inventory data source may
not be the same data source as the customer
data source or the employee data source. Even
if the systems are in the same database, it
would still be a chore to build a system of
queries that will combine and aggregate the
results in a way that will produce the correct
answer. In fact, this effort of combining data
sources and aggregating the results is just what
the data warehousing software does best.

89

Viewing a Slice and the


Programming Interface to A
Cube

Although fashioning a data warehouse in multiple


dimensions may be a simple design choice, and
performing queries that produce results that span
several dimensions may not be a significant chore for
the processor, the very constitution of multidimensional output can make it difficult to display.
Charts, graphs, and tables are almost always presented
in two dimensions. There are some good threedimensional charting tools, but beyond that, the chart
becomes more of a puzzle than a visual aid. A common
technique of viewing multi-dimensional output is to
view the output one two-dimensional "slice" of a cube
at a time. This is the way that the Microsoft SQL Server
Analysis Services Tool displays output.

90

Using DSO (Decision Support


Object)

Fortunately, output is not restricted to


two dimensions. Microsoft SQL Server
Analysis Services provides a
programming interface to multidimensional data warehouse output: the
DSO (Decision Support Object). DSO can
be used programmatically to access the
various dimensions.

91

Using MDX

MDX (multidimensional extensions) is a


syntax designed for querying
multidimensional objects and data. For such
systems, it is more efficient and intuitive to
use than SQL, which was designed for an
entirely different set of objects. The
grammar of an MDX query has a similar feel
to the grammar of an SQL query. Observe
the MDX query below which would select the
sales figures for the broccoli sold in Seattle
on Wednesday that I used in the example
above:

92

Using MDX

Microsoft SQL Server OLAP Services provides


an architecture for access to multidimensional
data. This data is summarized, organized, and
stored in multidimensional structures for
rapid response to user queries. Through OLE
DB for OLAP, a PivotTable Service provides
client access to this multidimensional online
analytical processing
(OLAP) data. For expressing queries to this
data, OLE DB for OLAP employs a full-fledged,
highly functional expression syntax:
multidimensional expressions (MDX).

93

Using MDX

OLE DB for OLAP is a set of Component


Object Model (COM) interfaces designed to
extend OLE DB for efficient access to
multidimensional data. ADO has been
extended with new objects, collections, and
methods that are designed to take
advantage of OLE DB for OLAP. These
extensions are collectively known as ADO MD
(multidimensional) and are designed to
provide a simple, high-level object model for
accessing tabular and OLAP data.

94

MDX expression

SELECT axis specification ON COLUMNS,


axis specification ON ROWS
FROM cube_name
WHERE slicer_specification

95

SELECT

[Measures].[Sales] ON COLUMNS
[Time].[Wednesday] ON ROWS
FROM MySalesCube
WHERE [Region].[Washington].[Seattle]
AND [Product].[Vegetable].[Broccholi]

The output of this query would be a


column labeled Sales, a row labeled
Wednesday and a single grid cell at the
intersection with the sales figure
$860.00.

96

SELECT

The SQL Server Analysis Server Manager


has an interface that accepts MDX
queries. Alternatively, MDX queries can
be incorporated into programs that
employ the DSO.

97

The Microsoft Data Warehousing


Framework

There is more information about how to build a


cube using the Microsoft SQL Server Analysis
Services Manager at MSDN Online, as well as
guidelines
to consider for the design and configuration of
your data warehouse. See the "How to" article at
this link:

http://msdn.microsoft.com/library/psdk/sql/aghtin
tro_2vov.htm.

98

You might also like