Lec 4n5 - Warehousing N DM (Compatibility Mode)

DATA WAREHOUSING
& Topic Objectives
DATA MINING
• The purpose of Online Analytical Processing (OLAP).
• The relationship between OLAP and data warehousing.
• The key features of OLAP applications.
• How to represent multi‐dimensional data.
• The rules for OLAP tools.
• The main categories of OLAP tools.
• OLAP extensions to the SQL standard.
• How Oracle supports OLAP.
LECTURE 4 & 5
Online Analytical Processing (OLAP)
By Mbogo Njoroge
1 2
Online Analytical Processing (OLAP)
Business Intelligence Technologies
• Original definition ‐ The dynamic synthesis, analysis, and consolidation of
large volumes of multi‐dimensional data, Codd (1993).
• Accompanying the growth in data warehousing is an ever‐
increasing demand by users for more powerful access tools • Describes a technology that is designed to optimize the storing and
that provide advanced analytical capabilities. querying of large volumes of multi‐dimensional data that is aggregated
(summarized) to various levels of detail to support the analysis of this
• There are two main types of access tools available to meet this data.
demand, namely Online Analytical Processing (OLAP) and data • Enables users to gain a deeper understanding and knowledge about
mining. various aspects of their corporate data through fast, consistent, interactive
• OLAP and Data Mining differ in what they offer the user and access to a wide variety of possible views of the data.
because of this they are complementary technologies. • Allows users to view corporate data in such a way that it is a better model
of the true dimensionality of the enterprise.
• An environment that includes a data warehouse (or more • Can easily answer ‘who?’ and ‘what?’ questions, however, ability to
commonly one or more data marts) together with tools such as answer ‘why?’ type questions distinguishes OLAP from general‐purpose
OLAP and /or data mining are collectively referred to as
Business Intelligence (BI) technologies. query tools.
• Types of analysis ranges from basic navigation and browsing (slicing and
dicing) to calculations, to more complex analyses such as time series and
3 complex modeling. 4
Examples of OLAP applications in various
OLAP Applications functional areas
• JIT (Just In Time)information is computed data that usually
reflects complex relationships and is often calculated on the
fly. Also as data relationships may not be known in advance,
the data model must be flexible.
• Many OLAP and Data Mining applications involve
sophisticated analysis methods from the fields of
mathematics, statistical analysis, and artificial intelligence
• Although OLAP applications are found in widely divergent
functional areas, they all have the following key features:
– multi‐dimensional views of data
– support for complex calculations
– time intelligence
5 6
1
OLAP Applications ‐ multi‐dimensional views OLAP Applications ‐ support for complex
of data calculations
• Core requirement of building a ‘realistic’ business • Must provide a range of powerful computational
model. methods such as that required by sales forecasting,
which uses trend algorithms such as moving averages
and percentage growth.
• Provides basis for analytical processing through
flexible access to corporate data.
• Mechanisms for implementing computational methods
should be clear and non‐procedural.
• The underlying database design that provides the
multi‐dimensional view of data should treat all
dimensions equally.
7 8
Multi‐dimensional Data and OLAP cubes
OLAP Applications – time intelligence
• Key feature of almost any analytical • Multi‐dimensional data is facts (numeric
application as performance is almost always measurements) such as property sales
judged over time. revenue data and the association of this
data with dimensions such as location (of
• Time hierarchy is not always used in the same the property) and time (of the property
manner as other hierarchies. sale).
• Concepts such as year‐to‐date and period‐ • Which is the best representation of multi‐
over‐period comparisons should be easily dimensional data: relational table, matrix
defined. or data cube?
9 10
Multi‐dimensional Data as 3‐field Table versus Multi‐dimensional Data as 4‐field Table versus
2‐D Matrix 3‐D Cube
11 12
2
Multi‐dimensional Data as series of 3‐D
Cubes Multi‐dimensional data and OLAP cubes
• The OLAP cube is n‐dimensional
structure (with sides that need not be
equal).
• Alternative representation for n‐
dimensional data is to consider a data
cube as a lattice of cuboids. Each
cuboid represents a subset of the
given dimensions.
13 14
Multi‐dimensional data and OLAP cubes Dimensionality Hierarchy
0‐D cuboid (highest‐level)
• The lattice of cuboids does not show the hierarchies that
ALL are commonly associated with dimensions.
1‐D cuboid
• A dimensional hierarchy defines mappings from a set of
lower‐level concepts to higher level concepts.
time location type office
country
2‐D cuboid
year
time, location time, type time, office location, type location, office type, office region
2-D data
city quarter season
3‐D cuboid
time, location, type time, location, office time, type, office location, type, office

area
month
week
4‐D cuboid (lowest‐level)
time, location, type, office zipCode
day
15 16
Dimensional Operations Dimensional Operations
• The analytical operations that can be performed on data
cubes include: • Slice and dice ‐ ability to look at data from different
– Roll‐up viewpoints. The slice operation performs a selection on one
– Drill‐down dimension of the data whereas dice uses two or more
– Slice and Dice dimensions. For example a slice of sales revenue (type =
‘Flat’) and a dice (type = ‘Flat’ and time = ‘Q1’).
– Pivot
• Roll‐up performs aggregations on the data by moving up the
dimensional hierarchy or by dimensional reduction e.g. 4‐D • Pivot ‐ ability to rotate the data to provide an alternative
sales data to 3‐D sales data. view of the same data e.g. sales revenue data displayed
using the location (city) as x‐axis against time (quarter) as
• Drill‐down is the reverse of roll‐up and involves revealing the the y‐axis can be rotated so that time (quarter) is the x‐axis
detailed data that forms the aggregated data. Drill‐down can against location (city) is the y‐axis.
be performed by moving down the dimensional hierarchy or
by dimensional introduction e.g. 3‐D sales data to 4‐D sales
data. 17 18
3
Fact Tables A Data Cube
• Fact tables can be viewed as an N‐dimensional data cube
• Many OLAP applications are based on a fact table (3‐dimensional in our example)
• For example, a supermarket application might be – The entries in the cube are the values for Sales_Amts
based on a table
Sales (Market_Id, Product_Id, Time_Id, Sales_Amt)
• The table can be viewed as multidimensional
– Market_Id, Product_Id, Time_Id are the dimensions that
represent specific supermarkets, products, and time
intervals
– Sales_Amt is a function of the other three
19 20
Dimension Tables Star Schema
• The fact and dimension relations can be
• The dimensions of the fact table are further
displayed in an E‐R diagram, which looks
described with dimension tables
like a star and is called a star schema
• Fact table:
Sales (Market_id, Product_Id, Time_Id, Sales_Amt)
• Dimension Tables:
Market (Market_Id, City, State, Region)
Product (Product_Id, Name, Category, Price)
Time (Time_Id, Week, Month, Quarter)
21 22
Aggregation Aggregation over Time
• The output of the previous query
• Many OLAP queries involve aggregation of the
data in the fact table Market_Id
• For example, to find the total sales (over time) of M1 M2 M3 M4
each product in each market, we might use SUM(Sales_Amt)
SELECT S.Market_Id, S.Product_Id, SUM (S.Sales_Amt)
P1 3003 1503 …
FROM
Product_Id
Sales S
GROUP BY S.Market_Id, S.Product_Id P2 6003 2402 …
• The aggregation is over the entire time dimension P3 4503 3 …
and thus produces a two‐dimensional view of the P4 7503 7000 …
data
P5 … … …
23 24
4
Drilling Down and Rolling Up Drilling Down
• Some dimension tables form an aggregation hierarchy
• Drilling down on market: from Region to State
Market_Id  City  State  Region Sales (Market_Id, Product_Id, Time_Id, Sales_Amt)
• Executing a series of queries that moves down a Market (Market_Id, City, State, Region)
hierarchy (e.g., from aggregation over regions to that
over states) is called drilling down 1. SELECT S.Product_Id, M.Region, SUM (S.Sales_Amt)
FROM Sales S, Market M
– Requires the use of the fact table or information more specific
WHERE M.Market_Id = S.Market_Id
than the requested aggregation (e.g., cities) GROUP BY S.Product_Id, M.Region
• Executing a series of queries that moves up the
hierarchy (e.g., from states to regions) is called rolling 2. SELECT S.Product_Id, M.State, SUM (S.Sales_Amt)
FROM Sales S, Market M
up
WHERE M.Market_Id = S.Market_Id
– Note: In a rollup, coarser aggregations can be computed using GROUP BY S.Product_Id, M.State,
prior queries for finer aggregations
25 26
Rolling Up Pivoting
• When we view the data as a multi‐dimensional
• Rolling up on market, from State to Region cube and group on a subset of the axes, we are
– If we have already created a table, State_Sales, using said to be performing a pivot on those axes
– Pivoting on dimensions D1,…,Dk in a data cube
1. SELECT S.Product_Id, M.State, SUM (S.Sales_Amt) D1,…,Dk,Dk+1,…,Dn means that we use GROUP BY
FROM Sales S, Market M A1,…,Ak and aggregate over Ak+1,…An, where Ai is an
WHERE M.Market_Id = S.Market_Id attribute of the dimension Di
GROUP BY S.Product_Id, M.State – Example: Pivoting on Product and Time corresponds to
grouping on Product_id and Quarter and aggregating
then we can roll up from there to:
Sales_Amt over Market_id:
2. SELECT T.Product_Id, M.Region, SUM (T.Sales_Amt) SELECT S.Product_Id, T.Quarter, SUM (S.Sales_Amt)
FROM State_Sales T, Market M FROM Sales S, Time T
WHERE M.State = T.State
WHERE T.Time_Id = S.Time_Id
GROUP BY T.Product_Id, M.Region GROUP BY S.Product_Id, T.Quarter
Pivot
27 28
Time Hierarchy as a Lattice Slicing‐and‐Dicing
• When we use WHERE to specify a particular
• Not all aggregation value for an axis (or several axes), we are
hierarchies are linear performing a slice
– The time hierarchy is a lattice – Slicing the data cube in the Time dimension
• Weeks are not contained in (choosing sales only in week 12) then pivoting to
months
• We can roll up days into weeks
Product_id (aggregating over Market_id)
or months, but we can only roll
SELECT S.Product_Id, SUM (Sales_Amt) Slice
up weeks into quarters
FROM Sales S, Time T
WHERE T.Time_Id = S.Time_Id AND T.Week = ‘Wk‐12’
GROUP BY S. Product_Id
Pivot
29 30
5
Slicing‐and‐Dicing The CUBE Operator
• Typically slicing and dicing involves several queries to • To construct the following table, would take 3
find the “right slice.” queries (next slide)
For instance, change the slice and the axes: Market_Id
• Slicing on Time and Market dimensions then pivoting to Product_id M1 M2 M3 Total
and Week (in the time dimension)
SUM(Sales_Amt)
SELECT S.Product_Id, T.Quarter, SUM (Sales_Amt) P1 3003 1503 … …
Product_Id
FROM Sales S, Time T
P2 6003 2402 … …
WHERE T.Time_Id = S.Time_Id Slice
AND T.Quarter = 4 P3 4503 3 … …
AND S.Market_id = 12345
P4 7503 7000 … …
GROUP BY S.Product_Id, T.Week
Total … … … …
Pivot 31 32
The Three Queries Definition of the CUBE Operator

• For the table entries, without the totals (aggregation on time)
SELECT S.Market_Id, S.Product_Id, SUM (S.Sales_Amt) • Doing these three queries is wasteful
FROM Sales S
GROUP BY S.Market_Id, S.Product_Id
– The first does much of the work of the other two:
• For the row totals (aggregation on time and supermarkets) if we could save that result and aggregate over
SELECT S.Product_Id, SUM (S.Sales_Amt) Market_Id and Product_Id, we could compute the
FROM Sales S other queries more efficiently
GROUP BY S.Product_Id • The CUBE clause is part of SQL:1999
• For the column totals (aggregation on time and products)
– GROUP BY CUBE (v1, v2, …, vn)
SELECT S.Market_Id, SUM (S.Sales)
FROM Sales S – Equivalent to a collection of GROUP BYs, one for
GROUP BY S.Market_Id each of the 2n subsets of v1, v2, …, vn
33 34
ROLLUP
Example of CUBE Operator
• ROLLUP is similar to CUBE except that instead of
aggregating over all subsets of the arguments, it
• The following query returns all the information creates subsets moving from right to left
needed to make the previous products/markets • GROUP BY ROLLUP (A1,A2,…,An) is a series of these
table: aggregations:
– GROUP BY A1 ,…, An‐1 ,An
– GROUP BY A1 ,…, An‐1
SELECT S.Market_Id, S.Product_Id, SUM (S.Sales_Amt) – … … …
FROM Sales S – GROUP BY A1, A2
GROUP BY CUBE (S.Market_Id, S.Product_Id) – GROUP BY A1
– No GROUP BY
• ROLLUP is also in SQL:1999
35 36
6
Example of ROLLUP Operator ROLLUP vs. CUBE
SELECT S.Market_Id, S.Product_Id, SUM (S.Sales_Amt)
• The same query with CUBE:
FROM Sales S
GROUP BY ROLLUP (S.Market_Id, S. Product_Id) ‐ first aggregates with the finest granularity:
– first aggregates with the finest granularity: GROUP BY S.Market_Id, S.Product_Id
GROUP BY S.Market_Id, S.Product_Id ‐ then with the next level of granularity:
– then with the next level of granularity: GROUP BY S.Market_Id
GROUP BY S.Market_Id
and
– then the grand total is computed with no GROUP
BY clause GROUP BY S.Product_Id
‐ then the grand total with no GROUP BY
37 38
Codd’s Rules for OLAP Systems
OLAP Tools
In 1993, E.F. Codd formulated twelve rules as the basis for selecting
• There are many varieties of OLAP tools available in the OLAP tools.
marketplace. 1. Multi‐dimensional conceptual view
2. Transparency
3. Accessibility
• This choice has resulted in some confusion with much 4. Consistent reporting performance
debate regarding what OLAP actually means to a 5. Client‐server architecture
potential buyer and in particular what are the available 6. Generic dimensionality
architectures for OLAP tools. 7. Dynamic sparse matrix handling
8. Multi‐user support
9. Unrestricted cross‐dimensional operations
10. Intuitive data manipulation
11. Flexible reporting
39
12. Unlimited dimensions and aggregation levels 40
Codd’s Rules for OLAP Systems Categories of OLAP Tools
• There are proposals to re‐defined or extended the rules. • OLAP tools are categorized according to the
For example to also include architecture used to store and process multi‐
– Comprehensive database management tools dimensional data.
– Ability to drill down to detail (source record) level
– Incremental database refresh • There are three main categories:
– SQL interface to the existing enterprise environment i. Multi‐dimensional OLAP (MOLAP)
ii. Relational OLAP (ROLAP)
iii. Hybrid OLAP (HOLAP)
41 42
7
Multi‐dimensional OLAP (MOLAP) Typical Architecture for MOLAP Tools
• MOLAP Architecture includes the following three major
• Use specialized data structures and multi‐dimensional Database components:
Management Systems (MDDBMSs) to organize, navigate, and – Database Server(Relational Database and/or Legacy systems)
analyze data. – MOLAP Server
• Data is typically aggregated and stored according to predicted usage – Front‐end user access tools
to enhance query performance.
Considering the given MOLAP
• Use array technology and efficient storage techniques that minimize Architecture:
the disk space requirements through sparse data management. 1. The user request reports
• Provides excellent performance when data is used as designed, and through the interface
the focus is on data for a specific decision‐support application. 2. The application logic layer of
the MDDB retrieves the
• Traditionally, require a tight coupling with the application layer and
stored data from Database
presentation layer. 3. The application logic layer
• Recent trends segregate the OLAP from the data structures through forwards the result to the
the use of published application programming interfaces (APIs). client/user.
43 44
Advantages of MOLAP Disadvantages of MOLAP
Below are the advantages of MOLAP: Following are the disadvantages of MOLAP:
i. MOLAP can manage, analyze and store considerable amounts of i. One major weakness of MOLAP is that it is less scalable than
multidimensional data. ROLAP as it handles only a limited amount of data.
ii. Fast Query Performance due to optimized storage, indexing, and ii. The MOLAP also introduces data redundancy as it is resource
caching. intensive
iii. Smaller sizes of data as compared to the relational database. iii. MOLAP Solutions may be lengthy, particularly on large data
iv. Automated computation of higher level of aggregates data. volumes.
v. Help users to analyze larger, less‐defined data.
vi. MOLAP is easier to the user that’s why It is a suitable model for iv. MOLAP products may face issues while updating and querying
inexperienced users. models when dimensions are more than ten.
vii. MOLAP cubes are built for fast data retrieval and are optimal for v. MOLAP is not capable of containing detailed data.
slicing and dicing operations. vi. The storage utilization can be low if the data set is highly
viii. All calculations are pre‐generated when the cube is created scattered.
vii. It can handle the only limited amount of data therefore, it’s
impossible to include a large amount of data in the cube itself.
45 46
• Clear Analytics – Clear analytics is an Excel‐based business solution.
MOLAP Tools – Clear Analytics aggregates data from a variety of sources, then
Below are some of the popular MOLAP Tools: leverage Microsoft’s Power BI features to enable the user to
• Essbase – Tools from Oracle that has a multidimensional database. wrangle, filter, model and visualize his/her insights. It can also
– It drives smarter decisions with the ability to easily test and model publish datasets directly to Power BI portal.
complex business assumptions in the cloud or on‐premises and
thereby gives organizations the power to rapidly generate insights
• SAP Business Intelligence – Business analytics solutions from SAP
from multidimensional data sets using what‐if analysis, and data
visualization tools. SAP BusinessObjects BI (SAP BO) is a centralized platform suite of
reporting and analytics tools for business intelligence (BI)
• Express Server – Web‐based environment that runs on Oracle database. platforms. ... It consists of a number of reporting applications that
– Oracle Database Express Edition (XE) is a community supported allow users to discover data, perform analysis to derive insights and
edition of the Oracle Database family create reports that visualize the insights.
• Yellowfin – Business analytics tools for creating reports and dashboards. NB: Business intelligence(BI) software is a type of application software
– It is a business intelligence(BI) tool and ‘end‐to‐end’ analytics designed to retrieve, analyze, transform and report data for business
platform that combines visualization, machine learning, and
intelligence. The applications generally read data that has been
collaboration. It provides ability to easily filter through tons of data
with intuitive filtering previously stored, often ‐ though not necessarily ‐ in a data warehouse
47
or data mart. 48
8
MOLAP Tools ‐ Development Issues
Relational OLAP (ROLAP)
• Fastest‐growing style of OLAP technology due to requirements to
• Underlying data structures are limited in their ability to analyze ever‐increasing amounts of data and the realization that
support multiple subject areas and to provide access to users cannot store all the data they require in MOLAP databases.
detailed data.
• Supports RDBMS products using a metadata layer ‐ avoids need to
create a static multi‐dimensional data structure ‐ facilitates the
• Navigation and analysis of data is limited because the creation of multiple multi‐dimensional views of the two‐
data is designed according to previously determined dimensional relation.
requirements.
• To improve performance, some products use SQL engines to
support the complexity of multi‐dimensional analysis, while others
• MOLAP products require a different set of skills and tools recommend, or require, the use of highly denormalized database
designs such as the star schema.
to build and maintain the database, thus increasing the
cost and complexity of support.
49 50
Typical Architecture for ROLAP Tools Disadvantages of ROLAP
i. Poor query performance.
• ROLAP Architecture includes the following three major ii. Some limitations of scalability depending on the technology architecture
components: that is utilized.
– Database Server
– ROLAP Server
– Front‐end user tools
Advantages
i. Can be easily used with the
existing RDBMS.
ii. Data Can be stored
efficiently since no zero
facts can be stored.
iii. ROLAP tools do not use
pre-calculated data cubes.
iv. DSS server of microstrategy
adopts the ROLAP
approach.
51 52
ROLAP Tools ‐ Development Issues
MOLAP vs ROLAP
MOLAP ROLAP
• Performance problems associated with the processing of 1 The information retrievalInformation retrieval
complex queries that require multiple passes through the is fast. is comparatively
slow.
relational data. 2 It uses the sparse array It uses relational
to store the data sets. table.
3 MOLAP is best suited for ROLAP is best
• Middleware to facilitate the development of multi‐
inexperienced users suited for
dimensional applications. (Software that converts the two‐ since it is very easy to experienced users.
dimensional relation into a multi‐dimensional structure). use.
4 The separate database It may not require
for data cube. space other than
• Development of an option to create persistent, multi‐ available in Data
dimensional structures with facilities to assist in the warehouse.
administration of these structures. 5 DBMS facility is weak. DBMS facility is
strong.
53 54
9
Hybrid OLAP (HOLAP)
• Provide limited analysis capability, either directly against RDBMS products, HOLAP Tools ‐ Development Issues
or by using an intermediate MOLAP server.
• Deliver selected data directly from the DBMS or via a MOLAP server to the • Architecture results in significant data redundancy and may
desktop (or local server) in the form of a datacube, where it is stored,
analyzed, and maintained locally.
cause problems for networks that support many users.
• Promoted as being relatively simple to install and administer with reduced
cost and maintenance. • Ability of each user to build a custom data‐cube may cause a
Typical Architecture for HOLAP Tool is shown below lack of data consistency among users.
• Only a limited amount of data can be efficiently maintained.
55 56
OLAP Extensions to SQL
Desktop OLAP (DOLAP) • Advantages of SQL include that it is easy to learn, non‐procedural, free‐
format, DBMS‐independent, and that it is a recognized international
standard.
• Store the OLAP data in client‐based files and support multi‐
dimensional processing using a client multi‐dimensional • However, major limitation of SQL is the inability to answer routinely asked
engine. business queries such as computing the percentage change in values
between this month and a year ago or to compute moving averages,
cumulative sums, and other statistical functions.
• Requires that relatively small extracts of data are held on • Answer is ANSI adopted a set of OLAP functions as an extension to SQL to
client machines. They may be distributed in advance, or enable these calculations as well as many others that used to be
impossible or even impractical within SQL.
created on demand (possibly through the Web).
• IBM and Oracle jointly proposed these extensions early in 1999 and they
now form part of the current SQL standard, namely SQL: 2008.
• The extensions are collectively referred to as the ‘OLAP package’ and are
described as follows:
– Feature T431, ‘Extended Grouping capabilities’
57
– Feature T611, ‘Extended OLAP operators’ 58
Extended Grouping Capabilities Extended Grouping Capabilities
• ROLLUP Extension to GROUP BY
– enables a SELECT statement to calculate multiple levels of
• Aggregation is a fundamental part of OLAP. To improve aggregation subtotals across a specified group of dimensions. ROLLUP appears
capabilities the SQL standard provides extensions to the GROUP BY in the GROUP BY clause in a SELECT statement using the following
clause such as the ROLLUP and CUBE functions. format:
• ROLLUP supports calculations using aggregations such as SUM, SELECT ... GROUP BY ROLLUP(columnList)
COUNT, MAX, MIN, and AVG at increasing levels of aggregation, – ROLLUP creates subtotals that roll up from the most detailed level
from the most detailed up to a grand total. to a grand total, following a column list specified in the ROLLUP
clause.
• CUBE is similar to ROLLUP, enabling a single statement to calculate
– ROLLUP first calculates the standard aggregate values specified in
all possible combinations of aggregations. CUBE can generate the
the GROUP BY clause and then creates progressively higher level
information needed in cross‐tabulation reports with a single query.
subtotals, moving from right to left through the column list until
• ROLLUP and CUBE extensions specify exactly the groupings of finally completing with a grand total.
interest in the GROUP BY clause and produces a single result – ROLLUP creates subtotals at n + 1 levels, where n is the number of
set that is equivalent to a UNION ALL of differently grouped grouping columns. For instance, if a query specifies ROLLUP on
rows. grouping columns of propertyType, yearMonth, and city (n = 3),
the result set will include rows at 4 aggregation levels.
59 60
10
Example ‐ Using the ROLLUP Group Example ‐ Using the ROLLUP Group Function
Function
• Show the totals for sales of flats or houses by SELECT propertyType, yearMonth, city,
branch offices located in Nairobi, Mombasa, SUM(saleAmount) AS sales
or Nakuru for the months of August and FROM Branch, PropertyFor Sale, PropertySale
September of 2020. WHERE Branch.branchNo = PropertySale.branchNo
AND PropertyForSale.propertyNo =
PropertySale.propertyNo
AND PropertySale.yearMonth IN ('2008‐08', '2008‐09')
AND Branch.city IN (‘Aberdeen’, ‘Edinburgh’, ‘Glasgow’)
GROUP BY ROLLUP(propertyType, yearMonth, city);
61 62
• CUBE is typically most suitable in queries that use
Extended Grouping Capabilities columns from multiple dimensions rather than
• CUBE Extension to GROUP BY columns representing different levels of a single
– CUBE takes a specified set of grouping columns and creates dimension.
subtotals for all of the possible combinations. CUBE
appears in the GROUP BY clause in a SELECT statement
using the following format: Example ‐ Using the CUBE Group Function
• Show all possible subtotals for sales of properties by
SELECT ... GROUP BY CUBE(columnList) branches offices in Aberdeen, Edinburgh, and
Glasgow for the months of August and September of
– CUBE generates all the subtotals that could be calculated 2008.
for a data cube with the specified dimensions.
– CUBE can be used in any situation requiring cross‐tabular
reports. The data needed for cross‐tabular reports can be
generated with a single SELECT using CUBE. Like ROLLUP,
CUBE can be helpful in generating summary tables. 63 64
Example ‐ Using the CUBE Group Function Elementary OLAP Operators

SELECT propertyType, yearMonth, city, • Supports a variety of operations such as
SUM(saleAmount) AS sales
rankings and window calculations.
FROM Branch, PropertyFor Sale, PropertySale
WHERE Branch.branchNo = PropertySale.branchNo • Ranking functions include cumulative
AND PropertyForSale.propertyNo = distributions, percent rank, and N‐tiles.
PropertySale.propertyNo
• Windowing allows the calculation of
AND PropertySale.yearMonth IN ('2008‐08', '2008‐
09') cumulative and moving aggregations using
AND Branch.city IN (‘Aberdeen’, ‘Edinburgh’, functions such as SUM, AVG, MIN, and
‘Glasgow’) COUNT.
GROUP BY CUBE(propertyType, yearMonth, city);
65 66
11
Elementary OLAP Operators Example ‐ Using the RANK and DENSE_RANK Functions
• Ranking Functions • Rank the total sales of properties for branch offices in
– Computes the rank of a record compared to other Edinburgh.
records in the dataset based on the values of a set of
measures. There are various types of ranking
SELECT branchNo, SUM(saleAmount) AS sales,
functions, including RANK and DENSE_RANK. The
syntax for each ranking function is: RANK( ) OVER (ORDER BY SUM(saleAmount)) DESC
AS ranking,
RANK( ) OVER (ORDER BY columnList)
DENSE_RANK( ) OVER (ORDER BY SUM(saleAmount))
DENSE_RANK( ) OVER (ORDER BY columnList)
DESC AS dense_ranking
FROM Branch, PropertySale
• The difference between RANK and DENSE_RANK is that
WHERE Branch.branchNo = PropertySale.branchNo
DENSE_RANK leaves no gaps in the sequential ranking
sequence when there are ties for a ranking. AND Branch.city = ‘Edinburgh’
GROUP BY(branchNo);
67 68
• Windowing Calculations
Elementary OLAP Operators – Can be used to compute cumulative, moving, and
• Supports a variety of operations such as rankings and centered aggregates. They return a value for each row
window calculations. in the table, which depends on other rows in the
corresponding window.
• Ranking functions include cumulative distributions,
percent rank, and N‐tiles. – These aggregate functions provide access to more
than one row of a table without a self‐join and can be
• Windowing allows the calculation of cumulative and
used only in the SELECT and ORDER BY clauses of the
moving aggregations using functions such as SUM, AVG,
query.
MIN, and COUNT.
• Windowing Calculations
Example ‐ Using Windowing Calculations
– Can be used to compute cumulative, moving, and
centered aggregates. They return a value for each row • Show the monthly figures and three‐month moving
in the table, which depends on other rows in the averages and sums for property sales at branch office
corresponding window. B003 for the first six months of 2008.(See code in next
slide)
69 70
SELECT yearMonth, SUM(saleAmount) AS
monthlySales, AVG(SUM(saleAmount))
OVER (ORDER BY yearMonth, ROWS 2 PRECEDING)
AS 3‐month moving avg,
SUM(SUM(salesAmount)) OVER (ORDER BY
yearMonth ROWS 2 PRECEDING)
AS 3‐month moving sum
FROM PropertySale
WHERE branchNo = ‘B003’
AND yearMonth BETWEEN ('2008‐01' AND '2008‐
06’)
GROUP BY yearMonth
ORDER BY yearMonth;
71 72
12

Lec 4n5 - Warehousing N DM (Compatibility Mode)

Uploaded by

Document Information

Original Description:

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Lec 4n5 - Warehousing N DM (Compatibility Mode)

Uploaded by

Copyright:

Available Formats

DATA WAREHOUSING

time, location, type time, location, office time, type, office location, type, office

The Three Queries Definition of the CUBE Operator

Example ‐ Using the CUBE Group Function Elementary OLAP Operators

You might also like