Download as pptx, pdf, or txt
Download as pptx, pdf, or txt
You are on page 1of 16

Data Warehousing

and

OLAP
Click to edit Master subtitle style

4/15/12

Motivation
On the other hand:

In most organizations, data about specific parts of business is there - lots and lots of data, somewhere, in some form. Data is available but not information -and not the right information at the right time.

Data warehouse is to:

bring together information from multiple sources as to provide a consistent database source for decision support queries. decision support applications from

4/15/12 off-load

Warehousing

Growing industry: $ 8 billion in 1998 Range from desktop to huge warehouses

Walmart: 900-CPU, 2,700 disks, 23TB Teradata system ROLAP, MOLAP, HOLAP rollup. drill-down, slice& dice

Lots of new terms


4/15/12

The Architecture of Data


Whats has been learned from data summaries by who, what, when, where,...
Business rules Metadata Database schema Summary data Operational data

Logical model physical layout of data

who, what, when,

where,

4/15/12

Decision Support and OLAP DSS: Information technology to help

knowledge workers (executives, managers, analysts) make faster and better decisions:

what were the sales volumes by region and by product category in the last year?

how did the share price of computer manufacturers correlate with quarterly profits over the past 10 years? will a 10% discount increase sales volume sufficiently?

4/15/12

OLAP is an element of decision support system

Data Processing Models


There are two basic data processing models:

OLTP the main aim of OLTP is reliable and efficient processing of a large number of transactions and ensuring data consistency. OLAP the main aim of OLAP is efficient multidimensional processing of large data volumes.

4/15/12

Traditional OLTP
Traditionally, DBMS have been used for on-line transaction processing (OLTP)

order entry: pull up order xx-yy-zz and update status field banking: transfer $100 from account X to account Y

clerical data processing tasks detailed up-to-date data structured, repetitive tasks short transactions are the unit of work read and/or update a few records isolation, recovery, and integrity are critical

4/15/12

OLTP vs. OLAP

OLTP: On Line Transaction Processing

Describes processing at operational sites

OLAP: On Line Analytical Processing

Describes processing at warehouse

4/15/12

OLTP vs. OLAP


OLTP
users function DB design data

OLAP
Clerk, IT professional Knowledge worker decision support subject-oriented historical, summarized multidimensional day to day operations application-oriented

current, up-to-date detailed, flat relational isolated

integrated, consolidated ad-hoc lots of scans

usage access

repetitive read/write,

index/hash on prim. key unit of work short, simple transaction tens hundreds complex query millions

# records accessed 4/15/12 #users thousands

A data warehouse is a subject-oriented, integrated, time-variant, and nonvolatile collection of data in support of managements decisionmaking process. --- W. H. Inmon Collection of data that is used primarily in organizational decision making

What is a Data Warehouse

A decision support database that is 4/15/12 maintained separately from the

Subject oriented: oriented to the major subject areas of the corporation that have been defined in the data model.

Data Warehouse - Subject Oriented

E.g. for an insurance company: customer, product, transaction or activity, policy, claim, account, and etc.

Operational DB and applications may be organized differently

4/15/12 E.g.

based on type of insurance's:

Data Warehouse Integrated

There is no consistency in encoding, naming conventions, , among different data sources Heterogeneous data sources When data is moved to the warehouse, it is converted.

4/15/12

Data Warehouse - NonVolatile

Operational data is regularly accessed and manipulated a record at a time, and update is done to data in the operational environment. Warehouse Data is loaded and accessed. Update of data does not occur in the data warehouse environment.

4/15/12

The time horizon for the data warehouse is significantly longer than that of operational systems.

Data Warehouse - Time Variance

Operational database: current value data. Data warehouse data : nothing more than a sophisticated series of snapshots, taken of at some moment in time.

The key structure of operational data may or may not contain some element of time. The key structure of the data warehouse always contains some element of time. 4/15/12

Why Separate Data Warehouse?

Performance

special data organization, access methods, and implementation methods are needed to support multidimensional views and operations typical of OLAP Complex OLAP queries would degrade performance for operational transactions Concurrency control and recovery modes of OLTP are not compatible 4/15/12with OLAP analysis

Why Separate Data Warehouse?

Function

missing data: Decision support requires historical data which operational DBs do not typically maintain data consolidation: DS requires consolidation (aggregation, summarization) of data from heterogeneous sources: operational DBs, external sources data quality: different sources typically use inconsistent data 4/15/12 representations, codes and formats

You might also like