Download as ppt, pdf, or txt
Download as ppt, pdf, or txt
You are on page 1of 43

Introduction to

Data Warehousing
The Architecture of Data
What’s has been
learned from data
summaries by Logical model
who, what, rules
when, where,... physical layout of

Database schema
Summary data what,
Operational data where,
Why all the Excitement ?
 The ability to perform a complete financial analysis of business
processes will allow organizations to make decision based on an
understanding of the entire system rather than using rough
estimates based on incomplete data

 The ability to rationalize and automate the process of building an

integrated enterprisewide information store rather than developing
many individual DSSs and the corresponding infrastructure.

 The h/w, s/w and storage cost related to the development,

deployment and maintenance of large informational data stores
continue to decline.

 The benefits of data warehousing can be easily extended to

strategic decision-making which can yield vary large and tangible
The Need for data Warehousing
 Data Warehousing is an architectural construct of
information system that provides users with current and
historical decision support information that is hard to
access in traditional operational data store.
 Decisions need to be made quickly and correctly, using
all available data.
 Users are business domain experts, not computer
 The amount of data doubles every 18 month, which
affects the response time and sheer ability to comprehend
its contents.
Computing Paradigm Shift
 A traditional view of computing can be that of a user
that access a computer via communication network.
** There is a need to access a known computer
paradigm that resided on a known remote system.
 A new way of computing is that users use
computers to solve problems and to request services
distributed across a network.
** Distributed client/server computing is being
positioned in enterprise.
Business Paradigm Shift

 A traditional business enterprise from the manual

back and front office to an automate back office for
online transaction processing.
 These innovations improved the processing speed
and throughput while having the same business
 Users are demanding more value for the money
and are well aware of the competitive offerings.
The Role of IT
New Enterprises,
Business Process Reengineering
Paperless Office
Expert System

C/S, Distributed computing

Open System
Multimedia, Object orientation
Automated enterprises
Back Office Automation
Proprietary System Mainframe
Manual Back Office
Manual Front Office
 Operational data focuses on transactional functions such
as bank card withdrawals and deposits.
*This data is part of the corporate infrastructure;
it is detailed, nonredundant and updateable and this
reflects current values.
 Informational data is organized around subjects such as
customer, vendor and product.
 It focuses on providing answers to problems posed by
decision maker.
 It is often summarized redundant to support varying data
views and nonupdateable.
 Informational data is obtained from Operational data.
Operational System

 Are Organized by application.

 Are update intensive
 Use current data
 Are optimize for higher performance
 Access few records per transaction, often direct
access by primary key.
 Support a large no. of short transaction.
 Support a large no. of concurrent user.
Differences Between OD Vs ID
Data Access Current Value Summarized, Derived
Data Organization By Application By Subject
Data Stability Dynamic Static until refresh
Data Structure Optimize for transactions Optimize for complex
Access Frequency High Medium to Low
Access type Read/Update/Delete Read/ Aggregate
Usage Predictable/Repetitive Ad hoc,Unstructure
Response Time Sub second(<1 s to 2-3 s) Several second to min.
Operational Data Store
 An ODS is an architectural concept to support day-to-
day operational decision support and contains current
value data propagated from operational application.
 The data maintained in the ODS to be subjected to
frequent changes as the corresponding data in the
operational system change.
 ODS provides an alternative to operational DSS
*DSS access data directly from the OLTP system
*DSS has performance impact on OLTP.
Data Warehouse
A data warehouse can be viewed as an information
system with
 It is a database designed for analytical task, using
data from multiple application.
 Its usage is read-intensive
 Its contents is periodically update (addition)
 It contains current and historical data to provide a
historical perspective of info.
 It contains a few large tables.
 Each query frequently results in a large results set
and involves frequent full table scan and multiple
What is a Data Warehouse
 “A Data Warehouse is a Subject-oriented,
Integrated, Time-variant, and Nonvolatile
collection of data in support of
Management’s Decision-making Process.”
--- W. H. Inmon
 Collection of data that is used primarily in
organizational decision making
 A decision support database that is maintained
separately from the organization’s operational database
Data Warehouse - Subject
 Subject Oriented: oriented to the major subject
areas of the corporation that have been defined
in the data model.
E.g. for an insurance company: customer,
product, transaction or activity, policy, claim,
account, and etc.
 Operational DB and applications may be
organized differently
E.g. based on type of insurance's: auto, life,
medical, fire, ...
Data Warehouse – Integrated

 There is no consistency in encoding, naming

conventions, …, among different data sources.

 Heterogeneous data sources.

 When data is moved to the warehouse, it is

Data Warehouse - Non-Volatile

 Operational data is regularly accessed and

manipulated a record at a time, and update is done
to data in the operational environment.

 Warehouse Data is loaded and accessed. Update of

data does not occur in the data warehouse
Data Warehouse - Time
 The time horizon for the data warehouse is
significantly longer than that of operational
Operational Database: Current value data.
Data Warehouse Data : Nothing more than a
sophisticated series of snapshots, taken of at some
moment in time.
 The key structure of operational data may or may
not contain some element of time. The key
structure of the data warehouse always contains
some element of time.
 Current Detail Data: Data that is acquired directly from
the operational databases and often represents an entire
 Old Detail Data: This represents history of the subject
areas; this data is trend analysis possible.
 Data Marts: A subset of a Data Warehouse that supports
the requirements of a particular department or business
 Summarized Data: Data that is aggregated along the lines
required for executive level reporting, trend analysis and
enterprise wise decision making
 Drill Down: An ability of a knowledge worker to perform
business analysis in a top down fashion, traversing the
summarization levels from highly summarized data to the
underlying current or old detail.
 Meta Data: It is a data about data containing
 The location and description of warehouse system
 names, definition, structure and content of the data
warehouse and end user view.
 Identification of authoritative data source.
 A history of warehouse updates.
Advantages of Warehousing
• High query performance
• Queries not visible outside warehouse
• Local processing at sources unaffected
• Can operate when sources unavailable
• Can query data not stored in a DBMS
• Extra information at warehouse
– Modify, summarize (store aggregates)
– Add historical information
2-Tier Data Warehouse Arch.
Warehouse Server

Query Specification Data Logic
Data Analysis Data Services
Report Formatting Metadata
Summarizing File Services
Data Access
Multitiered Data Warehouse Arch.
Warehouse Server
Meta Data Meta Data
Client Multidimensional Warehouse Data
Data Server

GUI/Presentation Filtering Data Logic

Summarizing Data Services
Query Specification
Data Analysis Metadata File Services
Report Formatting
Summarizing Multidimension
Data Access
External Operational
data databases
Data Extract
The Architecture of
1.Admin Clean Up
Data Warehouse
Meta Data 6.Mgmt Environment

3.Data Warehouse

4.Data Marts 7.Informatio

n Delivery

and Tools

OLAP Tools Data Mining Tools Report, Query, EIS Tools

**Multi Relational Database **Multi Dimensional Database

Data Marts

A subset of a Data Warehouse that supports

the requirements of a particular department or
business function.
Data Marts

Differences From Data Warehouses

– Only focuses on the requirements for users
associated with one business function
– Usually don’t contain detailed operational
– More easily understood and navigated
Types of Data Marts



Multidimensional Data Marts

 Contain numeric data

 Contain sparsely populated matrices

 Maintain structure as data enters the framework

 Only changed cells are updated

Relational Data Marts

 Contains numeric and textual data

 Maintain structured data

 Employ indexes

 Support star schemas

Networked Data Marts

 Distributed data marts can be accessed

from a single workstation.
 Data needs to be cleansed and
 A standard metadata directory must exist.

The Client/Server computing model implies
a cooperative processing of requests
submitted by a client to the server which
processes the requests and return the
results to the client

Host CPU Corporate Database

 Host-based processing environment does not have
any distributed application capabilities.

 Host-based application processing is perform on

one computer system with attached unintelligent
“dumb” terminal.


Slave CPU Host CPU master Slave CPU


Slave CPU
 Next-Higher level of distributed application

 Slave computer are attached to master computer and

perform application processing related function only
as directed by their master.

 Application processing in a master-slave

environment is some what distributed but
unidirectional master to slave.
Ist Generation C/S Processing




 Found in LAN, also known as Shared Device LAN.
 PCs are attached to a system processing environment
device that allows these PCs to share a common resource
Eg-A file, A printer.
 Such shared device are called as server.
Eg- File server, Printer server.

 All application processing is performed on individual
PCs and only certain function is are distributed.
C/S Processing Model




 This model is extension of shared device processing.

 In this model, application processing is divided between

the client and server.

 The processing is actually initiated and partially

controlled by the client but not in Master/Slave fashion.

 Both the client and server cooperate to successfully

execute the application
IInd Generation C/S Processing
 It deals with the Server, dedicated to application, data,
transaction, mgmt, etc.

 Data structure supported by this model range from

relational to multidimensional to structure to

 The IInd Gen. is characterized by a multitiered

architecture which promotes migration of the application
logic from the client to the application server in a three
tiered environment.
Server Function
The following function are required of server by user.
1.File Sharing
2.Printer Sharing
3.Database Access
A server must satisfied these requirement.

1.Multi user support

3.Performance and throughput
4.Storage capacity
6.Networking and Communication

You might also like