Download as ppt, pdf, or txt
Download as ppt, pdf, or txt
You are on page 1of 27

Data Integration Across

Sources
Savings Loans Trust Credit card

Same data Different data Data found here Different keys


different name Same name nowhere else same data

1
AND
IT’S APPLICATIONS IN MODERN
BUSINESS
GROUP 10
• Priyanka Jaiswal (90)
• Payal Bhattacharya (87)
• Sayan Chatterjee (96)
• Shubha Shankar (98)
• Vivek Gupta (106)
Course Overview
 Introduction
 What Data Warehouse means
 What is Data Warehousing
 When Data Warehouse Evolved
 About the Data Warehouse Architecture
 Benefits of Data Warehousing
 How Data Warehousing helps in Modern
Business

3
A producer wants to know….

What
Whatisisthe
themost
most
effective
effectivedistribution
distribution
channel?
channel?

4
Data, Data everywhere
yet ...  I can’t find the data I need
 data is scattered over the network
 many versions, subtle differences

 I can’t get the data I need


 need an expert to get the data
 I can’t understand the data I found
 available data poorly documented

 I can’t use the data I found


 results are unexpected
 data needs to be transformed
from one form to other 5
What are the users saying...

 Data should be integrated


across the enterprise
 Summary data has a real value
to the organization
 Historical data holds the key
to understanding data over
time
 What-if capabilities are
required
6
What is a Data Warehouse?

A single, complete and


consistent store of data
obtained from a variety of
different sources made
available to end users in
an understandable format.

[Barry Devlin]

7
Data Warehouse
 A data warehouse is a
 subject-oriented
 integrated
 time-varying
 non-volatile
collection of data that is used primarily in
organizational decision making.
-- Bill Inmon, 1996

8
What is Data Warehousing?

A process of transforming
Information data into information and
making it available to users
in a timely enough manner
to make a difference

[Forrester Research, April 1996]

Data
9
Data Warehousing --
It is a process
 Technique for assembling and
managing data from various
sources for the purpose of
answering business questions.
Thus making decisions that were
not previous possible
 A decision support database
maintained separately from the
organization’s operational database

10
History of Data Warehousing

• Concept dates back to late 1980s.

• Given by IBM researchers Barry Devlin and Paul


Murphy.

• Concept was intended to provide an architectural


model for the flow of data from operational systems
to decision support environments.

11
Key Developments in Early Years of
Data Warehousing
YEAR DEVELOPER DEVELOPMENT

1960s General Mills & Dartmouth Terms “dimensions & facts”


College
1970s ACNielsen & IRI Dimensional Data Marts for
retail sales
1983 Teradata DBMS as decision supportive

1988 Barry Devlin & Paul Murphy of Introduced term of “business


IBM datawarehouse”
1990 Red Brick System RBS- a database for data
warehousing
1991 Prism Solutions PWM- software for developing
data warehouse
12
Explorers, Farmers and Tourists

Tourists: Browse
information harvested by
farmers
Farmers: Harvest information
from known access paths

Explorers: Seek out the unknown


and previously unsuspected rewards
hiding in the detailed data
13
Data Warehouse gets its data from:

•Relational Databases: matches data using common characteristics


found within the data set.

•ERP Systems: intended to manage all the information and


functions of a business or company from shared data stores.

•Purchased data: digital or hardcopy information purchase.

•Legacy Systems: it can refer to a database system that was


inherited by a team from previous project owners.

14
Data Warehouse Architecture
Relational
Databases
Optimized Loader
Extraction
ERP
Systems Cleansing

Data Warehouse
Engine Analyze
Purchased Query
Data

Legacy
Data

15
Data Warehouse Designs

BOTTOM-UP
TOP-DOWN DESIGN
DESIGN

HYBRID
DESIGN
16
TOP-DOWN APPROACH
NDIM

PLACEMENT ADMISSION

PACKAGE INTERNSHIPS FEES FACULTY

COMPANIES APPROVAL

17
BOTTOM – UP DESIGNS

NDIM

CLASS FACULTY LIBRARY

ECO QT BOOKS
B
A D IT STAFF
C 18
DIMENSIONAL
APPROACH

NORMALIZED
APPROACH

HOW TO STORE DATA...


19
Dimensional Approach

CUSTOMER
PRICE PAID FOR NAME
PRODUCTS

SALES
TRANSACTIONS

ORDER NO OF PRODUCTS
DATE ORDERED
PRODUCT
NUMBER
20
Normalized Approach
SALES TABLE
cust_id cust_name prod_idqty cost
101 p.kumar 1001A 4 100
102 s.shah 1002B 9 120
103 j.dubey 1001A 1 15
104 t.jain 1001B 10 200
105 g.thakur 1031T 5 70
106 m.mathur 1006M 3 40

CUSTOMER TABLE PRO DUCT TABLE


cust_id cust_name prod_id prod_name price/kg
101 p.kumar 1001A surf 52
1002B tea 60
102 s.shah
1001A rice 120
103 j.dubey
1001B wheat 100
104 t.jain 1031T flour 140
105 g.thakur 1006M sugar 35
106 m.mathur
21
Types of Data Warehousing
Applications
 Personal Productivity : such as spreadsheets,
statistical packages and graphics tools .

Query and Reporting : deliver warehouse-wide


data access.

Planning and Analysis : address budgeting,


forecasting, product line and customer
profitability, sales analysis.
22
Application Areas

Industry Application
Finance Credit Card Analysis
Insurance Claims, Fraud Analysis
Telecommunication Call record analysis
Transport Logistics management
Consumer goods Promotion Analysis
Data Service providers Value added data
Utilities Power usage analysis

23
Benefits of Data Warehousing
1. Has a subject area orientation
2. Integrates data from multiple, diverse sources
3. Allows for analysis of data over time
4. Adds ad hoc reporting and enquiry
5. Provides analysis capabilities to decision makers
6. Relieves the development burden on IT
7. Provides improved performance for complex analytical
queries
8. Relieves processing burden on transaction oriented databases
9. Allows for a continuous planning process
24
Disadvantages of Data Warehousing

1. Not the optimal environment


for unstructured data.

2. Maintenance costs are high.


3. Gets outdated relatively
quickly.

4. Duplicate, expensive
functionality may be developed.
25
Future of Data Warehousing
 By 2012, business units will control at least 40 percent of the
total budget for business intelligence.

 By 2010, 20 percent of organizations will have an industry-


specific analytic application delivered.

 In 2009, collaborative decision making will emerge as a new


product combines social software with business intelligence
platform capabilities.

 By 2012, one-third of analytic applications applied to


business processes will be delivered through coarse-grained
application mashups. 26
27

You might also like