Warehousing: Presented by

You might also like

Download as pptx, pdf, or txt
Download as pptx, pdf, or txt
You are on page 1of 29

DATA

0
WAREHOUSING

PRESENTED BY:­
OUTLINES:-
• WHAT IS A DATA WAREHOUSING
• DATA WAREHOUSING DEFINITION
• HISTORY OF DATA WAREHOUSING
• FACTS ABOUT DATA WAREHOUSING
• CHARACTERISTICS OF DATA
WAREHOUSING
• USAGE AND TRENDS
• ARCHITECTURE OF DATA
WAREHOUSE
• DBMS VS DATA
WAREHOUSE
What Is A Data Warehouse?
A data warehouse is a powerful
database model
that significantly enhances the user's ability
to
quickly analyze large, multidimensional
data sets.
It cleanses and organizes data to allow users
to
make business decisions based on facts.
Hence,
the data in the data warehouse must have
trnnn
Data Warehousing Definition:-
• Date warehousing is an aspect to gather data
from multiple sources into central
repository,called Data
warehouse.
• According to William H. lnmon,a
leading
architect in the construction of data
warehouse systems,"A data warehouse is a
subject - oriented ,integrated ,time variant
and non­
volatile collection of data in support of
management's decision making
process.
• "A data warehouse is simply a single
complete,
and consistent store of data obtained from a
D TA WAREHOUSES
Data Warehouses:
• Data spread in several databases -
physically located at numerous sites
• Data warehouse - repository of multiple
DBs in single schema; resides at single site.
• Data warehousing processes
1.Data Cleaning 2. Data Integration 3. Data
Transformation
4. Data Loading 5. Periodic data refreshing
datasourcein cli
Vancouver ent
cle n
transfor query
darn
m and
datasourcein w
New York integrat arehouse analysi
e load s
tools

clie
datasourceinChi
nt
cago

Datawarehouse
diagram
• Data cleaning:-Data Cleaning includes,
filling in missing values, smoothing noisy
data, identifying or removing outliers, and
resolving inconsistencies.
• Data integration:-Data Integration includes
integration of multiple databases, data
cubes, or files.
• Data transformation:-Convert data from
legacy or host format to warehouse format.
• Load :-sort;summarize,consolidate;compute
views; check integrity.Build indices and
partitions.
• Refresh:-Propagates the update from data
sources to the warehouse.
l \\·nt. 1

n u1u C
ho111l11u

rtv
11 1 ,

• - r-----......,-
o
••

-
- ,.
-

1111::"1

Ooh1 T,111ns.ro, n,n -·.. -0 .02 , 0 .32 , 1 .00 , 0 .50 0


1
,

tlo,,
48

Data in a data warehouse are
organized around major subjects.

Data provide information on
historical perspective - summarized
on periodic dimension.
• Eg. Sales of an item for a region in a
period

Data warehouse model -
multidimensional database structure /
data cube
• Dimensions - Attributes / set of
attributes
• Facts - Aggregated measures
History of data warehousing
• The concept of data warehousing
dates back to the late 1980s when
IBM researcher Barry davlin and
paul murphy developed the ''the
business data warehouse''.
• In essence, the data warehousing
concept was intended to provide an
architectural model for the flow of data
from opeational systems to decision
support environments.
Facts about
data
warehousing:-
• Issues involved in warehousing include
techniques for dealing with errors and
techniques for efficient storage and
indexing of large volumes of data.
• This system is used for reporting and data
analysis.
• It usually contains historical data derived
from transaction data.
• Data warehousing is not meant for current
"Iive"data.
Components of a
data warehouse
• Sources -Data source interaction
• Data Transformation
• Data warehouse (data storage)
• Reporting (Data presentation )
• Metadata
Data
Business Int Sources
elligence
Executiv •
e
Dashboard

O
p
e
r
a
t
Data
i Warehouse
o
n
a
InteractiveQuery l
& Reporting
E
&
x
A p
n o
a r
l t
-.I
Data Warehouse Advantages
Complete control over the four main
areas
of data management systems -

• Clean data
• Query processing: multiple options
• Indexes: multiple types
• Security: data and access
Data Warehousing
Disadvantages
• Adding new data sources takes time and
associated high cost.

• Data owners lose control over their data,


raising ownership, security and privacy
issues.

• Long initial implementation time and


associated high cost.

• Difficult to accommodate changes


in data
tYP.es and ranges data source schema,
1naexes and queries.
Characteristics of Data
.-.1Warehousing:-
• Subject - Oriented:-A data warehouse
can be used to analyze a particular
subject area.
For exam ple:-"sales" can be a
particular subject.
• lntegrated:-A data warehouse integrates
data from multiple data sources.
For example:-Source A and source 8 may
have different ways of identifying a product,
but in a data warehouse,there will be only
a single way of identifying a product.
• Time Variant :-Historical data is kept in a
data warehouse.
For examp le:-One can retrieve data from 3
months ,6months, 12 months ,or even
older data from a data warehouse.
• Non volatile:-Once data is in the data
warehouse,it will not change.So,historical
data in a data warehouse should never be
altered.
• It must be optimized for access to very
large amount of data.
• It is based on client server
architecture.
• It is capable of handling dynamic
matrices.
• It maintains transparency.
DATA WAREHOUSE
USAGE:-
• Three kinds of data warehouse
applications
1)Information processing :-Supports querying, basic
statistical analysis, and reporting using crosstabs, tables,
charts and graphs.
2) Analytical processing:-
• Multidimensional analysis of data warehouse data
• Supports basic OLAP operations, slice-dice, drilling,
pivoting
3) Data mining:-
• Knowledge discovery from hidden patterns
• Supports associations, constructing analytical models,
performing classification and prediction, and
presenting the mining results using visualization tools.
• Differences among the three tasks
TRENDS IN DATA
WAREHOUSING
• In the next few years, data warehousing is
expected make big strides in software,
especially for optimizing queries:-
o indexing very large tables
o enhancing SOL
o improving data compression
o expanding dimensional modeling
• Real-Time Data Warehousing
• Multiple Data Types
• Adding Unstructured Data
• Searching Unstructured Data
• Spatial Data
• Data Visualization
• Major Visualization Trends
• Visualization Types
• Advanced Visualization TechniquesChart
Manipulation.
• Drill Down.
• Advanced Interaction
Architecture of data
warehouse
r
- - -- - - - - -
'
Q u- --e t y/ R e p o rt An al y Top tier:

I
Fr on t-En d
' T ools
''
' s -i s Da -ta-M in-i-n-g J
, ''

OL>U> Se rv
er Middle tier
OL ' U > E n g i n e

- -- - - - - --- -- -- - -"""-""-- --. - - f t n_ - ;=:::::s.;-:.-: - - - - - - - - --


Trans:fonn
- - - - -warehouse Load Server
Refresh
Data Cleaning
and
D a t a In t e g ra t i o n

O p e r a t i o n a l Datab ases EC><Yernal so urc es

Backend tools
fig:- A three tier data warehousing
1 )Bottom tier:-The bottom tier is a
warehouse database server that is always a
relational database system.
• Back-end tools and utilities are used to feed data
into the bottom tier from operational databases
or other external sources. These tools and
utilities perform data extraction,cleaning and
transformation as well as load and refresh
functions to update the data warehouse.
• The date extracted using application program
interfaces known as gateways.
• Example of gateways are ODBC(open database
connection)and OLEDB(Open Linking and
embedding for database) by microsoft and
jdbcUava database connecton).
• This tier also contains a metadata repository, which
stores information about the data warehouse and
its contents.
2.)Middle tier:- The middle tier is an OLAP
server that is typically implemented using
either:-
a) A relational OLAP (ROLAP) model that
is,an extended relation DBMS that maps
operations.Intermediate server b/w
relational back-end server and client front
end tools.
b) A multidimentional OLAP (MOLAP) model
that is, a special purpose server that
directly implements multidimentional data
and operations. Supports
multidimention vi ews.

3.)Top tier:-The top tier is a front -end client
layer ,which contains query and reporting
tools ,analysis tools,and or data mining
tools.
• Note:-
OLAP - Online Analytical Processing:
► This is the major task of Data
Warehousing System.
► Useful for complex data analysis
and decision making.
►Market oriented -used by
managers,executives and data
analyst.
DBMS VS Data
Warehousing
In today's corporate world ,every
business enterprise ,no matter how
big or small requires a database. The
more the business grows, the more
urgent is the requirement of a
database .The database is required to
keep a check on the growth of a
business in a specific period.
DBMS:-DBMS is at times known as
the database manager although it is
the abbreviated form of database
management system.
• It is basically a repertoire of computer
programs that devoted for the
management of the database of an
organization .
• It is a complete and comprehensive
methodology in use for specific purposes
• Like overall management of digital data­
bases,creation and maintenance of
data,searching and serving other
operations relating to the database.
DATA WAREHOUSE:- A data warehouse
is usually a place where various types'
data -bases are stored mainly for
purpose of security ,archival analysis
and storage.
• The data warehouse consists of either
one or several computer systems
that are networked together form a
single computer system.
• The data warehouse is a database of a
different kind: an OLAP (online analytical
processing) database. A data warehouse
exists as a layer on top of another database
or databases (usually OLTP databases).
• In DBMS,there is OLTP(online
transaction processing )is used.Here we
cannot analysis because data changes
day by day.
• In data warehousing there is
OLAP(online analytical processing .It
maintain historical data.It collects data
from different databases like oracle and
so on.It is used to find analysis and
generate reports.
• The key difference between DBMS and
data warehouse is the fact that a data
warehouse can be treated as a ty e
of database or a kind of database wich
provides special facilities for analysis
and reporting while DBMS is the overall
I
system which manages a certain
, I

for your attention!

You might also like