Download as pptx, pdf, or txt
Download as pptx, pdf, or txt
You are on page 1of 27

Data Warehousing

Data warehouse Architecture

1
Three Tier Data Warehouse Architecture
• Generally a data warehouses adopts a three-tier architecture.

2
• Bottom Tier - The bottom tier of the architecture is the
data warehouse database server. It is the relational
database system. We use the back-end tools and utilities to
feed data into the bottom tier. These backend tools and
utilities perform the Extract, Clean, Load, and refresh
functions.
• Middle Tier - In the middle tier, we have the OLAP Server
that can be implemented in either of the following ways.
– By Relational OLAP (ROLAP), which is an extended relational
database management system.
– By Multidimensional OLAP (MOLAP) model, which directly
implements the multidimensional data and operations.
• Top-Tier - This tier is the front-end client layer. This layer
holds the query tools and reporting tools, analysis tools
and data mining tools.

3
4
Typical architecture of a data warehouse
Data Warehouse Components
• Operational data store
• Load Manager
• Warehouse Manager
• Query Manager
• End User Access Tool

5
Operational Data Stores (ODS)
• The data in a data warehouse comes from operational systems of
the organization as well as from other external sources. These
are collectively referred to as source systems.
• The source of data for the data warehouse is supplied from:
– Operational data held in network databases.
– Departmental data held in proprietary file systems such as
VSAM
– Private data held on workstations and private servers.
– External systems such as the Internet, commercially available
databases
– Databases associated with an organization’s suppliers or
customers.
–  VSAM: Virtual Storage Access Method

6
Data Warehouse Components
• Load Manager
– This component performs the operations
required to extract and load process.
– The size and complexity of the load manager
varies between specific solutions from one data
warehouse to other

7
Load Manager Architecture

8
Load manager extracts data from different sources; performs
simple transformations into structure similar to the one in the
data warehouse and loads to temporary data store.
• The load manager performs the following functions:
– Identification of data.
– Validation of data about the accuracy.
– Extraction of data from original source.
– Cleansing of data by eliminating meaningless values and
making it usable.
– Data formatting.
– Data standardization by getting them into a consistent form.
– Data merging by taking data from different sources and
consolidating into one place.
– Establishing referential integrity.

9
Data Warehouse Components
Warehouse manager is the centre of data-
warehousing system. The data within the data
warehouse is organized such that it becomes easy
to find, use and update frequently from its sources.
– A warehouse manager is responsible for the warehouse
management process.
– It consists of third-party system software, C
programs/data management tools.
– The size and complexity of warehouse managers varies
between specific solutions.

10
Warehouse Manager Architecture

11
• Operations Performed by Warehouse
Manager
– Analysis of data to ensure consistency
– Transformation and merging of source data
from temporary storage into data warehouse
tables
– Creation of indexes and views on base tables
– Generation of de-normalizations (if necessary)
– Generation of aggregations (if necessary)
– Backing-up and archiving data.

12
Query Manager
• Query Manager Component provides the
end-users with access to the stored
warehouse information through the use of
specialized end-user tools. Data mining
access tools have various categories such as
query and reporting, on-line analytical
processing (OLAP), statistics, data discovery
and graphical and geographical information
systems.
13
Query Manager
• Query Manager
– Query manager is responsible for directing the
queries to the suitable tables.
• By directing the queries to appropriate tables, the
speed of querying and response generation can be
increased.
– Query manager is responsible for scheduling
the execution of the queries posed by the user
– This component is typically constructed using
end-user data access tools, data warehouse
monitoring tools
14
Query Manager Architecture

15
• query manager includes the following:
– Query redirection via C tool or RDBMS
– Stored procedures
– Query management tool
– Query scheduling via C tool or RDBMS
– Query scheduling via third-party software

16
Metadata
• This area of the warehouse stores all the metadata
(data about data).
• Metadata is used for a variety of purposes
including:
– the extraction and loading processes – metadata is
used to map data sources to a common view of the data
within the warehouse
– the warehouse management process – metadata is
used to automate the production of summary tables
– as part of the query management process – metadata
is used to direct a query to the most appropriate data
source.
17
18
End-User Access Tools
• The principal purpose of data warehousing
is to provide information to business users
for strategic decision-making.
• These users interact with the warehouse
using end-user access tools.
– Reporting and query tools
– Online analytical processing (OLAP) tools;
– Data mining tools.

19
Data Warehouse Data Flows

20
• Inflow : Extraction, cleansing, and loading of the
source data.
• Upflow : Adding value to the data in the
warehouse through summarizing, packaging, and
distribution of the data.
• Downflow : Archiving and backing-up the data
in the warehouse.
• Outflow : Making the data available to end-
users.
• Metaflow: Managing the metadata.

21
Data Warehouse Models
• Virtual Warehouse
– The view over an operational data warehouse is
known as a virtual warehouse.
– Building a virtual warehouse is easy requires
excess capacity on operational database servers
• Data mart
• Enterprise Warehouse

22
Data Warehouse Models
• Virtual Warehouse
• Data mart
– Data mart contains a subset of organization-wide data
– Data marts are small in size.
– Data marts are customized by department.
– The source of a data mart is departmentally
structured data warehouse.
– Data marts are flexible.
• Enterprise Warehouse

23
Data Warehouse Models
• Virtual Warehouse
• Data mart
• Enterprise Warehouse
– An enterprise warehouse collects all the information
and the subjects spanning an entire organization.
– It provides us enterprise-wide data integration.
– The data is integrated from operational systems and
external information providers.
– This information can vary from a few gigabytes to
hundreds of gigabytes, terabytes or beyond.
24
Data Warehousing Tools
• Amazon Redshift
• Teradata
• Oracle
• Informatica 
• IBM Infosphere
• Ab Initio Software
• ParAccel 
• Cloudera 
• Analytix DS
• MarkLogic 
25
Assignment and Quiz # 2
• Give brief introduction of five data
warehousing tools
– Handwritten
– Deadline: Next lecture
• Quiz 2 next lecture

26
Project
• Submit the title and brief introduction of the
business you want to built a data
warehouse/ data mart
• Also highlight the subject(s) you want to
analyze.

27

You might also like