Download as pptx, pdf, or txt
Download as pptx, pdf, or txt
You are on page 1of 51

Data Warehouse

Data Warehouse Defined

A physical repository where relational data are specially


organized to provide enterprise-wide, cleansed data in a
standardized format

“The data warehouse is a collection of integrated, subject-


oriented databases design to support DSS functions, where
each unit of data is non-volatile and relevant to some moment
in time”
Characteristics of Data Warehouse (DW)

Subject
oriented
Real time,
Relational/Multidimens Integrated
ional

Time-variant
Metadata (time series)

Client/Server
Nonvolatile
architecture
Metadata

 Syntactic Metadata
 Structural Metadata
 Semantic Metadata
Data Warehouse and Data
Warehousing
Data Warehousing

 Data warehousing is the entire process or


a discipline that results in applications that
provide decision support capability
 Allows ready access to business
information
 Creates business insight
Types of Data Warehouses
Operational data stores (ODS)
• A type of database often used as an interim area
for a data warehouse

Enterprise data warehouse (EDW)


• A data warehouse for the enterprise.

Data Marts
• A subset of data warehouse, consisting of a single
subject area.
Data Mart

A departmental data warehouse that stores only relevant data.


A data mart is focusing on a particular subject or department
(e.g. marketing, operations)

 Dependent data mart


A subset that is created directly from a data warehouse

 Independent data mart


A small data warehouse designed for a strategic business
unit or a department
Example Case: First American
Corporation
 Changed its corporate strategy from traditional banking
approach to one that was ce0mntered on CRM
 From 60mn$ loss in 1990 to innovative financial services
leader a decade later
 VISION data warehouse
 Identification of top 20 percent profitable customers
 Retention strategies
 Lower cost distribution channels
 Redesigned information flows
Data Warehouse concept
A Conceptual Framework for DW

No data marts option


Data Applications
Sources (Visualization)
Access
Routine
ERP Business
ETL
Reporting
Process Data mart
(Marketing)
Select
Legacy Metadata Data/text

/ Middleware
Extract mining
Data mart
(Engineering)
Transform Enterprise
POS Data warehouse
OLAP,
Integrate
Data mart Dashboard,

API
(Finance) Web
Other Load
OLTP/wEB
Replication Data mart
(...) Custom built
External
applications
data
Major components of a data warehouse

 Data sources. Data are sourced from operational systems and possibly from
external data sources.
 Data extraction. Data are extracted using custom-written or commercial
software called ETL.
 Data loading. Data are loaded into a staging area, where they are transformed
and cleansed. The data are then ready to load into the data warehouse.
 Comprehensive database. This is the EDW that supports decision analysis by
providing relevant summarized and detailed information.
 Metadata. Metadata are maintained for access by IT personnel and users.
Metadata include rules for organizing data summaries that are easy to index
and search.
 Middleware tools. Middleware tools enable access to the data warehouse from
a variety of front-end applications.
Top 5 data warehouses on the
market today
 Teradata
 Oracle
 Amazon Web Services
 Cloudera
 MarkLogic

Source: https://www.monitis.com/blog/top-5-
data-warehouses-on-the-market-today/
Lecture 9
Data Integration and
the Extraction, Transformation and
Load (ETL) Processes
Data Integration

 It comprises of three major processes that


when correctly implemented, permit data
to be accessed and made accessible to an
array of ETL and analysis tools and data
warehousing environment.
Data Integration

 Data Access: the ability to access and extract data


from any data source.
 Data Federation: the integration of business
views across multiple data stores
 Change Capture: based on the identification,
capture and delivery of the changes made to
enterprise data sources
Example

 Bank of America—implemented data warehouse for DI


 Teradata warehouse is the platform for its integrated EDW
 SAS institute– customer data integration tools
 Oracle business intelligence suits for integrating data
Integration Technologies

 Enterprise application integration (EAI)

 Service-oriented architecture (SOA)

 Enterprise information integration (EII)

 Extraction. Transformation and Load (ETL)


Enterprise Application Integration

 Enterprise application integration (EAI) is the use of technologies and


services across an enterprise to enable the integration of software applications
and hardware systems.
 It involves integrating application functionality and is focused on sharing
functionality (rather than data) across systems, thereby enabling flexibility
and reuse.
Advantages of EAI:

 Ensures consistent information across different systems.


 Streamlining processes
 Accessing information more efficiently
 Transferring data and information across multiple platforms
 Users are not required to learn new or different applications because a
consistent software application access interface is provided.
Enterprise Information
Integration (EII)
 Real time data integration from a variety of sources such as
relational databases, Web services etc
 Use predefined metadata to populate views that make
integrated data appear relational to end users
 XML allows data to be tagged either at creation time or later
 And these tags can be extended and modified to
accommodate almost any area of knowledge
ETL Process

Packaged Transient
application data source

Data
warehouse

Legacy
Extract Transform Cleanse Load
system

Data mart
Other internal
applications
ETL Process

 Extraction: selecting data from one or more


sources and reading the selected data. 
 Transformation: converting data from their
original form to whatever form the DW needs.
This step often also includes cleansing of the data
to remove as many errors as possible.
  Loading: putting the converted (transformed)
data into the DW.
 
ETL
Issues affecting the purchase of and ETL tool
• Data transformation tools are expensive
• Data transformation tools may have a long learning curve

Important criteria in selecting an ETL tool


• Ability to read from and write to an unlimited number of
data sources/architectures
• Automatic capturing and delivery of metadata
• A history of conforming to open standards
• An easy-to-use interface for the developer and the
functional user
Benefits of DW
Direct benefits of a data warehouse
• Allows end users to perform extensive analysis
• Allows a consolidated view of corporate data
• Better and more timely information
• Enhanced system performance
• Simplification of data access

Indirect benefits of data warehouse


• Enhance business knowledge
• Present competitive advantage
• Enhance customer service and satisfaction
• Facilitate decision making
• Help in reforming business processes
Thank You
Data Warehouse Development

 Things Go Better with Coke’s Data Warehouse


 In the face of competitive pressures and consumer demand,
how does a successful bottling company ensure that its
vending machine are profitable?
 Hokuriku Coca-coal Bottling company, Japan
 Developed data warehouse and analytical software
implemented by Teradata corp.
 Collects near real time data such as time and date of each sale,
machine malfunctioning along with historical data.
 Total sales immediately increased by 10 percent, overtime and
other cost reduced by 46 percent.
Data Warehouse Design
Approaches
 Data warehouse design is one of the key technique in
building the data warehouse.
 Choosing a right data warehouse design can save the project
time and cost.
 Basically there are two data warehouse design approaches
are popular.
Data Warehouse Development

 Data warehouse development approaches


 Inmon Model: EDW approach (top-down)
 Kimball Model: Data mart approach
(bottom-up)
Top Down Approach

 Inmon’s approach starts with an enterprise data warehouse, creating data


marts as subsets if appropriate.

 It is most effective when there is a recognized need for an EDW, an


executive “champion” of the project, and a willingness to invest in a data
warehousing infrastructure before it will show results.

 In the top-down design approach the, data warehouse is built first. The
data marts are then created from the data warehouse.
 
Advantages of top-down design are:

 Provides consistent dimensional views of data across data marts, as all


data marts are loaded from the data warehouse.
 This approach is robust against business changes. Creating a new data
mart from the data warehouse is very easy.

Disadvantages of top-down design are:

 This methodology is inflexible to changing departmental needs during


implementation phase.
 It represents a very large project and the cost of implementing the
project is significant.
Bottom Up Approach

 Kimball’s approach starts with data marts, consolidating them into an


EDW later if appropriate.
 It is most effective when it is desired to provide a “proof of concept”
implementation before embarking on a full-scale EDW project or
when a well-defined area with the greatest benefits can be identified.
 In the bottom-up design approach, the data marts are created first to
provide reporting capability.
 A data mart addresses a single business area such as sales, Finance
etc.
 These data marts are then integrated to build a complete data
warehouse.
Advantages of bottom-up design are:

 This model contains consistent data marts and these data marts can be
delivered quickly.
 As the data marts are created first, reports can be generated quickly.
 The data warehouse can be extended easily to accommodate new
business units. It is just creating new data marts and then integrating
with other data marts.

Disadvantages of bottom-up design are:

 The positions of the data warehouse and the data marts are reversed in
the bottom-up approach design.
Comparison between Data warehouse development
approaches
Data Warehouse Structure: The
Star Schema
 A star schema is the one in which a central fact table is
sourrounded by denormalized dimensional tables.
 A star schema can be simple or complex.
 A simple star schema consists of one fact table where as a
complex star schema have more than one fact table.
 The data warehouse design is based on the concept of
dimensional modelling.
 Dimensional modelling is a retrieval based system that
supports high volume query access
 .
Data Warehouse Structure:
The Star Schema
 The fact table contains a large number of rows that
correspond to observed business or facts.
 The fact table contains attributed needed to perform
decision analysis, descriptive attributes used for query
reporting and foreign keys to link to dimension tables.
 The dimension tables contain classification and aggregation
information about the central fact rows.
Star Schema
Example: Football Club
Football Club Dimensional Model
DW Development Approaches
Similarities and differences between the Inmon and
Kimball data warehouse development approaches

 Similarities: Both methods can produce an


enterprise data warehouse and subset data marts.

 Differences: Inmon’s approach starts with an


enterprise data warehouse, creating data marts as
subsets of that EDW if appropriate. The focus is
on proven, traditional methods and technologies.
Kimball’s starts with data marts, consolidating
them into an EDW later if appropriate. It focuses
in creating a useful end-user capability quickly.
Real-time DW (a.k.a. Active Data Warehousing )
 It is the process of loading and providing data via the data
warehouse as they become available.
 The active traits of RDW/ADW supplement and expand
traditional data warehouse functions into the realm of tactical
decision making.
 Enabling real-time data updates for immediate analysis and
decision making is growing rapidly
Brobst’s 5 Stage Model of Active Data warehouse

Reporting

Active warehousing Analysis

Operationalizing Prediction
Data Warehouse Administration

 Dueto its huge size and its intrinsic nature, a


DW requires especially strong monitoring in
order to sustain its efficiency, productivity and
security.
 The successful administration and management
of a data warehouse entails skills and
proficiency that go past what is required of a
traditional database administrator.
 Requiresexpertise in high-performance software,
hardware, and networking technologies
DW Scalability and Security

 Scalability
 The main issues pertaining to scalability:
 The amount of data in the warehouse
 How quickly the warehouse is expected to grow
 The number of concurrent users
 The complexity of user queries
 Good scalability means that queries and other data-
access functions will grow linearly with the size of the
warehouse
 Security
 Emphasis on security and privacy
Security concerns involved in building a data
warehouse.

 1.Laws and regulations, in the U.S. and elsewhere, require


certain safeguards on databases that contain the type of
information typically found in a DW.

 2.The large amount of valuable corporate data in a data


warehouse can make it an attractive target.

 3.The need to allow a wide variety of unplanned queries in a


DW makes it impractical to restrict end user access to specific
carefully constrained screens, one way to limit potential
violations.
Effective security in a data warehouse should focus on four main areas:

 Step 1.Establishing effective corporate and security policies


and procedures. An effective security policy should start at the
top and be communicated to everyone in the organization.
 Step 2.Implementing logical security procedures and
techniques to restrict access. This includes user authentication,
access controls, and encryption.
 Step 3.Limiting physical access to the data center
environment.
 Step 4.Establishing an effective internal control review
process for security and privacy. 
THANK YOU

11

You might also like