Professional Documents
Culture Documents
Lecture 10 - 11 - Data Warehousing
Lecture 10 - 11 - Data Warehousing
Subject
oriented
Real time,
Relational/Multidimens Integrated
ional
Time-variant
Metadata (time series)
Client/Server
Nonvolatile
architecture
Metadata
Syntactic Metadata
Structural Metadata
Semantic Metadata
Data Warehouse and Data
Warehousing
Data Warehousing
Data Marts
• A subset of data warehouse, consisting of a single
subject area.
Data Mart
/ Middleware
Extract mining
Data mart
(Engineering)
Transform Enterprise
POS Data warehouse
OLAP,
Integrate
Data mart Dashboard,
API
(Finance) Web
Other Load
OLTP/wEB
Replication Data mart
(...) Custom built
External
applications
data
Major components of a data warehouse
Data sources. Data are sourced from operational systems and possibly from
external data sources.
Data extraction. Data are extracted using custom-written or commercial
software called ETL.
Data loading. Data are loaded into a staging area, where they are transformed
and cleansed. The data are then ready to load into the data warehouse.
Comprehensive database. This is the EDW that supports decision analysis by
providing relevant summarized and detailed information.
Metadata. Metadata are maintained for access by IT personnel and users.
Metadata include rules for organizing data summaries that are easy to index
and search.
Middleware tools. Middleware tools enable access to the data warehouse from
a variety of front-end applications.
Top 5 data warehouses on the
market today
Teradata
Oracle
Amazon Web Services
Cloudera
MarkLogic
Source: https://www.monitis.com/blog/top-5-
data-warehouses-on-the-market-today/
Lecture 9
Data Integration and
the Extraction, Transformation and
Load (ETL) Processes
Data Integration
Packaged Transient
application data source
Data
warehouse
Legacy
Extract Transform Cleanse Load
system
Data mart
Other internal
applications
ETL Process
In the top-down design approach the, data warehouse is built first. The
data marts are then created from the data warehouse.
Advantages of top-down design are:
This model contains consistent data marts and these data marts can be
delivered quickly.
As the data marts are created first, reports can be generated quickly.
The data warehouse can be extended easily to accommodate new
business units. It is just creating new data marts and then integrating
with other data marts.
The positions of the data warehouse and the data marts are reversed in
the bottom-up approach design.
Comparison between Data warehouse development
approaches
Data Warehouse Structure: The
Star Schema
A star schema is the one in which a central fact table is
sourrounded by denormalized dimensional tables.
A star schema can be simple or complex.
A simple star schema consists of one fact table where as a
complex star schema have more than one fact table.
The data warehouse design is based on the concept of
dimensional modelling.
Dimensional modelling is a retrieval based system that
supports high volume query access
.
Data Warehouse Structure:
The Star Schema
The fact table contains a large number of rows that
correspond to observed business or facts.
The fact table contains attributed needed to perform
decision analysis, descriptive attributes used for query
reporting and foreign keys to link to dimension tables.
The dimension tables contain classification and aggregation
information about the central fact rows.
Star Schema
Example: Football Club
Football Club Dimensional Model
DW Development Approaches
Similarities and differences between the Inmon and
Kimball data warehouse development approaches
Reporting
Operationalizing Prediction
Data Warehouse Administration
Scalability
The main issues pertaining to scalability:
The amount of data in the warehouse
How quickly the warehouse is expected to grow
The number of concurrent users
The complexity of user queries
Good scalability means that queries and other data-
access functions will grow linearly with the size of the
warehouse
Security
Emphasis on security and privacy
Security concerns involved in building a data
warehouse.
11