Download as pdf or txt
Download as pdf or txt
You are on page 1of 26

21IS503 - Data Mining

Data Warehouse Design


Data Warehouse Design

The target of the design becomes how the record from


multiple data sources should be extracted, transformed, and
loaded (ETL) to be organized in a database as the data
warehouse.

There are two approaches

 “Top-down" approach

 “Bottom-up" approach
Data Warehouse Design
Data Warehouse Design
Data Warehouse Design
Data Warehouse Design – Top Down Approach

In the "Top-Down" design approach, a data warehouse is

 Subject-oriented
 Time-variant
 Non-volatile
 Integrated

Data repository for the entire enterprise data from different


sources are validated, reformatted and saved in a normalized (up
to 3NF) database as the data warehouse.
Data Warehouse Design – Top Down Approach

Advantages of top-down design

 Data Marts are loaded from the data warehouses.


 Developing new data mart from the data warehouse is very
easy.

Disadvantages of top-down design

 This technique is inflexible to changing departmental needs.


 The cost of implementing the project is high.
Data Warehouse Design – Top Down Approach
Data Warehouse Design – Bottom Up Approach

In this approach, a data mart is created first to necessary reporting


and analytical capabilities for particular business processes (or
subjects).

Instead of a normalized database for the data warehouse, a


denormalized dimensional database is adapted to meet the data
delivery requirements of data warehouses.
Data Warehouse Design – Bottom Up Approach
Data Warehouse Design – Bottom Up Approach
Advantages of bottom-up design

 Documents can be generated quickly.


 The data warehouse can be extended to accommodate new
business units.
 It is just developing new data marts and then integrating with
other data marts.

Disadvantages of bottom-up design

 The locations of the data warehouse and the data marts are
reversed in the bottom-up approach design.
Data Warehouse Design
Top-Down Design Approach Bottom-Up Design Approach
Breaks the vast problem into smaller Solves the essential low-level
subproblems. problem and integrates them into a
higher one.
Inherently architected- not a union of Inherently incremental; can schedule
several data marts. essential data marts first.
Single, central storage of information Departmental information stored.
about the content.
Centralized rules and control. Departmental rules and control.
It includes redundant information. Redundancy can be removed.
It may see quick results if Less risk of failure, favorable return
implemented with repetitions. on investment, and proof of
techniques.
Data Warehouse Implementation
Data Warehouse Implementation
Data Warehouse Implementation

Data Warehouse Implementation is a series of activities


that are essential to create a fully functioning Data
Warehouse, after classifying, analyzing and designing the
data Warehouse with respect to the requirements
provided by the client.
Data Warehouse Implementation
Data Warehouse Implementation
Phases of Data Warehouse Implementation
 Planning
 Data Gathering
 Data Analysis
 Business Actions

Components for Data Warehouse Design


 Data Marts
 OLTP/ OLAP
 ETL
 Metadata
Data Warehouse Implementation
Advantages of Data Warehouse Implementation
Data Warehouse Tools
 QuerySurge
 CloverDX
 Teradata
 Dundas
 SAS
 Sisense
 Tableau
 BigQuery
 PostgreSQL
 Pentaho
 Solver BI360
Data Warehouse Tools - Tableau
Tableau
 Tools for data Visualization.
 It helps to analyze complex data in a simple format.
 Data visualizations created with this tableau tool are in the form of
dashboards and worksheets.
 Data that is created by the tableau tool is easily understood by anyone
in the industry at any level.

Features of Tableau-
 Import all sizes and ranges information.
 It manages the metadata.
 Tableau Create a “no-code” data query.
Data Warehouse Tools - BigQuery
 BigQuery is a business-level, cloud-based data warehouse tool offered by
Google.
 The platform is built to save time by storing and querying big datasets by
providing super-fast SQL queries in seconds against multi-terabyte
datasets, giving users with real-time insights into data.
 Google BigQuery offers automatic information transfer and complete data
access control.

Features Of BigQuery-
 A large number of data can be analyzed very quickly.
 Coding Skill is required in the BigQuery API.
 For Non-IT User, it provides the Learning Curve.
 Pay as you go. Low cost.
Data Warehouse Tools – Postgre SQL

PostgreSQL is an open-source powerful object-related database


system with more than 30 years of active growth that has earned it a
strong reputation for reliability, robustness, and efficiency.

Features of PostgreSQL-
 PostgreSQL Supports the Backend.
 PostgreSQL not Provided by the Vendor.
 PostgreSQL is extremely extensible in relation to being free and
open source.
Summary
• Data Warehouse Design
 Top-Down Approach
 Bottom-Up Approach

• Data Warehouse Implementation


 Planning
 Data Gathering
 Data Analysis
 Business Actions

• Data Warehouse Tools


THANK YOU

You might also like