Download as ppt, pdf, or txt
Download as ppt, pdf, or txt
You are on page 1of 22

The Goals of a Data Warehouse

The data warehouse is a place where people can access their data. The goals of a data warehouse are as follows
Access warehouse retrievals must be fast The Data in a data warehouse is consistent Users must be able to slice and dice the data A warehouse must use easy to use browsing tools The data warehouse is a place where we publish used data The Quality of the Data in the data warehouse is a driver of business reengineering.

4/14/2012

Two Different Worlds


On-line transaction processing (OLTP) is profoundly different from Dimensional data warehousing (DDW) The Users, data content, data structures, the hardware, the software, the administration, the management and the daily rhythms are different. OLTP design techniques and methods are inappropriate for and even destructive for information warehousing.
4/14/2012 2

Consistency
Both OLTP and data warehouse systems are greatly concerned with data consistency. OLTP consistency is microscopic. The point of transaction processing is to process a very large number of tiny, atomic transactions with out loosing any of them. In a data warehouse, consistency is measured globally. We dont care about individual transactions. But we care enormously that the current load of new data is a full and consistent set of data.
4/14/2012 3

What is a Transaction
A serious OLTP System processes thousands or even millions of transactions per day.

A serious data warehouse often will process only one transaction per day. But this transaction contains millions of records Called a Production Data Load. What we care about is the consistent state of the system we started before the production data load.
If we are forced to stop the production data load before it was complete we will not roll back the inserted records. We will rather overwrite the entire system with a snapshot of the system taken before the production data load.
4/14/2012 4

Users and Managers


The Users of the OLTP System turn the wheels of an organization where as The Users of a Data warehouse watch the wheels of the organization Users of an OLTP system almost always deal with one account at a time OLTP users perform the same tasks many , many times. Performance is the absolute king of an OLTP system. NO optional activity is allowed to slow down an OLTP System.
4/14/2012 5

Dimensions in Data Analysis


In the world of data warehousing, a summarizable numerical value that you use to monitor your business is called a Measure When looking for numeric information your first question will be What measure U want to see?

You could look at lets say, ales units, sales dollars, defects etc.
Suppose that U ask to see a report of your companys Units Sold. Heres what u get: 113
4/14/2012 6

Fact Table
A Fact Table is a table in the relational data warehouse that stores the detailed values for measures, or facts. Example a fact table that stores Dollars and Units by state, by product and by Month has five columns.
State Product Month Units Dollars

The first 3 columns are Key columns, the remaining two are measure values.

4/14/2012

Fact Table
Each column in the fact table should be either a key or a measure.

The fact table must contain a column for each measure.


The fact table must contain rows at the lowest level of detail you might want to retrieve for a measure. A fact table almost always uses an integer key for each member rather than a descriptive name.

The key column for a date dimension might be either an integer key or a date.

4/14/2012

Dimension Tables
A dimension table contains one row for each leaf level member of the dimension. Ex. A product dimension table with 3 products will have 3 rows. In most cases a dimension table also contains one column containing a numeric key columns that uniquely identifies each member. This column that contains the unique value is the primary key and references the foreign key in the fact table.
4/14/2012

CHRIS

Dimension Tables
If the dimension is involved in a balanced hierarchy it will have an additional column that gives the parent for each member. Ex.if you have 3 products in a dimension table that belong to a particular product Subcategory your table will look like this.
PROD_ID 589 592 1218 Prod_Name Sweet Muffins Coconut Muffins Salt Bread SubCategory Muffins Muffins Bread

4/14/2012

CHRIS

10

Star Schema
When each dimension is stored in a single table, the databases organization is called a star Schema Design. When a Database Dimensions are stored in a chain of tables, the databases design is called a Snowflake Design. A relational database must perform time consuming joins each time a report executes, and a star design for a dimension requires fewer joins than a snowflake design.
4/14/2012

CHRIS

11

Basic Elements - Data Warehouse


Source System- An operational system of record whose function it is to capture the transactions of the business Data Staging Area- A Storage area and set of processes that clean, transform, combine, deduplicate, household, archive and prepare source data for use in the data warehouse. Presentation Server - The target physical machine on which the data warehouse data is organized and stored for direct querying by end users, report writers, and other applications.
4/14/2012

CHRIS

12

Basic Elements - Data Warehouse


Dimensional Model A specific discipline for modeling data that is an alternative to entity relationship (E/R) modeling. Business Process A coherent set of business activities that make sense to the business users of our data warehouses Data Mart A logical subset of the complete data warehouse. Data Warehouse - The queryable source of data in the enterprise.
4/14/2012

CHRIS

13

Basic Elements - Data Warehouse


Operational Data Store(ODS) Has taken too many definitions to be useful to the data warehouse. OLAP (On-line Analytic Processing) The general activity of querying and presenting text and number data from data warehouses, as well as a specifically dimensional style of querying and presenting that is exemplified by a number of OLAP vendors
4/14/2012

CHRIS

14

Basic Elements - Data Warehouse


ROLAP ( Relational OLAP ) A storage option or set of user interfaces and applications that give a relational database a dimensional flavor. MOLAP ( Multidimensional OLAP) - A storage option or set of user interfaces and applications and proprietary database technology that have a strongly dimensional flavor.

HOLAP ( Hybrid OLAP) - A storage option of both relational and proprietary structure.
4/14/2012

CHRIS

15

Basic Elements - Data Warehouse


End User Application - A collection of tools that query, analyze, and present information targeted to support a business need. End User Data Access Tool - A client of the data warehouse. Ad Hoc Query Tool A specific kind of end user data access tool that invites the user to form their own queries by directly manipulating relational tables and their joins.
4/14/2012

CHRIS

16

Basic Elements - Data Warehouse


Modeling Applications A sophisticated kind of data warehouse client with analytic capabilities that transform or digest the out put from the data warehouse. Modeling applications include :
Forecasting models Behavior scoring models Allocation models Data mining tools

Metadata All the information in the data warehouse environment that is not the actual data itself.
4/14/2012

CHRIS

17

Basic Processes - Data Warehouse


Extracting The first step of getting Data into the data warehouse. Transformation Once data extracted into the data staging area, many possible transformation steps, including Cleaning the data, correcting misspelling, purging selected fields, Creating Surrogate keys for each dimension, Building Aggregates etc.

Loading and Indexing Loading in the data warehouse.


4/14/2012

CHRIS

18

Basic Processes - Data Warehouse


Quality Assurance Checking Quality assurance can be checked by running a comprehensive exception report over the entire new set of newly loaded data. Release/Publishing - The User community must be notified that the new data is ready. Updating Modern data marts may well be updated, sometimes frequently. Changes in labels, changes in hierarchies, changes in status, and changes in corporate ownership.
4/14/2012

CHRIS

19

Basic Processes - Data Warehouse


Querying Querying is abroad term that encompasses all the activities of requesting data from a data mart. Data Feedback/Feeding in Reverse The data can also flow in the opposite direction uphill from the traditional flow we have discussed. Auditing At times it is critically important to know where the data came from and what were the calculations performed. For this you can create special audit records.
4/14/2012

CHRIS

20

Basic Processes - Data Warehouse


Securing - Every data warehouse has an exquisite dilemma: Publishing the data as widely to as many users as possible with the easiest of user interfaces, at the same time protect the data from misuse and snoopers.

Backing Up and Recovering Since data warehouse data is a flow of data from the legacy system on through to the data marts and eventually onto the users desktops, a real question arises about where to take the necessary snapshots.
4/14/2012 21

Steps in the Design Process


Choose a business process to model

Choose the grain of the business process


Choose the dimensions that will apply for each business process and the attributes/members for each dimension

Choose the measured facts that will populate each fact table record.
4/14/2012 22

You might also like