Professional Documents
Culture Documents
Imran Introduction To DWH-1
Imran Introduction To DWH-1
Imran Introduction To DWH-1
Data Warehouse-1
Instructor:
Muhammad Imran Saeed
What is Data, Database,
Database Management System?
Normal Databases (OLTP) are not for analysis or decision making as those are designed for daily/routine
transactions and do not contain Historic Data that is suitable for Analysis.
Online Transaction Processing
OLTP
OLTP (Online Transactional Processing) is a category of data processing that is focused on transaction-
oriented tasks. OLTP typically involves inserting, updating, and/or deleting small amounts of data in a
database. OLTP mainly deals with large numbers of transactions by a large number of users.
Online Analytic Processing
OLAP
OLAP (online analytical processing) is a computing method that enables users to
easily and selectively extract and query data in order to analyze it from different
points of view. OLAP business intelligence queries often aid in trends analysis,
financial reporting, sales forecasting, budgeting and other planning purposes.
Analysts frequently need to group, aggregate and join data. These operations in
relational databases are resource intensive. With OLAP data can be pre-calculated
and pre-aggregated, making analysis faster.
For example, a user can request that data be analyzed to display a spreadsheet
showing all of a company's beach ball products sold in Florida in the month of July,
compare revenue figures with those for the same products in September and then
see a comparison of other product sales in Florida in the same time period.
Online Analytic Processing
OLAP
Online Analytic Processing
OLAP
OLAP Cube Interpretation
OLAP Cube Example
OLAP Cube Operations
Various business applications and other data operations require the use
of OLAP Cube.
There are primary five types of analytical operations in OLAP
1) Roll-up
2) Drill-down
3) Slice
4) Dice and
5) Pivot.
Three types of widely used OLAP systems are MOLAP, ROLAP, and
Hybrid OLAP.
OLAP (On-line Analytical Processing) – From the ‘analytical’ part in OLAP, one
can understand what this system does: analysis of data efficiently and effectively. It
is an approach to handle multi-dimensional queries and works with a large amount
of data. In this system, the effectiveness is measured by the response time rather
than accuracy as in OLTP.
OLTP vs OLAP
OLTP vs OLAP
Why OLTP & OLAP
Don’t Mix
A data warehouse is a relational database that is designed for query and analysis rather than for transaction
processing. It usually contains historical data derived from transaction data, but it can include data from other
sources. It separates analysis workload from transaction workload and enables an organization to consolidate
data from several sources.
Integrated
Integration is closely related to subject orientation. Data warehouses must put data from disparate
sources into a consistent format. They must resolve such problems as naming conflicts and
inconsistencies among units of measure. When they achieve this goal, they are said to be integrated.
Nonvolatile
Nonvolatile means that, once entered into the warehouse, data should not change. This is logical because
the purpose of a warehouse is to enable you to analyze what has occurred.
Characteristics of
Data Warehouse
Time Variant
In order to discover trends in business, analysts need large amounts of data. This is very much in contrast
to online transaction processing (OLTP) systems, where performance requirements demand that historical
data be moved to an archive. A data warehouse's focus on change over time is what is meant by the term
time variant.
Typically, data flows from one or more online transaction processing (OLTP) databases into a data
warehouse on a monthly, weekly, or daily basis. The data is normally processed in a staging file before
being added to the data warehouse. Data warehouses commonly range in size from tens of gigabytes to a
few terabytes. Usually, the vast majority of the data is stored in a few very large fact tables.
Differences Between
DWH and OLTP Systems
Workload:
Data warehouses are designed to accommodate ad hoc queries. You might not know the workload of your
data warehouse in advance, so a data warehouse should be optimized to perform well for a wide variety of
possible query operations.
OLTP systems support only predefined operations. Your applications might be specifically tuned or
designed to support only these operations.
Data Modifications:
A data warehouse is updated on a regular basis by the ETL process (run nightly or weekly) using bulk data
modification techniques. The end users of a data warehouse do not directly update the data warehouse.
In OLTP systems, end users routinely issue individual data modification statements to the database. The
OLTP database is always up to date, and reflects the current state of each business transaction.
Schema Design:
Data warehouses often use denormalized or partially denormalized schemas (such as a star schema) to
optimize query performance.
OLTP systems often use fully normalized schemas to optimize update/insert/delete performance, and to
guarantee data consistency.
Differences Between
DWH and OLTP Systems
Typical Operations:
A typical data warehouse query scans thousands or millions of rows. For example, "Find the total sales for
all customers last month."
A typical OLTP operation accesses only a handful of records. For example, "Retrieve the current order for
this customer.“
Historical Data:
Data warehouses usually store many months or years of data. This is to support historical analysis.
OLTP systems usually store data from only a few weeks or months. The OLTP system stores only
historical data as needed to successfully meet the requirements of the current transaction.
Data Warehouse
Architecture
Data warehouses and their architectures vary depending upon the specifics of an organization's situation.
Three common architectures are:
Data Warehouse Architecture (Basic)
Data warehouses and their architectures vary depending upon the specifics of an organization's situation.
Three common architectures are:
Data warehouses and their architectures vary depending upon the specifics of an organization's situation.
Three common architectures are:
Ralph Kimball's data warehouse design starts with the most important business processes. In this approach,
an organization creates data marts that aggregate relevant data around subject-specific areas. The data
warehouse is the combination of the organization’s individual data marts.
Data warehouse Models
Inmon Vs Kimball
Two data warehouse pioneers, Bill Inmon and Ralph Kimball differ in their views on how data warehouses
should be designed from the organization's perspective.
Bill Inmon's approach favors a top-down design in which the data warehouse is the centralized data repository
and is BUILT FIRST. Dimensional data marts related to specific business lines can be created from the data
warehouse when they are needed.
Home Assignment
Assignment 1:
b) Why an OLTP is kept normalized? And Why a Normalized Schema is not good for Warehouse or
OLAP?
c) Describe the difference in Data warehouse designing/development approaches of Bill Inmon and
Ralph Kimball. Also mention the point of view of each approach.