Download as pdf or txt
Download as pdf or txt
You are on page 1of 24

ISLAMIC REPUBLIC OF AFGHANISTAN

MINISTRY OF HIGHER EDUCATION


HERAT UNIVERSITY
COMPUTER SCEINE FACULTY
Advance Studies in
Database Systems 1 (Semester 7)

(Data Warehousing and Data Mining )


Lecture 2 Intro to DW

LECTURER: HAMED AMIRY

hamed.amir.2015@gmail.com

DATAWAREHOUSING && DATAMINING - LECTURE 02 1


Outline of Today’s Class
 What is Data Warehouse?
 Data Assets
 ETL
 Data Warehouse Definition by W.H. Ammon Explained
 Need for Data Warehouse
 Data Warehousing: A Working Definition

2 Datawarehousing && Datamining - Lecture 02


Motivation
 “Modern organization is drowning in data but starving
for information”.

 Operational processing (transaction processing) captures,


stores and manipulates data to support daily operations.
 Information processing is the analysis of data or other
forms of information to support decision making.

3 Datawarehousing && Datamining - Lecture 02


What is data warehouse?
 The Data Warehouse: A Place for Your Data Assets
 that originates from company applications/software or others
 E.g. software that company uses to fill customer orders for its products
 public database that contains sales information

 Three groups of data in an enterprise


 Run-the-business data: Produced by corporate applications, The raw materials
for a data warehouse.
 Integrate-the-business data: Built to improve the quality of and synchronize
two or more corporate applications, such as a master list of customers.
 Monitor-the-business data: Presented to end users for reporting and decision
support, such as your financial dashboard.

4 Datawarehousing && Datamining - Lecture 02


Data asset
 A data asset is the result of taking the raw material from
the run-the-business data and producing higher-quality-
data end products to integrate the business and monitor
the business.
 The data warehouse team should have the mission of
providing high-quality data assets for enterprise use.

5 Datawarehousing && Datamining - Lecture 02


Manufacturing data assets
 Most organizations are following these steps
1. The data warehousing team selects a focus area, such as tracking
sales activity
2. Together, the data warehousing team and subject-matter experts
compile a list of different types of information that can enable them
to use the data warehouse to help track sales activity
3. The group then goes through the list of information (data assets),
item by item, and figures out where the data warehouse can obtain
that particular piece of data (raw material).
4. The data warehousing team creates extraction programs.

6 Datawarehousing && Datamining - Lecture 02


ETL
 Extraction programs collect data, copy certain data to a
staging area (a work area outside the data warehouse),
cleanse the data to ensure that the data has no errors,
and then copy the higher-quality data (data assets) into
the data warehouse.
 Extraction programs are custom-coded or specialized data
warehousing products — ETL (extract, transform, and
load) tools.

7 Datawarehousing && Datamining - Lecture 02


DW Definition
 Data Warehouse: (W.H. Inmon)
 A subject-oriented, integrated, time-variant, non-updatable
collection of data used in support of management decision-
making processes

 Data Warehousing:
 The process of constructing and using a data warehouse

8 Datawarehousing && Datamining - Lecture 02


Data Warehouse—Subject-Oriented
 Organized around major subjects, such as customer,
product, sales.
 Focusing on the modeling and analysis of data for
decision makers, not on daily operations or transaction
processing.
 Provide a simple and concise view around particular
subject issues by excluding data that are not useful in the
decision support process

9 Datawarehousing && Datamining - Lecture 02


Data Warehouse - Integrated
 Constructed by integrating multiple, heterogeneous
data sources
 relational databases, flat files, on-line transaction records
 Data cleaning and data integration techniques are
applied.
 Ensure consistency in naming conventions, encoding
structures, attribute measures, etc. among different data
sources
 E.g., Hotel price: currency, tax, breakfast covered, etc.
 When data is moved to the warehouse, it is converted.

10 Datawarehousing && Datamining - Lecture 02


Data Warehouse -Time Variant
 The time horizon for the data warehouse is
significantly longer than that of operational systems.
 Operational database: current value data.
 Data warehouse data: provide information from a historical
perspective (e.g., past 5-10 years)
 Every key structure in the data warehouse
 Contains an element of time, explicitly or implicitly
 But the key of operational data may or may not contain
“time element”.

11 Datawarehousing && Datamining - Lecture 02


Data Warehouse - Non Updatable
 A physically separate store of data transformed
from the operational environment.
 Operational update of data does not occur in
the data warehouse environment.
 Does not require transaction processing, recovery,
and concurrency control mechanisms.
 Requires only two operations in data accessing:
 initial loading of data and access of data.

12 Datawarehousing && Datamining - Lecture 02


Need for Data Warehousing
 Integrated, company-wide view of high-quality
information (from disparate databases)
 Separation of operational and informational systems and
data (for improved performance)

13 Datawarehousing && Datamining - Lecture 02


Need to separate operational and information
systems
Three primary factors:
 A data warehouse centralizes data that are scattered
throughout disparate operational systems and makes
them available for DS.
 A well-designed data warehouse adds value to data by
improving their quality and consistency.
 A separate data warehouse eliminates much of the
contention for resources that results when information
applications are mixed with operational processing.

14 Datawarehousing && Datamining - Lecture 02


Reason-1: Why a Data Warehouse?
 Size of Data Sets are going up .
 Cost of data storage is coming down .
 The amount of data average business collects and stores is
doubling every year

 Total hardware and software cost to store and manage 1


Mbyte of data
 1990: ~ $15
 2002: ~ ¢15 (Down 100 times)
 By 2007: < ¢1 (Down 150 times)

15 Datawarehousing && Datamining - Lecture 02


A Warehouse of Data
is NOT a
Data Warehouse

Size
is NOT
Everything

16 Datawarehousing && Datamining - Lecture 02


Reason-2: Why a Data Warehouse?
 Businesses demand Intelligence (BI).
 Complex questions from integrated data.
 “Intelligent Enterprise”

17 Datawarehousing && Datamining - Lecture 02


Reason-2: Why a Data Warehouse?
DBMS Approach

 List of all items that were sold last month?

 List of all items purchased by Ahmad Ahmadi?

 The total sales of the last month grouped by branch?

 How many sales transactions occurred during the


month of January?

18 Datawarehousing && Datamining - Lecture 02


Reason-2: Why a Data Warehouse?
Intelligent Enterprise

 Which items sell together? Which items to stock?

 Where and how to place the items? What discounts to offer?

 How best to target customers to increase sales at a branch?

 Which customers are most likely to respond to my next


promotional campaign, and why?

19 Datawarehousing && Datamining - Lecture 02


Reason-3: Why a Data Warehouse?
Businesses want much more…

 What happened?
 Why it happened?
 What will happen?
 What is happening?
 What do you want to happen?
20 Datawarehousing && Datamining - Lecture 02
Is a Bigger Data Warehouse
a Better Data Warehouse?
 NO!
 The size of a data warehouse is a characteristic — almost
a by-product — of a data warehouse; it’s not an objective

 It’s Data Warehouse, Not Data Dump


 What’s the average number of room-service vegetarian meals ordered by
passengers who were on their third cruise with Captain Grumby in command and in
which a half-day stop was made in Grand Cayman when its temperature was
between 75 and 80 degrees?
 Asking this type of question doesn’t have any real
business value

21 Datawarehousing && Datamining - Lecture 02


Data Warehousing: A Working Definition
Data warehousing for dummies author:
Data warehousing is the coordinated, architected, and
periodic copying of data from various sources, both inside
and outside the enterprise, into an environment optimized
for analytical and informational processing.

 The information that you use to formulate decisions typically is based on


data gathered from previous experiences — what works and what doesn’t.
 Data warehouses capture similar data, allowing business leaders to make
informed decisions

22 Datawarehousing && Datamining - Lecture 02


Putting the pieces together

Data Data Warehouse Server OLAP Servers Clients


(Tier 0) (Tier 1) (Tier 2) (Tier 3)


Semistructured MOLAP
Sources Query/Reporting

www data
Meta
Data 
 Extract
Data 
  Analysis






 Archived
Transform
Load Warehouse 
 data
(ETL) ROLAP Business
IT Data Mining
Users
Users
Operational
Data Bases 

Data sources Data Marts  Tools
Business Users

23 Datawarehousing && Datamining - Lecture 02


Any
 ????

24 Datawarehousing && Datamining - Lecture 02

You might also like