Professional Documents
Culture Documents
Kabul University: Computer Science Faculty
Kabul University: Computer Science Faculty
Kabul University: Computer Science Faculty
1
Contents:
Class Policies and Organizational Issues
Course Information
Organizational Issues
Course Contents
Literatures
Introduction to Topic
2
Course Information
• Course Name:
o Data Warehousing & BI
• Pre-Requirements for the participation
o Fundamental of Database.
• Type of Course:
o Lecture with supporting Lab Manuals and weekly exercises to repeat and
adapt the lecture contents.
• Slides and Extra Notes:
o Piazza
3
Lecture Issues
Lecture Times per Week
Saturday
Wednesday
Office hours
Tuesday 10:00am – 11:45am
Private appointment
Contact me through email.
Mayar.ku.cs@gmail.com
4
Examination and Grading
Exams
Final-term Exam: 70%
Others
Homework and Class Activity: 30%
5
Class Rules
• Full attendance
• Please come on time
• Turn off your mobile.
6
Literatures
• Paulraj Ponniah, Data Warehousing Fundamentals, John Wiley
& Sons Inc., NY.
• Ralph Kimball, 2013. The Data Warehouse Toolkit: The Definitive
Guide to Dimensional Modeling. 3 Edition. Wiley.
• W. H. Inmon, Building the Data Warehouse
(Second Edition), John Wiley & Sons Inc., N
7
Summary of course
• Introduction & Background
• Dimensional modeling
• Extract – Transform – Load (ETL)
• Data Quality Management (DQM)
• On Line Analytical Processing (OLAP)
• Designing reports
• Need for speed (Parallelism, Join and Indexing techniques)
• DWH Implementation steps
• Complete implementation case study
• Lab and tool usage
• Others
8
Why this course?
• The world is changing (actually changed), either change or be left
behind.
9
The need
10
Why a Data Warehouse (DWH)?
• Data recording and storage is growing.
11
Reason-1: Why a Data Warehouse?
• Data Sets are growing.
12
Reason-1: Why a Data Warehouse?
• Size of Data Sets are going up .
• Cost of data storage is coming down .
• The amount of data average business collects and stores is doubling every
year
• Total hardware and software cost to store and manage 1 Mbyte of data
• 1990: ~ $15
• 2002: ~ ¢15 (Down 100 times)
• By 2007: < ¢1 (Down 150 times)
13
Reason-1: Why a Data Warehouse?
• A Few Examples
• WalMart: 24 TB
• France Telecom: ~ 100 TB
• CERN: Up to 20 PB by 2006
• Stanford Linear Accelerator Center (SLAC): 500TB
14
Caution!
A Warehouse of Data
is NOT a
Data Warehouse
15
Caution!
Size
is NOT
Everything
16
Reason-2: Why a Data Warehouse?
17
Reason-2: Why a Data Warehouse?
DBMS Approach
List of all items that were sold last month?
18
Reason-2: Why a Data Warehouse?
Intelligent Enterprise
Which items sell together? Which items to stock?
19
Reason-3: Why a Data Warehouse?
•Businesses want much more…
• What happened?
• Why it happened? Stages of
• What will happen? Data
Warehouse
• What is happening?
• What do you want to happen?
20
What is a Data Warehouse?
21
What is a Data Warehouse?
Complete repository
History
Transaction System
Ad-Hoc access
Knowledge workers
22
What is a Data Warehouse?
Transaction System
• Management Information System (MIS)
• Could be typed sheets (NOT transaction system)
Ad-Hoc access
• Dose not have a certain access pattern.
• Queries not known in advance.
• Difficult to write SQL in advance.
Knowledge workers
• Typically NOT IT literate (Executives, Analysts, Managers).
• NOT clerical workers.
• Decision makers.
23
Another View of a DWH
Subject
Oriented
Integrated
Time
Variant
Non
Volatile
24
What is a Data Warehouse ?
It is a blend of many technologies, the basic concept being:
25
What is a Data Warehouse ? (Cont…)
26
How is it Different?
Fundamentally different
Business user
needs info
Answers result
User requests
in more questions
IT people
?
Business user
may get answers IT people do
system analysis
and design
IT people
send reports to IT people
business user create reports
27
How is it Different?
• Different patterns of hardware utilization
100%
0%
Operational DWH
28
How much history?
• Depends on:
• Industry.
• Cost of storing historical data.
29
How much history?
• Industries and history
• Telecomm calls are much much more as compared to bank transactions- 18
months.
30
How is it Different?
31
Data Warehouse Vs. OLTP
32
Data Warehouse Vs. OLTP
DWH
Select balance, age, sal, gender from customer_table, tx_table
Where age between (30 and 40) and
Education = ‘graduate’ and
CustID.customer_table = Customer_ID.tx_table;
33
Data Warehouse Vs. OLTP
OLTP: OnLine Transaction Processing (MIS or Database System)
• Complex query scripts and large list selections can generally be executed
in a small number of minutes.
35
Putting the pieces together
Data Data Warehouse Server OLAP Servers Clients
(Tier 0) (Tier 1) (Tier 2) (Tier 3)
Semistructured MOLAP
Sources Query/Reporting
www data
Meta
Data
Extract
Data
Analysis
IT
Archived
data
Transform
Load
(ETL)
Warehouse
ROLAP
Business
Users
Data Mining
Users
Operational
Data Bases
Data sources Data Marts Tools
Business Users
36