Kabul University: Computer Science Faculty

You might also like

Download as pdf or txt
Download as pdf or txt
You are on page 1of 36

KABUL UNIVERSITY

Computer Science Faculty

Data Warehousing & BI


Lecture 1: Course Overview
Lecturer: Ahmad Javid Mayar

1
Contents:
 Class Policies and Organizational Issues
 Course Information
 Organizational Issues
 Course Contents
 Literatures
 Introduction to Topic

2
Course Information
• Course Name:
o Data Warehousing & BI
• Pre-Requirements for the participation
o Fundamental of Database.
• Type of Course:
o Lecture with supporting Lab Manuals and weekly exercises to repeat and
adapt the lecture contents.
• Slides and Extra Notes:
o Piazza

3
Lecture Issues
 Lecture Times per Week
 Saturday
 Wednesday
 Office hours
 Tuesday 10:00am – 11:45am
 Private appointment
 Contact me through email.
 Mayar.ku.cs@gmail.com

4
Examination and Grading
Exams
Final-term Exam: 70%
Others
Homework and Class Activity: 30%

5
Class Rules
• Full attendance
• Please come on time
• Turn off your mobile.

Don’t disturb your classmate !!!!

6
Literatures
• Paulraj Ponniah, Data Warehousing Fundamentals, John Wiley
& Sons Inc., NY.
• Ralph Kimball, 2013. The Data Warehouse Toolkit: The Definitive
Guide to Dimensional Modeling. 3 Edition. Wiley.
• W. H. Inmon, Building the Data Warehouse
(Second Edition), John Wiley & Sons Inc., N

7
Summary of course
• Introduction & Background
• Dimensional modeling
• Extract – Transform – Load (ETL)
• Data Quality Management (DQM)
• On Line Analytical Processing (OLAP)
• Designing reports
• Need for speed (Parallelism, Join and Indexing techniques)
• DWH Implementation steps
• Complete implementation case study
• Lab and tool usage
• Others

8
Why this course?
• The world is changing (actually changed), either change or be left
behind.

• Missing the opportunities or going in the wrong direction has


prevented us from growing.

• What is the right direction?


• Harnessing the data, in a knowledge driven economy.

9
The need

“Drowning in data and starving


for information”
Knowledge is power, Intelligence
is absolute power!

10
Why a Data Warehouse (DWH)?
• Data recording and storage is growing.

• History is excellent predictor of the future.

• Gives total view of the organization.

• Intelligent decision-support is required for decision-making.

11
Reason-1: Why a Data Warehouse?
• Data Sets are growing.

How Much Data is that?


1 MB 220 or 106 bytes Small novel – 31/2 Disk
Paper rims that could fill the back of a
1 GB 230 or 109 bytes
pickup van
50,000 trees chopped and converted
1 TB 240 or 1012 bytes
into paper and printed
Academic research libraries across
2 PB 1 PB = 250 or 1015 bytes
the U.S.
All words ever spoken by human
5 EB 1 EB = 260 or 1018 bytes
beings

12
Reason-1: Why a Data Warehouse?
• Size of Data Sets are going up .
• Cost of data storage is coming down .
• The amount of data average business collects and stores is doubling every
year

• Total hardware and software cost to store and manage 1 Mbyte of data
• 1990: ~ $15
• 2002: ~ ¢15 (Down 100 times)
• By 2007: < ¢1 (Down 150 times)

13
Reason-1: Why a Data Warehouse?
• A Few Examples
• WalMart: 24 TB
• France Telecom: ~ 100 TB
• CERN: Up to 20 PB by 2006
• Stanford Linear Accelerator Center (SLAC): 500TB

14
Caution!

A Warehouse of Data
is NOT a
Data Warehouse

15
Caution!

Size
is NOT
Everything

16
Reason-2: Why a Data Warehouse?

• Businesses demand Intelligence (BI).


• Complex questions from integrated data.
• “Intelligent Enterprise”

17
Reason-2: Why a Data Warehouse?
DBMS Approach
List of all items that were sold last month?

List of all items purchased by Ahmad?

The total sales of the last month grouped by branch?

How many sales transactions occurred during the month of January?

18
Reason-2: Why a Data Warehouse?
Intelligent Enterprise
Which items sell together? Which items to stock?

Where and how to place the items? What discounts to offer?

How best to target customers to increase sales at a branch?

Which customers are most likely to respond to my next promotional


campaign, and why?

19
Reason-3: Why a Data Warehouse?
•Businesses want much more…

• What happened?
• Why it happened? Stages of
• What will happen? Data
Warehouse
• What is happening?
• What do you want to happen?

20
What is a Data Warehouse?

A complete repository of historical


corporate data extracted from
transaction systems that is available
for ad-hoc access by knowledge
workers.

21
What is a Data Warehouse?
Complete repository
History
Transaction System
Ad-Hoc access
Knowledge workers

22
What is a Data Warehouse?
Transaction System
• Management Information System (MIS)
• Could be typed sheets (NOT transaction system)

Ad-Hoc access
• Dose not have a certain access pattern.
• Queries not known in advance.
• Difficult to write SQL in advance.

Knowledge workers
• Typically NOT IT literate (Executives, Analysts, Managers).
• NOT clerical workers.
• Decision makers.

23
Another View of a DWH

Subject
Oriented

Integrated

Time
Variant

Non
Volatile

24
What is a Data Warehouse ?
It is a blend of many technologies, the basic concept being:

 Take all data from different operational systems.


 If necessary, add relevant data from industry.

 Transform all data and bring into a uniform format.

 Integrate all data as a single entity.

25
What is a Data Warehouse ? (Cont…)

It is a blend of many technologies, the basic concept being:

Store data in a format supporting easy access for decision support.


 Create performance enhancing indices.

 Implement performance enhancement joins.

 Run ad-hoc queries with low selectivity.

26
How is it Different?
 Fundamentally different
Business user
needs info

Answers result
User requests
in more questions
IT people

?
Business user
may get answers  IT people do
system analysis
and design

IT people
send reports to IT people
business user create reports

27
How is it Different?
• Different patterns of hardware utilization

100%

0%

Operational DWH

Bus Service vs. Train

28
How much history?

• Depends on:
• Industry.
• Cost of storing historical data.

• Economic value of historical data.

29
How much history?
• Industries and history
• Telecomm calls are much much more as compared to bank transactions- 18
months.

• Retailers interested in analyzing yearly seasonal patterns- 65 weeks.

• Insurance companies want to do actuary analysis, use the historical data in


order to predict risk- 7 years.

30
How is it Different?

 Rate of update depends on:


 volume of data,
 nature of business,
 cost of keeping historical data,
 benefit of keeping historical data.

31
Data Warehouse Vs. OLTP

OLTP (On Line Transaction Processing)


Select tx_date, balance from tx_table
Where account_ID = 23876;

32
Data Warehouse Vs. OLTP

DWH
Select balance, age, sal, gender from customer_table, tx_table
Where age between (30 and 40) and
Education = ‘graduate’ and
CustID.customer_table = Customer_ID.tx_table;

33
Data Warehouse Vs. OLTP
OLTP: OnLine Transaction Processing (MIS or Database System)

Data Warehouse OLTP


Scope * Single source of “truth” * Multiple databases with repetition
* Evolves over time * Off the shelf application
* How to improve business * Runs the business

Data * Historical, detailed data * Operational data


Perspective * Some summary * No summary
* Lightly denormalized * Fully normalized

Queries * Number of results * Number of results returned in


returned in thousands hundreds

Time factor * Minutes to hours * Sub seconds to seconds


* Typical availability 6x12 * Typical availability 24x7
34
Comparison of Response Times
• On-line analytical processing (OLAP) queries must be executed in a
small number of seconds.
• Often requires denormalization and/or sampling.

• Complex query scripts and large list selections can generally be executed
in a small number of minutes.

• Sophisticated clustering algorithms (e.g., data mining) can generally be


executed in a small number of hours (even for hundreds of thousands of
customers).

35
Putting the pieces together
Data Data Warehouse Server OLAP Servers Clients
(Tier 0) (Tier 1) (Tier 2) (Tier 3)


Semistructured MOLAP
Sources Query/Reporting

www data
Meta
Data 
 Extract
Data 

Analysis





IT


 Archived
data
Transform
Load
(ETL)
Warehouse
ROLAP
 
Business
Users
Data Mining
Users
Operational
Data Bases 

Data sources Data Marts  Tools
Business Users

36

You might also like