Download as pdf or txt
Download as pdf or txt
You are on page 1of 24

Data warehousing

&
Data Mining
IT480

TBS 2020-2021

Olfa Dridi & Afef Ben Brahim 1


Goals 2

The learning objectives of this course are:

▪ Introduce Business Intelligence and its importance in the decision-making process

▪ Understand the data warehousing and data mining concepts and their application

to business intelligence

▪ Learn how to extract knowledge from large data warehouses

▪ Learn data pre-processing techniques

▪ Learn main supervised and unsupervised data mining algorithms


Outline 3

Chapter 1: Introduction to Business Intelligence

Chapter 2: Data Warehousing

Chapter 3: Data Pre-processing

Chapter 4: Data Mining

Chapter 5: Classification

Chapter 6: Clustering

Chapter 7: Association Rules

Chapter 8: Introduction to Python for Data Mining – A Case Study


Grading & Schedule 4

❖ The course has the following grading components:

1. Final Exam (40%)


2. Midterm (30%)
4. Project (30%)

❖ This course is given 4.5 hours/week.


Chapter 1:
Introduction to
Business Intelligence

5
Introduction & Motivation
▪ Information systems are combinations of
hardware, software, and telecommunications
networks that people build and use to collect,
create, and distribute useful data, typically in
organizational settings.

▪ A good management of information systems


leads to good decision making in business just in
the same way poor management leads to poor
decision making.

▪ Many organizations try to provide the right


information to the right people at the right
time, to help them to make the right decisions. 6
Introduction & Motivation
▪ People at all levels in an organization need access
to critical business information, and to have the
ability to analyze and share that information with
suppliers, partners, and customers.

▪ With aggressive competitors and highly dynamic


markets, “get feelings” and “trial and error” are
not effective for managing an enterprise.

▪ Business users throughout many organizations


need Business Intelligence (BI) for quick-and-easy
access to information, to make timely and
accurate decisions. 7
Business understanding
▪ Nowadays each individual and
organization (business, family or
institution) can access to a large quantity
of data and information about itself and
its environment.
▪ The data has the potential to predict the
evolution of interesting variables or trends
in the outside environment.
▪ But, that potential has not been fully
exploited in the business field.
8
Business understanding
I can’t find the data I need
– data is scattered over the network
– many versions, subtle differences
I can’t get the data I need
– need an expert to get the data
I can’t understand the data I found
– available data poorly documented
I can’t use the data I found
– results are unexpected
What can we do with
– data needs to be transformed from
a large quantity of
one form to other
data?
9
Business Questions

Which are our


lowest/highest margin
customers ?
Who are my customers
What is the most and what products
effective distribution are they buying?
channel?

What product prom- Which customers


-otions have the biggest are most likely to go
impact on revenue? to the competition ?
What impact will
new products/services
have on revenue
and margins?

10
Business requirements
Input: A large quantity of data
Output: Extract required information for
business analysis and strategic decision
making
The general challenge: Is how to exploit these
big and distributed data?
The main focus is on having learned how to
answer today’s business questions as:
▪ Which customers are contributing to our
profitability and which ones are not?
▪ etc … 11
Business requirements
▪ Business requirements are the production
of efficient and relevant data synthesis.
▪ Business intelligence guide the
development team in making the biggest
strategic choices.
▪ There are two main problems:
▪ Information is scattered within different
archive systems that are not connected with
one another : Producing an inefficient
organization of data
▪ There is a lack of awareness about statistical
tools and their potential for information
elaboration 12
What is Business intelligence ?
▪ Zeng et al. (2006) define BI as the process of
collection, treatment and diffusion of information
that has an objective, the reduction of uncertainty
in the making of all strategic decisions.

▪ Stackowiak et al. (2007) define BI as the process of


taking large amounts of data, analyzing that data,
and presenting a high-level set of reports that
condense the essence of that data into the basis
of business actions, enabling management to
make fundamental daily business decisions.
13
What is Business intelligence ?
▪ To learn from the past and forecast the future,
many companies are adopting BI tools and
systems.

▪ In such rough and competitive environment,


strategic decision making is extremely complex
and often requires the consideration of several
objectives while satisfying hard constraints.
As a result, the main concern of the managers is to
deliver sophisticated solutions strategies in an
attempt to reach the predefined objectives and
fulfill all system constraints
14
Business intelligence

▪ The institutions and firms operate in an open


system (that is impacted by external variables)

▪ There is a big need to make business analysis and


strategic decision making

▪ Is to find a match between the opportunities in


the environment and the strengths and
weaknesses of the firm
15
Decision-making support systems

▪ Decision making support systems are


information systems (SI) which are designed
to interactively support all phases of an end-
user’s decision making process in
organizations.
▪ Many organizations are turning to decision
support systems to improve decision
making.
▪ Turning a challenge to a learning curve
16
Decision Making Process
1. Define the problem: the manager must identify the
problem.
2. Establish decision criteria and goals : obtaining
necessary information and data.
3. Analyze Data : Formulate a model between goals and
the important variables (physical models as a scale
model of a building):
The approaches to decision making include:
▪ the use of models,
▪ the use quantitative methods,
▪ the analysis of trade-offs,
▪ establishing priorities,
▪ the systems approach 17
Decision Making Process
Models are often a key tool used by all decision makers.
A model is an abstraction of reality, a simplified
representation of something. For example, a child’s toy
car is a model of a real automobile.
4. Identify and evaluate various alternatives or
solutions.
5. Select the best alternative (decision theory : an
analytical approach to select the best alternative,
it is closely related to the field of game theory).
6. Implement your decision.

18
Decision Making
The Difference between Decision Making and
Problem Solving :
▪ Problem solving involves defining a problem and
creating solutions for it.
▪ Decision making is selecting a course of actions from
among available alternatives.

Problem solving (Steps 1—6) always involves decision


making (Step 5). However, not all decision making
involves solving a problem. For example, a supervisor
may have to make decisions about employees,
resources, workload, etc, without having a problem to
solve. 19
Decision Making process
Define the
1 Problem

Obtaining
2 Data
Decision making

Analyze Data
3
Create
4 alternatives

Select the
best
5 alternative
Implement
the decision
6 alternative 20
Data analysis
Information processing is the analysis of a large
quantity of data or other forms of information to
support decision making and to discover
knowledge in data.
This is indeed the biggest challenge posed by big
and often unstructured data: how to analyze it in a
useful way.
Objectives of Data analysis:
• Increase the effectiveness of the manager’s
decision making process,
• Support the manager in the decision making
process but not replace it,
• And improve the directions of the decision
21
making.
Data analysis
Requirements
▪ Data analysis requires that the data is
organized into an ordered database.

▪ The way data is analyzed depends greatly on


how the data is organized within the DB.

▪ Data warehousing step is crucial to a


successful data analysis outcome.

▪ It has become strategic for all medium and


large companies to have a unified information
system called a data warehouse. 22
Conclusion
▪ A goal of every business is to make better business

Why do you need BI ?


decisions than their competitors.

▪ BI turns the massive amount of data from


operational systems into a format that is easy to
understand and correct so decisions can be made
on the data.

▪ The most important ingredient to a BI solution is


that it must include a data warehouse.

23
Summary

“A real-time enterprise without


real-time business intelligence
is a real fast, dumb
organization.”

• Stephen Brobst
• Chief Technology Office
• Teradata
24

You might also like