Download as pdf or txt
Download as pdf or txt
You are on page 1of 17

AN INTRODUCTION

TO DATA MINING
“Education is not the piling on of learning, information, data, facts, skills,
or abilities – that’s training or instruction – but is rather making visible
what is hidden as a seed.”—Thomas More
Motivation
 Recently, the computer manufacturer Dell was interested in improving
the productivity of its sales workforce. It therefore turned to data
mining and predictive analytics to analyze its database of potential
customers, in order to identify the most likely respondents.
 Researching the social network activity of potential leads, using
LinkedIn and other sites, provided a richer amount of information
about the potential customers, thereby allowing Dell to develop more
personalized sales pitches to their clients.
 This is an example of mining customer data to help identify the type of
marketing approach for a particular customer, based on customer's
individual profile.
 What is the bottom line?

7/3/2020 DATA MINING AND PREDICTIVE MODELING 2


Motivation
 The number of prospects that needed to be contacted was cut by
50%, leaving only the most promising prospects, leading to a near
doubling of the productivity and efficiency of the sales workforce, with
a similar increase in revenue for Dell.

7/3/2020 DATA MINING AND PREDICTIVE MODELING 3


Motivation
 Forbes magazine reports that the use of data mining and predictive
analytics has helped to identify patients who have been of the
greatest risk of developing congestive heart failure.
 IBM collected 3 years of data pertaining to 350,000 patients, and
including measurements on over 200 factors, including things such as
blood pressure, weight, and drugs prescribed.
 Using predictive analytics, IBM was able to identify the 8500 patients
most at risk of dying of congestive heart failure within 1 year.

7/3/2020 DATA MINING AND PREDICTIVE MODELING 4


Motivation
■ The MIT Technology Review reports that it was the Obama campaign's
effective use of data mining that helped President Obama win the
2012 presidential election over Mitt Romney.
■ They first identified likely Obama voters using a data mining model,
and then made sure that these voters actually got to the polls. The
campaign also used a separate data mining model to predict the
polling outcomes county by county.
■ In the important swing county of Hamilton County, Ohio, the model
predicted that Obama would receive 56.4% of the vote; the Obama
share of the actual vote was 56.6%, so that the prediction was off by
only 0.02%.
■ Such precise predictive power allowed the campaign staff to allocate
scarce resources more efficiently.

7/3/2020 DATA MINING AND PREDICTIVE MODELING 5


Motivation
 So, what is data mining? What is predictive analytics?
 While waiting in line at a large supermarket, have you ever just closed
your eyes and listened? You might hear the beep, beep, beep of the
supermarket scanners, reading the bar codes on the grocery items,
ringing up on the register, and storing the data on company servers.
 Each beep indicates a new row in the database, a new “observation”
in the information being collected about the shopping habits of your
family, and the other families who are checking out.
 Clearly, a lot of data is being collected. However, what is being learned
from all this data? What knowledge are we gaining from all this
information? Probably not as much as you might think, because there
is a serious shortage of skilled data analysts.

7/3/2020 DATA MINING AND PREDICTIVE MODELING 6


What is data mining?
 ‘Data mining is a collection of techniques for efficient automated
discovery of previously unknown, valid, novel, useful and
understandable patterns in large databases. The patterns must be
actionable so they may be used in an enterprise’s decision making.’
 From this definition, the important take always are:
 Data mining is a process of automated discovery of previously
unknown patterns in large volumes of data.
 This large volume of data is usually the historical data of an
organization known as the data warehouse.

7/3/2020 DATA MINING AND PREDICTIVE MODELING 7


What is data mining?
 Data mining deals with large volumes of data, in Gigabytes or
Terabytes of data and sometimes as much as Zetabytes of data (in
case of big data).
 Patterns must be valid, novel, useful and understandable.
 Data mining allows businesses to determine historical patterns to
predict future behavior.
 Although data mining is possible with smaller amounts of data, the
bigger the data the better the accuracy in prediction.

7/3/2020 DATA MINING AND PREDICTIVE MODELING 8


Data Mining Techniques
Data Mining
Techniques

Predictive Database Deviation


Link analysis
modeling segmentation detection

Sequential Similar time


Associations
pattern sequence
discovery
discovery discovery

7/3/2020 DATA MINING AND PREDICTIVE MODELING 9


Data Mining Process or The Cross-Industry
Standard Process for Data Mining: CRISP-
DM

7/3/2020 DATA MINING AND PREDICTIVE MODELING 10


The Cross-Industry Standard Process for
Data Mining: CRISP-DM
 According to CRISP-DM, a given data mining project has a life cycle
consisting of six phases. Note that the phase-sequence is adaptive.
That is, the next phase in the sequence often depends on the
outcomes associated with the previous phase.
 The most significant dependencies between phases are indicated by
the arrows.
 For example, suppose we are in the modeling phase. Depending on
the behavior and characteristics of the model, we may have to return
to the data preparation phase for further refinement before moving
forward to the model evaluation phase.

7/3/2020 DATA MINING AND PREDICTIVE MODELING 11


Data Mining Process
 Problem definition phase
 Data understanding phase
 Data preparation phase
 Modeling phase
 Evaluation phase
 Deployment phase

7/3/2020 DATA MINING AND PREDICTIVE MODELING 12


Data Mining Applications
 Loan/Credit card approvals
 Market segmentation
 Fraud detection
 Better marketing
 Trend analysis
 Market basket analysis
 Customer churn
 Website design
 Corporate analysis and risk management

7/3/2020 DATA MINING AND PREDICTIVE MODELING 13


Difference between Data Mining and
Machine Learning
Basic for comparison Data mining Machine learning
Meaning It involves extracting useful It introduces new algorithm
knowledge from a large from data as well as past
amount of data. experience.
History Introduced in 1930 it was It was introduced in 1959.
initially called knowledge
discovery in databases.
Responsibility Data mining is used to Machine learning teaches the
examine patterns in existing computer to learn and
data. This can then be used to understand the given rules.
set rules.
Nature It involves human involvement It is automated, once designed
and intervention. it is self-implementing and no
or very little human effort is
required.
7/3/2020 DATA MINING AND PREDICTIVE MODELING 14
Test Me
 What is the full form of KDD _________.
 Which learning approach is used by Database Segmentation?
(a) Supervised Learning (b) Unsupervised Learning
 Links between the individual record, or sets of records in a database
is called ___________.
 Discuss the need for human direction of data mining. Describe the
possible consequences of relying on completely automatic data
analysis tools.
 CRISP-DM is not the only standard process for data mining. Research
an alternative methodology (Hint: Sample, Explore, Modify, Model and
Assess (SEMMA), from the SAS Institute). Discuss the similarities and
differences with CRISP-DM.

7/3/2020 DATA MINING AND PREDICTIVE MODELING 15


Test Me
 What are the two types of predictive modeling?
 Deviation detection can be performed by using______ and________ techniques.
 Predictive Modeling is developed using a supervised learning approach.
(a)True (b) False

 Value prediction uses the traditional statistical techniques of______


and_______ .

7/3/2020 DATA MINING AND PREDICTIVE MODELING 16


References
 How Dell Predicts Which Customers Are Most Likely to Buy, by Rachael King, CIO
Journal, Wall Street Journal, December 5, 2012.
 IBM and Epic Apply Predictive Analytics to Electronic Health Records, by Zina
Moukheiber, Forbes magazine, February 19, 2014.
 How President Obama's campaign used big data to rally individual voters, by Sasha
Issenberg, MIT Technology Review, December 19, 2012.
 https://doi.org/10.1017/9781108635592.003
 https://codesachin.wordpress.com/2015/09/13/interesting-take-aways-from-data-
science-for-business/
 Data Mining and Predictive Analytics [Larose & Larose 2015-03-16]
 Data Mining and Data Warehousing_ Principles and Practical Techniques

7/3/2020 DATA MINING AND PREDICTIVE MODELING 17

You might also like