CH 1 Intro To Data Mining

You might also like

Download as pptx, pdf, or txt
Download as pptx, pdf, or txt
You are on page 1of 17

AN

TO DATA
INTRODUCTION
MINING
“Education is not the piling on of learning, information, data, facts, skills,
or abilities – that’s training or instruction – but is rather making visible
what is hidden as a seed.”—Thomas More
Motivatio
n Recently,
 the computer manufacturer Dell was interested in improving the
productivity of its sales workforce. It therefore turned to data mining and
predictive analytics to analyze its database of potential customers, in
order to identify the most likely respondents.
 Researching the social network activity of potential leads, using LinkedIn
and other sites, provided a richer amount of information about the
potential customers, thereby allowing Dell to develop more personalized
sales pitches to their clients.
 This is an example of mining customer data to help identify the type of
marketing approach for a particular customer, based on customer's
individual profile.
 What is the bottom line?

7/3/2020 DATA MINING AND PREDICTIVE 2


Motivatio
n The
 number of prospects that needed to be contacted was cut by 50%,
leaving only the most promising prospects, leading to a near doubling of
the productivity and efficiency of the sales workforce, with a similar
increase in revenue for Dell.

7/3/2020 DATA MINING AND PREDICTIVE 3


Motivatio
n Forbes
 magazine reports that the use of data mining and predictive
analytics has helped to identify patients who have been of the greatest
risk of developing congestive heart failure.
 IBM collected 3 years of data pertaining to 350,000 patients, and
including measurements on over 200 factors, including things such as
blood pressure, weight, and drugs prescribed.
 Using predictive analytics, IBM was able to identify the 8500 patients
most at risk of dying of congestive heart failure within 1 year.

7/3/2020 DATA MINING AND PREDICTIVE 4


Motivatio
n effective use of data mining that helped President Obama win the 2012
■ The MIT Technology Review reports that it was the Obama campaign's

presidential election over Mitt Romney.


■ They first identified likely Obama voters using a data mining model, and
then made sure that these voters actually got to the polls. The campaign
also used a separate data mining model to predict the polling outcomes
county by county.
■ In the important swing county of Hamilton County, Ohio, the model
predicted that Obama would receive 56.4% of the vote; the Obama share
of the actual vote was 56.6%, so that the prediction was off by only
0.02%.
■ Such precise predictive power allowed the campaign staff to allocate
scarce resources more efficiently.

7/3/2020 DATA MINING AND PREDICTIVE 5


Motivatio
n So, what is data mining? What is predictive analytics?

 While waiting in line at a large supermarket, have you ever just closed
your eyes and listened? You might hear the beep, beep, beep of the
supermarket scanners, reading the bar codes on the grocery items,
ringing up on the register, and storing the data on company servers.
 Each beep indicates a new row in the database, a new “observation” in
the information being collected about the shopping habits of your family,
and the other families who are checking out.
 Clearly, a lot of data is being collected. However, what is being learned
from all this data? What knowledge are we gaining from all this
information? Probably not as much as you might think, because there is a
serious shortage of skilled data analysts.

7/3/2020 DATA MINING AND PREDICTIVE 6


What is data mining?
 ‘Data mining is a collection of techniques for efficient automated
discovery of previously unknown, valid, novel, useful and understandable
patterns in large databases. The patterns must be actionable so they may
be used in an enterprise’s decision making.’
 From this definition, the important take always are:
 Data mining is a process of automated discovery of previously
unknown patterns in large volumes of data.
 This large volume of data is usually the historical data of an organization
known as the data warehouse.

7/3/2020 DATA MINING AND PREDICTIVE 7


What is data mining?
 Data mining deals with large volumes of data, in Gigabytes or Terabytes
of data and sometimes as much as Zetabytes of data (in case of big data).
 Patterns must be valid, novel, useful and understandable.
 Data mining allows businesses to determine historical patterns to
predict future behavior.
 Although data mining is possible with smaller amounts of data, the
bigger the data the better the accuracy in prediction.

7/3/2020 DATA MINING AND PREDICTIVE 8


Data Mining Techniques
Data
Mining
Techniques

Predictive Database Deviation


Link analysis
modeling segmentation detection

Sequential Similar time


Associations
pattern sequence
discovery
discovery discovery

7/3/2020 DATA MINING AND PREDICTIVE 9


Data Mining Process or The Cross-Industry
Standard Process for Data Mining: CRISP-
DM

7/3/2020 DATA MINING AND PREDICTIVE 10


The Cross-Industry Standard Process for
Data Mining: CRISP-DM
 According to CRISP-DM, a given data mining project has a life cycle
consisting of six phases. Note that the phase-sequence is adaptive. That
is, the next phase in the sequence often depends on the outcomes
associated with the previous phase.
 The most significant dependencies between phases are indicated by
the arrows.
 For example, suppose we are in the modeling phase. Depending on the
behavior and characteristics of the model, we may have to return to the
data preparation phase for further refinement before moving forward to
the model evaluation phase.

7/3/2020 DATA MINING AND PREDICTIVE 11


Data Mining Process
 Problem definition phase
 Data understanding phase
 Data preparation phase
 Modeling phase
 Evaluation phase
 Deployment phase

7/3/2020 DATA MINING AND PREDICTIVE 12


Data Mining
Applications
 Loan/Credit card approvals
 Market segmentation
 Fraud detection
 Better marketing
 Trend analysis
 Market basket analysis
 Customer churn
 Website design
 Corporate analysis and risk management

7/3/2020 DATA MINING AND PREDICTIVE 13


Difference between Data Mining and
Machine Learning
Basic for comparison Data mining Machine learning
Meaning It involves extracting useful It introduces new algorithm
knowledge from a large from data as well as past
amount of data. experience.
History Introduced in 1930 it was It was introduced in 1959.
initially called knowledge
discovery in databases.
Responsibility Data mining is used to examine Machine learning teaches the
patterns in existing data. This computer to learn
can then be used to set rules. and understand the given rules.

Nature It involves human involvement It is automated, once designed it


and intervention. is self-implementing and no or
very little human effort is
required.

7/3/2020 DATA MINING AND PREDICTIVE 14


Test Me
 What is the full form of KDD .
 Which learning approach is used by Database Segmentation?
(a) Supervised Learning (b) Unsupervised Learning
 Links between the individual record, or sets of records in a database
is called .
 Discuss the need for human direction of data mining. Describe the
possible consequences of relying on completely automatic data analysis
tools.
 CRISP-DM is not the only standard process for data mining. Research an
alternative methodology (Hint: Sample, Explore, Modify, Model and
Assess (SEMMA), from the SAS Institute). Discuss the similarities and
differences with CRISP-DM.

7/3/2020 DATA MINING AND PREDICTIVE 15


Test Me
 What are the two types of predictive modeling?
 Deviation detection can be performed by using_ and_
techniques.
 Predictive Modeling is developed using a supervised learning approach.
(a)True (b) False

 Value prediction uses the traditional statistical techniques of


and .

7/3/2020 DATA MINING AND PREDICTIVE 16


References
 How Dell Predicts Which Customers Are Most Likely to Buy, by Rachael King,
CIO
Journal, Wall Street Journal, December 5, 2012.
 IBM and Epic Apply Predictive Analytics to Electronic Health Records, by
Zina
Moukheiber, Forbes magazine, February 19, 2014.
 How President Obama's campaign used big data to rally individual voters, by Sasha
Issenberg, MIT Technology Review, December 19, 2012.
 https://doi.org/10.1017/9781108635592.003
 https://codesachin.wordpress.com/2015/09/13/interesting-take-aways-from-data-
science-for-business/
 Data Mining and Predictive Analytics [Larose & Larose 2015-03-16]
 Data Mining and Data Warehousing_ Principles and Practical Techniques

7/3/2020 DATA MINING AND PREDICTIVE 17

You might also like