Download as pptx, pdf, or txt
Download as pptx, pdf, or txt
You are on page 1of 10

Data Mining

Business Analytics 27/08/2010


Data Mining
• It is the process of sampling, exploring,
modifying, modeling and assessing (SEMMA)
large amounts of data to uncover previously
unknown patterns, which can be utilized as a
business advantage
The data

Experimental Opportunistic
Purpose Research Operational
Value Scientific Commercial
Actively Passively
Generation controlled observed

Size Small Massive


Hygiene Clean Dirty
State Static Dynamic
Business Decision Support
• The owners of the data and sponsors of the analyses are
typically not researchers. The objectives are usually to
support critical business decisions
• Database Marketing
– Target Marketing
– CRM
• Credit Risk Management
– Credit Scoring
• Fraud Detection
• Healthcare Informatics
– Clinical Decision Support
Steps in Data Mining Analysis
1. Specific Objectives
- In terms of subject matter
2. Translation into analytical method
3. Data examination
- Data capacity
- Preliminary results
4. Refinement and reformulation
1. Specific Objectives
• Problem formulation is central to successful data mining. The
following are examples of objectives that are inadequately
specified:
– Understand our customer base
– Reengineer our customer retention strategy
– Detect actionable patterns
Objectives such as these leave many essential questions unanswered.
Eg. What specific actions will result from the analytical effort? The
answer of course may depend on the result, but the inability to
speculate is an indication of inadequate problem formulation.
Unless the purpose is to write the research paper, understanding is
probably not the ultimate goal.
• A related pitfall is to specify the objectives in
terms of analytical methods:
– Implement neural networks
– Apply visualization tools
– Cluster the database
The same analytical tools may be applied to many
different problems. The choice of the most
appropriate analytical tool often depends upon
subtle differences in the objectives. The objectives
eventually must be translated in terms of analytical
methods. This should occur only after they are
specified in ordinary language.
2. Problem Translation
• The problem translation step involves determining
what analytical methods are relevant to the
objectives. The requisite knowledge is a wide array of
methodologies and what sorts of problem they
solve.
• Problem translation
– Predictive modeling (supervised classification)
– Cluster analysis
– Association rules
– Something else
Types of targets
• Supervised classification
– Event/No event (binary target)
– Class label (multiclass problem)

• Regression
– Continuous outcome

• Survival analysis
– Time to event (possibly censored)
• The tools in SAS Enterprise Minor are arranged according to the SAS
process for data mining SEMMA
• Sample the data creating one or more data tables. The sample
should be large enough to contain the significant information, yet
small enough to process
• Explore the data by searching for anticipated relationships,
unanticipated trends, and anomalies in order to gain understanding
and ideas
• Modify the data by creating, selecting, transforming the variables to
focus model selection process
• Model the data by using the analytical tools to search for a
combination of the data that reliably predicts desired outcome
• Assess compare competing predictive models (Build charts to
evaluate the usefulness and reliability of the findings from the data
mining process

You might also like