Crispslides

You might also like

Download as pdf or txt
Download as pdf or txt
You are on page 1of 20

CRISP-DM

Methodology for Data Mining Project


Development
CRoss-Industry Standard Process
for Data Mining

1
References
• Pete Chapman (NCR), Julian Clinton (SPSS), Randy
Kerber (NCR), Thomas Khabaza (SPSS), Thomas
Reinartz, (DaimlerChrysler), Colin Shearer (SPSS) and
Rüdiger Wirth (DaimlerChrysler) “CRISP-DM 1.0 - Step-
by-step data mining guide”

• P. Gonzalez-Aranda, E.Menasalvas, S.Millan, F.


Segovia “Towards a Methodology for Data Mining
Project Development: The Importance of Abstraction”

2
References
• Websites

http://www.crisp-dm.org/
http://www.crisp-dm.org/CRISPWP-800.pdf
http://www.spss.com/
http://www.kdnuggets.com/

3
Overview

• Introduction to CRISP-DM

• Phases and T asks

• Summary

4
CRISP-DM

CRoss-Industry Standard Process


for Data Mining

5
Why Should There be a Standard Process?

The data mining process must be reliable


and repeatable by people with little data
mining background.

6
Why Should There be a Standard Process?

• Framework for recording experience


– Allows projects to be replicated
• Aid to project planning and management

7
Standardization
DM - Consortium 1999
Describe the phases to follow, the task to execute
and the results expected

8
CRISP-DM
• Non-proprietary
• Application/Industry neutral
• Tool neutral
• Focus on business issues
– As well as technical
analysis
• Framework for guidance
– Templates for Analysis

9
CRISP-DM: Overview

• Data Mining methodology


• Process Model
• For anyone
• Provides a complete blueprint
• Life cycle: 6 phases

10
CRISP-DM: Phases
• Business Understanding
Project objectives and requirements understanding, Data mining problem
definition
• Data Understanding
Initial data collection and familiarization, Data quality problems identification
• Data Preparation
Table, record and attribute selection, Data transformation and cleaning
• Modeling
Modeling techniques selection and application, Parameters calibration
• Evaluation
Business objectives & issues achievement evaluation
• Deployment
Result model deployment, Repeatable data mining process implementation

11
Phases and Tasks
Business Data Data
Modeling Evaluation Deployment
Understanding Understanding Preparation

Determine Collect Select


Select Evaluate Plan
Business Initial Modeling
Data Results Deployment
Objectives Data Technique

Plan Monitering
Assess Describe Clean Generate Review
&
Situation Data Data Test Design Process
Maintenance

Determine Produce
Explore Construct Build Determine
Data Mining Final
Data Data Model Next Steps
Goals Report

Verify
Produce Integrate Assess Review
Data
Project Plan Data Model Project
Quality

Format
Data

12
Phase 1. Business Understanding
• Business Objectives
• Assessement of the situation
• Data Mining Objectives
• Project Plan
Focuses on understanding the project objectives and
requirements from a business perspective, then converting
this knowledge into a data mining problem definition and a
preliminary plan designed to achieve the objectives

13
Phase 2. Data Understanding

• Collect initial data


• Describe the Data (formato, cantidad)
• Explore the Data (distribución de valores, relaciones)
• Verify the Quality
Starts with an initial data collection and proceeds with
activities in order to get familiar with the data, to identify data
quality problems, to discover first insights into the data or to
detect interesting subsets to form hypotheses for hidden
information.

14
Phase 3. Data Preparation
• Takes usually over 90% of the time
• Data selection
• Clean Data
• Construct Data
(nuevos atributos nuevos registros)
• Integrate Data
• Format Data
Covers all activities to construct the final dataset from the
initial raw data. Data preparation tasks are likely to be
performed multiple times and not in any prescribed order. Tasks
include table, record and attribute selection as well as
transformation and cleaning of data for modeling tools. 15
Phase 4. Modeling
• Select the modeling technique
(based upon the data mining objective)
• Test Design
• Build model
(Parameter settings)
• Assess Model (rank the models)
Various modeling techniques are selected and applied and their
parameters are calibrated to optimal values. Some techniques have
specific requirements on the form of data. Therefore, stepping back to
the data preparation phase is often necessary.

16
Phase 5. Evaluation
• Evaluation of results
- how well it performed on test data
• Review Process
- depend on model type
• Determine next steps
- Thoroughly evaluate the model and review the steps executed to construct
the model to be certain it properly achieves the business objectives. A key
objective is to determine if there is some important business issue that has not
been sufficiently considered. At the end of this phase, a decision on the use of
the data mining results should be reached

17
Phase 6. Deployment

• Plan deployment
• Plan monitoring and maintenance
• Produce Final Report
• Review Project

18
Summary
• Why CRISP-DM?
The data mining process must be reliable and repeatable by people
with little data mining skills

CRISP-DM provides a uniform framework for


- guidelines
- experience documentation
CRISP-DM is flexible to account for differences
- Different business/agency problems
- Different data

19
Questions & Discussion

20

You might also like