Download as pdf or txt
Download as pdf or txt
You are on page 1of 43

Data Mining

รศ.ดร. วรพจน กรีสุระเดช


Worapoj Kreesuradej, Ph.D.
Associate Professor

Data Mining & Data Exploration Laboratory (DME Lab),


Faculty of Information Technology,
King Mongkut's Institute of Technology Ladkrabang,
Web: www.it.kmitl.ac.th/dme
Email: worapoj@it.kimitl.ac.th, dme@it.kmitl.ac.th.
Text Book
z Jaiwei han and Micheline Kamber, Data
Mining: concepts and techniques,
Morgan Kaufmann, 2001.
References
z Christopher Westphal and Teresa
Blaxton, Data Mining Solutions, John Wiley
& Sons Inc., 1998.
z Pieter Adriaans and Dolf Zantinge, Data
Mining, Addison Wesley, 1996.
z Michael J.A. Berry and Gordon Linoff,
Data Mining Techniques, John Wiley &
Sons Inc., 1997.
z Alex Berson and Stephen J. Smith, Data
Warehousing, Data Mining & OLAP,
McGraw Hill, 1997.
References
z Robert Groth, Data Mining, Prentice Hall
PTR, 1997.
z Sholom M. Weiss and Nitin Indurkhya,
Predictive Data Mining, Margan Kaufmann
Publishers Inc., 1998.
z Ryszard S. Michalski, Ivan Bratko and
Miroslav Kubat, Machine Learning and
Data Mining, John Wiley & Sons Inc.,
1999.
z Vladimir Cherkassky and Filip Mulier,
Learning From Data, John Wiley & Sons
Inc., 1998.
References
z Vasant Dhar and Roger Stein, Seven
Methods for Transforming Corporate Data
Into Business Intelligence, Prentice Hall,
1997.
z Cabena Peter etc., Discovering data
mining, Prentice Hall, 1997.
z Business Modeling and Data Mining
by Dorian Pyle , Morgan Kaufmann; 1st
edition (April 2003) .
z Exploratory Data Mining and Data Cleaning
by Tamraparni Dasu (Author), Theodore
Johnson (Author) , John Wiley & Sons; 1st
edition (May 9, 2003)
References
z Building Data Mining Applications for CRM by
Alex Berson, Kurt Thearling, Stephen J. Smith,
McGraw-Hill Osborne Media; (December 22,
1999)
z Statistical Modeling and Analysis for Database
Marketing: Effective Techniques for Mining Big
Databy Bruce Ratner , : Chapman & Hall; (May
2003)
z Data Mining, Ian H. Witten, Eible Frank, Morgan
Kaufman, 2005.
What is Data Mining?
z Definition: Data Mining is the
process of extracting previously
unknown, valid and actionable
information from large database and
then using the information to make
crucial business decisions.
z Alternative names: Knowledge
Discovery in Databases (KDD)
Evolution of Database
Technology
z 1960s:Data collection, database
creation, and network DBMS
z 1970s: Relational data model,
relational DBMS implementation
Evolution of Database
Technology
z 1980s: RDBMS, advanced data
models (extended-relational, OO,
deductive, etc.) and application-
oriented DBMS (spatial, scientific,
engineering, etc.)
z 1990s—2000s: Data mining and
data warehousing, multimedia
databases, and Web databases
Data Mining
Client Data

Custom Data
Application Warehouse Query & Reporting tool

ERP

Packaged
Application

OLAP
Custom Intelligence Enterprise
Application
Business Impact
Increasing
business Impact

Data Mining

Information Discovery

Data Exploration
OLAP
Statistical Analysis, Querying and Reporting

Data Warehouses / Data Marts

Data Sources
Paper, Files, Information Providers, Database Systems, OLTP
Potential Applications of
Data Mining
z Market analysis and management
¾ purchasing pattern over time
¾ cross-selling
¾ customer profiling
¾ direct mail campaign
¾ market segmentation
Potential Applications of
Data Mining
z Risk analysis and management
¾ forecasting
¾ credit scoring for loan application
processing
¾ profile of attrition (churn management)
Potential Applications of Data
Mining
z Fraud detection and management
¾ money laundering detect suspicious
money transactions
¾ detecting Inappropriate Medical
Treatments
Potential Applications of Data
Mining
z Web mining
¾ Web Usage Mining
¾ Web Content Mining
ƒ Automatic Classification of Web
Document
¾ Web Structure Mining
Potential Applications of
Data Mining
z Text mining
¾ Dividing documents into groups
¾ Document feature extraction

Structured Data

Uns tructured
Data
Data mining process
Pattern Evaluation

Data Mining

Data Preparation

Preprocessed
Data
Selection
Business
Target
Objective Data

Databases
Data mining process
z Business Objectives Determination
¾ Identify
the business problems or
opportunity
z Data Selection
¾ Identify
all internal or external sources
of information and select which
subset of the data is needed for the
data mining application.
Data mining process
z Data Preprocessing
¾ The goal of data preprocessing is to
ensure the quality of the selected data.
¾ current data set, sampling data, unit
conversion, representation formats,
detecting missing value
Data mining process
z Data Transformation
¾ thegoal is to transform data to suit the
intended analysis and the data formats
required by the data mining algorithms,
many of which have particular
requirements.
Data mining process
z Data Mining
¾ Select modeling technique
¾ Data Mining Operations
ƒ Predictive Modeling
ƒ Database Segmentation
ƒ Link Analysis
ƒ Visualization
Predictive Modeling
z Findingmodels that describe and
distinguish classes or concepts for
future prediction
¾ Model: decision-tree, neural network
Database Segmentation
(clustering)
z partitioninga database into
segments of similar records, that is
records that share a number of
properties.
K-means, Kohonen neural
¾ Model:
networks
Database Segmentation

Annual
Income

Age
Link Analysis
z Findingfrequent patterns,
associations, correlations, or causal
structures among sets of items or
objects in transaction databases,
relational databases, and other
information repositories
¾ Model: Apiori Algorithm,
Visualization of Link Analysis
software
Visualization
Visualization
Visualization of a decision
tree in MineSet 3.0
Data mining process
z Analysis of results
¾ Interpret and evaluate the output form
data mining.
¾ Have we found something that is
interesting, valid, and actionable?
Data mining process
z Assimilation of knowledge
¾ The objective is to put into action,
according to the new, valid and
actionable information from the
previous process steps.
Effort Required for Each
Data Mining Process Step
Methodology for data mining
zCRISP-DM
¾ CrossIndustry Standard
Process for Data Mining
(CRISP-DM)
zConsortium of data
miners from various
industries –
manufacturing,
marketing, and
government
Examples of Data Mining
Systems
z IBM Intelligent Miner
¾A wide range of data mining algorithms
¾ Scalable mining algorithms
¾ Toolkits: neural network algorithms,
statistical methods, data preparation,
and data visualization tools
¾ Tight integration with IBM's DB2
relational database system
Examples of Data Mining
Systems
z SAS Enterprise Miner
¾A variety of statistical analysis tools
¾ Data warehouse tools and multiple data
mining algorithms
z Clementine (from SPSS)
¾ Multiple
data mining algorithms and
advanced statistics
Examples of Data Mining
Systems
z SQL Server 2005
¾ Multipledata mining modules:
discovery-driven OLAP analysis,
association, classification, and
clustering
¾ Tight integration with SQL Server
relational database system
Examples of Data Mining
Systems
z Oracle Data miner
¾ Multipledata mining modules:
discovery-driven OLAP analysis,
association, classification, and
clustering
Examples of Data Mining
Systems
z DBMiner (DBMiner Technology
Inc.)
¾ Multiple data mining modules:
discovery-driven OLAP analysis,
association, classification, and
clustering
¾ Efficient, association and sequential-
pattern mining functions, and visual
classification tool
¾ Mining both relational databases and
data warehouses
Trends in Data Mining
z Application exploration
¾ development of application-specific
data mining system
¾ Invisible data mining (mining as built-in
function)
Trends in Data Mining
z Scalable data mining methods
¾ Constraint-based mining: use of
constraints to guide data mining
systems in their search for
interesting patterns
z Integration
of data mining with
database systems, data
warehouse systems, and Web
database systems
Trends in Data Mining
z Standardization of data mining
language
¾A standard will facilitate
systematic development, improve
interoperability, and promote the
education and use of data mining
systems in industry and society
z Web mining
Data Mining: Confluence of
Multiple Disciplines
Database
Statistics
Technology

Machine
Learning
Data Mining Visualization

Information Other
Science Disciplines
Thank you !!!

You might also like