Professional Documents
Culture Documents
III-IT-Data mining Unit 1-session 1-part1
III-IT-Data mining Unit 1-session 1-part1
2
Data Mining
Unit I – INTRODUCTION
• Introduction- Different Kinds of Data
• Patterns Mined –Applications
• Attribute Types
• Data Preprocessing: Data Cleaning
• Data Integration
• Data Reduction
• Data Transformation
• Data Discretization
• Data Visualization
Data Mining 3
Unit I – INTRODUCTION
• Introduction- Different Kinds of Data
• Patterns Mined –Applications
• Attribute Types
• Data Preprocessing: Data Cleaning
• Data Integration
• Data Reduction
• Data Transformation
• Data Discretization
• Data Visualization
Data Mining 4
Why Data Mining?
• Explosive Growth of Data
• Data collection and data availability
• Automated data collection tools, database systems, Web,
computerized society
• Major sources of abundant data
• Business: Web, e-commerce, transactions, stocks, …
• Science: Remote sensing, bioinformatics, scientific
simulation, …
• Society and everyone: news, digital cameras, YouTube
• Too much data, Less knowledge!
Data Mining 5
What is Data Mining?
• Definition
• Data mining (knowledge discovery from data)
Extraction of interesting patterns/knowledge/
hidden information (non-trivial, implicit, previously
unknown and potentially useful) from huge amount of
data.
• Alternative names
• Knowledge discovery (mining) in databases (KDD),
knowledge extraction, data/pattern analysis, data
archeology, data dredging, information harvesting,
etc.
Data Mining 6
Where to Apply Data Mining?
• Business
• Education
• Sports
• Customer Segmentation
Data Mining 7
Who Uses Data Mining?
• Business Owners
• To gain profit in business
Data Mining 8
KDD Process – Simple Overview
Data Mining 9
Knowledge Discovery in Database (KDD) Process
Pattern Evaluation
Data Mining
Task-relevant Data
Data Cleaning
Data Integration
Databases
Data Mining 10
Steps in KDD Process
• Data Cleaning
• Remove noise and inconsistent data
• Data Integration
• Combine multiple data sources
• Data Selection
• Data relevant to analysis tasks are retrieved form the data
• Data Transformation
• Transform data into appropriate form for mining (summary, aggregation, etc.)
• Data Mining
• Extract data patterns
• Pattern Evaluation
• Identify truly interesting patterns
• Knowledge Representation
• Use visualization and knowledge representation tools to present the mined
data to the user
Data Mining 11
Summary
• Data Mining
• Why
• Where
• Who
• How
• KDD Process
Data Mining 12
Reference
1. Jiawei Han, Micheline Kamber, Jian Pei, “Data Mining:
Concepts and Techniques”, 3rd Edition, Elsevier, 2014.
2. Jure Leskovec, Anand Rajaraman, Jeffery David
Ullman, “Mining of Massive Datasets”, 2nd Edition,
Cambridge University Press, 2014.
3. Ian H.Witten, Eibe Frank, Mark A.Hall, “Data Mining:
Practical Machine Learning Tools and Techniques”, 3rd
Edition, Elsevier, 2011.
Data Mining 13
Thank you
Data Mining 14