Download as pptx, pdf, or txt
Download as pptx, pdf, or txt
You are on page 1of 14

Data Mining

Unit 1-Session 1-Part 1


CO1: Identify the types of data to be pre-processed for
the given dataset using the preprocessing
technique.
LO1.1:Describe about Data mining and its
functionalities
SO1.1.1: Explain Knowledge discovery process and the
importance of data mining

2
Data Mining
Unit I – INTRODUCTION
• Introduction- Different Kinds of Data
• Patterns Mined –Applications
• Attribute Types
• Data Preprocessing: Data Cleaning
• Data Integration
• Data Reduction
• Data Transformation
• Data Discretization
• Data Visualization

Data Mining 3
Unit I – INTRODUCTION
• Introduction- Different Kinds of Data
• Patterns Mined –Applications
• Attribute Types
• Data Preprocessing: Data Cleaning
• Data Integration
• Data Reduction
• Data Transformation
• Data Discretization
• Data Visualization

Data Mining 4
Why Data Mining?
• Explosive Growth of Data
• Data collection and data availability
• Automated data collection tools, database systems, Web,
computerized society
• Major sources of abundant data
• Business: Web, e-commerce, transactions, stocks, …
• Science: Remote sensing, bioinformatics, scientific
simulation, …
• Society and everyone: news, digital cameras, YouTube
• Too much data, Less knowledge!

Data Mining 5
What is Data Mining?
• Definition
• Data mining (knowledge discovery from data)
Extraction of interesting patterns/knowledge/
hidden information (non-trivial, implicit, previously
unknown and potentially useful) from huge amount of
data.
• Alternative names
• Knowledge discovery (mining) in databases (KDD),
knowledge extraction, data/pattern analysis, data
archeology, data dredging, information harvesting,
etc.

Data Mining 6
Where to Apply Data Mining?
• Business
• Education
• Sports
• Customer Segmentation

Data Mining 7
Who Uses Data Mining?
• Business Owners
• To gain profit in business

Data Mining 8
KDD Process – Simple Overview

Data Data Pre- Data Knowledge Presentation


Collection Extraction to End User
Processing Mining

Data Cleaning Association & correlation


Classification Knowledge/Pattern/ Hidden
Data Integration
Clustering
Data Selection Outlier analysis Information
Data Transformation ………… • Evaluation
• Selection
• Interpretation
• Visualization

Data Mining 9
Knowledge Discovery in Database (KDD) Process

Pattern Evaluation

Data Mining

Task-relevant Data

Data Warehouse Selection and Transformation

Data Cleaning

Data Integration

Databases
Data Mining 10
Steps in KDD Process
• Data Cleaning
• Remove noise and inconsistent data
• Data Integration
• Combine multiple data sources
• Data Selection
• Data relevant to analysis tasks are retrieved form the data
• Data Transformation
• Transform data into appropriate form for mining (summary, aggregation, etc.)
• Data Mining
• Extract data patterns
• Pattern Evaluation
• Identify truly interesting patterns
• Knowledge Representation
• Use visualization and knowledge representation tools to present the mined
data to the user

Data Mining 11
Summary
• Data Mining
• Why
• Where
• Who
• How
• KDD Process

Data Mining 12
Reference
1. Jiawei Han, Micheline Kamber, Jian Pei, “Data Mining:
Concepts and Techniques”, 3rd Edition, Elsevier, 2014.
2. Jure Leskovec, Anand Rajaraman, Jeffery David
Ullman, “Mining of Massive Datasets”, 2nd Edition,
Cambridge University Press, 2014.
3. Ian H.Witten, Eibe Frank, Mark A.Hall, “Data Mining:
Practical Machine Learning Tools and Techniques”, 3rd
Edition, Elsevier, 2011.

Data Mining 13
Thank you

Data Mining 14

You might also like