Professional Documents
Culture Documents
Introduction To Data Mining-Week1
Introduction To Data Mining-Week1
CSE5317
(Elective)
4th Year Semester-1
Week-1 Lecture PPT
By
Dr.T.GopiKrishna
Assistant Professor
Dept of Computer Sciene and Engineering
What is Data Mining?
Extracting and ‘Mining’ knowledge from large amounts of data.
(Or)
“Gold Mining from rock or sand” is same as “Knowledge mining from data”
Data Mining
Database
systems
Warehouse (OLAP)
Online analytical process
Mostly reads
Queries are long and complex
Gb - Tb of data
History
Lots of scans
Summarized, reconciled data
Hundreds of users (e.g., decision-
makers, analysts)
Machine Learning
Machine learning is a field of artificial intelligence that uses
statistical techniques to give computer systems the ability to "learn"
Prediction Methods
Use some variables to predict unknown or
future values of other variables.
Description Methods
Find human-interpretable patterns that
describe the data.
Data Mining Tasks
Classification [Predictive]
Clustering [Descriptive]
Association Rule Discovery [Descriptive]
Sequential Pattern Discovery [Descriptive]
Regression [Predictive]
Deviation Detection [Predictive]
Data Mining Goals
Data mining uncovers this in-depth
business intelligence by using
advanced analytical and modelling
techniques.
Transactional Databases:
Consists of a file with records where each record is a transaction.
Each transaction has a unique transaction ID and list of items that make
up transactions.
Object-Relational Databases:
Temporal Databases, Sequence Databases and Time-Series Databases
Spatial Databases and Spatiotemporal Databases:
Text Databases and Multimedia Databases:
Heterogeneous Databases and Legacy Databases:
Stages of Data Mining Process
TRUE or FALSE?
KDD Process
TRUE or FALSE?
Brief explanation of data mining
stages
Data Cleaning – Remove noisy and inconsistent
Data Integration – Multiple data sources combined
Data Selection – Data relevant to analysis retrieved
Data Transformation – Transform into form suitable
Data Mining (Summarized / Aggregated) Data Mining –
Extract data patterns using intelligent methods
Pattern Evaluation – Identify interesting patterns
Knowledge Presentation – Visualization / Knowledge
Representation – Presenting mined knowledge to the
user
Data Mining Techniques
There are several major data mining
techniques have been developing and using
in data mining projects recently including
association,
classification,
clustering,
prediction,
sequential patterns and
decision tree.
Data Mining Techniques(Association)
Geometric projection visualization technique
i. Scatter-plot matrices
It consists of scatter plots of all possible pairs of variables in a dataset.
For example: The face width, the length of the mouth and the length of
nose, etc. as shown in the following diagram.
Visualization techniques
Rely on domain experts for decision making - using their knowledge intuition
o Time consuming, costly, error prone, biased
So the solution is to use Data Mining tools
– performs data analysis,
- finds data patterns
Architecture of Typical Data Mining
Systems
Architecture of a typical Data Mining
System – Major Components:
Knowledge Base:
Domain knowledge is used to guide search – used to evaluate
interestingness of patterns.
Includes concept hierarchies, user benefits, thresholds, metadata
Database / Data warehouse Server:
Responsible for fetching relevant data based on data mining
request.
Data Mining Engine:
Consists of modules for characterization, association, correlation analysis,
classification, cluster analysis, prediction, outlier analysis and evolution
analysis.
Pattern Evaluation Module:
Interacts with data mining modules. Focuses the search
towards interesting patterns.
Pattern evaluation module may be integrated with mining module
to confine the search.
User Interface:
Communicates between users and data mining system
Specifies data mining query – to focus search
Uses intermediate data mining results to perform exploratory data
Major Issues in Data Mining:
Mining Methodology Issues:
o Mining different kinds of knowledge in databases.
o Incorporation of background knowledge
o Handling noisy or incomplete data
o Pattern Evaluation – Interestingness Problem
User Interaction Issues:
o Interactive mining of knowledge at multiple levels of abstraction
o Data mining query languages and ad-hoc data mining.
o Presentation and visualization of data mining results.
Performance Issues:
o Efficiency and Scalability of Data Mining Algorithms.
o Parallel, distributed and incremental mining algorithms.
Issues related to diversity of data types:
o Handling of relational and complex types of data.
o Mining information from heterogeneous databases and global I
nformation systems.
Review Questions