Download as pdf or txt
Download as pdf or txt
You are on page 1of 16

Data Mining

IS314

Dr. Ayman Alhelbawy 9 March 2022


Knowledge Discovery Process

Patte
Data Inform rn
Input Data Data Pre- Post-
ation
Processing Mining Processing
Know
ledge

Data integration Pattern discovery Pattern evaluation


Normalization Association & correlation Pattern selection
Feature selection Classification Pattern interpretation
Clustering
Dimension reduction Pattern visualization
Outlier analysis
…………

■ This is a view from typical machine learning and statistics communities

Knowledge Discovery Process (cont.)

• Data Cleaning: remove noise or irrelevant data and outliers

• Data Integration: multiple data sources are combined (sometimes


combined with data cleaning named preprocessing step)

• Data Selection:data relevant to the analysis task are retrieved from the
database

Knowledge Discovery Process (cont.)

• Data Transformation: data are transformed into forms appropriate for


mining by performing aggregation or summary. (sometimes this step is
performed before data selection)

• Data Mining: Intelligent methods are applied in order to extract data


patterns

• Pattern Evaluation: identify interesting patterns representing knowledge


base.

• Knowledge Representation: visualization and knowledge representation


techniques are used to represent the extracted patterns. 4

Data Mining System Architecture


Kinds of Data could be mined

• Database-oriented data sets like Relational database, data


warehouse, transactional database
• Data streams and sensor data
• Time-series data, temporal data, sequence data (incl. bio-sequences)
• Structure data, graphs, social networks and multi-linked data
• Object-relational databases
• Heterogeneous databases and legacy databases
• Spatial data and spatiotemporal data
• Multimedia database
• Text databases
• The World-Wide Web

Data Mining Tasks

1. Predictive Tasks

Use some variables to predict unknown or future


values of other variables

2. Descriptive Tasks

Find human-interpretable patterns that describe the


data.

Data Mining Tasks …


Data Mining Tasks

1. Classification (Predictive).

2. Association rule mining (Descriptive):

3. Clustering (Descriptive)

4. Sequential pattern mining (Descriptive)

5. Deviation detection (Outlier analysis) [predictive]

6. Regression [predictive]

Data Mining Tasks (cont.)


Classification 1

1. Mining patterns that can classify future data into known


classes
Data Mining Tasks (cont.)
Classification 2
Data Mining Tasks (cont.)
Classification (cont.)

• Classification and label prediction


• Construct models (functions) based on some training examples
• Describe and distinguish classes or concepts for future prediction

E.g., classify countries based on (climate), or classify cars based on
(gas mileage)
• Predict some unknown class labels
• Typical methods
• Decision trees, naïve Bayesian classification, support vector
machines, neural networks, rule-based classification, pattern-based
classification, logistic regression, …
• Typical applications:
• Credit card fraud detection, direct marketing, classifying stars,
diseases, web-pages, …

Data Mining Tasks (cont.)


Association Rule Mining

1. Mining any rule of the form X —> Y, where X and Y


are sets of data items.
Data Mining Tasks (cont.)
Association and Correlation Analysis

• Frequent patterns (or frequent item sets)


• What items are frequently purchased together in
Carfour store?
• Association, correlation vs. causality
•A typical association rule
• Diaper ! Beer [0.5%, 75%] (support, confidence)
• Are strongly associated items also strongly correlated?

Data Mining Tasks (cont.)


Association Rule Mining Applications
Thank You.
Questions????

You might also like