Professional Documents
Culture Documents
Data Mining: Dr. Retno Kusumaningrum, S.Si., M.Kom
Data Mining: Dr. Retno Kusumaningrum, S.Si., M.Kom
(Gurin, 2013)
Data
(Gurin, 2013)
Format Data
Semi
Structured Unstructured
Structured
Description Methods
Find human-interpretable patterns that describe the
data.
Data Mining Tasks
Clu
ste Data
ring Tid Refund Marital
Status
Taxable
Income Cheat
n
tio
1 Yes Single 125K No
2 No Married 100K No
ca
3 No Single 70K No
s ifi
as
4 Yes Married 120K No
5 No Divorced 95K Yes
Cl
6 No Married 60K No
7 Yes Divorced 220K No
8 No Single 85K Yes
9 No Married 75K No
An
10 No Single 90K Yes
De oma
11 No Married 60K No
tion
12 Yes Divorced 220K No
i a tec ly
oc
13 No Single 85K Yes
s 14 No Married 75K No
tio
As s 15 No Single 90K Yes n
le
10
Ru
Milk
Classification
Find a model for class attribute as a function of the values of
other attributes
Employed
Class
# years at No Yes
Level of Credit
Tid Employed present
Education Worthy
address
1 Yes Graduate 5 Yes
No Education
2 Yes High School 2 No
3 No Undergrad 1 No { High school,
Graduate
4 Yes High School 10 Yes Undergrad }
… … … … …
Number of Number of
10
years years
Yes No Yes No
Classification Structure
l l e
r ica ir ca a tiv # years at
ti t Level of Credit
go go an s Tid Employed present
a te a te u as
Education
address
Worthy
c c q cl 1 Yes Undergrad 7 ?
# years at 2 No Graduate 3 ?
Level of Credit
Tid Employed present 3 Yes High School 2 ?
Education Worthy
address
1 Yes Graduate 5 Yes … … … … …
10
Training
Learn
Set Classifier Model
Examples of Classification Task
Classifying credit card transactions
as legitimate or fraudulent
Inter-cluster
Intra-cluster distances are
distances are maximized
minimized
Applications of Cluster Analysis
Understanding
Custom profiling for targeted marketing
Group related documents for browsing
Group genes and proteins that have similar
functionality
Group stocks with similar price fluctuations
Summarization
Reduce the size of large data sets
Use of K-means to
partition Sea Surface
60
Land Cluster 2
0
(NPP) into clusters that
Ic e or No NPP
-30
reflect the Northern and
Sea Cluster 2 Southern Hemispheres.
-60
Sea Cluster 1
-90
-180 -150 -120
29
-90 -60 -30 0 30 60 90 120 150 180
Cluster
longitude
Clustering: Application 1
Market Segmentation:
Goal: subdivide a market into distinct subsets of
customers where any subset may conceivably be selected
as a market target to be reached with a distinct marketing
mix.
Approach:
Collect different attributes of customers based on their
geographical and lifestyle related information.
Find clusters of similar customers.
Measure the clustering quality by observing buying patterns of
customers in same cluster vs. those from different clusters.
Clustering: Application 2
Document Clustering:
Goal: To find groups of documents that are similar to each
other based on the important terms appearing in them.
TID Items
1 Bread, Coke, Milk
Rules
RulesDiscovered:
Discovered:
2 Beer, Bread
{Milk}
{Milk}-->
-->{Coke}
{Coke}
3 Beer, Coke, Diaper, Milk {Diaper,
{Diaper,Milk}
Milk}-->
-->{Beer}
{Beer}
4 Beer, Bread, Diaper, Milk
5 Coke, Diaper, Milk
Association Analysis: Applications
Market-basket analysis
Rules are used for sales promotion, shelf management, and
inventory management
Medical Informatics
Rules are used to find combination of patient symptoms and
test results associated with certain diseases
Association Analysis: Applications
High Dimensionality
Non-traditional Analysis