Download as pdf or txt
Download as pdf or txt
You are on page 1of 10

Unsupervised Learning

Harsha Vardhan Reddy Burri


Unsupervised Learning
• There is no output or response or target
variable, only having input variable(X)
• The major goal is to identify the hidden
patterns and relationships in data
• Preparing clusters and finding data 
distribution in the space (density estimation).
• Examples: grouping fruits
Grouping : 
• Green color – bananas and grapes
• Physical characters
• Green color and big size – banana
• Like shape, color, odor, 
• Green color and small size‐ grapes 
• Ex: Red color – apples and cherrys
• Redcolor and bigsize‐ apples
• Redcolor and small size‐ cherrys
Real Life examples
• You meet strangers in party , then you need to 
classify them without prior knowldge. How to 
do? – Basis on gender, age, habits and other
behavioural
• You found a new instance that differ from 
others, how to find or classify? ‐
Challenges
• Harder as compared to Supervised Learning tasks..
• Dealing with large number of dimensions and large number of 
data items can be problematic because of time complexity;
• The effectiveness of the method depends on the definition of 
“distance” (for distance‐based clustering). 
• The result of the clustering algorithm (that in many cases can 
be arbitrary itself) can be interpreted in different ways.
• How do we know if results are meaningful since no answer 
labels are available?
• Let the expert look at the results (external evaluation)
• Define an objective function on clustering (internal 
evaluation)
Applications
• Can be applied in many fields
• Market Analysis : 
Grouping customers
• Biology: 
Classification of plants and animals given their features
Analysis genes and genomes
• Insurance: 
Identifying groups of motor insurance policy holders 
with a  high average claim cost; identifying frauds;
• Earthquake studies: 
– Clustering observed earthquake epicenters to identify 
dangerous zones;
• World Wide Web: 
– Document classification; clustering weblog data to discover 
groups of similar access patterns.
Types of Unsupervised algorithms
• K‐means clustering

• Hierarchial clustering

• Principle Component Analysis
K‐means Clustering 
• Unsupervised learning algoritm
• Unleabelled data or no target label
• Goal is to find patterns and making clusters
Stpes in K‐means:
• 1: Pick random points as cluster centers (also called as 
centroids). cluster centroids – c1,  c2, c3….ck
• 2: Assign each data point to nearest cluster by calculating
its distance to each centroid
• 3. find new cluster center by taking the averages of 
assigned points
• 4. Repeat step 2 and 3 untill none of the cluster
assignments change
Dataset= [2,3,4,10,11,12,20,25,30] #monthly expenditure (in 1000) of customers

10,11,12,20,25,30
2,3,4
Mean =3
Mean =18

11,12,20,25,30
2,3,4,10

Mean =5
Mean =20

12,20,25,30
2,3,4,10,11

Mean =6
Mean =22
2,3,4,10,11,12 20,25,30

Mean =7
Mean =25

Applications: 
1. Image segmentation
2. Clustering genome data – gene segments
3. Data mining segmentation
4. Anomly detection
5. Instance classification
6. Customer classification

You might also like