Welcome to Scribd!

Stat 390 Presentation 2

Uploaded by

0% found this document useful (0 votes)

45 views14 pages

Clustering is an unsupervised machine learning technique that groups unlabeled data points together based on similarities. It is commonly used to simplify large datasets by organizing them into a smaller number of groups. The two main types of clustering are k-means clustering and hierarchical clustering. K-means clustering aims to partition observations into k clusters where each observation belongs to the cluster with the nearest mean. Hierarchical clustering creates a hierarchy of clusters which can either be formed through an agglomerative approach that starts with each observation in its own cluster and recursively merges the closest pairs, or a divisive approach that starts with all observations in one cluster and recursively splits clusters.

Original Description:

Original Title

stat 390 presentation 2

Copyright

Available Formats

PPTX, PDF, TXT or read online from Scribd

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Report this Document

Copyright:

Available Formats

Download as PPTX, PDF, TXT or read online from Scribd

Flag for inappropriate content

Download as pptx, pdf, or txt

0% found this document useful (0 votes)

45 views14 pages

Stat 390 Presentation 2

Uploaded by

api-340742243

Copyright:

Available Formats

Download as PPTX, PDF, TXT or read online from Scribd

Flag for inappropriate content

Download as pptx, pdf, or txt

Jump to Page

You are on page 1of 14

Search inside document

Clustering

A N U NSU P E RV ISE D M A CH INE L E A R N ING

TA SK
Why use clustering?
 Utilizes a broad set of techniques to find subgroups of
observations within a data set.
Simplifies extremely large datasets by grouping features with
similar values.
Uses simple non-statistical principles.
Very ﬂexible and malleable algorithm.
Wide set of real-world applications.
Segments consumers into groups with similar demographics or buying
patterns.
Types of Clustering
Good Clustering
 Good clustering will produce
clusters with:
High intra-class similarity
Low inter-class similarity
 Similarity is a measure of
“alikeness” of instances.
This is sometimes expressed as
a distance function.
K-Means Clustering
 Simplest and most commonly used clustering method for splitting
a dataset into a set of k groups.
k-Means is very sensitive to the value of k and to the initial
randomly-chosen cluster centers.
Choosing the Right k
 Ideally, the appropriate value for k should be
determined based on a priori knowledge or on
business requirements.
 Rule of thumb - k ≈ √(n/2) (this is only a
start)
Potential Methods of
Choosing k
Elbow Method (most
common)
Choose k, such that there are
diminishing returns beyond that
point.

Information Criterion
Approach
Silhouettes Method
Jump Method
Gap Statistic
k-Means Clustering Example

Suppose we would like Randomly pick k initial

to cluster these cluster centers or
instances. centroids.
k-Means Clustering Example

Assign each instance to
the closest cluster
centroid.
The distance between
each instance and the
centroid is measured by
the Euclidean distance:
k-Means Clustering Example

Move each cluster Reassign instances

centroid to the mean of closest to a different
each cluster. centroid to the
appropriate cluster
centroid.
k-Means Clustering Example

Recompute cluster Reassign instances to

centroid means. clusters.
No change, then
converged (ﬁnished)!
Hierarchical Clustering
Do not know in advance how many clusters we want
End up with a tree-like visual representation of the observations,
called a dendrogram

Two Main Types of Hierarchical Clustering:
Agglomerative Clustering
Divisive Clustering


The (dis)similarity of observations are analyzed by their
Euclidean distance, as done so in k-means clustering

A linkage criterion then specifies the dissimilarity of sets as a
function of the pairwise distances of observations in the sets
Agglomerative Hierarchical
Clustering

Suppose we begin with eleven different instances we would like to cluster.

Here, we take a "bottom-up" approach:
1. Each observation starts in its own cluster.
2. Pairs of clusters are merged together based on similar characteristics as
one moves up the hierarchy.
3. At each step of the algorithm, the two clusters that are the most similar
are combined into a new bigger cluster, until there is only one cluster
Divisive Hierarchical
Clustering

Suppose we begin with the same eleven different instances we would like to
cluster. Here, we take a “top-down" approach:
1. All observations start in one cluster.
2. Splits are performed recursively as one moves down the hierarchy based
on similarities between the observations.
3. At each step of iteration, the most heterogeneous cluster is divided into
two until all observations are in their own cluster
Works Cited
James, Gareth, et al. An Introduction to Statistical Learning:
with Applications in R. Springer, 2017.

Numerical Method - Accuracy of Numbers
Document7 pages
Numerical Method - Accuracy of Numbers
o
No ratings yet
Chapter 8 - Clustering
Document42 pages
Chapter 8 - Clustering
FakhrulShahrilEzanie
No ratings yet
Clustering Analysis: What Is Cluster Analysis?
Document5 pages
Clustering Analysis: What Is Cluster Analysis?
shyama
No ratings yet
"These Are Just Rough Notes For References" What Is K-Means Clustering
Document9 pages
"These Are Just Rough Notes For References" What Is K-Means Clustering
Nikhil Jojen
No ratings yet
Clustering FinancialData
Document38 pages
Clustering FinancialData
Zeeshan Ali
No ratings yet
K Means Clustering Lecture
Document32 pages
K Means Clustering Lecture
Daneil Radcliffe
No ratings yet
Clustering
Document7 pages
Clustering
Rupesh Gaur
No ratings yet
Clustering: Clustering Is One of The Most Common Exploratory Data Analysis
Document5 pages
Clustering: Clustering Is One of The Most Common Exploratory Data Analysis
Mada
No ratings yet
Unsupervised Learning: K-Means Clustering
Document23 pages
Unsupervised Learning: K-Means Clustering
ariw200201
No ratings yet
Clustering Hierarichal Method
Document83 pages
Clustering Hierarichal Method
Priyanka Bhardwaj
No ratings yet
DM BS Lec8 Clustering
Document48 pages
DM BS Lec8 Clustering
ruba
No ratings yet
K Means Clustering
Document6 pages
K Means Clustering
Alina Corina Bala
No ratings yet
Lecture14 Notes
Document9 pages
Lecture14 Notes
chelsea
No ratings yet
15-505 Internet Search Technologies: Kamal Nigam
Document62 pages
15-505 Internet Search Technologies: Kamal Nigam
Amuliya VS
No ratings yet
Jaipur National University: Project Design With Seminar
Document26 pages
Jaipur National University: Project Design With Seminar
Faizan Shaikh
100% (1)
Clustering
Document4 pages
Clustering
DhruTheGamer
No ratings yet
05 Clustering
Document96 pages
05 Clustering
Insta Gram
No ratings yet
Lecture+Notes+ +clustering
Document13 pages
Lecture+Notes+ +clustering
Pankaj Pandey
No ratings yet
Lecture Notes - Clustering
Document13 pages
Lecture Notes - Clustering
gunjan Bhardwaj
No ratings yet
DM Lecture 06
Document32 pages
DM Lecture 06
Sameer Ahmad
No ratings yet
Data Mining-Model Based Clustering
Document8 pages
Data Mining-Model Based Clustering
Raj Endran
No ratings yet
Hierarchical Clustering: Required Data
Document6 pages
Hierarchical Clustering: Required Data
Hritik Agrawal
No ratings yet
Clustering
Document80 pages
Clustering
Aatmaj Salunke
No ratings yet
Hierarchical Clustering Unit 4 ML
Document14 pages
Hierarchical Clustering Unit 4 ML
Smriti Sharma
No ratings yet
K-Means and PCA
Document69 pages
K-Means and PCA
vdjohn
No ratings yet
Unit 3 Updated Notes
Document29 pages
Unit 3 Updated Notes
abhishekilindra2021
No ratings yet
Concepts and Techniques: Data Mining
Document43 pages
Concepts and Techniques: Data Mining
Esraa Samir
No ratings yet
Partitioning Methods
Document26 pages
Partitioning Methods
Ahmed hussain
No ratings yet
Text Analytics Unit-3
Document11 pages
Text Analytics Unit-3
aathyukthas.ai20001
No ratings yet
Cluster Analysis: Basic Concepts Partitioning Methods Hierarchical Methods Density-Based Methods Grid-Based Methods Evaluation of Clustering
Document53 pages
Cluster Analysis: Basic Concepts Partitioning Methods Hierarchical Methods Density-Based Methods Grid-Based Methods Evaluation of Clustering
Maha Lakshmi
No ratings yet
Cluster Analysis Clustering
Document6 pages
Cluster Analysis Clustering
17CSE97- VIKASHINI TP
No ratings yet
Day 6
Document8 pages
Day 6
krypton
No ratings yet
Concepts and Techniques: Data Mining
Document101 pages
Concepts and Techniques: Data Mining
Jiyual Musti
No ratings yet
Unit IV Cluster Analysis
Document7 pages
Unit IV Cluster Analysis
Ajit Raut
No ratings yet
A Famous Example of Cluster Analysis
Document5 pages
A Famous Example of Cluster Analysis
Vinit Shah
No ratings yet
Unsupervisd Learning Algorithm
Document6 pages
Unsupervisd Learning Algorithm
Shrey Dixit
No ratings yet
Slides - Clustering
Document13 pages
Slides - Clustering
mofolukeakintayojo
No ratings yet
Clustering
Document7 pages
Clustering
Deepak Varma
No ratings yet
Concepts and Techniques: Data Mining
Document101 pages
Concepts and Techniques: Data Mining
Rizky Ramadhan
No ratings yet
Clustering - The Data Ensemble
Document4 pages
Clustering - The Data Ensemble
Daniel N Sherine Foo
No ratings yet
Concepts and Techniques: Data Mining
Document27 pages
Concepts and Techniques: Data Mining
AdamZain788
No ratings yet
CS276A Text Retrieval and Mining
Document48 pages
CS276A Text Retrieval and Mining
Panku Rangaree
No ratings yet
Concepts and Techniques: Data Mining
Document50 pages
Concepts and Techniques: Data Mining
Hasibur Rahman Porag
No ratings yet
Text Clustering
Document47 pages
Text Clustering
Alex Ciocan
No ratings yet
4 Clustering
Document9 pages
4 Clustering
Bibek Neupane
No ratings yet
M4 - Clustering
Document43 pages
M4 - Clustering
Javada Javada
No ratings yet
Clustering Techniques - Hierarchical, K-Means Clustering
Document22 pages
Clustering Techniques - Hierarchical, K-Means Clustering
Tanya Sharma
No ratings yet
DMBI5
Document9 pages
DMBI5
Shubham Jha
No ratings yet
Data Mining Clustering
Document76 pages
Data Mining Clustering
Anjali Asha Jacob
No ratings yet
Clustering Algorithm: An Unsupervised Learning Approach
Document23 pages
Clustering Algorithm: An Unsupervised Learning Approach
SyedDabeerAli
No ratings yet
Module-5-Cluster Analysis-Part1
Document24 pages
Module-5-Cluster Analysis-Part1
Shrimohan Tripathi
No ratings yet
Clustering: ISOM3360 Data Mining For Business Analytics
Document28 pages
Clustering: ISOM3360 Data Mining For Business Analytics
Claire Lee
No ratings yet
Week-9-Part-2 Agglomerative Clustering
Document40 pages
Week-9-Part-2 Agglomerative Clustering
Michael Zewdie
No ratings yet
Clustering
Document104 pages
Clustering
Dev kartik Agarwal
No ratings yet
Clustering: K-Means, Agglomerative, DBSCAN: Tan, Steinbach, Kumar
Document45 pages
Clustering: K-Means, Agglomerative, DBSCAN: Tan, Steinbach, Kumar
hub23
No ratings yet
Module5 QB 1
Document21 pages
Module5 QB 1
Vaishnavi G . Rao
No ratings yet
Task 10: K-Mean Clustering
Document18 pages
Task 10: K-Mean Clustering
Pawan Singh
No ratings yet
Agglomerative Clustering
Document6 pages
Agglomerative Clustering
jenna.amber000
No ratings yet
Market Segmentation - Cluster Analysis
Document18 pages
Market Segmentation - Cluster Analysis
RAJARSHI VIKRAM
No ratings yet
K Nearest Neighbor Algorithm: Fundamentals and Applications
From Everand
K Nearest Neighbor Algorithm: Fundamentals and Applications
Fouad Sabry
No ratings yet
Competitive Learning: Fundamentals and Applications for Reinforcement Learning through Competition
From Everand
Competitive Learning: Fundamentals and Applications for Reinforcement Learning through Competition
Fouad Sabry
No ratings yet
Lecture 08 - Penalty and Augmented Lagrangian Methods
Document7 pages
Lecture 08 - Penalty and Augmented Lagrangian Methods
Fanta Camara
No ratings yet
Writing in The Computer Science Field
Document6 pages
Writing in The Computer Science Field
gbland
100% (2)
MGT 208 Exercise 2 (Decision Analysis)
Document5 pages
MGT 208 Exercise 2 (Decision Analysis)
Christine Ramil Eugenio
No ratings yet
15.053 Tuesday, May 1: Branch and Bound
Document73 pages
15.053 Tuesday, May 1: Branch and Bound
Ehsan Spencer
No ratings yet
Cell Phone Based DTMF Controlled Garage Door Opening System
Document2 pages
Cell Phone Based DTMF Controlled Garage Door Opening System
Shiekh Aasif
No ratings yet
1-1 Numerical Methods Inttroduction
Document20 pages
1-1 Numerical Methods Inttroduction
wasif
No ratings yet
A Neural Networks Approach For Portfolio
Document66 pages
A Neural Networks Approach For Portfolio
bruno23ster
No ratings yet
The Bisection Method
Document10 pages
The Bisection Method
Rahat Hossain
No ratings yet
CMR - Transportation Problem
Document14 pages
CMR - Transportation Problem
qry01327
No ratings yet
CENGR 3140:: Numerical Solutions To Ce Problems
Document21 pages
CENGR 3140:: Numerical Solutions To Ce Problems
Bry Ramos
No ratings yet
CSE 211: Deterministic Finite Automaton: Md. Shaifur Rahman
Document27 pages
CSE 211: Deterministic Finite Automaton: Md. Shaifur Rahman
Arannya Monzur
No ratings yet
Linear Program - Original Data Linear Program - Original Data
Document3 pages
Linear Program - Original Data Linear Program - Original Data
Pranav
No ratings yet
Lecture1 ESO208 28july 2022
Document10 pages
Lecture1 ESO208 28july 2022
rohit kumar
No ratings yet
The QR Method For Finding Eigenvalues: Text Reference: Section 6.4, P. 400
Document4 pages
The QR Method For Finding Eigenvalues: Text Reference: Section 6.4, P. 400
sadasv
No ratings yet
JSFC - Volume 10 - Issue 1 - FGHBC
Document42 pages
JSFC - Volume 10 - Issue 1 - FGHBC
Hazar Jojo
No ratings yet
PLA Explanation
Document19 pages
PLA Explanation
Donald Bennett
No ratings yet
Calculating Settlement For Irregularly S
Document7 pages
Calculating Settlement For Irregularly S
Abdelmoez Elgarf
No ratings yet
Class Test Polynomials-X
Document2 pages
Class Test Polynomials-X
Govind Ahuja
No ratings yet
MTH - 209 Lecture Note
Document48 pages
MTH - 209 Lecture Note
Tochukwupa Preize
0% (1)
2 Marks Questions
Document5 pages
2 Marks Questions
Kritika Gupta
No ratings yet
Unbalanced Assignment Problem PDF
Document5 pages
Unbalanced Assignment Problem PDF
asiehem
100% (1)
A Presentation On "Deep Neural Network" Nikhil Sunil Patil
Document9 pages
A Presentation On "Deep Neural Network" Nikhil Sunil Patil
Nikhil Patil
No ratings yet
B Spline C Implementation
Document24 pages
B Spline C Implementation
Snehal
No ratings yet
Assignment 1
Document2 pages
Assignment 1
Zobia Kainat
No ratings yet
CH6605 Process Instrumentation, Dynamics and Control: Routh-Hurwitz Stability Criterion
Document17 pages
CH6605 Process Instrumentation, Dynamics and Control: Routh-Hurwitz Stability Criterion
Royalcar Accessories
No ratings yet
LAB#08: Implementation of Code of Bisection Method and Regula-Falsi Method For Solution of Transcendental Equations in MATLAB
Document10 pages
LAB#08: Implementation of Code of Bisection Method and Regula-Falsi Method For Solution of Transcendental Equations in MATLAB
Asad Saeed
No ratings yet
Finite Difference
Document11 pages
Finite Difference
Alex Iskandar
No ratings yet
Dit FFT
Document18 pages
Dit FFT
guptavikas_1051
100% (1)
LMS - Linear Programming (Simplex Method) ACC 421
Document90 pages
LMS - Linear Programming (Simplex Method) ACC 421
andrea win
No ratings yet