Welcome to Scribd!

Skip carousel

Chapter 2

Uploaded by

Anto

0% found this document useful (0 votes)

38 views39 pages

Original Title

chapter2 (5)

Copyright

Available Formats

PDF, TXT or read online from Scribd

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Report this Document

Copyright:

Available Formats

Download as PDF, TXT or read online from Scribd

Flag for inappropriate content

Download as pdf or txt

0% found this document useful (0 votes)

38 views39 pages

Chapter 2

Uploaded by

Anto

Copyright:

Available Formats

Download as PDF, TXT or read online from Scribd

Flag for inappropriate content

Download as pdf or txt

Jump to Page

You are on page 1of 39

Search inside document

Introduction to

hierarchical
clustering
UN SUP ERVISED L EARN IN G IN R

Hank Roark
Senior Data Scientist at Boeing
Hierarchical clustering
Number of clusters is not known ahead of time

Two kinds: bo om-up and top-down, this course bo om-up

UNSUPERVISED LEARNING IN R
Simple example

UNSUPERVISED LEARNING IN R
Five clusters

UNSUPERVISED LEARNING IN R
Four clusters

UNSUPERVISED LEARNING IN R
Three clusters

UNSUPERVISED LEARNING IN R
Two clusters

UNSUPERVISED LEARNING IN R
One cluster

UNSUPERVISED LEARNING IN R
Hierarchical clustering in R
# Calculates similarity as Euclidean distance
# between observations
dist_matrix <- dist(x)
# Returns hierarchical clustering model
hclust(d = dist_matrix)

Call:
hclust(d = s)

Cluster method : complete

Distance : euclidean
Number of objects: 50

UNSUPERVISED LEARNING IN R
Let's practice!
UN SUP ERVISED L EARN IN G IN R
Selecting number of
clusters
UN SUP ERVISED L EARN IN G IN R

Hank Roark
Senior Data Scientist at Boeing
Interpreting results
# Create hierarchical cluster model: hclust.out
hclust.out <- hclust(dist(x))
# Inspect the result
summary(hclust.out)

Length Class Mode

merge 98 -none- numeric
height 49 -none- numeric
order 50 -none- numeric
labels 0 -none- NULL
method 1 -none- character
call 2 -none- call
dist.method 1 -none- character

UNSUPERVISED LEARNING IN R
Dendrogram
Tree shaped structure used to interpret hierarchical clustering models

UNSUPERVISED LEARNING IN R
Dendrogram plotting in R
# Draws a dendrogram
plot(hclust.out)
abline(h = 6, col = "red")

UNSUPERVISED LEARNING IN R
Tree "cutting" in R
# Cut by height h
cutree(hclust.out, h = 6)

1 1 1 1 1 1 1 1 2 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 3 3 3
3 3 3 3 3 3 3 4 4 4 4 4 4 4 4 4 4 2 4 2 4 4

# Cut by number of clusters k

cutree(hclust.out, k = 2)

1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 2 2 2
2 2 2 2 2 2 2 1 1 1 1 1 1 1 1 1 2 1 1 1 1 1

UNSUPERVISED LEARNING IN R
Let's practice!
UN SUP ERVISED L EARN IN G IN R
Clustering linkage
and practical
matters
UN SUP ERVISED L EARN IN G IN R

Hank Roark
Senior Data Scientist at Boeing
Linking clusters in hierarchical clustering
How is distance between clusters determined? Rules?

Four methods to determine which cluster should be linked

Complete: pairwise similarity between all observations in cluster 1 and cluster 2, and uses
largest of similarities

Single: same as above but uses smallest of similarities

Average: same as above but uses average of similarities

Centroid: nds centroid of cluster 1 and centroid of cluster 2, and uses similarity between
two centroids

UNSUPERVISED LEARNING IN R
Linking methods: complete and average

UNSUPERVISED LEARNING IN R
Linking method: single

UNSUPERVISED LEARNING IN R
Linking method: centroid

UNSUPERVISED LEARNING IN R
Linkage in R
# Fitting hierarchical clustering models using different methods
hclust.complete <- hclust(d, method = "complete")
hclust.average <- hclust(d, method = "average")
hclust.single <- hclust(d, method = "single")

UNSUPERVISED LEARNING IN R
Practical matters
Data on di erent scales can cause undesirable results in clustering methods

Solution is to scale data so that features have same mean and standard deviation
Subtract mean of a feature from all observations

Divide each feature by the standard deviation of the feature

Normalized features have a mean of zero and a standard deviation of one

UNSUPERVISED LEARNING IN R
Practical matters
# Check if scaling is necessary
colMeans(x)

-0.1337828 0.0594019

apply(x, 2, sd)

1.974376 2.112357

UNSUPERVISED LEARNING IN R
Practical matters
# Produce new matrix with columns of mean of 0 and sd of 1
scaled_x <- scale(x)
colMeans(scaled_x)

2.775558e-17 3.330669e-17

apply(scaled_x, 2, sd)

1 1

UNSUPERVISED LEARNING IN R
Let's practice!
UN SUP ERVISED L EARN IN G IN R
Review of
hierarchical
clustering
UN SUP ERVISED L EARN IN G IN R

Hank Roark
Senior Data Scientist at Boeing
Hierarchical clustering review
# Fitting various hierarchical clustering models
hclust.complete <- hclust(d, method = "complete")
hclust.average <- hclust(d, method = "average")
hclust.single <- hclust(d, method = "single")

UNSUPERVISED LEARNING IN R
Linking methods: complete and average

UNSUPERVISED LEARNING IN R
Hierarchical clustering

UNSUPERVISED LEARNING IN R
Iterating

UNSUPERVISED LEARNING IN R
Dendrogram

UNSUPERVISED LEARNING IN R
How k-means and hierarchical clustering differ

UNSUPERVISED LEARNING IN R
Practical matters
# Scale the data
pokemon.scaled <- scale(pokemon)

# Create hierarchical and k-means clustering models

hclust.pokemon <- hclust(dist(pokemon.scaled), method = "complete")
km.pokemon <- kmeans(pokemon.scaled, centers = 3,
nstart = 20, iter.max = 50)

# Compare results of the models

cut.pokemon <- cutree(hclust.pokemon, k = 3)
table(km.pokemon$cluster, cut.pokemon)
cut.pokemon

1 2 3
1 242 1 0
2 342 1 0
3 204 9 1

UNSUPERVISED LEARNING IN R
Let's practice!
UN SUP ERVISED L EARN IN G IN R

Planning Your Microsoft Dynamics GP Upgrade: Brian Murphy & Lance Brigham
Document24 pages
Planning Your Microsoft Dynamics GP Upgrade: Brian Murphy & Lance Brigham
BAY THEJACK
No ratings yet
Painless Pre-Algebra
From Everand
Painless Pre-Algebra
Amy Stahl
Rating: 3 out of 5 stars
3/5 (2)
Ifmis Erp System: Integrated Financial Makagements System & Enterprise Resource Planning System
Document34 pages
Ifmis Erp System: Integrated Financial Makagements System & Enterprise Resource Planning System
Emmanuel Shivina Khisa
No ratings yet
Introduction To The Case Study: Hank Roark
Document25 pages
Introduction To The Case Study: Hank Roark
Anto
No ratings yet
Chapter 1
Document46 pages
Chapter 1
Anto
No ratings yet
Chapter2 Hierarchical Clustering
Document59 pages
Chapter2 Hierarchical Clustering
boogie
No ratings yet
Exp 8
Document5 pages
Exp 8
Soban Maruf
No ratings yet
Lab Manual
Document42 pages
Lab Manual
Shamilie M
No ratings yet
B22CS014 Report
Document11 pages
B22CS014 Report
b22cs014
No ratings yet
A Conceptual Version of The K-Means Algorithm: Pattern Recognition Letters
Document11 pages
A Conceptual Version of The K-Means Algorithm: Pattern Recognition Letters
Miguel Katsuyi Ayala Kanashiro
No ratings yet
Unsupervised Learning: K-Means Clustering
Document23 pages
Unsupervised Learning: K-Means Clustering
ariw200201
No ratings yet
Accelerated Data Science Introduction To Machine Learning Algorithms
Document37 pages
Accelerated Data Science Introduction To Machine Learning Algorithms
sanketjaiswal
No ratings yet
Lecture 6
Document55 pages
Lecture 6
Hassan
No ratings yet
CLOPE: A Fast and Effective Clustering Algorithm For Transactional Data
Document6 pages
CLOPE: A Fast and Effective Clustering Algorithm For Transactional Data
aegr82
No ratings yet
DAA Slides U3
Document240 pages
DAA Slides U3
chekodu.akash
No ratings yet
DB Scan
Document7 pages
DB Scan
UJJAWAL PUGALIA
No ratings yet
Lecture 7
Document64 pages
Lecture 7
Michael Zhang
No ratings yet
Lecture 7 - Integrated Analysis With R
Document79 pages
Lecture 7 - Integrated Analysis With R
Anurag Laddha
No ratings yet
ML4 Unsupervised Learning
Document60 pages
ML4 Unsupervised Learning
andesong88
No ratings yet
Chapter2 Limitations of RNN
Document29 pages
Chapter2 Limitations of RNN
MohamedChiheb BenChaabane
No ratings yet
Large Scale Deep Learning
Document170 pages
Large Scale Deep Learning
pavancreative81
No ratings yet
Introduction To Clustering: Alka Arora Sr. Scientist
Document57 pages
Introduction To Clustering: Alka Arora Sr. Scientist
zilangamba_s4535
No ratings yet
Shahapure 2020
Document2 pages
Shahapure 2020
someone4551
No ratings yet
2 - K-Mean
Document39 pages
2 - K-Mean
abdala sabry
No ratings yet
Survey of Clustering Algorithms
Document37 pages
Survey of Clustering Algorithms
Aniket Roy
No ratings yet
Summary - MachineLearning (Part 2)
Document19 pages
Summary - MachineLearning (Part 2)
aril dan
No ratings yet
Lec10 Clustering
Document19 pages
Lec10 Clustering
mobinakohesta
No ratings yet
Hierarchical Clustering Unit 4 ML
Document14 pages
Hierarchical Clustering Unit 4 ML
Smriti Sharma
No ratings yet
K Means
Document36 pages
K Means
Saurabh Mishra
No ratings yet
R - Packages With Applications From Complete and Censored Samples
Document43 pages
R - Packages With Applications From Complete and Censored Samples
Liban Ali Mohamud
No ratings yet
Learning With Hadoop Based Data Mining: - A Case Study On Mapreduce
Document38 pages
Learning With Hadoop Based Data Mining: - A Case Study On Mapreduce
Wan Na Prommanop
No ratings yet
CSE 326: Data Structures Lecture #21 Multidimensional Search Trees
Document42 pages
CSE 326: Data Structures Lecture #21 Multidimensional Search Trees
Thaddeus Moore
No ratings yet
Clustering - The Data Ensemble
Document4 pages
Clustering - The Data Ensemble
Daniel N Sherine Foo
No ratings yet
Chapter 8 - Clustering
Document42 pages
Chapter 8 - Clustering
FakhrulShahrilEzanie
No ratings yet
R Basic
Document16 pages
R Basic
Taslima Chowdhury
No ratings yet
Lab3 Dsa W23
Document13 pages
Lab3 Dsa W23
Rahhim Adeem
No ratings yet
Enhanced Db-Scan Algorithm
Document5 pages
Enhanced Db-Scan Algorithm
seventhsensegroup
No ratings yet
ML Unit 3
Document83 pages
ML Unit 3
sanju.25qt
No ratings yet
c5 RoughSet
Document67 pages
c5 RoughSet
Syarifah Nazirah
No ratings yet
Fast Polynomial Evaluation and Composition: To Cite This Version
Document5 pages
Fast Polynomial Evaluation and Composition: To Cite This Version
thomasdaquin
No ratings yet
T20 Cricket Score Prediction.
Document17 pages
T20 Cricket Score Prediction.
Avinash
No ratings yet
Maths
Document488 pages
Maths
samsung.galaxy.tab.345c
No ratings yet
Solution of Data Structure ST-2 PDF
Document21 pages
Solution of Data Structure ST-2 PDF
harshit
No ratings yet
Assignment 2 With Program
Document8 pages
Assignment 2 With Program
Palash Saroware
No ratings yet
Lecture 3
Document17 pages
Lecture 3
Mohit Garg
No ratings yet
Combining Recurrent, Convolutional, and Continuous-Time Models With Linear State-Space Layers
Document46 pages
Combining Recurrent, Convolutional, and Continuous-Time Models With Linear State-Space Layers
be.arnab96
No ratings yet
3 UnSupervised Learning
Document53 pages
3 UnSupervised Learning
Zaeem Abbas
No ratings yet
MMW-MODULE6-Statistics Part 3 PDF
Document4 pages
MMW-MODULE6-Statistics Part 3 PDF
Jan Mark Castillo
No ratings yet
Repetitive DNA Detection and Classification: Vijay Krishnan Masters Student Computer Science Department
Document33 pages
Repetitive DNA Detection and Classification: Vijay Krishnan Masters Student Computer Science Department
Prabin Kumar
No ratings yet
Clustering
Document36 pages
Clustering
Arul Kumar Venugopal
No ratings yet
Learning From Data HW 6
Document5 pages
Learning From Data HW 6
Pratik Sachdeva
No ratings yet
Visualization
Document19 pages
Visualization
Deva Hema D
No ratings yet
Fds Answers
Document53 pages
Fds Answers
saranyatvcet
No ratings yet
Introduction-To-Ml-Part-3 Edited
Document73 pages
Introduction-To-Ml-Part-3 Edited
Hana Alhomrani
No ratings yet
5 - 1 - Course Notes Cluster Analysis
Document7 pages
5 - 1 - Course Notes Cluster Analysis
uusepp
No ratings yet
Measures of Relative Motion
Document20 pages
Measures of Relative Motion
Bam Bam
0% (1)
Lecture Week 2 KNN and Model Evaluation PDF
Document53 pages
Lecture Week 2 KNN and Model Evaluation PDF
Hoàng Phạm
100% (1)
Measures of Position PartT 2
Document21 pages
Measures of Position PartT 2
Jessele suarez
No ratings yet
Clustering Techniques - Utkarsh Kulshrestha
Document25 pages
Clustering Techniques - Utkarsh Kulshrestha
N Mahesh
No ratings yet
Agnes
Document25 pages
Agnes
Dyah Septi Andryani
No ratings yet
Chapter 3: Data Preprocessing
Document15 pages
Chapter 3: Data Preprocessing
Asma Batool Naqvi
No ratings yet
Daa Unit II
Document21 pages
Daa Unit II
Nikhil Yasalapu
No ratings yet
Chapter 1
Document25 pages
Chapter 1
Anto
No ratings yet
Chapter 3
Document31 pages
Chapter 3
Anto
No ratings yet
Introduction To The Case Study: Hank Roark
Document25 pages
Introduction To The Case Study: Hank Roark
Anto
No ratings yet
Chapter 1
Document54 pages
Chapter 1
Anto
No ratings yet
Tabel Swasta Apr19 Value
Document215 pages
Tabel Swasta Apr19 Value
Anto
No ratings yet
Cavalry Brochure v1.01
Document2 pages
Cavalry Brochure v1.01
Anto
No ratings yet
Serial Number SMADAV 9.8.1
Document1 page
Serial Number SMADAV 9.8.1
Anto
No ratings yet
Abp
Document1,029 pages
Abp
sarmad
No ratings yet
Course Learning Objectives:: S.No. Course Outcomes Blooms Taxonomy CO1 CO2 CO3
Document29 pages
Course Learning Objectives:: S.No. Course Outcomes Blooms Taxonomy CO1 CO2 CO3
atul211988
No ratings yet
Common Causes and Solutions On ORA-376
Document7 pages
Common Causes and Solutions On ORA-376
elcaso34
No ratings yet
Tso Ispf JCL Vsam Cobol DB2 Cics
Document5 pages
Tso Ispf JCL Vsam Cobol DB2 Cics
ishanlumba
No ratings yet
TYBSc IT Syllabus-1
Document92 pages
TYBSc IT Syllabus-1
Sachin Bhostekar
No ratings yet
SESI 1 2021/2022: Jabatan Teknologi Maklumat Dan Komunikasi Politeknik Ungku Omar
Document13 pages
SESI 1 2021/2022: Jabatan Teknologi Maklumat Dan Komunikasi Politeknik Ungku Omar
Thayalan
No ratings yet
Course Content - Dell Boomi
Document7 pages
Course Content - Dell Boomi
Srikanth Reddy
No ratings yet
PMP Technical Description
Document53 pages
PMP Technical Description
Amarsaikhan Amgalan
No ratings yet
MongoDB Manual Master
Document499 pages
MongoDB Manual Master
vitalitymobile
No ratings yet
LiteSpeed For SQL Server - Configure Log Shipping Guide PDF
Document49 pages
LiteSpeed For SQL Server - Configure Log Shipping Guide PDF
Nestor Depablos
No ratings yet
Python Essentials For AWS Cloud Developers Run and Deploy Cloud-Based Python Applications Using AWS PDF
Document224 pages
Python Essentials For AWS Cloud Developers Run and Deploy Cloud-Based Python Applications Using AWS PDF
Géza Nagy
No ratings yet
Database Design:: Visual Basic
Document5 pages
Database Design:: Visual Basic
Nockie Hipolito Rivera
100% (2)
135+ (FREQUENTLY ASKED) - Ab Initio Interview Questions & Answers
Document19 pages
135+ (FREQUENTLY ASKED) - Ab Initio Interview Questions & Answers
Amit Jankar
No ratings yet
Todo Application (1 Week) : 2.1 Required Tasks
Document3 pages
Todo Application (1 Week) : 2.1 Required Tasks
Жаклин Георгиева
No ratings yet
Internal Tables PDF
Document11 pages
Internal Tables PDF
Chalapathi
No ratings yet
Apps Check v02
Document296 pages
Apps Check v02
mahfuz
No ratings yet
XR APP Basic Overview
Document2 pages
XR APP Basic Overview
Mahesh Kumar
No ratings yet
SQL Preparation
Document4 pages
SQL Preparation
bhanu prakash
No ratings yet
DBAS6211MM
Document173 pages
DBAS6211MM
Fathima
No ratings yet
Microsoft DP-900 Free Practice Exam & Test Training - ITExams
Document16 pages
Microsoft DP-900 Free Practice Exam & Test Training - ITExams
timpu33
No ratings yet
TQC Installation Maintenance Manual - Rev 7 - 09
Document25 pages
TQC Installation Maintenance Manual - Rev 7 - 09
Bello
100% (1)
Core Libraries For Machine Learning
Document5 pages
Core Libraries For Machine Learning
Nandkumar Khachane
No ratings yet
Settings: Files/Opencv Directory. Now, We Have To Configure Devcpp That He Can Take
Document5 pages
Settings: Files/Opencv Directory. Now, We Have To Configure Devcpp That He Can Take
Lesocrate
No ratings yet
Sentinel Visualizer 6 User Guide
Document227 pages
Sentinel Visualizer 6 User Guide
Tania
No ratings yet
Chat GPT
Document4 pages
Chat GPT
saurabh_k2009
No ratings yet
Preface 1 Data Handling in Files: VIII 1
Document179 pages
Preface 1 Data Handling in Files: VIII 1
Hamza Atif
No ratings yet
1) How Many Types of Files Are There in A SQL Server Database?
Document16 pages
1) How Many Types of Files Are There in A SQL Server Database?
Janardhan kengar
No ratings yet
Introduction To Cassandra
Document47 pages
Introduction To Cassandra
Ponnusamy S Pichaimuthu
No ratings yet
A Comparative Study of K-Means, DBSCAN and OPTICS
Document6 pages
A Comparative Study of K-Means, DBSCAN and OPTICS
kripsi ilyas
No ratings yet