TSIM - Clustering

Máster Universitario en Ingeniería Biomédica

Advanced Methods In Medical Signals and Images

Segmentation of Medical Images

Juan Ortuño Fisac

Biomedical Imaging Technology (BIT) www.die.upm.es/im

Departamento de Ingeniería Electrónica, ETSIT-UPM

TSIM (2022-2023) - Segmentation

of medical images

• Outline
– Segmentation of medical images
– Classical methods
– Morphological methods
– 1st exercise, image segmentation
– Active contours
– Graph-based methods
– Active shape models (ASM)
– Machine learning: Clustering
– 2nd exercise, clustering segmentation

Machine learning

Machine learning

Supervised learning
– There is a set of labeled training data used as a reference
– Fixed number of classes
• Classification
Identifying to which of a set of classes a new observation belongs
• Regression
Predicts continuous dependent variable from a number of
independent variables
Unsupervised learning
– Does not exist any labeled training set
– Number of classes can be fixed or automatically chosen by the
In machine learning, clustering is a type of unsupervised learning
(Supervised clustering = Classification + regression)

Machine learning

Machine learning

What is clustering?

What is clustering?

• Clustering:
– The process of grouping a set into clusters of similar
• Objects within a cluster should be similar
• Objects from different clusters should be dissimilar

– The most common form of unsupervised learning

• Unsupervised learning
– Learning from raw data, as opposed to supervised data where
a classification of examples is given
– We want to explore the data to find some intrinsic structures
in them

Hierarchical Clustering


• Partitional clustering
Find all the clusters simultaneously as a partition of the data and do not
impose a hierarchical structure.
• Hierarchical clustering
Is a set of nested clusters that are organized as a tree

Hierarchical Clustering

Hierarchical Clustering

Dendogram: Is a diagram that shows the hierarchical

relationship between objects

Hierarchical Clustering

Hierarchical Clustering

• Gene analysis

DOI: 10.1098/rstb.2012.0474
DOI: 10.1098/rstb.2012.0474
Partitional Clustering

A Spatial Clustering Technique for the Identification of Customizable Ecoregions

Partitional Clustering

Clustering in Image segmentation:

• Grouping pixels into clusters (segments)
• Non coherent regions. Needing post-process
– Morphological process
– Markov Random Fields (MRF)

E.g.: Histogram segmentation as clustering

Clustering in Image segmentation:

Clustering in image segmentation

• Clusters of pixels
• Clusters of atomic regions (superpixels)
From a previous over-segmentation

Watershed T.

Clustering of pixels Clustering of regions

Clustering in image segmentation

Clustering in image segmentation

• Based on a definition of distance to clusters and a set of

characteristics as vector of parameters
• Two-step process:
– 1) Election of vector of parameters
– 2) Automatic cluster-finding strategy (grouping)
• Parameters (in image segmentation problems):
– Gray level
– N Gray levels in multimodal images
– Local characteristics (variance, gradient, etc…)
– Texture
– Etc..

Clustering in image segmentation

Stages in clustering

Optimal clustering method:

Iterations to update the number of clusters, the
type of distance and the type of parameters

A. K. Jain, M. N. Murty, and P. J. Flynn, “Data clustering: A review”, ACM

Computing Surveys, vol. 31, no. 3, pp. 264-323, Sep, 1999.

TSIM (2022-2023) - Segmentation of medical images 13


Xu, R., & Wunsch, D. C. (2010). Clustering algorithms in biomedical research: a review.
IEEE Reviews in Biomedical Engineering, 3, 120-154.

The importance of choosing the type of distance…


The importance of choosing the type of distance…

Scatter plot

Scatter plot

Scatter plot:
Schematic representation of the vectors of parameters

Scatter plot

Scatter plot

• Scatter plot: schematic representation for 2D vectors of params.

– Example: bimodal imaging, two gray level values
T2 weighted
T1 weighted

Scatter plot

Scatter plot

• Scatter plot: schematic representation for 2D vectors of params.

– Example: bimodal imaging, two gray level values

PD-MRI T2-weighted MRI

(proton density)

Cluster (non spatially connected region)

Scatter plot

Scatter plot

• 3D vectors of parameters

TSIM (2022-2023) - Segmentation of medical images 19


• Clustering algorithms:
– K-Nearest Neighbor (K-NN)
– K-means
– Fuzzy C-Means (FCM)
– Mean shift
– Expectation-Maximization (EM)
– Spectral clustering

A. K. Jain, M. N. Murty, and P. J. Flynn, “Data clustering: A review”, ACM

Computing Surveys, vol. 31, no. 3, pp. 264-323, Sep, 1999.

• K-Nearest neighbors (K-NN)


• K-Nearest neighbors (K-NN)

– The pixel is assigned to the class most common amongst its k nearest
– It is a supervised algorithm (training set)

• The effect of K


• The effect of K
– Larger K produces smoother boundaries and reduces the class label
– But when k is too large, always predict the majority class/cluster

High variance (Overfitting) High bias (Underfitting)

• K-NN in medical imaging


• K-NN in medical imaging

Priors: CSF , GM and WM

3D-FLAIR 3D-T1 Manual segmentation K-NN probability map

Steenwijk MD et al. , Accurate white matter lesion segmentation by k nearest neighbor classification with
tissue type priors, Neuroimage: Clinical. (2013):462:469.

• K-NN in medical imaging


• K-NN in medical imaging



Vrooman HA, et al. Multi-spectral brain tissue segmentation using automatically trained k-Nearest-
Neighbor classification. Neuroimage 2007, 37(1):71-81.

TSIM (2022-2023) - Segmentation of medical images 24


• K-means clustering
– Unsupervised algorithm
– Minimize the within-cluster sum of squares of distances
– Separates data into Voronoi-cells, assumes equal-sized K

K  2
argmin    xi − uk 
C k = 0  iC k 

• K clusters
• uk mean value of cluster Ck

• K-means:


• K-means:
– NP-hard (non-deterministic polynomial-time)
– Common approach: Search for local minima
– Heuristic solution: Lloyd's algorithm (k-means iterative algorithm)

k-means iterative algorithm:

1. Initial set of mean values uk
2. Pixels xi assigned to the cluster
with minimum distance
3. Calculation of new means uk
4. Repeat (2-3) until total
distance is not reduced

• K-means (iterative method, Lloyds algorithm)


• K-means (iterative method, Lloyds algorithm)

– Heuristic (Depends on initial seeds)

– Can stop in a local minimum
– In one-dimensional vectors, is equivalent to automatic
histogram thresholding

The K-means clustering presents problems with:


The K-means clustering presents problems with:

Different sizes and densities
Non-hyper-spherical regions

K-means and fuzzy C-means

K-means and fuzzy C-means

• K-means
K  2 • K clusters
argmin    xi − uk 
C k =1  iC k  • uk mean value of cluster Ck

• Fuzzy C-means (FCM)

• Each pixel has a degree of membership to each cluster (fuzzy logic)
rather than belonging completely to just one

 N m 2
argmin    wik xi − uk , 1  m   w ik = 1 i 1,..., N
C k =1  i =1  k =1

• wik weight of xi in cluster Ck

• tells the degree to which element xi belongs to cluster Ck

Fuzzy C-means

Fuzzy C-means

• Fuzzy C-means N

w m
ik i
uk = i =1

argmin    wikm xi − uk , 1  m  

C k =1  i =1   ik
w m

i =1

Fuzzy C-means iterative algorithm: wik = 2
1. Initialize the weighted matrix W(n)=[wik] K  xi − uk  m −1

2. Calculate the centroid vector U(n)= [uk]  

 xi − u j

j =1
 
3. Update W(n+1) with new means U(n)
4. If det(W(n+1)-W(n))< s , stop, otherwise return (2)
✓ m: level of ‘fuzziness’

Performance depends on initial centroids: For a robust approach:

• Clever algorithm to determine initial centroids
• Run FCM several times each starting with different initial centroids
• In the limit m=1, w converges to 0 or 1, which implies a crisp partitioning
K-means in medical imaging
K-means in medical imaging

• Clustering in Medical imaging

– Classification of cerebral tissues (GM, WM, CSF)

MRI, T1 weighted k-means k-means post-processed with MRF

(Markov random fields)

D. L. Pham et al. ,“Current methods in medical image segmentation”, Annual

Review of Biomedical Engineering (2), 2000.

C-means in medical imaging

C-means in medical imaging

• segmentation into grey matter and white matter, using a fuzzy C‐means

Grey matter White matter

Miller D H et al. Brain 2002;125:1676-1695

Problems with k-means

Problems with k-means

• Example where K-means and Fuzzy C-means fails

K-means result

Parametric models

Parametric models

• Parametric models
– Data are a mixture of probability distributions
– Each cluster is a model of a probability distribution, characterized by
a set of parameters
– Task: estimate the parameters, fitting the parametric model

Parametric models

Parametric models

• Parametric models
– Model parameters of statistical distributions
• Density estimator (in contrast to centroid estimators, i.e., k-means, fuzzy
– Example: Gaussian Mixture Models (GMM)
• 1-dimensional case: mean and variance parameters

p ( x ) =  k p ( x |k )
k =1

1  1 2
p ( x |uk , k ) = exp  − 2 ( x − k ) 
 k 2  2 k 

Parametric models

Parametric models

• Gaussian Mixture Models

– N-dimensional case
• Covariance matrix, mean vector
p ( x ) =  k p ( x |k )
1  1 
p ( x |uk ,  k ) =  ( k) k ( k )
T −1
n 1
exp − x − μ  x - μ
k =1
(2 ) 2 k 2  2 

2-dimensional Gaussian

Parametric models

Parametric models

• Gaussian Mixture Models

 bN ( μ b ,  b )

 a N ( μ a , a )

Parametric models

Parametric models

• Gaussian Mixture Models

– Bayesian estimation

Example of prior probability maps (from atlas)

– Solved with EM algorithm (expectation-maximization)
• EM clustering

Conditional probability A priori probability

i pixel prob., given cluster Ck (Cluster Ck prob.)

P ( xi |C k ) P ( C k )
Bayes Theorem: P ( C k | xi ) =
P ( xi )

A posteriori probability (evidence) Normalizer

(Cluster Ck probability, given i pixel)

Parametric models

Parametric models

• Data must fit the model

– In this example, two or more Gaussian distributions
• Unknowns:
k  k ,  k
– Parameters of distribution probabilities and weights
• Iterative algorithm: k
– Maximum likelihood estimation with EM algorithm

p ( xi |C k ,uk(t ) , (kt ) ) k(t )

p C k | xi ,  k(t ) = ) N-dimensional gaussian distribution
( )

 i j j j j
j =1
x |C , u (t )
,  (t )
 (t )

 p (C | x ,  ) x  p (C ) ( x − u )( x − u )  p (C )
( t +1 ) ( t +1 )
k i
(t )
k i k | xi ,  (t)
k i k i k k | xi ,  i(t )
uk (t +1)
= i
k (t +1)
= i
 (t +1)
= i

 p (C | x ,  )  p (C | x ,  )
(t ) (t) k
k i k k i k
i i

Parametric models

Parametric models

• EM algorithm to solve a mixture of Gaussians model

Parametric models

Parametric models

• Mixture of gaussians in medical imaging

Mean shift

Mean shift

• Mean shift clustering

– Find local maxima of the probability density
– Non-parametric technique
– Does not constrain the shape of the clusters
– Does not require to known the number of clusters
1. Select a kernel/window K
2. u(n), mean of xi in the kernel
3. Let u(n+1) the new center of the window K
4. Iterate until convergence
i i
x K ( x − u ( n)
K ( xi − u )=e
( n +1) − xi −u( n )
u = i ( n)

K ( x − u )
( n)

Fukunaga and Hostetler, "The Estimation of the Gradient of a Density Function, with Applications
in Pattern Recognition", IEEE Transactions on Information Theory, vol. 21 , pp 32-40 ,1975

Mean shift

Mean shift

• Mean shift
(1) (2)


Mean shift in medical imaging

Mean shift in medical imaging

• Mean shift in medical imaging

Simulated MRI Segmentation

A. Mayer and H. Greenspan, "An adaptive mean-shift framework for MRI brain
segmentation," IEEE Trans. on Medical Imaging, vol. 28, (8), pp. 1238-50, 2009.

ISODATA algorithm

ISODATA algorithm

– Iterative Self-Organizing Data Analysis Technique
– Like the k-means algorithm but allows for different number of clusters
while the k-means assumes that the number of clusters is known a
– 1) Perform k-means clustering
– 2) Split any clusters whose samples are sufficiently dissimilar
– 3) Merge any two clusters sufficiently close
– 4) Go to step #1

Spectral Clustering

Spectral Clustering

• Spectral clustering
– The goal is to cluster data that is connected but not necessarily
compact or clustered within convex boundaries
– Uses the spectrum (eigenvalues) of the affinity matrix to perform
dimensionality reduction before clustering in fewer dimensions

Related with

Compact clusters (k-means, GMM..) Connectivity

TSIM (2022-2023) - Segmentation of medical images 46

