Download as pptx, pdf, or txt
Download as pptx, pdf, or txt
You are on page 1of 19

DISTANCE BASED MODELS

Topics
• Meaning of distance based model

• Distance Computation methods

• Concepts of different clustering algorithms

• Application of clustering algorithms to solve


problems
ALGEBRIC MODEL
• In Algebric models, features could be described as
points in two dimensions (x- and y-axis) or a
three- dimensional space (x, y, and z).

• when features are not intrinsically geometric, they


could be modelled in a geometric manner

• Example: temperature as a function of time can


be modelled in two axes.
Cont…
• In algebraic models, there are two ways we could
impose similarity.
1. Linear models: Use geometric concepts like lines or
planes to segment (classify) the instance space.

2. Distance-based models: Use the geometric notion of


distance to represent similarity. In this case, if two
points are close together, they have similar values
for features and thus can be classed as similar.
Distance Calculation Methods
• Euclidean distance

• Manhattan distance:
NEIGHBOURS AND EXEMPLARS
• In the Distance based models, distance is applied
through the concept of neighbours and exemplars.
• Neighbours are points in proximity with respect to the
distance measure expressed through exemplars.
• Exemplars are either centroids that find a centre of
mass according to a chosen distance metric or medoids
that find the most centrally located data point.
• The most commonly used centroid is the arithmetic
mean, which minimises squared Euclidean distance to
all other points.
Cont…
• The centroid represents the geometric centre of a
plane figure, i.e., the arithmetic mean position of all the
points in the figure from the centroid point.
• This definition extends to any object in n-dimensional
space: its centroid is the mean position of all the points.
• Medoids are similar in concept to means or centroids.
Medoids are most commonly used on data when a
mean or centroid cannot be defined. They are used in
contexts where the centroid is not representative of the
dataset, such as in image data.
NEAREST NEIGHBOURS
CLASSIFICATION
K-Nearest Neighbour
• K-NN algorithm assumes the similarity between the new
case/data and available cases and put the new case into
the category that is most similar to the available categories.
• K-NN algorithm stores all the available data and classifies a
new data point based on the similarity.
• K-nearest neighbors (KNN) algorithm is a type of
supervised ML algorithm which can be used for both
classification as well as regression predictive problems.
• it is mainly used for classification predictive problems in
industry.
Cont…
• The following two properties would define KNN well

1. Lazy learning algorithm: KNN is a lazy learning
algorithm because it does not have a specialized
training phase and uses all the data for training
while classification.
2. Non-parametric learning algorithm: KNN is also a
non-parametric learning algorithm because it
doesn’t assume anything about the underlying data.
Cont…
Working of KNN Algorithm:
Exapmle
K- MEANS CLUSTERING
• unsupervised learning algorithms.
• Given data set is classified assuming some prior
number of clusters
• In k- means clustering for each cluster one
centroid is defined.
• Total there are k centroids.
• The centroids should be defined in a tricky way
because result differs based on the location of
centroids
Cont…
• To get the better results we need to place the
centroids far away from each other as much as
possible
• each point from the given data set is stored in a
group with closest centroid.
• This process is repeated for all the points.
• The first step is finished when all points are
grouped.
• In the next step new k centroids are calculated
Cont…
• After finding these new k centroids, a new
grouping is done for the data points and closest
new centroids.
• This process is done iteratively.
• The process is repeated unless and until no data
point moves from one group to another
• The aim of this algorithm is to minimize an
objective function such as sum of a squared error
function.
• The objective function is defined as follows:

• Here shows the selected distance


measure between a data point x ji and the
cluster centre Cj.
• It is a representation of the distance of the n
data points from their respective cluster centers
Working of K-means Algorithm:
Examples ofK-means Algorithm:

You might also like