Professional Documents
Culture Documents
Distance Based Models
Distance Based Models
Topics
• Meaning of distance based model
• Manhattan distance:
NEIGHBOURS AND EXEMPLARS
• In the Distance based models, distance is applied
through the concept of neighbours and exemplars.
• Neighbours are points in proximity with respect to the
distance measure expressed through exemplars.
• Exemplars are either centroids that find a centre of
mass according to a chosen distance metric or medoids
that find the most centrally located data point.
• The most commonly used centroid is the arithmetic
mean, which minimises squared Euclidean distance to
all other points.
Cont…
• The centroid represents the geometric centre of a
plane figure, i.e., the arithmetic mean position of all the
points in the figure from the centroid point.
• This definition extends to any object in n-dimensional
space: its centroid is the mean position of all the points.
• Medoids are similar in concept to means or centroids.
Medoids are most commonly used on data when a
mean or centroid cannot be defined. They are used in
contexts where the centroid is not representative of the
dataset, such as in image data.
NEAREST NEIGHBOURS
CLASSIFICATION
K-Nearest Neighbour
• K-NN algorithm assumes the similarity between the new
case/data and available cases and put the new case into
the category that is most similar to the available categories.
• K-NN algorithm stores all the available data and classifies a
new data point based on the similarity.
• K-nearest neighbors (KNN) algorithm is a type of
supervised ML algorithm which can be used for both
classification as well as regression predictive problems.
• it is mainly used for classification predictive problems in
industry.
Cont…
• The following two properties would define KNN well
−
1. Lazy learning algorithm: KNN is a lazy learning
algorithm because it does not have a specialized
training phase and uses all the data for training
while classification.
2. Non-parametric learning algorithm: KNN is also a
non-parametric learning algorithm because it
doesn’t assume anything about the underlying data.
Cont…
Working of KNN Algorithm:
Exapmle
K- MEANS CLUSTERING
• unsupervised learning algorithms.
• Given data set is classified assuming some prior
number of clusters
• In k- means clustering for each cluster one
centroid is defined.
• Total there are k centroids.
• The centroids should be defined in a tricky way
because result differs based on the location of
centroids
Cont…
• To get the better results we need to place the
centroids far away from each other as much as
possible
• each point from the given data set is stored in a
group with closest centroid.
• This process is repeated for all the points.
• The first step is finished when all points are
grouped.
• In the next step new k centroids are calculated
Cont…
• After finding these new k centroids, a new
grouping is done for the data points and closest
new centroids.
• This process is done iteratively.
• The process is repeated unless and until no data
point moves from one group to another
• The aim of this algorithm is to minimize an
objective function such as sum of a squared error
function.
• The objective function is defined as follows: