Download as docx, pdf, or txt
Download as docx, pdf, or txt
You are on page 1of 8

Different distances used in K-NN algorithm

Below are the commonly used distance metrics –

Minkowski Distance:

When we think about distance, we usually imagine distances between cities. That is the most
intuitive understanding of the distance concept. Fortunately, this example is perfect for
explaining the constraints of Minkowski distances. We can calculate Minkowski distance only
in a normed vector space, which is a fancy way of saying: “in a space where distances can be
represented as a vector that has a length.”

Let’s start by proving that a map is a vector space. If we take a map, we see that distances
between cities are normed vector space because we can draw a vector that connects two cities
on the map. We can combine multiple vectors to create a route that connects more than two
cities. Now, the adjective “normed.” It means that the vector has its length and no vector has a
negative length. That constraint is met too because if we draw a line between cities on the
map, we can measure its length.

Again, a normed vector space is a vector space on which a norm is defined. Suppose X is a
vector space then a norm on X is a real-valued function ||x||which satisfies below conditions -

1. Zero Vector- Zero vector will have zero length. To say, If we look at a map, it is
obvious. The distance from a city to the same city is zero because we don’t need to travel
at all. The distance from a city to any other city is positive because we can’t travel -20

2. Scalar Factor- The direction of the vector doesn’t change when you multiply it with
a positive number though its length will be changed. Example: We traveled 50 km North.
If we travel 50 km more in the same direction, we will end up 100 km North. The
direction does not change.

3. Triangle Inequality- If the distance is a norm then the calculated distance between
two points will always be a straight line.
You might be wondering why do we need normed vector, can we just not go for simple
metrics? Normed vector has the above properties which help to keep the norm induced metric-
homogeneous and translation invariant.

The distance can be calculated using the below formula -

Minkowski distance is the generalized distance metric. Here generalized means that we can
manipulate the above formula to calculate the distance between two data points in different

As mentioned above, we can manipulate the value of p  and calculate the distance in three
different ways-

p = 1, Manhattan Distance

p = 2, Euclidean Distance

p = ∞, Chebychev Distance (won’t be discussed in this article)

Manhattan Distance:

We use Manhattan Distance if we need to calculate the distance between two data points in a
grid-like path. As mentioned above, we use the Minkowski distance formula to find
Manhattan distance by setting p’s value as 1.

Let’s say, we want to calculate the distance, d, between two data points- x and y.
Distance d will be calculated using an absolute sum of the difference  between its Cartesian
coordinates as below:

where, n- number of variables, xi and yi are the variables of vectors x and y respectively, in

the two-dimensional vector space. i.e. x = (x1, x2, x3…) and y = (y1, y2, y3…).

Now the distance d will be calculated as-

(x1 — y1) + (x2 — y2) + (x3 — y3) + … + (xn — yn).

If you try to visualize the distance calculation, it will look something like as below:

Manhattan distance is also known as Taxicab Geometry, City Block Distance, etc.
When we can use a map of a city, we can give direction by telling people that they should
walk/drive two city blocks North, then turn left and travel another three city blocks. In total,
they will travel five city blocks, which is the Manhattan distance between the starting point
and their destination.

L1 Norm:
Also known as Manhattan Distance or Taxicab norm. L1 norm is the sum of
the magnitudes of the vectors in space. It is the most natural way of measure
distance between vectors, that is the sum of absolute difference of the
components of the vectors. In this norm, all the components of the vector
are weighted equally.

Having, for example, the vector X = [3,4]:

The L1 norm is calculated by

As you can see in the graphic, the L1 norm is the distance you have to travel
between the origin (0,0) to the destination (3,4), in a way that resembles
how a taxicab drives between city blocks to arrive at its destination.

Euclidean Distance:

Euclidean distance is one of the most used distance metrics. It is calculated using
the Minkowski Distance formula by setting p’s value to 2. This will update the
distance d’ formula as below

Euclidean distance formula can be used to calculate the distance between two data points in a

If we look again at the city block example used to explain the Manhattan distance, we see that
the traveled path consists of two straight lines. When we draw another straight line that
connects the starting point and the destination, we end up with a triangle. In this case, the
distance between the points can be calculated using the Pythagorean theorem.

L2 norm:

It is the most popular norm, also known as the Euclidean norm. It is the shortest distance to go
from one point to another.
Using the same example, the L2 norm is calculated by

As you can see in the graphic, the L2 norm is the most direct route.

There is one consideration to take with the L2 norm, and it is that each component of the
vector is squared, and that means that the outliers have more weighting, so it can skew results.

Hamming Distance:

A Hamming distance in information technology represents the number of points at which two
corresponding pieces of data can be different. It is often used in various kinds of error
correction or evaluation of contrasting strings or pieces of data.

While it may seem complicated and obscure at first glance, the Hamming distance is a very
practical metric for measuring data strings. The Hamming distance involves counting up
which set of corresponding digits or places are different, and which are the same. For example,
take the text string “hello world” and contrast it with another text string, “herra poald.” There
are five places along the corresponding strings where the letters are different.

Why is this important? One fundamental application of Hamming distance is to correct binary
code either toward one result or another. Professionals talk about one-bit errors or two-bit
errors, the idea that corrupted data can be transformed into a correct original result. The
problem is, if there are two strings and one corrupted piece of data, one must ascertain which
final result the corrupted or third data set is closest to. That is where the Hamming distance
comes in — for example if the Hamming distance is four, and there is a one-bit error toward
one result, it is most likely that that is the correct result. This is just one of the applications that
the Hamming distance can have toward code and data string evaluation.


Suppose there are two strings 1101 1001 and 1001 1101.

11011001 ⊕ 10011101 = 01000100. Since, this contains two 1s, the Hamming distance,
d(11011001, 10011101) = 2.

Minimum Hamming Distance

In a set of strings of equal lengths, the minimum Hamming distance is the smallest Hamming
distance between all possible pairs of strings in that set.

Cosine Similarity and Cosine Distance:

There are two terms: Similarity and Distance. They are inversely proportional to each other
i.e. if one increases the other one decreases and vice-versa. the formula for which is:


Cosine similarity formula can be derived from the equation of dot products:-
Now, you must be thinking which value of cosine angle will help find out the similarities.

Now that we have the values which will be considered to measure the similarities, we need to
know what do 1, 0 and -1 signify.

Here cosine value 1 is for vectors pointing in the same direction i.e. there are similarities
between the documents/data points. At zero for orthogonal vectors i.e. Unrelated (some
similarity found). Value -1 for vectors pointing in opposite directions (No similarity).

To find the cosine distance, we simply have to put the values in the formula and compute.

You might also like