Download as pptx, pdf, or txt
Download as pptx, pdf, or txt
You are on page 1of 37

Nearest Neighbor Classifiers

 Basic idea:
 If it walks like a duck, quacks like a duck, then

it’s probably a duck


Compute
Distance Test
Record

Training Choose k of the


Records “nearest” records

01/31/2024 1
Nearest Neighbor Classifiers

What is the K-Nearest Neighbor (KNN) algorithm?

K-Nearest Neighbor (KNN) algorithm is a distance based supervised


learning algorithm that is used for solving classification problems.
In this, we will be looking at the classes of the k nearest neighbors
to a new point and assign it the class to which the majority of k
neighbours belong too.

To identify the nearest neighbors we use various techniques of


measuring distance, the most common of them being the
‘Euclidean Distance’.

01/31/2024 2
Nearest Neighbor Classifiers

When should you use KNN?


KNN can be used for both classification and regression predictive
problems. However, it is more widely used in classification problems
in the industry.

Can I use KNN for both classification and regression


problems?
Yes! We just answered the question above but let’s expand a bit
more on that. To evaluate any technique we generally look at 3
important aspects:
• Ease to interpret output
• Calculation time
• Predictive Power

01/31/2024 3
Nearest Neighbor Classifiers

When should you use KNN?


KNN can be used for both classification and regression predictive
problems. However, it is more widely used in classification problems
in the industry.

Can I use KNN for both classification and regression


problems?
Yes! We just answered the question above but let’s expand a bit
more on that. To evaluate any technique we generally look at 3
important aspects:
• Ease to interpret output
• Calculation time
• Predictive Power

01/31/2024 4
Nearest neighbor method
 Majority vote within the k nearest neighbors

new

K= 1: blue
K= 3: green

01/31/2024 5
Nearest-Neighbor Classifiers
Unknown record  Requires three things
– The set of stored records
– Distance Metric to compute
distance between records
– The value of k, the number of
nearest neighbors to retrieve

 To classify an unknown record:


– Compute distance to other
training records
– Identify k nearest neighbors
– Use class labels of nearest
neighbors to determine the
class label of unknown record
(e.g., by taking majority vote)

01/31/2024 6
Definition of Nearest Neighbor

X X X

(a) 1-nearest neighbor (b) 2-nearest neighbor (c) 3-nearest neighbor

K-nearest neighbors of a record x are data points


that have the k smallest distance to x

01/31/2024 7
Nearest Neighbor Classification
 Compute distance between two points:
 Euclidean distance

d ( p, q )   ( pi
i
q )
i
2

 Determine the class from nearest neighbor list


 take the majority vote of class labels among

the k-nearest neighbors


 Weigh the vote according to distance

 weight factor, w = 1/d2

01/31/2024 8
Nearest Neighbor Classification
Euclidean Distance: Euclidean distance is calculated as the square root of the
sum of the squared differences between a new point (x) and an existing point (y).

Manhattan Distance: This is the distance between real vectors using the sum of
their absolute difference.

Hamming Distance: It is used for categorical variables. If the value (x) and the
value (y) are the same, the distance D will be equal to 0 . Otherwise D=1.

01/31/2024 9
Nearest Neighbor Classification…
 Choosing the value of k:
 If k is too small, sensitive to noise points
 If k is too large, neighborhood may include points from
other classes

01/31/2024 10
Nearest Neighbor Classification…
 Scaling issues
 Attributes may have to be scaled to prevent

distance measures from being dominated by


one of the attributes
 Example:

 height of a person may vary from 1.5m to 1.8m


 weight of a person may vary from 90lb to 300lb
 income of a person may vary from $10K to $1M

01/31/2024 11
Nearest Neighbor Classification…
 Problem with Euclidean measure:
 High dimensional data

 curse of dimensionality
 Can produce counter-intuitive results

111111111110 100000000000
vs
011111111111 000000000001
d = 1.4142 d = 1.4142

 Solution: Normalize the vectors to unit length

01/31/2024 12
Nearest neighbor Classification…
 k-NN classifiers are lazy learners
 It does not build models explicitly

 Unlike eager learners such as decision tree

induction and rule-based systems


 Classifying unknown records are relatively

expensive

01/31/2024 13
Nearest Neighbor Regression…
Let us start with a simple example. Consider the following table – it
consists of the height, age and weight (target) value for 10 people. As
you can see, the weight value of ID11 is missing. We need to predict the
weight of this person based on their height and age.

01/31/2024 14
A SIMPLE EXAMPLE TO UNDERSTAND THE USE OF
KNN FOR REGRESSION

01/31/2024 15
A SIMPLE EXAMPLE TO UNDERSTAND THE USE OF
KNN FOR REGRESSION

In the above graph, the y-axis represents the height of a person


(in feet) and the x-axis represents the age (in years). The points
are numbered according to the ID values. The yellow point (ID
11) is our test point.

If I ask you to identify the weight of ID11 based on the plot,


what would be your answer? You would likely say that since ID11
is closer to points 5 and 1, so it must have a weight similar to
these IDs, probably between 72-77 kgs (weights of ID1 and ID5
from the table). That actually makes sense, but how do you think
the algorithm predicts the values? Let’s find that out.

01/31/2024 16
HOW DOES THE KNN ALGORITHM WORK

As we saw above, KNN algorithm can be used for both


classification and regression problems. The KNN algorithm uses
‘feature similarity’ to predict the values of any new data points.
This means that the new point is assigned a value based on how
closely it resembles the points in the training set. From our
example, we know that ID11 has height and age similar to ID1
and ID5, so the weight would also approximately be the same.

Had it been a classification problem, we would have taken the


mode as the final prediction. In this case, we have two values of
weight – 72 and 77. Any guesses on how the final value will be
calculated? The average of the values is taken to be the final
prediction.

01/31/2024 17
HOW DOES THE KNN ALGORITHM WORK

First, the distance between the new point and each training point
is calculated.

01/31/2024 18
HOW DOES THE KNN ALGORITHM WORK

The closest k data points are selected (based on the distance). In


this example, points 1, 5, 6 will be selected if the value of k is 3.

01/31/2024 19
HOW DOES THE KNN ALGORITHM WORK

The closest k data points are selected (based on the


distance). In this example, points 1, 5, 6 will be selected if
the value of k is 3.

The average of these data points is the final prediction for


the new point. Here, we have weight of ID11 =
(77+72+60)/3 = 69.66 kg.

01/31/2024 20
HOW DOES THE KNN ALGORITHM WORK

The problem is that how to select the k value. This determines


the number of neighbors we look at when we assign a value to
any new observation.

01/31/2024 21
HOW DOES THE KNN ALGORITHM WORK

For a very low value of k (suppose k=1), the model overfits on the training data,
which leads to a high error rate on the validation set. On the other hand, for a high
value of k, the model performs poorly on both train and validation set. If you
observe closely, the validation error curve reaches a minima at a value of k = 9.
This value of k is the optimum value of the model (it will vary for different
datasets). This curve is known as an ‘elbow curve‘ (because it has a shape like an
elbow) and is usually used to determine the k value.
rmse_val = [] #to store rmse values for different k
for K in range(20):
K = K+1
model = neighbors.KNeighborsRegressor(n_neighbors = K)

model.fit(x_train, y_train) #fit the model


pred=model.predict(x_test) #make prediction on test set
error = sqrt(mean_squared_error(y_test,pred)) #calculate rmse
rmse_val.append(error) #store rmse values

01/31/2024 23
Lazy vs. Eager Learning
 Lazy vs. eager learning
 Lazy learning (e.g., instance-based learning): Simply

stores training data (or only minor processing) and


waits until it is given a test tuple
 Eager learning : Given a set of training set, constructs a

classification model before receiving new (e.g., test)


data to classify
 Lazy: less time in training but more time in predicting
 Accuracy
 Lazy method effectively uses a richer hypothesis space

since it uses many local linear functions to form its


implicit global approximation to the target function
 Eager: must commit to a single hypothesis that covers

the entire instance space


01/31/2024 24
ASSIGNMENT……
1. Tune k-NN. Try larger and larger k values to see if you can
improve the performance of the algorithm on the Iris dataset.
2. Implement other distance measures that you can use to find
similar data, such as Hamming distance, Manhattan distance
and Minkowski distance.
3. Regression. Adapt the example and apply it to a regression
predictive modeling problem (e.g. predict a numerical value)
4. Data Preparation. Distance measures are strongly affected by
the scale of the input data. Experiment with standardization
and other data preparation methods in order to improve results.

01/31/2024 25
How the K-Means Algorithm Works?

Start

Number of
Cluster K

Centroid (-)

Distance objects to
centroids

Grouping based on No object


(+) End
minimum distance move group?

26
K-Means Clustering
Algorithm
 Step 1: Begin with a decision on the value of
k = number of clusters.
 Step 2: Put any initial partition that classifies
the data into k-clusters. You may assign the
training samples randomly, or systematically
as the following:
1. Take the first k training sample as single-element
clusters
2. Assign each of the remaining (N-k) sample to the
cluster with the nearest centroid. After each
assignment, re-compute the centroid of the
gaining cluster.
27
K-Means Clustering
Algorithm
 Step 3: Take each sample in sequence and compute its
distance from the centroid of each of the clusters. If a
sample is not currently in the cluster with the closest
centroid, switch this sample to that cluster and update the
centroid of the cluster gaining the new sample and the
cluster losing the sample.
 Step 4: Repeat step 3 until convergence is achieved, that
is until a pass through the training sample causes no new
assignments.

28
A Simple example showing the implementation of k-means algorithm
(using K=2)

Individual Variable 1 Variable 2


1 1.0 1.0
2 1.5 2.0
3 3.0 4.0
4 5.0 7.0
5 3.5 5.0
6 4.5 5.0
7 3.5 4.5

29
 Step 1:
 Initialization:
Randomly we
choose
following two
centroids (k=2)
for two
clusters.
 In this case the
2 centroid are:
m1=(1.0,1.0)
and Individual Mean Vector
m2=(5.0,7.0). Group 1 1 (1.0, 1.0)
Group 2 4 (5.0, 7.0)

30
Individual Centroid 1 Centroid 2
1 0 7.21
2 (1.5, 2.0) 1.12 6.10
Step 2:
 Thus, we obtain two
3 3.61 3.61
clusters containing: 4 7.21 0
5 4.72 2.50
{1,2,3} and {4,5,6,7}.
 Their new centroids are:
6 5.31 2.06
7 4.30 2.92

31
Step 3:
 Now using these centroids

we compute the Euclidean


distance of each object,
as shown in table.

 Therefore, the new


clusters are:
{1,2} and {3,4,5,6,7}

 Next centroids are:


m1=(1.25,1.5) and m2 =
(3.9,5.1)

32
 Step 4 :
The clusters obtained are:
{1,2} and {3,4,5,6,7}
 Therefore, there is no
change in the cluster.
 Thus, the algorithm
comes to a halt here and
final result consist of 2
clusters {1,2} and
{3,4,5,6,7}.

33
 PLOT

34
 (with K = 3)

Step 1 Step 2

35
 PLOT

36
Exercise
 A department store in Jakarta area, would want to make sure that their sale
campaigns are right on target. In order to do that, they have to cluster their
customers based on: age, total spending, and shopping frequency in a year. The
department store has collected all the data. You are asked to help them clusters
these data into 3 clusters.

Customer Age Total Spend Shopping


(in million rupiah) Frequency
Customer 1 25 18,000 20
Customer 2 30 15,000 10
Customer 3 20 8,000 8
Customer 4 35 22,000 5
Customer 5 40 40,000 14
Customer 6 50 32,000 15
Customer 7 43 9,000 6
Customer 8 42 12,000 16

37

You might also like