Download as pptx, pdf, or txt
Download as pptx, pdf, or txt
You are on page 1of 16

K Nearest Neighbors (KNN)

“Birds of a feather flock together"


KNN – Different names

K-Nearest Neighbors
Memory-Based Reasoning
Example-Based Reasoning
Instance-Based Learning
Lazy Learning
Instance-Based Classifiers
Set of Stored Cases • Store the training records

……...
• Use training records to
Atr1 AtrN Class predict the class label of
A unseen cases
B
B
Unseen Case
C
A Atr1 ……... AtrN

C
B
Basic Idea
If it walks like a duck, quacks like a duck, then it’s probably a duck

Compute
Distance Test
Record

Training Choose k of the


Records “nearest” records
What is nearest?

only valid for continuous variables


What is nearest?

for categorical variables


Nearest-Neighbor Classifiers
Unknown record  Requires three things
– The set of stored records
– Distance Metric to compute
distance between records
– The value of k, the number of
nearest neighbors to retrieve

 To classify an unknown record:


– Compute distance to other
training records
– Identify k nearest neighbors
– Use class labels of nearest
neighbors to determine the
class label of unknown record
(e.g., by taking majority vote)
Definition of Nearest Neighbor

X X X

(a) 1-nearest neighbor (b) 2-nearest neighbor (c) 3-nearest neighbor

K-nearest neighbors of a record x are data points


that have the k smallest distance to x
1 nearest-neighbor
Voronoi Diagram defines the classification boundary

The area takes the


class of the green
point
K-Nearest Neighbor Algorithm
1. All the instances correspond to points in an n-dimensional
feature space.

2. Each instance is represented with a set of numerical attributes.

3. Each of the training data consists of a set of vectors and a


class label associated with each vector.

4. Classification is done by comparing feature vectors of different


K nearest points.

5. Select the K-nearest examples to E in the training set.

6. Assign E to the most common class among its K-nearest


neighbors.
3NN: Example
How to choose K?
• If K is too small it is sensitive to noise points.
• Larger K works well. But too large K may include majority points from other
classes.

• Rule of thumb is K < sqrt(n), n is number of examples.


Noisy sample
KNN Feature Weighting
Scale each feature by its importance for classification

Can use our prior knowledge about which features are more important
Feature Normalization

• Distance between neighbors could be dominated by some


attributes with relatively large numbers.
 e.g., income of customers in our previous example.

• Arises when two features are in different scales.

• Important to normalize those features.


 Mapping values to numbers between 0 – 1.

• Linearly transforms x to y= (x-min)/(max-min)


Pros and Cons
Strengths:
• No learning time (lazy learner)
• Highly expressive since can learn complex
decision boundaries
• Via use of k can avoid noise
• Easy to explain/justify decision

Weaknesses:
• Relatively long evaluation time
• No model to provide insight
• Very sensitive to irrelevant and redundant features
• Normalization required

You might also like