Download as pdf or txt
Download as pdf or txt
You are on page 1of 17

CS446: Machine Learning

Lecture 20 (ML Models – K Nearest Neighbor)


Instructor:
Dr. Muhammad Kabir
Assistant Professor
muhammad.kabir@umt.edu.pk

School of Systems and Technology


Department of Computer Science
University of Management and Technology, Lahore
Previous Lectures…
 Logistic Regression – intuition

 Logistic Regression – Mathematical understanding

 Gradient Decent for Logistic Regression

 Implementation of Logistic Regression


Today’s Lectures…
 Logistic Regression – intuition

 Logistic Regression – Mathematical understanding

 Gradient Decent for Logistic Regression

 Implementation of Logistic Regression


K-nearest Neighbor - Intuition….
K-nearest Neighbor - Intuition….
 Supervised Learning model
 Used both for classification and regression.
 Can be used for non-linear data.
 Consider k-neighbors.
 KNN classifier categorize the unknown data sample s to a pre-define
class based on previously categorized data samples (training data).
 Lazy learning - Does not “learn” until the test example is given.
Whenever we have a new data to classify, we find its K-nearest neighbors
from the training data.
 Classified by “MAJORITY VOTES” for its neighbor classes
K-nearest Neighbor - Intuition….
Classification Problem K=5

To measure the distance between the data points:


• Euclidean Distance
• Manhattan Distance
K-nearest Neighbor - Intuition….
Regression Problem K=5

Salary of the person can be calculated as the mean of 5 nearest neighbors.


K-nearest Neighbor
Advantages
 Works well with smaller datasets with less number of
features.
 Can be use for both classification & regression.
 Easy to implement for multi-class classification problems.
 Different distance criteria can be used (Euclidean
Distance, Manhattan Distance etc).
Disadvantages
 Choosing the optimum “K” value.
 Less efficient with high dimensional data.
 Doesn’t perform well on the imbalance datasets.
 Sensitive to outliers.
KNN – Example Euclidean Distance….
KNN – Example Euclidean Distance….
KNN – Example Manhattan Distance….

Manhattan distance is preferred over Euclidean


distance when there is high dimensionality of the
data.
KNN Algorithm Steps
 Step 1: provide features to K-NN classifier for classification to train the
system
 Step 2: Measure the distance by using in the Euclidean Distance formula
between the new observation S and training data

 Step 3: Sort the Euclidean distance values as di ≤ di+1, select k smallest


samples.
 Step 4: Apply voting or means according to the application (for classification)
while for regression apply mean to calculate the final value.
KNN Algorithm Example
 For example we have training data with class label C=
{1, 2} listed in table.
 The new observation S = {3, 4} need to be classify,
which is show in fig the k-NN algorithm indicate the
nearest neighbor with k=3 and k=5 according to the
new observation S.
 Distance can be calculated
through Euclidean distance
formula shown in the above fig
and the calculation process.
KNN Algorithm Example

 KNN uses voting method where k value equal to 3, the number of 2 labels >
1 labels that’s why the new observation S will be classified as 2.
 If k value changes from 3 to 5 then the number of 1 labels > 2 labels so the
new observation will be classified as 1.The voting process is highlighted in
fig on previous page.
Effect of K
Effect of K
Chapter Reading

You might also like