Download as pptx, pdf, or txt
Download as pptx, pdf, or txt
You are on page 1of 23

Classification : kNN & NB

REFERENCES
kNN Classifier:
Book: Machine Learning with Python for Everyone (Chapter 3)
NB Classifier:
Book: Machine Learning with Python for Everyone (Chapter 3)
Naive Bayes Classifier in Machine Learning (enjoyalgorithms.com)
Bayes Theorem - Statement, Proof, Formula, Derivation & Examples (byj
us.com)
Classification Tasks
• Depending on no. of outcomes
• Binary Classifiction (Two class classification)
• {Yes, No}; {Red, Black}; {True, False}
• {-1 +1}; {0, 1}
• Multiclass Classification
• {Cruiser, Destroyer, Frigate Mine Sweeper, Air Craft Carrier…}
• Depending on steps involved
• Direct outcome in one step
• K Nearest Neighbours
• Two step process
• (1) build a model of how likely the outcomes are and
• (2) pick the most likely outcome
• Naïve Bayes
Sample (& Simple) Classification Dataset
• IRIS Dataset
• Included with sklearn
• Fisher’s Dataset
• Sir Ronald Fisher, mid-20th-century statistician
• First academic paper on classification
• Edgar Anderson
• Gatherer of data!
• Contents
• Each Row: describes one iris flower, in terms of the length and width of that flower’s sepals
and petals
• Rows: Examples / samples
• Final Column: Particular species of that iris: setosa, versicolor, or virginica
• Features / Attributes / IV (initial columns) and Target / Label / DV (Final column)
Sample (& Simple) Classification Dataset
Sample (& Simple) Classification Dataset
Training and Test (Data) Sets
Training and Test (Data) Sets
• Generalization
• Performance on novel data (general knowledge)
• Evaluation Schemes
• in-sample evaluation or training error
• out-of-sample or test error evaluation
• sklearn’s train_test_split
• training data
• portion of the data that we will use to study and build up our understanding
• testing data
• portion of the data that we will use to test ourselves
• Split randomly
Training and Test (Data) Sets
Training and Test (Data) Sets

iris Python variable Symbol Phrase


iris Dall (total) dataset
iris.data Dftrs train and test features
iris.target Dtgt train and test targets
iris_train_ftrs Dtrain training features
iris_test_ftrs Dtest testing features
iris_train_tgt Dtraintgt training target
iris_test_tgt Dtesttgt testing target
Evaluation
• Accuracy
• If the answer is true and we predicted true, then we get a point!
• If the answer is false and we predicted true, we don’t get a point!!
• Formula: (#correct answers / #questions)
• sklearn’s train_test_split
• training data
• portion of the data that we will use to study and build up our understanding
• sklearn’s metrics.accuracy_score
k Nearest Neighbours Classifier
• Simple Classifier
• Single step to make predictions from labelled dataset
• Method
• Find a way to describe the similarity of two different examples.
• When you need to make a prediction on a new, unknown example, simply take the
value from the most similar known example
• Consider more than just the single most similar example:
• Describe similarity between pairs of examples.
• Pick several of the most-similar examples.
• Combine those picks to get a single answer.
k Nearest Neighbours Classifier
• Similarity
• A distance between pairs of examples
• similarity = distance(example_one, example_two)
• Similar things are close - a small distance apart
• Dissimilar things are far away - a large distance apart
• Distance Metrics
• Euclidean Distance
• treat the two examples as points in space
• Hamming Distance
• when we have examples that consist of simple Yes ; No or True; False features, with Boolean
data, we can compare two examples very nicely by counting up the number of features that are
different
• Minkowski Distance etc…
k Nearest Neighbours Classifier
• k in the k-NN and Answer Combination
• 1 / 3 / 5 / 10 / 20
• Voting method to classify
• Noise problem
• Tie problem
• {cat, dog, dog, zebra, cat}
• Statistic (mean / median) to regress
k Nearest Neighbours Classifier
• We want to use 3-NN - three nearest neighbors - as our model
• We want that model to capture the relationship between the iris
training features and the iris training targets
• We want to use that model to predict - on previously unseen test
examples - the iris target species.
• Finally, we want to evaluate the quality of those predictions, using
accuracy, by comparing predictions against reality. We don’t peek at
these known answers, but we use them as an answer key for the test.
k Nearest Neighbours Classifier
k Nearest Neighbours Classifier
• sklearn’s terminology
• An estimator is fit on some data and then used to predict on some data.
• We fit the estimator on training data and then use the fit-estimator to predict
on the test data.
• In other words:-
• Create a 3-NN model,
• Fit that model on the training data,
• Use that model to predict on the test data, and
• Evaluate those predictions using accuracy
k Nearest Neighbours Classifier
• Hyperparameters
• 3 in our 3-nearest-neighbors is not something that we adjust by training
• If we want a 5-NN machine, we have to build a completely different model
• 3 is a hyperparameter
• Hyperparameters are not trained or manipulated by the learning method they
help define
• Hyperparameters are predetermined and fixed before we get a chance to do
anything with them while learning
Naïve Bayes Classifier
Naïve Bayes Classifier
• Example: Football Play with single feature

• Aim: to make a ML model which receives the feature value of humidity and tries to
predict whether the play will happen or not
• Given that Humidity is Normal, lets find the chances of the play
• p(Play = Yes | Humidity = Normal)
Naïve Bayes Classifier
Naïve Bayes Classifier

You might also like