Professional Documents
Culture Documents
Unit - 1 What Is Machine Learning?
Unit - 1 What Is Machine Learning?
How it works –
First, we need data about the houses: square footage, number of rooms, features,
whether a house has a garden or not, and so on. We then need to know the prices of
these houses, i.e. the corresponding labels. By leveraging data coming from
thousands of houses, their features and prices, we can now train a supervised
machine learning model to predict a new house’s price based on the examples
observed by the model.
Image classification is a popular problem in the computer vision field. Here, the goal
is to predict what class an image belongs to. In this set of problems, we are
interested in finding the class label of an image. More precisely: is the image of a car
or a plane? A cat or a dog?
● The model learns through observation and finds structures in the data.
Unsupervised learning, as the name suggests, has no data labels. The
machine looks for patterns randomly.
● Once the model is given a dataset, it automatically finds patterns and
relationships in the dataset by creating clusters in it. What it cannot do is add
labels to the cluster, like it cannot say this is a group of apples or mangoes,
but it will separate all the apples from mangoes.
● Suppose we present images of apples, bananas and mangoes to the model,
so what it does, based on some patterns and relationships it creates clusters
and divides the dataset into those clusters. Now if a new data is fed to the
model, it adds it to one of the created clusters.
● EXAMPLES:
○ Customer segmentation, or understanding different customer
groups around which to build marketing or other business
strategies.
○ Genetics, for example clustering DNA patterns to analyze
evolutionary biology.
○ Recommender systems, which involve grouping together users
with similar viewing patterns in order to recommend similar
content.
○ Anomaly detection, including fraud detection or detecting
defective mechanical parts (i.e., predictive maintenance).
● It is the ability of an agent to interact with the environment and find out what is
the best outcome.
● It follows the concept of hit and trial method.
● The agent is rewarded or penalized with a point for a correct or a wrong
answer, and on the basis of the positive reward points gained the model trains
itself.
● And again once trained it gets ready to predict the new data presented to it.
● EXAMPLES
Classification Algorithms can be further divided into the Mainly two category:
● Linear Models
○ Logistic Regression
● Non-linear Models
○ K-Nearest Neighbours
○ Kernel SVM
○ Naïve Bayes
● Speech Recognition
● Drugs Classification
Linear regression algorithm shows a linear relationship between a dependent (y) and
one or more independent (y) variables, hence called linear regression. Since linear
regression shows the linear relationship, which means it finds how the value of the
means the error between predicted values and actual values should be minimized.
Model Performance:
The "Goodness of fit" term is taken from the statistics, and the goal of the machine
learning models to achieve the goodness of fit. In statistics modeling, it defines how
closely the result or predicted values match the true values of the dataset.
The Goodness of fit determines how the line of regression fits the set of
observations. The process of finding the best model out of various models is called
1. R-squared method:
● The high value of R-square determines the less difference between the
predicted values and actual values and hence represents a good model.
variables.
Linear Regression is used for solving Logistic regression is used for solving
In linear regression, we find the best fit In Logistic Regression, we find the
line, by which we can easily predict the S-curve by which we can classify the
output. samples.
The output for Linear Regression must The output of Logistic Regression must
be linear. variable.
UNDERFITTING:
● If the model is performing poorly over the test and the train set, then we call
that an underfitting model.
such a way that it fits the purpose of the machine learning model. It can be thought of as the art of selecting
the important features and transforming them into refined and meaningful features that suit the needs of the
model.
OVERFITTING:
● This situation where any given model is performing too well on the training
data but the performance drops significantly over the test set is called an
overfitting model.
● When a model gets trained with so much data, it starts learning from the noise
and inaccurate data entries in our data set.
● For example, non-parametric models like decision trees, KNN, and other
tree-based algorithms are very prone to overfitting
● A solution to avoid overfitting is using a linear algorithm if we have linear data.
Goodness of Fit
The "Goodness of fit" term is taken from the statistics, and the goal of the machine
learning models to achieve the goodness of fit. In statistics modeling, it defines how
closely the result or predicted values match the true values of the dataset.
The model with a good fit is between the underfitting and overfitted model, and
ideally, it makes predictions with 0 errors, but in practice, it is difficult to achieve it.
● The K-NN algorithm assumes the similarity between the new case/data and
available cases and puts the new case into the category that is most similar
to the available categories.
● It is also called a lazy learner algorithm because it does not learn from the
training set immediately instead it stores the dataset and at the time of
classification, it performs an action on the dataset.
● The problem statement is to assign the new input data point to one of the two
classes by using the KNN algorithm
● The first step in the KNN algorithm is to define the value of ‘K’. But what does
the ‘K’ in the KNN algorithm stand for?
● ‘K’ stands for the number of Nearest Neighbors and hence the name K
Nearest Neighbors (KNN).
● In the above image, I’ve defined the value of ‘K’ as 3. This means that the
algorithm will consider the three neighbors that are the closest to the new data
point in order to decide the class of this new data point.
● The closeness between the data points is calculated by using measures such
as Euclidean and Manhattan distance, which I’ll be explaining below.
● At ‘K’ = 3, the neighbors include two squares and 1 triangle. So, if I were to
classify the new data point based on ‘K’ = 3, then it would be assigned to
Class A (squares).
● But what if the ‘K’ value is set to 7? Here, I’m basically telling my algorithm to
look for the seven nearest neighbors and classify the new data point into the
class it is most similar to.
● At ‘K’ = 7, the neighbors include three squares and four triangles. So, if I were
to classify the new data point based on ‘K’ = 7, then it would be assigned to
Class B (triangles) since the majority of its neighbors were of class B.
In practice, there’s a lot more to consider while implementing the KNN algorithm.
This will be discussed in the demo section of the blog.
Earlier I mentioned that KNN uses Euclidean distance as a measure to check the
distance between a new data point and its neighbors, let’s see how.
● Consider the above image, here we’re going to measure the distance
between P1 and P2 by using the Euclidean Distance measure.
It is as simple as that! KNN makes use of simple measures in order to solve complex
problems, this is one of the reasons why KNN is such a commonly used algorithm.
What is Naive Bayes algorithm?
● It is a classification technique based on Bayes’ Theorem with an assumption
upon the existence of the other features, all of these properties independently
contribute to the probability that this fruit is an apple and that is why it is
known as ‘Naive’.
Step 2: Create Likelihood table by finding the probabilities like Overcast probability =
0.29 and probability of playing is 0.64.
Step 3: Now, use the Naive Bayesian equation to calculate the posterior probability
for each class. The class with the highest posterior probability is the outcome of
prediction.
Problem: Players will play if the weather is sunny. Is this statement correct?
Here we have P (Sunny |Yes) = 3/9 = 0.33, P(Sunny) = 5/14 = 0.36, P( Yes)= 9/14 =
0.64
Now, P (Yes | Sunny) = 0.33 * 0.64 / 0.36 = 0.60, which has higher probability.
While implementing a Decision tree, the main issue arises that how to select the best
attribute for the root node and for sub-nodes. So, to solve such problems there is a
technique which is called as Attribute selection measure or ASM. By this
measurement, we can easily select the best attribute for the nodes of the tree. There
are two popular techniques for ASM, which are:
● Information Gain
○ The algorithm calculates the information gain for each split and the split
which is giving the highest value of information gain is selected.
○ We can say that in Information gain we are going to compute the
average of all the entropy-based on the specific split.
● Gini Index
● ENTROPY :
○ Purpose of Entropy:
○ “Entropy values range from 0 to 1”, Less the value of entropy, the more
trustworthy it is.
As the Gini Impurity does not contain any logarithmic function to calculate it takes
less computational time as compared to entropy.
What is a Confusion Matrix?
● A Confusion matrix is an N x N matrix used for evaluating the performance of
a classification model, where N is the number of target classes.
● Both precision and recall can be interpreted from the confusion matrix, so
we start there. The confusion matrix is used to display how well a model made
its predictions.
● The purpose of the confusion matrix is to show how…well, how confused the
model is.
● For a binary classification problem, we would have a 2 x 2 matrix as shown
below with 4 values:
● True Positive (TP) = 560; meaning 560 positive class data points were
correctly classified by the model
● True Negative (TN) = 330; meaning 330 negative class data points were
correctly classified by the model
● False Positive (FP) = 60; meaning 60 negative class data points were
incorrectly classified as belonging to the positive class by the model
● False Negative (FN) = 50; meaning 50 positive class data points were
incorrectly classified as belonging to the negative class by the model
Remember the Type 1 and Type 2 errors. Interviewers love to ask the difference
between these two!
Precision vs. Recall
Precision tells us how many of the correctly predicted cases actually turned out to be
positive.
Recall tells us how many of the actual positive cases we were able to predict
correctly with our model.
F1-Score
In practice, when we try to increase the precision of our model, the recall goes down,
and vice-versa. The F1-score captures both the trends in a single value:
ACCURACY:
Support Vector Machine Algorithm
● Support Vector Machine or SVM is one of the most popular Supervised
Learning algorithms, which is used for Classification as well as Regression
problems. However, primarily, it is used for Classification problems in Machine
Learning.
● The goal of the SVM algorithm is to create the best line or decision boundary
that can segregate n-dimensional space into classes so that we can easily put
the new data point in the correct category in the future. This best decision
boundary is called a hyperplane.
A 2-D example helps to make sense of all the machine learning jargon. Basically you
have some data points on a grid. You're trying to separate these data points by the
category they should fit in, but you don't want to have any data in the wrong
category. That means you're trying to find the line between the two closest points
that keeps the other data points separated.
So the two closest data points give you the support vectors you'll use to find that line.
That line is called the decision boundary.
linear SVM
The decision boundary doesn't have to be a line. It's also referred to as a hyperplane
because you can find the decision boundary with any number of features, not just
two.
Types of SVM
SVM can be of two types:
● Linear SVM: Linear SVM is used for linearly separable data, which means if a
dataset can be classified into two classes by using a single straight line, then
such data is termed as linearly separable data, and classifier is used called as
Linear SVM classifier.
● Non-linear SVM: Non-Linear SVM is used for non-linearly separated data,
which means if a dataset cannot be classified by using a straight line, then
such data is termed as non-linear data and classifier used is called as
Non-linear SVM classifier.
Hyperplane
Support Vectors:
The data points or vectors that are the closest to the hyperplane and which affect the
position of the hyperplane are termed as Support Vectors. These vectors support the
hyperplane, hence called a Support vector, hence algorithm is termed as Support
Vector Machine
● Face detection
● Text and hypertext categorization
● Classification of images
● Protein fold and remote homology detection
● Handwriting recognition
UNIT 3