AIDS 6 by AKN

Introduction to
ML
• Prof. Amit K. Nerurkar
• Assistant Professor
• Department of Computer Engineering
• Vidyalankar Institute of Technology, Wadala
A Machine Learning process begins
by feeding the machine lots of data,
by using this data the machine is
trained to detect hidden insights and
trends. These insights are then used
to build a Machine Learning Model by
using an algorithm in order to solve a
problem.
Prepared by Prof. Amit K. Nerurkar (AKN)

Supervised Learning

Unsupervised Learning

There are three main types of problems that can be solved in Machine Learning:
1. Regression: In this type of problem the output is a continuous quantity. So, for example, if you want to predict
the speed of a car given the distance, it is a Regression problem. Regression problems can be solved by using
Supervised Learning algorithms like Linear Regression.
2. Classification: In this type, the output is a categorical value. Classifying emails into two classes, spam and non-
spam is a classification problem that can be solved by using Supervised Learning classification algorithms such
as Support Vector Machines, Naive Bayes, Logistic Regression, K Nearest Neighbor, etc.
3. Clustering: This type of problem involves assigning the input into two or more clusters based on feature
similarity. For example, clustering viewers into similar groups based on their interests, age, geography, etc can be
done by using Unsupervised Learning algorithms like K-Means Clustering.

Unsupervised Learning
1. K Means Clustering
2. Hierarchical Clustering
3. Association Rules

Introduction to Clustering
Clustering is defined as dividing data points or
population into several groups such that
similar data points are in the same groups.
1. Soft Clustering– In this type of clustering,

a likelihood or probability of the data point
of being in a particular cluster is assigned
instead of putting each data point into a
separate cluster.
2. Hard Clustering– Each data point either
entirely belongs to a cluster or not at all.

K-means algorithm Step 3: Assign all the points to the closest cluster centroid
We have these 8 points and we want to apply k-means to Here you can see that the points
create clusters for these points.
which are closer to the red point
are assigned to the red cluster
whereas the points which are
closer to the green point are
assigned to the green cluster.
Step 1: Choose the number of clusters k Step 4: Recompute the centroids of newly formed clusters
Step 2: Select k random points from the data as Here, the red and green
centroids crosses are the new
Here, the red and green circles centroids
represent the centroid for these
clusters.
Step 5: Repeat steps 3 and 4
There are essentially three stopping criteria that can be adopted to stop
the K-means algorithm:
1. Centroids of newly formed clusters do not change
2. Points remain in the same cluster
3. Maximum number of iterations are reached Prepared by Prof. Amit K. Nerurkar (AKN)
Hierarchical Clustering in Machine Learning
Hierarchical clustering is another unsupervised machine learning algorithm, which is used to group the
unlabeled datasets into a cluster and also known as hierarchical cluster analysis or HCA.
In this algorithm, we develop the hierarchy of clusters in the form of a tree, and this tree-shaped
structure is known as the dendrogram.
The hierarchical clustering technique has two approaches: project spam detector
1. Agglomerative: Agglomerative is a bottom-up approach, in which the algorithm starts with taking
all data points as single clusters and merging them until one cluster is left.
2. Divisive: Divisive algorithm is the reverse of the agglomerative algorithm as it is a top-down

approach.

Agglomerative Hierarchical clustering Step-4: Repeat Step 3 until only one cluster
left. So, we will get the following clusters.
Step-1: Create each data point as a single Consider the below images:
cluster. Let's say there are N data points, so
the number of clusters will also be N.
Step-2: Take two closest data points or

clusters and merge them to form one
cluster. So, there will now be N-1 clusters.
Step-3: Again, take the two closest

clusters and merge them together to o Step-5: Once all the
form one cluster. There will be N-2 clusters are combined
clusters. into one big cluster,
develop the
dendrogram to divide
the clusters as per the
problem.
Measure for the distance between two clusters Average Linkage: It is the linkage method in
As we have seen, the closest distance between the two clusters is which the distance between each pair of
crucial for the hierarchical clustering. There are various ways to datasets is added up and then divided by the
total number of datasets to calculate the
calculate the distance between two clusters, and these ways decide
average distance between two clusters. It is
the rule for clustering. These measures are called Linkage methods. also one of the most popular linkage
Some of the popular linkage methods are given below: methods.
Single Linkage: It is the Complete Linkage: It is the Centroid Linkage: It is the linkage method
Shortest Distance between the farthest distance between the two in which the distance between the centroid
closest points of the clusters. points of two different clusters. It is of the clusters is calculated.
one of the popular linkage
methods as it forms tighter clusters
than single-linkage.

o firstly, the datapoints P2 and P3 combine together and form a cluster, correspondingly a dendrogram is created,
which connects P2 and P3 with a rectangular shape. The hight is decided according to the Euclidean distance
between the data points.
o In the next step, P5 and P6 form a cluster, and the corresponding dendrogram is created. It is higher than of previous,
as the Euclidean distance between P5 and P6 is a little bit greater than the P2 and P3.
o Again, two new dendrograms are created that combine P1, P2, and P3 in one dendrogram, and P4, P5, and P6, in
another dendrogram.
o At last, the final dendrogram is created that combines all the data points together.

Divisive Hierarchical clustering Technique
o In simple words, we can say that the Divisive Hierarchical clustering is exactly the opposite of
the Agglomerative Hierarchical clustering. In Divisive Hierarchical clustering, we consider all the data points
as a single cluster and in each iteration, we separate the data points from the cluster which are not similar. Each
data point which is separated is considered as an individual cluster. In the end, we’ll be left with n clusters.
o As we’re dividing the single clusters into n clusters, it is named as Divisive Hierarchical clustering.

Association Rule Learning
Association rule learning is a type of unsupervised learning
technique that checks for the dependency of one data
item on another data item and maps accordingly so that it
can be more profitable.
For example, if a customer buys bread, he most likely can

also buy butter, eggs, or milk, so these products are stored
within a shelf or mostly nearby.

Association rule learning works on the concept of If and
Else Statement, such as if A then B.
Here the If element is called antecedent, and then

statement is called as Consequent.
Support Confidence
Support is the frequency of A or how frequently an item Confidence indicates how often the rule has been
appears in the dataset. It is defined as the fraction of found to be true. Or how often the items X and Y
the transaction T that contains the itemset X. If there are occur together in the dataset when the occurrence
X datasets, then for transactions T, it can be written as: of X is already given. It is the ratio of the transaction
that contains X and Y to the number of records that
contain X.

Lift
It is the strength of any rule, which can be defined as
below formula:
It is the ratio of the observed support measure and

expected support if X and Y are independent of each other.
It has three possible values:
o If Lift= 1: The probability of occurrence of antecedent
and consequent is independent of each other.
o Lift>1: It determines the degree to which the two
itemsets are dependent to each other.
o Lift<1: It tells us that one item is a substitute for other
items, which means one item has a negative effect on
another.

o Association rule learning can be divided into three algorithms:
o Apriori Algorithm
o This algorithm uses frequent datasets to generate association rules. It is designed to work on the databases that
contain transactions. This algorithm uses a breadth-first search and Hash Tree to calculate the itemset efficiently.
o It is mainly used for market basket analysis and helps to understand the products that can be bought together. It
can also be used in the healthcare field to find drug reactions for patients.
o Eclat Algorithm
o Eclat algorithm stands for Equivalence Class Transformation. This algorithm uses a depth-first search technique
to find frequent itemsets in a transaction database. It performs faster execution than Apriori Algorithm.
o F-P Growth Algorithm

o The F-P growth algorithm stands for Frequent Pattern, and it is the improved version of the Apriori Algorithm. It
represents the database in the form of a tree structure that is known as a frequent pattern or tree. The purpose of
this frequent tree is to extract the most frequent patterns.

Supervised Learning
1. Logistic Regression
2. Decision Tree
3. Support Vector Machine

Decision Tree Classification Algorithm
Decision Tree is a Supervised learning technique that
can be used for both classification and Regression
problems, but mostly it is preferred for solving
Classification problems. It is a tree-structured classifier,
where internal nodes represent the features of a
dataset, branches represent the decision rules and each
leaf node represents the outcome.

Example: Suppose there is a candidate who has a job
offer and wants to decide whether he should accept the
offer or Not. So, to solve this problem, the decision tree
starts with the root node (Salary attribute by ASM). The
root node splits further into the next decision node
(distance from the office) and one leaf node based on the
corresponding labels. The next decision node further gets
split into one decision node (Cab facility) and one leaf
node. Finally, the decision node splits into two leaf nodes
(Accepted offers and Declined offer). Consider the below
diagram:

Support Vector Machine Algorithm
Support Vector Machine or SVM is one of the most

popular Supervised Learning algorithms, which is used for
Classification as well as Regression problems. However,
primarily, it is used for Classification problems in Machine
Learning.
SVM chooses the extreme points/vectors that

help in creating the hyperplane. These extreme
cases are called as support vectors, and hence
algorithm is termed as Support Vector
Machine. Consider the below diagram in which
there are two different categories that are
classified using a decision boundary or
hyperplane:
SVM algorithm can be used for Face

detection, image classification, text
categorization, etc.
SVM can be of two types:
Linear SVM: Linear SVM is used for linearly separable
data, which means if a dataset can be classified into two
classes by using a single straight line, then such data is
termed as linearly separable data, and classifier is used
called as Linear SVM classifier.
Non-linear SVM: Non-Linear SVM is used for non-linearly

separated data, which means if a dataset cannot be
classified by using a straight line, then such data is termed
as non-linear data and classifier used is called as Non-
linear SVM classifier.

the SVM algorithm helps to find the best line or decision
boundary; this best boundary or region is called as
a hyperplane.
SVM algorithm finds the closest point of the lines from
both the classes. These points are called support vectors.
The distance between the vectors and the hyperplane is
called as margin.
And the goal of SVM is to maximize this margin.
The hyperplane with maximum margin is called
the optimal hyperplane.

Confusion Matrix in
Machine Learning

A Confusion matrix is an N x N matrix used for evaluating
the performance of a classification model, where N is the
total number of target classes. The matrix compares the
actual target values with those predicted by the machine
learning model. This gives us a holistic view of how well
our classification model is performing and what kinds of
errors it is making.
Important Terms in a Confusion Matrix

True Positive (TP)
The predicted value matches the actual value, or the predicted class matches the actual class. The actual value was
positive, and the model predicted a positive value.
True Negative (TN)
The predicted value matches the actual value, or the predicted class matches the actual class. The actual value was
negative, and the model predicted a negative value.
False Positive (FP) – Type I Error
The predicted value was falsely predicted. The actual value was negative, but the model predicted a positive value.
Also known as the type I error.
False Negative (FN) – Type II Error
The predicted value was falsely predicted. The actual value was positive, but the model predicted a negative value.
Also known as the type II error.

• Precision vs. Recall
• Precision tells us how many of the correctly
predicted cases actually turned out to be
positive.
• This would determine whether our model is

reliable or not.
• Recall tells us how many of the actual
positive cases we were able to predict correctly
with our model.

Thank You
• Name: Amit K. Nerurkar
• Designation: Assistant Professor
• College: Vidyalankar Institute of Technology

AIDS 6 by AKN

Uploaded by

Copyright:

Available Formats

You might also like

AIDS 6 by AKN

Uploaded by

Document Information

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

AIDS 6 by AKN

Uploaded by

Copyright:

Available Formats

Introduction to

Prepared by Prof. Amit K. Nerurkar (AKN)

Prepared by Prof. Amit K. Nerurkar (AKN)

Prepared by Prof. Amit K. Nerurkar (AKN)

Prepared by Prof. Amit K. Nerurkar (AKN)

Prepared by Prof. Amit K. Nerurkar (AKN)

1. Soft Clustering– In this type of clustering,

Prepared by Prof. Amit K. Nerurkar (AKN)

2. Divisive: Divisive algorithm is the reverse of the agglomerative algorithm as it is a top-down

Prepared by Prof. Amit K. Nerurkar (AKN)

Step-2: Take two closest data points or

Step-3: Again, take the two closest

Prepared by Prof. Amit K. Nerurkar (AKN)

Prepared by Prof. Amit K. Nerurkar (AKN)

Prepared by Prof. Amit K. Nerurkar (AKN)

For example, if a customer buys bread, he most likely can

Prepared by Prof. Amit K. Nerurkar (AKN)

Here the If element is called antecedent, and then

Prepared by Prof. Amit K. Nerurkar (AKN)

It is the ratio of the observed support measure and

Prepared by Prof. Amit K. Nerurkar (AKN)

o F-P Growth Algorithm

Prepared by Prof. Amit K. Nerurkar (AKN)

Prepared by Prof. Amit K. Nerurkar (AKN)

Prepared by Prof. Amit K. Nerurkar (AKN)

Prepared by Prof. Amit K. Nerurkar (AKN)

Support Vector Machine or SVM is one of the most

SVM chooses the extreme points/vectors that

SVM algorithm can be used for Face

Non-linear SVM: Non-Linear SVM is used for non-linearly

Prepared by Prof. Amit K. Nerurkar (AKN)

Prepared by Prof. Amit K. Nerurkar (AKN)

Prepared by Prof. Amit K. Nerurkar (AKN)

Important Terms in a Confusion Matrix

Prepared by Prof. Amit K. Nerurkar (AKN)

• This would determine whether our model is

Prepared by Prof. Amit K. Nerurkar (AKN)

Prepared by Prof. Amit K. Nerurkar (AKN)

You might also like