Professional Documents
Culture Documents
fml
fml
Machine Learning is defined as a technology that is used to train machines to perform various
actions such as predictions, recommendations, estimations, etc., based on historical data or past
experience.
Example : In Driverless Car, the training data is fed to Algorithm like how to
Drive Car in Highway, Busy and Narrow Street with factors like speed limit,
parking, stop at signal etc. After that, a Logical and Mathematical model is
created on the basis of that and after that, the car will work according to the
logical model. Also, the more data the data is fed the more efficient output is
produced.
Example: In Spam E-Mail detection,
Task, T: To classify mails into Spam or Not Spam.
Performance measure, P: Total percent of mails being correctly
classified as being “Spam” or “Not Spam”.
Experience, E: Set of Mails with label “Spam”
o Noisy Data- It is responsible for an inaccurate prediction that affects the decision as well
as accuracy in classification tasks.
o Incorrect data- It is also responsible for faulty programming and results obtained in
machine learning models. Hence, incorrect data may affect the accuracy of the results
also.
o Generalizing of output data- Sometimes, it is also found that generalizing output data
becomes complex, which results in comparatively poor future actions.
Further, if we are using non-representative training data in the model, it results in less
accurate predictions. A machine learning model is said to be ideal if it predicts well for
generalized cases and provides accurate decisions. If there is less training data, then
there will be a sampling noise in the model, called the non-representative training set. It
won't be accurate in predictions. To overcome this, it will be biased against one class or
a group.
Hence, we should use representative data in training to protect against being biased
and make accurate predictions without any drift.
Overfitting is one of the most common issues faced by Machine Learning engineers and
data scientists. Whenever a machine learning model is trained with a huge amount of
data, it starts capturing noise and inaccurate data into the training data set. It negatively
affects the performance of the model. Let's understand with a simple example where we
have a few training data sets such as 1000 mangoes, 1000 apples, 1000 bananas, and
5000 papayas. Then there is a considerable probability of identification of an apple as
papaya because we have a massive amount of biased data in the training data set;
hence prediction got negatively affected. The main reason behind overfitting is using
non-linear methods used in machine learning algorithms as they build non-realistic data
models. We can overcome overfitting by using linear and parametric algorithms in the
machine learning models.
Underfitting:
Underfitting is just the opposite of overfitting. Whenever a machine learning model is
trained with fewer amounts of data, and as a result, it provides incomplete and
inaccurate data and destroys the accuracy of the machine learning model.
Underfitting occurs when our model is too simple to understand the base structure of
the data, just like an undersized pant. This generally happens when we have limited data
into the data set, and we try to build a linear model with non-linear data. In such
scenarios, the complexity of the model destroys, and rules of the machine learning
model become too easy to be applied on this data set, and the model starts doing
wrong predictions as well.
8. Customer Segmentation
Customer segmentation is also an important issue while developing a machine learning
algorithm. To identify the customers who paid for the recommendations shown by the
model and who don't even check them. Hence, an algorithm is necessary to recognize
the customer behavior and trigger a relevant recommendation for the user based on
past experience.
Based on the methods and way of learning, machine learning is divided into mainly four
types, which are:
The main goal of the supervised learning technique is to map the input variable(x)
with the output variable(y). Some real-world applications of supervised learning
are Risk Assessment, Fraud Detection, Spam filtering, etc.
ADVERTISEMENT
o Classification
o Regression
a) Classification
Classification algorithms are used to solve the classification problems in which the
output variable is categorical, such as "Yes" or No, Male or Female, Red or Blue, etc.
The classification algorithms predict the categories present in the dataset. Some real-
world examples of classification algorithms are Spam Detection, Email filtering, etc.
Regression algorithms are used to solve regression problems in which there is a linear
relationship between input and output variables. These are used to predict continuous
output variables, such as market trends, weather prediction, etc.
o Since supervised learning work with the labelled dataset so we can have an exact idea
about the classes of objects.
o These algorithms are helpful in predicting the output on the basis of prior experience.
Disadvantages:
o Image Segmentation:
Supervised Learning algorithms are used in image segmentation. In this process, image
classification is performed on different image data with pre-defined labels.
o Medical Diagnosis:
Supervised algorithms are also used in the medical field for diagnosis purposes. It is
done by using medical images and past labelled data with labels for disease conditions.
With such a process, the machine can identify a disease for the new patients.
o Fraud Detection - Supervised Learning classification algorithms are used for identifying
fraud transactions, fraud customers, etc. It is done by using historic data to identify the
patterns that can lead to possible fraud.
o Spam detection - In spam detection & filtering, classification algorithms are used. These
algorithms classify an email as spam or not spam. The spam emails are sent to the spam
folder.
o Speech Recognition - Supervised learning algorithms are also used in speech
recognition. The algorithm is trained with voice data, and various identifications can be
done using the same, such as voice-activated passwords, voice commands, etc.
In unsupervised learning, the models are trained with the data that is neither classified
nor labelled, and the model acts on that data without any supervision.
The main aim of the unsupervised learning algorithm is to group or categories the
unsorted dataset according to the similarities, patterns, and differences. Machines
are instructed to find the hidden patterns from the input dataset.
So, now the machine will discover its patterns and differences, such as colour difference,
shape difference, and predict the output when it is tested with the test dataset.
Categories of Unsupervised Machine Learning
Unsupervised Learning can be further classified into two types, which are given below:
o Clustering
o Association
1) Clustering
The clustering technique is used when we want to find the inherent groups from the
data. It is a way to group the objects into a cluster such that the objects with the most
similarities remain in one group and have fewer or no similarities with the objects of
other groups. An example of the clustering algorithm is grouping the customers by their
purchasing behaviour.
ADVERTISEMENT
2) Association
Some popular algorithms of Association rule learning are Apriori Algorithm, Eclat, FP-
growth algorithm.
Advantages and Disadvantages of Unsupervised
Learning Algorithm
Advantages:
o These algorithms can be used for complicated tasks compared to the supervised ones
because these algorithms work on the unlabeled dataset.
o Unsupervised algorithms are preferable for various tasks as getting the unlabeled
dataset is easier as compared to the labelled dataset.
Disadvantages:
o The output of an unsupervised algorithm can be less accurate as the dataset is not
labelled, and algorithms are not trained with the exact output in prior.
o Working with Unsupervised learning is more difficult as it works with the unlabelled
dataset that does not map with the output.
3. Semi-Supervised Learning
Semi-Supervised learning is a type of Machine Learning algorithm that lies
between Supervised and Unsupervised machine learning. It represents the
intermediate ground between Supervised (With Labelled training data) and
Unsupervised learning (with no labelled training data) algorithms and uses the
combination of labelled and unlabeled datasets during the training period.
ADVERTISEMENT
Disadvantages:
4. Reinforcement Learning
Reinforcement learning works on a feedback-based process, in which an AI agent
(A software component) automatically explore its surrounding by hitting & trail,
taking action, learning from experiences, and improving its performance. Agent
gets rewarded for each good action and get punished for each bad action; hence the
goal of reinforcement learning agent is to maximize the rewards.
In reinforcement learning, there is no labelled data like supervised learning, and agents
learn from their experiences only.
ADVERTISEMENT
The reinforcement learning process is similar to a human being; for example, a child
learns various things by experiences in his day-to-day life. An example of reinforcement
learning is to play a game, where the Game is the environment, moves of an agent at
each step define states, and the goal of the agent is to get a high score. Agent receives
feedback in terms of punishment and rewards.
Due to its way of working, reinforcement learning is employed in different fields such
as Game theory, Operation Research, Information theory, multi-agent systems.
Disadvantage
The curse of dimensionality limits reinforcement learning for real physical systems.
The goal of the SVM algorithm is to create the best line or decision boundary that can
segregate n-dimensional space into classes so that we can easily put the new data point
in the correct category in the future. This best decision boundary is called a hyperplane.
SVM chooses the extreme points/vectors that help in creating the hyperplane. These
extreme cases are called as support vectors, and hence algorithm is termed as Support
Vector Machine. Consider the below diagram in which there are two different categories
that are classified using a decision boundary or hyperplane:
Example: SVM can be understood with the example that we have used in the KNN
classifier. Suppose we see a strange cat that also has some features of dogs, so if we
want a model that can accurately identify whether it is a cat or dog, so such a model can
be created by using the SVM algorithm. We will first train our model with lots of images
of cats and dogs so that it can learn about different features of cats and dogs, and then
we test it with this strange creature. So as support vector creates a decision boundary
between these two data (cat and dog) and choose extreme cases (support vectors), it
will see the extreme case of cat and dog. On the basis of the support vectors, it will
classify it as a cat. Consider the below diagram:
SVM algorithm can be used for Face detection, image classification, text
categorization, etc.
Types of SVM
SVM can be of two types:
ADVERTISEMENT
o Linear SVM: Linear SVM is used for linearly separable data, which means if a dataset can
be classified into two classes by using a single straight line, then such data is termed as
linearly separable data, and classifier is used called as Linear SVM classifier.
o Non-linear SVM: Non-Linear SVM is used for non-linearly separated data, which means
if a dataset cannot be classified by using a straight line, then such data is termed as non-
linear data and classifier used is called as Non-linear SVM classifier.
We always create a hyperplane that has a maximum margin, which means the maximum
distance between the data points.
Support Vectors:
The data points or vectors that are the closest to the hyperplane and which affect the position of
the hyperplane are termed as Support Vector. Since these vectors support the hyperplane,
hence called a Support vector.
Linear SVM:
The working of the SVM algorithm can be understood by using an example. Suppose we
have a dataset that has two tags (green and blue), and the dataset has two features x1
and x2. We want a classifier that can classify the pair(x1, x2) of coordinates in either
green or blue. Consider the below image:
So as it is 2-d space so by just using a straight line, we can easily separate these two
classes. But there can be multiple lines that can separate these classes. Consider the
below image:
Hence, the SVM algorithm helps to find the best line or decision boundary; this best
boundary or region is called as a hyperplane. SVM algorithm finds the closest point of
the lines from both the classes. These points are called support vectors. The distance
between the vectors and the hyperplane is called as margin. And the goal of SVM is to
maximize this margin. The hyperplane with maximum margin is called the optimal
hyperplane.
Non-Linear SVM:
If data is linearly arranged, then we can separate it by using a straight line, but for non-
linear data, we cannot draw a single straight line. Consider the below image:
ADVERTISEMENT
ADVERTISEMENT
So to separate these data points, we need to add one more dimension. For linear data,
we have used two dimensions x and y, so for non-linear data, we will add a third
dimension z. It can be calculated as:
z=x2 +y2
By adding the third dimension, the sample space will become as below image:
So now, SVM will divide the datasets into classes in the following way. Consider the
below image:
Since we are in 3-d Space, hence it is looking like a plane parallel to the x-axis. If we
convert it in 2d space with z=1, then it will become as:
Hence we get a circumference of radius 1 in case of non-linear data.
Ensemble means ‘a collection of things’ and in Machine Learning
terminology, Ensemble learning refers to the approach of combining multiple
ML models to produce a more accurate and robust prediction compared to
any individual model. It implements an ensemble of fast algorithms
(classifiers) such as decision trees for learning and allows them to vote.
Bagging Vs Boosting
We all use the Decision Tree Technique on day to day life to make the decision.
Organizations use these supervised machine learning techniques like Decision trees to
make a better decision and to generate more surplus and profit.
Ensemble methods combine different decision trees to deliver better predictive results,
afterward utilizing a single decision tree. The primary principle behind the ensemble
model is that a group of weak learners come together to form an active learner.
There are two techniques given below that are used to perform ensemble decision tree.
Bagging
Bagging is used when our objective is to reduce the variance of a decision tree. Here the
concept is to create a few subsets of data from the training sample, which is chosen
randomly with replacement. Now each collection of subset data is used to prepare their
decision trees thus, we end up with an ensemble of various models. The average of all
the assumptions from numerous tress is used, which is more powerful than a single
decision tree.
Random Forest is an expansion over bagging. It takes one additional step to predict a
random subset of data. It also makes the random selection of features rather than using
all features to develop trees. When we have numerous random trees, it is called the
Random Forest.
These are the following steps which are taken to implement a Random forest:
o Let us consider X observations Y features in the training data set. First, a model from the
training data set is taken randomly with substitution.
o The tree is developed to the largest.
o The given steps are repeated, and prediction is given, which is based on the collection of
predictions from n number of trees.
Since the last prediction depends on the mean predictions from subset trees, it won't
give precise value for the regression model.
Boosting:
Boosting is another ensemble procedure to make a collection of predictors. In other
words, we fit consecutive trees, usually random samples, and at each step, the objective
is to solve net error from the prior trees.
If a given input is misclassified by theory, then its weight is increased so that the
upcoming hypothesis is more likely to classify it correctly by consolidating the entire set
at last converts weak learners into better performing models.
It utilizes a gradient descent algorithm that can optimize any differentiable loss function.
An ensemble of trees is constructed individually, and individual trees are summed
successively. The next tree tries to restore the loss ( It is the difference between actual
and predicted values).
Bagging Boosting
Various training data subsets are randomly drawn Each new subset contains the components that
with replacement from the whole training dataset. were misclassified by previous models.
Bagging attempts to tackle the over-fitting issue. Boosting tries to reduce bias.
If the classifier is unstable (high variance), then we If the classifier is steady and straightforward
need to apply bagging. (high bias), then we need to apply boosting.
Every model receives an equal weight. Models are weighted by their performance.
Objective to decrease variance, not bias. Objective to decrease bias, not variance.
It is the easiest way of connecting predictions that It is a way of connecting predictions that belong
belong to the same type. to the different types.
Every model is constructed independently. New models are affected by the performance of
the previously developed model.
What is Clustering ?
The task of grouping data points based on their similarity with each other is called
Clustering or Cluster Analysis.
This method is defined under the branch of Unsupervised Learning, which aims at
gaining insights from unlabelled data points, that is, unlike supervised learning we
don’t have a target variable.
Clustering aims at forming groups of homogeneous data points from a heterogeneous
dataset. It evaluates the similarity based on a metric like Euclidean distance, Cosine
similarity, Manhattan distance, etc. and then group the points with highest similarity
score together.
For Example, In the graph given below, we can clearly see that there are 3 circular
clusters forming on the basis of distance.
Now it is not necessary that the clusters formed must be circular in shape. The shape
of clusters can be arbitrary. There are many algortihms that work well with detecting
arbitrary shaped clusters.
For example, In the below given graph we can see that the clusters formed are not
circular in shape.
Types of Clustering
Broadly speaking, there are 2 types of clustering that can be performed to group
similar data points:
Hard Clustering: In this type of clustering, each data point belongs to a cluster
completely or not. For example, Let’s say there are 4 data point and we have to
cluster them into 2 clusters. So each data point will either belong to cluster 1 or
cluster 2.
Data
Points Clusters
A C1
B C2
C C2
D C1
Soft Clustering: In this type of clustering, instead of assigning each data point
into a separate cluster, a probability or likelihood of that point being that cluster is
evaluated. For example, Let’s say there are 4 data point and we have to cluster
them into 2 clusters. So we will be evaluating a probability of a data point
belonging to both clusters. This probability is calculated for all data points.
Data
Probability of C1 Probability of C2
Points
A 0.91 0.09
B 0.3 0.7
C 0.17 0.83
D 1 0
Uses of Clustering
Now before we begin with types of clustering algorithms, we will go through the use
cases of Clustering algorithms. Clustering algorithms are majorly used for:
Market Segmentation – Businesses use clustering to group their customers and use
targeted advertisements to attract more audience.
Market Basket Analysis – Shop owners analyze their sales and figure out which
items are majorly bought together by the customers. For example, In USA,
according to a study diapers and beers were usually bought together by fathers.
Social Network Analysis – Social media sites use your data to understand your
browsing behaviour and provide you with targeted friend recommendations or
content recommendations.
Medical Imaging – Doctors use Clustering to find out diseased areas in diagnostic
images like X-rays.
Anomaly Detection – To find outliers in a stream of real-time dataset or
forecasting fraudulent transactions we can use clustering to identify them.
Simplify working with large datasets – Each cluster is given a cluster ID after
clustering is complete. Now, you may reduce a feature set’s whole feature set into
its cluster ID. Clustering is effective when it can represent a complicated case with
a straightforward cluster ID. Using the same principle, clustering data can make
complex datasets simpler.
There are many more use cases for clustering but there are some of the major and
common use cases of clustering. \
1. Partitioning Clustering
2. Density-Based Clustering
3. Distribution Model-Based Clustering
4. Hierarchical Clustering
5. Fuzzy Clustering
Partitioning Clustering
It is a type of clustering that divides the data into non-hierarchical groups. It is also
known as the centroid-based method. The most common example of partitioning
clustering is the K-Means Clustering algorithm.
In this type, the dataset is divided into a set of k groups, where K is used to define the
number of pre-defined groups. The cluster center is created in such a way that the
distance between the data points of one cluster is minimum as compared to another
cluster centroid.
Density-Based Clustering
The density-based clustering method connects the highly-dense areas into clusters, and
the arbitrarily shaped distributions are formed as long as the dense region can be
connected. This algorithm does it by identifying different clusters in the dataset and
connects the areas of high densities into clusters. The dense areas in data space are
divided from each other by sparser areas.
These algorithms can face difficulty in clustering the data points if the dataset has
varying densities and high dimensions.
Distribution Model-Based Clustering
In the distribution model-based clustering method, the data is divided based on the
probability of how a dataset belongs to a particular distribution. The grouping is done
by assuming some distributions commonly Gaussian Distribution.
Fuzzy Clustering
Fuzzy clustering is a type of soft method in which a data object may belong to more
than one group or cluster. Each dataset has a set of membership coefficients, which
depend on the degree of membership to be in a cluster. Fuzzy C-means algorithm is
the example of this type of clustering; it is sometimes also known as the Fuzzy k-means
algorithm.
Clustering Algorithms
The Clustering algorithms can be divided based on their models that are explained
above. There are different types of clustering algorithms published, but only a few are
commonly used. The clustering algorithm is based on the kind of data that we are using.
Such as, some algorithms need to guess the number of clusters in the given dataset,
whereas some are required to find the minimum distance between the observation of
the dataset.
Here we are discussing mainly popular Clustering algorithms that are widely used in
machine learning:
1. K-Means algorithm: The k-means algorithm is one of the most popular clustering
algorithms. It classifies the dataset by dividing the samples into different clusters of
equal variances. The number of clusters must be specified in this algorithm. It is fast with
fewer computations required, with the linear complexity of O(n).
2. Mean-shift algorithm: Mean-shift algorithm tries to find the dense areas in the smooth
density of data points. It is an example of a centroid-based model, that works on
updating the candidates for centroid to be the center of the points within a given region.
3. DBSCAN Algorithm: It stands for Density-Based Spatial Clustering of Applications
with Noise. It is an example of a density-based model similar to the mean-shift, but with
some remarkable advantages. In this algorithm, the areas of high density are separated
by the areas of low density. Because of this, the clusters can be found in any arbitrary
shape.
4. Expectation-Maximization Clustering using GMM: This algorithm can be used as an
alternative for the k-means algorithm or for those cases where K-means can be failed. In
GMM, it is assumed that the data points are Gaussian distributed.
5. Agglomerative Hierarchical algorithm: The Agglomerative hierarchical algorithm
performs the bottom-up hierarchical clustering. In this, each data point is treated as a
single cluster at the outset and then successively merged. The cluster hierarchy can be
represented as a tree-structure.
6. Affinity Propagation: It is different from other clustering algorithms as it does not
require to specify the number of clusters. In this, each data point sends a message
between the pair of data points until convergence. It has O(N 2T) time complexity, which
is the main drawback of this algorithm.
Applications of Clustering
Below are some commonly known applications of clustering technique in Machine
Learning:
o In Identification of Cancer Cells: The clustering algorithms are widely used for the
identification of cancerous cells. It divides the cancerous and non-cancerous data sets
into different groups.
o In Search Engines: Search engines also work on the clustering technique. The search
result appears based on the closest object to the search query. It does it by grouping
similar data objects in one group that is far from the other dissimilar objects. The
accurate result of a query depends on the quality of the clustering algorithm used.
o Customer Segmentation: It is used in market research to segment the customers based
on their choice and preferences.
o In Biology: It is used in the biology stream to classify different species of plants and
animals using the image recognition technique.
o In Land Use: The clustering technique is used in identifying the area of similar lands use
in the GIS database. This can be very useful to find that for what purpose the particular
land should be used, that means for which purpose it is more suitable.
.
Applications of Clustering in different fields:
1. Marketing: It can be used to characterize & discover customer segments for
marketing purposes.
2. Biology: It can be used for classification among different species of plants and
animals.
3. Libraries: It is used in clustering different books on the basis of topics and
information.
4. Insurance: It is used to acknowledge the customers, their policies and identifying
the frauds.
5. City Planning: It is used to make groups of houses and to study their values based
on their geographical locations and other factors present.
6. Earthquake studies: By learning the earthquake-affected areas we can determine
the dangerous zones.
7. Image Processing: Clustering can be used to group similar images together,
classify images based on content, and identify patterns in image data.
8. Genetics: Clustering is used to group genes that have similar expression patterns
and identify gene networks that work together in biological processes.
9. Finance: Clustering is used to identify market segments based on customer
behavior, identify patterns in stock market data, and analyze risk in investment
portfolios.
10.Customer Service: Clustering is used to group customer inquiries and complaints
into categories, identify common issues, and develop targeted solutions.
11.Manufacturing: Clustering is used to group similar products together, optimize
production processes, and identify defects in manufacturing processes.
12.Medical diagnosis: Clustering is used to group patients with similar symptoms or
diseases, which helps in making accurate diagnoses and identifying effective
treatments.
13.Fraud detection: Clustering is used to identify suspicious patterns or anomalies in
financial transactions, which can help in detecting fraud or other financial crimes.
14.Traffic analysis: Clustering is used to group similar patterns of traffic data, such
as peak hours, routes, and speeds, which can help in improving transportation
planning and infrastructure.
15.Social network analysis: Clustering is used to identify communities or groups
within social networks, which can help in understanding social behavior,
influence, and trends.
16.Cybersecurity: Clustering is used to group similar patterns of network traffic or
system behavior, which can help in detecting and preventing cyberattacks.
17.Climate analysis: Clustering is used to group similar patterns of climate data,
such as temperature, precipitation, and wind, which can help in understanding
climate change and its impact on the environment.
18.Sports analysis: Clustering is used to group similar patterns of player or team
performance data, which can help in analyzing player or team strengths and
weaknesses and making strategic decisions.
19.Crime analysis: Clustering is used to group similar patterns of crime data, such as
location, time, and type, which can help in identifying crime hotspots, predicting
future crime trends, and improving crime prevention strategies.