Download as docx, pdf, or txt
Download as docx, pdf, or txt
You are on page 1of 41

Well Posed Learning Problem – A computer program is said to learn from

experience E in context to some task T and some performance measure P, if


its performance on T, as was measured by P, upgrades with experience E.
Any problem can be segregated as well-posed learning problem if it has
three traits –
 Task
 Performance Measure
 Experience
Certain examples that efficiently defines the well-posed learning
problem are –
1. To better filter emails as spam or not
 Task – Classifying emails as spam or not
 Performance Measure – The fraction of emails accurately classified as
spam or not spam
 Experience – Observing you label emails as spam or not spam
3. Handwriting Recognition Problem
 Task – Acknowledging handwritten words within portrayal
 Performance Measure – percent of words accurately classified
 Experience – a directory of handwritten words with given classifications
5. Fruit Prediction Problem
 Task – forecasting different fruits for recognition
 Performance Measure – able to predict maximum variety of fruits
 Experience – training machine with the largest datasets of fruits images
6. Face Recognition Problem
 Task – predicting different types of faces
 Performance Measure – able to predict maximum types of faces
 Experience – training machine with maximum amount of datasets of
different face images

Machine Learning is defined as a technology that is used to train machines to perform various
actions such as predictions, recommendations, estimations, etc., based on historical data or past
experience.

Design a Learning System in Machine


Learning
According to Arthur Samuel “Machine Learning enables a Machine to
Automatically learn from Data, Improve performance from an Experience and
predict things without explicitly programmed.”
In Simple Words, When we fed the Training Data to Machine Learning
Algorithm, this algorithm will produce a mathematical model and with the
help of the mathematical model, the machine will make a prediction and take
a decision without being explicitly programmed. Also, during training data,
the more machine will work with it the more it will get experience and the
more efficient result is produced.

Example : In Driverless Car, the training data is fed to Algorithm like how to
Drive Car in Highway, Busy and Narrow Street with factors like speed limit,
parking, stop at signal etc. After that, a Logical and Mathematical model is
created on the basis of that and after that, the car will work according to the
logical model. Also, the more data the data is fed the more efficient output is
produced.
Example: In Spam E-Mail detection,
 Task, T: To classify mails into Spam or Not Spam.
 Performance measure, P: Total percent of mails being correctly
classified as being “Spam” or “Not Spam”.
 Experience, E: Set of Mails with label “Spam”

Steps for Designing Learning System are:


Step 1) Choosing the Training Experience: The very important and first
task is to choose the training data or training experience which will be fed to
the Machine Learning Algorithm. It is important to note that the data or
experience that we fed to the algorithm must have a significant impact on the
Success or Failure of the Model. So Training data or experience should be
chosen wisely.
Below are the attributes which will impact on Success and Failure of Data:
 The training experience will be able to provide direct or indirect feedback
regarding choices. For example: While Playing chess the training data will
provide feedback to itself like instead of this move if this is chosen the
chances of success increases.
 Second important attribute is the degree to which the learner will control
the sequences of training examples. For example: when training data is
fed to the machine then at that time accuracy is very less but when it
gains experience while playing again and again with itself or opponent the
machine algorithm will get feedback and control the chess game
accordingly.
 Third important attribute is how it will represent the distribution of
examples over which performance will be measured. For example, a
Machine learning algorithm will get experience while going through a
number of different cases and different examples. Thus, Machine
Learning Algorithm will get more and more experience by passing through
more and more examples and hence its performance will increase.
Step 2- Choosing target function: The next important step is choosing the
target function. It means according to the knowledge fed to the algorithm the
machine learning will choose NextMove function which will describe what
type of legal moves should be taken. For example : While playing chess with
the opponent, when opponent will play then the machine learning algorithm
will decide what be the number of possible legal moves taken in order to get
success.
Step 3- Choosing Representation for Target function: When the machine
algorithm will know all the possible legal moves the next step is to choose
the optimized move using any representation i.e. using linear Equations,
Hierarchical Graph Representation, Tabular form etc. The NextMove function
will move the Target move like out of these move which will provide more
success rate. For Example : while playing chess machine have 4 possible
moves, so the machine will choose that optimized move which will provide
success to it.
Step 4- Choosing Function Approximation Algorithm: An optimized move
cannot be chosen just with the training data. The training data had to go
through with set of example and through these examples the training data
will approximates which steps are chosen and after that machine will provide
feedback on it. For Example : When a training data of Playing chess is fed
to algorithm so at that time it is not machine algorithm will fail or get success
and again from that failure or success it will measure while next move what
step should be chosen and what is its success rate.
Step 5- Final Design: The final design is created at last when system goes
from number of examples , failures and success , correct and incorrect
decision and what will be the next step etc. Example: DeepBlue is an
intelligent computer which is ML-based won chess game against the chess
expert Garry Kasparov, and it became the first computer which had beaten a
human chess expert.

Common issues in Machine Learning


Although machine learning is being used in every industry and helps organizations
make more informed and data-driven choices that are more effective than classical
methodologies, it still has so many problems that cannot be ignored. Here are some
common issues in Machine Learning that professionals face to inculcate ML skills and
create an application from scratch.

1. Inadequate Training Data


The major issue that comes while using machine learning algorithms is the lack of
quality as well as quantity of data. Although data plays a vital role in the processing of
machine learning algorithms, many data scientists claim that inadequate data, noisy
data, and unclean data are extremely exhausting the machine learning algorithms. For
example, a simple task requires thousands of sample data, and an advanced task such as
speech or image recognition needs millions of sample data examples. Further, data
quality is also important for the algorithms to work ideally, but the absence of data
quality is also found in Machine Learning applications. Data quality can be affected by
some factors as follows:

o Noisy Data- It is responsible for an inaccurate prediction that affects the decision as well
as accuracy in classification tasks.
o Incorrect data- It is also responsible for faulty programming and results obtained in
machine learning models. Hence, incorrect data may affect the accuracy of the results
also.
o Generalizing of output data- Sometimes, it is also found that generalizing output data
becomes complex, which results in comparatively poor future actions.

2. Poor quality of data


As we have discussed above, data plays a significant role in machine learning, and it
must be of good quality as well. Noisy data, incomplete data, inaccurate data, and
unclean data lead to less accuracy in classification and low-quality results. Hence, data
quality can also be considered as a major common problem while processing machine
learning algorithms.

3. Non-representative training data


To make sure our training model is generalized well or not, we have to ensure that
sample training data must be representative of new cases that we need to generalize.
The training data must cover all cases that are already occurred as well as occurring.

Further, if we are using non-representative training data in the model, it results in less
accurate predictions. A machine learning model is said to be ideal if it predicts well for
generalized cases and provides accurate decisions. If there is less training data, then
there will be a sampling noise in the model, called the non-representative training set. It
won't be accurate in predictions. To overcome this, it will be biased against one class or
a group.

Hence, we should use representative data in training to protect against being biased
and make accurate predictions without any drift.

4. Overfitting and Underfitting


Overfitting:

Overfitting is one of the most common issues faced by Machine Learning engineers and
data scientists. Whenever a machine learning model is trained with a huge amount of
data, it starts capturing noise and inaccurate data into the training data set. It negatively
affects the performance of the model. Let's understand with a simple example where we
have a few training data sets such as 1000 mangoes, 1000 apples, 1000 bananas, and
5000 papayas. Then there is a considerable probability of identification of an apple as
papaya because we have a massive amount of biased data in the training data set;
hence prediction got negatively affected. The main reason behind overfitting is using
non-linear methods used in machine learning algorithms as they build non-realistic data
models. We can overcome overfitting by using linear and parametric algorithms in the
machine learning models.

Methods to reduce overfitting:

o Increase training data in a dataset.


o Reduce model complexity by simplifying the model by selecting one with fewer
parameters
o Ridge Regularization and Lasso Regularization
o Early stopping during the training phase
o Reduce the noise
o Reduce the number of attributes in training data.
o Constraining the model.

Underfitting:
Underfitting is just the opposite of overfitting. Whenever a machine learning model is
trained with fewer amounts of data, and as a result, it provides incomplete and
inaccurate data and destroys the accuracy of the machine learning model.

Underfitting occurs when our model is too simple to understand the base structure of
the data, just like an undersized pant. This generally happens when we have limited data
into the data set, and we try to build a linear model with non-linear data. In such
scenarios, the complexity of the model destroys, and rules of the machine learning
model become too easy to be applied on this data set, and the model starts doing
wrong predictions as well.

Methods to reduce Underfitting:

o Increase model complexity


o Remove noise from the data
o Trained on increased and better features
o Reduce the constraints
o Increase the number of epochs to get better results.

5. Monitoring and maintenance


As we know that generalized output data is mandatory for any machine learning model;
hence, regular monitoring and maintenance become compulsory for the same. Different
results for different actions require data change; hence editing of codes as well as
resources for monitoring them also become necessary.

6. Getting bad recommendations


A machine learning model operates under a specific context which results in bad
recommendations and concept drift in the model. Let's understand with an example
where at a specific time customer is looking for some gadgets, but now customer
requirement changed over time but still machine learning model showing same
recommendations to the customer while customer expectation has been changed. This
incident is called a Data Drift. It generally occurs when new data is introduced or
interpretation of data changes. However, we can overcome this by regularly updating
and monitoring data according to the expectations.
7. Lack of skilled resources
Although Machine Learning and Artificial Intelligence are continuously growing in the
market, still these industries are fresher in comparison to others. The absence of skilled
resources in the form of manpower is also an issue. Hence, we need manpower having
in-depth knowledge of mathematics, science, and technologies for developing and
managing scientific substances for machine learning.

8. Customer Segmentation
Customer segmentation is also an important issue while developing a machine learning
algorithm. To identify the customers who paid for the recommendations shown by the
model and who don't even check them. Hence, an algorithm is necessary to recognize
the customer behavior and trigger a relevant recommendation for the user based on
past experience.

9. Process Complexity of Machine Learning


The machine learning process is very complex, which is also another major issue faced
by machine learning engineers and data scientists. However, Machine Learning and
Artificial Intelligence are very new technologies but are still in an experimental phase
and continuously being changing over time. There is the majority of hits and trial
experiments; hence the probability of error is higher than expected. Further, it also
includes analyzing the data, removing data bias, training data, applying complex
mathematical calculations, etc., making the procedure more complicated and quite
tedious.

10. Data Bias


Data Biasing is also found a big challenge in Machine Learning. These errors exist when
certain elements of the dataset are heavily weighted or need more importance than
others. Biased data leads to inaccurate results, skewed outcomes, and other analytical
errors. However, we can resolve this error by determining where data is actually biased
in the dataset. Further, take necessary steps to reduce it.

Methods to remove Data Bias:

o Research more for customer segmentation.


o Be aware of your general use cases and potential outliers.
o Combine inputs from multiple sources to ensure data diversity.
o Include bias testing in the development process.
o Analyze data regularly and keep tracking errors to resolve them easily.
o Review the collected and annotated data.
o Use multi-pass annotation such as sentiment analysis, content moderation, and intent
recognition.

11. Lack of Explainability


This basically means the outputs cannot be easily comprehended as it is programmed in
specific ways to deliver for certain conditions. Hence, a lack of explainability is also
found in machine learning algorithms which reduce the credibility of the algorithms.

12. Slow implementations and results


This issue is also very commonly seen in machine learning models. However, machine
learning models are highly efficient in producing accurate results but are time-
consuming. Slow programming, excessive requirements' and overloaded data take more
time to provide accurate results than expected. This needs continuous maintenance and
monitoring of the model for delivering accurate results.

13. Irrelevant features


Although machine learning models are intended to give the best possible outcome, if
we feed garbage data as input, then the result will also be garbage. Hence, we should
use relevant features in our training sample. A machine learning model is said to be
good if training data has a good set of features or less to no irrelevant features.

Types of Machine Learning


Machine learning is a subset of AI, which enables the machine to automatically
learn from data, improve performance from past experiences, and make
predictions. Machine learning contains a set of algorithms that work on a huge amount
of data. Data is fed to these algorithms to train them, and on the basis of training, they
build the model & perform a specific task.

These ML algorithms help to solve different business problems like Regression,


Classification, Forecasting, Clustering, and Associations, etc.

Based on the methods and way of learning, machine learning is divided into mainly four
types, which are:

1. Supervised Machine Learning


2. Unsupervised Machine Learning
3. Semi-Supervised Machine Learning
4. Reinforcement Learning

1. Supervised Machine Learning


As its name suggests, Supervised machine learning is based on supervision. It means in
the supervised learning technique, we train the machines using the "labelled" dataset,
and based on the training, the machine predicts the output. Here, the labelled data
specifies that some of the inputs are already mapped to the output. More preciously, we
can say; first, we train the machine with the input and corresponding output, and then
we ask the machine to predict the output using the test dataset.

Let's understand supervised learning with an example. Suppose we have an input


dataset of cats and dog images. So, first, we will provide the training to the machine to
understand the images, such as the shape & size of the tail of cat and dog, Shape of
eyes, colour, height (dogs are taller, cats are smaller), etc. After completion of
training, we input the picture of a cat and ask the machine to identify the object and
predict the output. Now, the machine is well trained, so it will check all the features of
the object, such as height, shape, colour, eyes, ears, tail, etc., and find that it's a cat. So, it
will put it in the Cat category. This is the process of how the machine identifies the
objects in Supervised Learning.

The main goal of the supervised learning technique is to map the input variable(x)
with the output variable(y). Some real-world applications of supervised learning
are Risk Assessment, Fraud Detection, Spam filtering, etc.

Categories of Supervised Machine Learning


Supervised machine learning can be classified into two types of problems, which are
given below:

ADVERTISEMENT

o Classification
o Regression

a) Classification

Classification algorithms are used to solve the classification problems in which the
output variable is categorical, such as "Yes" or No, Male or Female, Red or Blue, etc.
The classification algorithms predict the categories present in the dataset. Some real-
world examples of classification algorithms are Spam Detection, Email filtering, etc.

Some popular classification algorithms are given below:

o Random Forest Algorithm


o Decision Tree Algorithm
o Logistic Regression Algorithm
o Support Vector Machine Algorithm
b) Regression

Regression algorithms are used to solve regression problems in which there is a linear
relationship between input and output variables. These are used to predict continuous
output variables, such as market trends, weather prediction, etc.

Some popular Regression algorithms are given below:

o Simple Linear Regression Algorithm


o Multivariate Regression Algorithm
o Decision Tree Algorithm
o Lasso Regression

Advantages and Disadvantages of Supervised


Learning
Advantages:

o Since supervised learning work with the labelled dataset so we can have an exact idea
about the classes of objects.
o These algorithms are helpful in predicting the output on the basis of prior experience.

Disadvantages:

o These algorithms are not able to solve complex tasks.


o It may predict the wrong output if the test data is different from the training data.
o It requires lots of computational time to train the algorithm.

Applications of Supervised Learning


Some common applications of Supervised Learning are given below:

o Image Segmentation:
Supervised Learning algorithms are used in image segmentation. In this process, image
classification is performed on different image data with pre-defined labels.
o Medical Diagnosis:
Supervised algorithms are also used in the medical field for diagnosis purposes. It is
done by using medical images and past labelled data with labels for disease conditions.
With such a process, the machine can identify a disease for the new patients.
o Fraud Detection - Supervised Learning classification algorithms are used for identifying
fraud transactions, fraud customers, etc. It is done by using historic data to identify the
patterns that can lead to possible fraud.
o Spam detection - In spam detection & filtering, classification algorithms are used. These
algorithms classify an email as spam or not spam. The spam emails are sent to the spam
folder.
o Speech Recognition - Supervised learning algorithms are also used in speech
recognition. The algorithm is trained with voice data, and various identifications can be
done using the same, such as voice-activated passwords, voice commands, etc.

2. Unsupervised Machine Learning


Unsupervised learning is different from the Supervised learning technique; as its name
suggests, there is no need for supervision. It means, in unsupervised machine learning,
the machine is trained using the unlabeled dataset, and the machine predicts the output
without any supervision.

In unsupervised learning, the models are trained with the data that is neither classified
nor labelled, and the model acts on that data without any supervision.

The main aim of the unsupervised learning algorithm is to group or categories the
unsorted dataset according to the similarities, patterns, and differences. Machines
are instructed to find the hidden patterns from the input dataset.

Let's take an example to understand it more preciously; suppose there is a basket of


fruit images, and we input it into the machine learning model. The images are totally
unknown to the model, and the task of the machine is to find the patterns and
categories of the objects.

So, now the machine will discover its patterns and differences, such as colour difference,
shape difference, and predict the output when it is tested with the test dataset.
Categories of Unsupervised Machine Learning
Unsupervised Learning can be further classified into two types, which are given below:

o Clustering
o Association

1) Clustering

The clustering technique is used when we want to find the inherent groups from the
data. It is a way to group the objects into a cluster such that the objects with the most
similarities remain in one group and have fewer or no similarities with the objects of
other groups. An example of the clustering algorithm is grouping the customers by their
purchasing behaviour.

ADVERTISEMENT

Some of the popular clustering algorithms are given below:

o K-Means Clustering algorithm


o Mean-shift algorithm
o DBSCAN Algorithm
o Principal Component Analysis
o Independent Component Analysis

2) Association

Association rule learning is an unsupervised learning technique, which finds interesting


relations among variables within a large dataset. The main aim of this learning algorithm
is to find the dependency of one data item on another data item and map those
variables accordingly so that it can generate maximum profit. This algorithm is mainly
applied in Market Basket analysis, Web usage mining, continuous production, etc.

Some popular algorithms of Association rule learning are Apriori Algorithm, Eclat, FP-
growth algorithm.
Advantages and Disadvantages of Unsupervised
Learning Algorithm
Advantages:

o These algorithms can be used for complicated tasks compared to the supervised ones
because these algorithms work on the unlabeled dataset.
o Unsupervised algorithms are preferable for various tasks as getting the unlabeled
dataset is easier as compared to the labelled dataset.

Disadvantages:

o The output of an unsupervised algorithm can be less accurate as the dataset is not
labelled, and algorithms are not trained with the exact output in prior.
o Working with Unsupervised learning is more difficult as it works with the unlabelled
dataset that does not map with the output.

Applications of Unsupervised Learning


o Network Analysis: Unsupervised learning is used for identifying plagiarism and
copyright in document network analysis of text data for scholarly articles.
o Recommendation Systems: Recommendation systems widely use unsupervised
learning techniques for building recommendation applications for different web
applications and e-commerce websites.
o Anomaly Detection: Anomaly detection is a popular application of unsupervised
learning, which can identify unusual data points within the dataset. It is used to discover
fraudulent transactions.
o Singular Value Decomposition: Singular Value Decomposition or SVD is used to extract
particular information from the database. For example, extracting information of each
user located at a particular location.

3. Semi-Supervised Learning
Semi-Supervised learning is a type of Machine Learning algorithm that lies
between Supervised and Unsupervised machine learning. It represents the
intermediate ground between Supervised (With Labelled training data) and
Unsupervised learning (with no labelled training data) algorithms and uses the
combination of labelled and unlabeled datasets during the training period.

ADVERTISEMENT

Although Semi-supervised learning is the middle ground between supervised and


unsupervised learning and operates on the data that consists of a few labels, it mostly
consists of unlabeled data. As labels are costly, but for corporate purposes, they may
have few labels. It is completely different from supervised and unsupervised learning as
they are based on the presence & absence of labels.

To overcome the drawbacks of supervised learning and unsupervised learning


algorithms, the concept of Semi-supervised learning is introduced. The main aim
of semi-supervised learning is to effectively use all the available data, rather than only
labelled data like in supervised learning. Initially, similar data is clustered along with an
unsupervised learning algorithm, and further, it helps to label the unlabeled data into
labelled data. It is because labelled data is a comparatively more expensive acquisition
than unlabeled data.

We can imagine these algorithms with an example. Supervised learning is where a


student is under the supervision of an instructor at home and college. Further, if that
student is self-analysing the same concept without any help from the instructor, it
comes under unsupervised learning. Under semi-supervised learning, the student has to
revise himself after analyzing the same concept under the guidance of an instructor at
college.

Advantages and disadvantages of Semi-supervised


Learning
Advantages:

o It is simple and easy to understand the algorithm.


o It is highly efficient.
o It is used to solve drawbacks of Supervised and Unsupervised Learning algorithms.

Disadvantages:

o Iterations results may not be stable.


o We cannot apply these algorithms to network-level data.
o Accuracy is low.

4. Reinforcement Learning
Reinforcement learning works on a feedback-based process, in which an AI agent
(A software component) automatically explore its surrounding by hitting & trail,
taking action, learning from experiences, and improving its performance. Agent
gets rewarded for each good action and get punished for each bad action; hence the
goal of reinforcement learning agent is to maximize the rewards.

In reinforcement learning, there is no labelled data like supervised learning, and agents
learn from their experiences only.

ADVERTISEMENT

The reinforcement learning process is similar to a human being; for example, a child
learns various things by experiences in his day-to-day life. An example of reinforcement
learning is to play a game, where the Game is the environment, moves of an agent at
each step define states, and the goal of the agent is to get a high score. Agent receives
feedback in terms of punishment and rewards.

Due to its way of working, reinforcement learning is employed in different fields such
as Game theory, Operation Research, Information theory, multi-agent systems.

A reinforcement learning problem can be formalized using Markov Decision


Process(MDP). In MDP, the agent constantly interacts with the environment and
performs actions; at each action, the environment responds and generates a new state.

Categories of Reinforcement Learning


Reinforcement learning is categorized mainly into two types of methods/algorithms:

o Positive Reinforcement Learning: Positive reinforcement learning specifies increasing


the tendency that the required behaviour would occur again by adding something. It
enhances the strength of the behaviour of the agent and positively impacts it.
o Negative Reinforcement Learning: Negative reinforcement learning works exactly
opposite to the positive RL. It increases the tendency that the specific behaviour would
occur again by avoiding the negative condition.
Real-world Use cases of Reinforcement Learning
o Video Games:
RL algorithms are much popular in gaming applications. It is used to gain super-human
performance. Some popular games that use RL algorithms are AlphaGO and AlphaGO
Zero.
o Resource Management:
The "Resource Management with Deep Reinforcement Learning" paper showed that how
to use RL in computer to automatically learn and schedule resources to wait for different
jobs in order to minimize average job slowdown.
o Robotics:
RL is widely being used in Robotics applications. Robots are used in the industrial and
manufacturing area, and these robots are made more powerful with reinforcement
learning. There are different industries that have their vision of building intelligent robots
using AI and Machine learning technology.
o Text Mining
Text-mining, one of the great applications of NLP, is now being implemented with the
help of Reinforcement Learning by Salesforce company.

Advantages and Disadvantages of Reinforcement


Learning
Advantages

o It helps in solving complex real-world problems which are difficult to be solved by


general techniques.
o The learning model of RL is similar to the learning of human beings; hence most accurate
results can be found.
o Helps in achieving long term results.

Disadvantage

o RL algorithms are not preferred for simple problems.


o RL algorithms require huge data and computations.
o Too much reinforcement learning can lead to an overload of states which can weaken
the results.

The curse of dimensionality limits reinforcement learning for real physical systems.

Support Vector Machine Algorithm


Support Vector Machine or SVM is one of the most popular Supervised Learning
algorithms, which is used for Classification as well as Regression problems. However,
primarily, it is used for Classification problems in Machine Learning.

The goal of the SVM algorithm is to create the best line or decision boundary that can
segregate n-dimensional space into classes so that we can easily put the new data point
in the correct category in the future. This best decision boundary is called a hyperplane.

SVM chooses the extreme points/vectors that help in creating the hyperplane. These
extreme cases are called as support vectors, and hence algorithm is termed as Support
Vector Machine. Consider the below diagram in which there are two different categories
that are classified using a decision boundary or hyperplane:
Example: SVM can be understood with the example that we have used in the KNN
classifier. Suppose we see a strange cat that also has some features of dogs, so if we
want a model that can accurately identify whether it is a cat or dog, so such a model can
be created by using the SVM algorithm. We will first train our model with lots of images
of cats and dogs so that it can learn about different features of cats and dogs, and then
we test it with this strange creature. So as support vector creates a decision boundary
between these two data (cat and dog) and choose extreme cases (support vectors), it
will see the extreme case of cat and dog. On the basis of the support vectors, it will
classify it as a cat. Consider the below diagram:
SVM algorithm can be used for Face detection, image classification, text
categorization, etc.

Types of SVM
SVM can be of two types:

ADVERTISEMENT

o Linear SVM: Linear SVM is used for linearly separable data, which means if a dataset can
be classified into two classes by using a single straight line, then such data is termed as
linearly separable data, and classifier is used called as Linear SVM classifier.
o Non-linear SVM: Non-Linear SVM is used for non-linearly separated data, which means
if a dataset cannot be classified by using a straight line, then such data is termed as non-
linear data and classifier used is called as Non-linear SVM classifier.

Hyperplane and Support Vectors in the SVM


algorithm:
Hyperplane: There can be multiple lines/decision boundaries to segregate the classes in
n-dimensional space, but we need to find out the best decision boundary that helps to
classify the data points. This best boundary is known as the hyperplane of SVM.
The dimensions of the hyperplane depend on the features present in the dataset, which
means if there are 2 features (as shown in image), then hyperplane will be a straight line.
And if there are 3 features, then hyperplane will be a 2-dimension plane.

We always create a hyperplane that has a maximum margin, which means the maximum
distance between the data points.

Support Vectors:

The data points or vectors that are the closest to the hyperplane and which affect the position of
the hyperplane are termed as Support Vector. Since these vectors support the hyperplane,
hence called a Support vector.

Linear SVM:

The working of the SVM algorithm can be understood by using an example. Suppose we
have a dataset that has two tags (green and blue), and the dataset has two features x1
and x2. We want a classifier that can classify the pair(x1, x2) of coordinates in either
green or blue. Consider the below image:

So as it is 2-d space so by just using a straight line, we can easily separate these two
classes. But there can be multiple lines that can separate these classes. Consider the
below image:
Hence, the SVM algorithm helps to find the best line or decision boundary; this best
boundary or region is called as a hyperplane. SVM algorithm finds the closest point of
the lines from both the classes. These points are called support vectors. The distance
between the vectors and the hyperplane is called as margin. And the goal of SVM is to
maximize this margin. The hyperplane with maximum margin is called the optimal
hyperplane.
Non-Linear SVM:

If data is linearly arranged, then we can separate it by using a straight line, but for non-
linear data, we cannot draw a single straight line. Consider the below image:

ADVERTISEMENT
ADVERTISEMENT
So to separate these data points, we need to add one more dimension. For linear data,
we have used two dimensions x and y, so for non-linear data, we will add a third
dimension z. It can be calculated as:

z=x2 +y2

By adding the third dimension, the sample space will become as below image:
So now, SVM will divide the datasets into classes in the following way. Consider the
below image:
Since we are in 3-d Space, hence it is looking like a plane parallel to the x-axis. If we
convert it in 2d space with z=1, then it will become as:
Hence we get a circumference of radius 1 in case of non-linear data.
Ensemble means ‘a collection of things’ and in Machine Learning
terminology, Ensemble learning refers to the approach of combining multiple
ML models to produce a more accurate and robust prediction compared to
any individual model. It implements an ensemble of fast algorithms
(classifiers) such as decision trees for learning and allows them to vote.

 . The basic idea behind ensemble learning is to leverage the wisdom of


the crowd by aggregating the predictions of multiple models, each of
which may have its own strengths and weaknesses. This can lead to
improved performance and generalization.
 Ensemble learning can be thought of as compensation for poor learning
algorithms that are computationally more expensive than a single model.
But they are more efficient than a single non-ensemble model that has
passed through a lot of learning and how it works, different types of
ensemble classifiers, advanced ensemble learning techniques, and some
algorithms (such as random forest, xgboost) for better clarification of the
common ensemble classifiers and finally their uses in the technical world.
 Several individual base models (experts) are fitted to learn from the same
data and produce an aggregation of output based on which a final
decision is taken. These base models can be machine learning algorithms
such as decision trees (mostly used), linear models, support vector
machines (SVM), neural networks, or any other model that is capable of
making predictions.
 Most commonly used ensembles include techniques such as Bagging-
used to generate Random Forest algorithms and Boosting- to generate
algorithms such as Adaboost, Xgboost etc.

Bagging Vs Boosting
We all use the Decision Tree Technique on day to day life to make the decision.
Organizations use these supervised machine learning techniques like Decision trees to
make a better decision and to generate more surplus and profit.

Ensemble methods combine different decision trees to deliver better predictive results,
afterward utilizing a single decision tree. The primary principle behind the ensemble
model is that a group of weak learners come together to form an active learner.

There are two techniques given below that are used to perform ensemble decision tree.
Bagging
Bagging is used when our objective is to reduce the variance of a decision tree. Here the
concept is to create a few subsets of data from the training sample, which is chosen
randomly with replacement. Now each collection of subset data is used to prepare their
decision trees thus, we end up with an ensemble of various models. The average of all
the assumptions from numerous tress is used, which is more powerful than a single
decision tree.

Random Forest is an expansion over bagging. It takes one additional step to predict a
random subset of data. It also makes the random selection of features rather than using
all features to develop trees. When we have numerous random trees, it is called the
Random Forest.

These are the following steps which are taken to implement a Random forest:

o Let us consider X observations Y features in the training data set. First, a model from the
training data set is taken randomly with substitution.
o The tree is developed to the largest.
o The given steps are repeated, and prediction is given, which is based on the collection of
predictions from n number of trees.

Advantages of using Random Forest technique:

o It manages a higher dimension data set very well.


o It manages missing quantities and keeps accuracy for missing data.

Disadvantages of using Random Forest technique:

Since the last prediction depends on the mean predictions from subset trees, it won't
give precise value for the regression model.

Boosting:
Boosting is another ensemble procedure to make a collection of predictors. In other
words, we fit consecutive trees, usually random samples, and at each step, the objective
is to solve net error from the prior trees.
If a given input is misclassified by theory, then its weight is increased so that the
upcoming hypothesis is more likely to classify it correctly by consolidating the entire set
at last converts weak learners into better performing models.

Gradient Boosting is an expansion of the boosting procedure.

1. Gradient Boosting = Gradient Descent + Boosting

It utilizes a gradient descent algorithm that can optimize any differentiable loss function.
An ensemble of trees is constructed individually, and individual trees are summed
successively. The next tree tries to restore the loss ( It is the difference between actual
and predicted values).

Advantages of using Gradient Boosting methods:

o It supports different loss functions.


o It works well with interactions.

Disadvantages of using a Gradient Boosting methods:

o It requires cautious tuning of different hyper-parameters.

Difference between Bagging and Boosting:

Bagging Boosting

Various training data subsets are randomly drawn Each new subset contains the components that
with replacement from the whole training dataset. were misclassified by previous models.

Bagging attempts to tackle the over-fitting issue. Boosting tries to reduce bias.

If the classifier is unstable (high variance), then we If the classifier is steady and straightforward
need to apply bagging. (high bias), then we need to apply boosting.

Every model receives an equal weight. Models are weighted by their performance.

Objective to decrease variance, not bias. Objective to decrease bias, not variance.
It is the easiest way of connecting predictions that It is a way of connecting predictions that belong
belong to the same type. to the different types.

Every model is constructed independently. New models are affected by the performance of
the previously developed model.

Clustering in Machine Learning



What is Clustering ?
The task of grouping data points based on their similarity with each other is called
Clustering or Cluster Analysis.

This method is defined under the branch of Unsupervised Learning, which aims at
gaining insights from unlabelled data points, that is, unlike supervised learning we
don’t have a target variable.
Clustering aims at forming groups of homogeneous data points from a heterogeneous
dataset. It evaluates the similarity based on a metric like Euclidean distance, Cosine
similarity, Manhattan distance, etc. and then group the points with highest similarity
score together.
For Example, In the graph given below, we can clearly see that there are 3 circular
clusters forming on the basis of distance.
Now it is not necessary that the clusters formed must be circular in shape. The shape
of clusters can be arbitrary. There are many algortihms that work well with detecting
arbitrary shaped clusters.
For example, In the below given graph we can see that the clusters formed are not
circular in shape.

Types of Clustering
Broadly speaking, there are 2 types of clustering that can be performed to group
similar data points:
 Hard Clustering: In this type of clustering, each data point belongs to a cluster
completely or not. For example, Let’s say there are 4 data point and we have to
cluster them into 2 clusters. So each data point will either belong to cluster 1 or
cluster 2.
Data
Points Clusters

A C1

B C2

C C2

D C1

 Soft Clustering: In this type of clustering, instead of assigning each data point
into a separate cluster, a probability or likelihood of that point being that cluster is
evaluated. For example, Let’s say there are 4 data point and we have to cluster
them into 2 clusters. So we will be evaluating a probability of a data point
belonging to both clusters. This probability is calculated for all data points.
Data
Probability of C1 Probability of C2
Points

A 0.91 0.09

B 0.3 0.7

C 0.17 0.83

D 1 0

Uses of Clustering
Now before we begin with types of clustering algorithms, we will go through the use
cases of Clustering algorithms. Clustering algorithms are majorly used for:
 Market Segmentation – Businesses use clustering to group their customers and use
targeted advertisements to attract more audience.
 Market Basket Analysis – Shop owners analyze their sales and figure out which
items are majorly bought together by the customers. For example, In USA,
according to a study diapers and beers were usually bought together by fathers.
 Social Network Analysis – Social media sites use your data to understand your
browsing behaviour and provide you with targeted friend recommendations or
content recommendations.
 Medical Imaging – Doctors use Clustering to find out diseased areas in diagnostic
images like X-rays.
 Anomaly Detection – To find outliers in a stream of real-time dataset or
forecasting fraudulent transactions we can use clustering to identify them.

 Simplify working with large datasets – Each cluster is given a cluster ID after
clustering is complete. Now, you may reduce a feature set’s whole feature set into
its cluster ID. Clustering is effective when it can represent a complicated case with
a straightforward cluster ID. Using the same principle, clustering data can make
complex datasets simpler.
There are many more use cases for clustering but there are some of the major and
common use cases of clustering. \

Types of Clustering Algorithms


At the surface level, clustering helps in the analysis of unstructured data. Graphing,
the shortest distance, and the density of the data points are a few of the elements that
influence cluster formation. Clustering is the process of determining how related the
objects are based on a metric called the similarity measure. Similarity metrics are
easier to locate in smaller sets of features. It gets harder to create similarity measures
as the number of features increases. Depending on the type of clustering algorithm
being utilized in data mining, several techniques are employed to group the data from
the datasets. In this part, the clustering techniques are described. Various types of
clustering algorithms are:
1. Centroid-based Clustering (Partitioning methods)
2. Density-based Clustering (Model-based methods)
3. Connectivity-based Clustering (Hierarchical clustering)
4. Distribution-based Clustering
We will be going through each of these types in brief.

1. Partitioning Clustering
2. Density-Based Clustering
3. Distribution Model-Based Clustering
4. Hierarchical Clustering
5. Fuzzy Clustering

Partitioning Clustering
It is a type of clustering that divides the data into non-hierarchical groups. It is also
known as the centroid-based method. The most common example of partitioning
clustering is the K-Means Clustering algorithm.

In this type, the dataset is divided into a set of k groups, where K is used to define the
number of pre-defined groups. The cluster center is created in such a way that the
distance between the data points of one cluster is minimum as compared to another
cluster centroid.
Density-Based Clustering
The density-based clustering method connects the highly-dense areas into clusters, and
the arbitrarily shaped distributions are formed as long as the dense region can be
connected. This algorithm does it by identifying different clusters in the dataset and
connects the areas of high densities into clusters. The dense areas in data space are
divided from each other by sparser areas.

These algorithms can face difficulty in clustering the data points if the dataset has
varying densities and high dimensions.
Distribution Model-Based Clustering
In the distribution model-based clustering method, the data is divided based on the
probability of how a dataset belongs to a particular distribution. The grouping is done
by assuming some distributions commonly Gaussian Distribution.

The example of this type is the Expectation-Maximization Clustering algorithm that


uses Gaussian Mixture Models (GMM).
Hierarchical Clustering
Hierarchical clustering can be used as an alternative for the partitioned clustering as
there is no requirement of pre-specifying the number of clusters to be created. In this
technique, the dataset is divided into clusters to create a tree-like structure, which is also
called a dendrogram. The observations or any number of clusters can be selected by
cutting the tree at the correct level. The most common example of this method is
the Agglomerative Hierarchical algorithm.

Fuzzy Clustering
Fuzzy clustering is a type of soft method in which a data object may belong to more
than one group or cluster. Each dataset has a set of membership coefficients, which
depend on the degree of membership to be in a cluster. Fuzzy C-means algorithm is
the example of this type of clustering; it is sometimes also known as the Fuzzy k-means
algorithm.

Clustering Algorithms
The Clustering algorithms can be divided based on their models that are explained
above. There are different types of clustering algorithms published, but only a few are
commonly used. The clustering algorithm is based on the kind of data that we are using.
Such as, some algorithms need to guess the number of clusters in the given dataset,
whereas some are required to find the minimum distance between the observation of
the dataset.

Here we are discussing mainly popular Clustering algorithms that are widely used in
machine learning:

1. K-Means algorithm: The k-means algorithm is one of the most popular clustering
algorithms. It classifies the dataset by dividing the samples into different clusters of
equal variances. The number of clusters must be specified in this algorithm. It is fast with
fewer computations required, with the linear complexity of O(n).
2. Mean-shift algorithm: Mean-shift algorithm tries to find the dense areas in the smooth
density of data points. It is an example of a centroid-based model, that works on
updating the candidates for centroid to be the center of the points within a given region.
3. DBSCAN Algorithm: It stands for Density-Based Spatial Clustering of Applications
with Noise. It is an example of a density-based model similar to the mean-shift, but with
some remarkable advantages. In this algorithm, the areas of high density are separated
by the areas of low density. Because of this, the clusters can be found in any arbitrary
shape.
4. Expectation-Maximization Clustering using GMM: This algorithm can be used as an
alternative for the k-means algorithm or for those cases where K-means can be failed. In
GMM, it is assumed that the data points are Gaussian distributed.
5. Agglomerative Hierarchical algorithm: The Agglomerative hierarchical algorithm
performs the bottom-up hierarchical clustering. In this, each data point is treated as a
single cluster at the outset and then successively merged. The cluster hierarchy can be
represented as a tree-structure.
6. Affinity Propagation: It is different from other clustering algorithms as it does not
require to specify the number of clusters. In this, each data point sends a message
between the pair of data points until convergence. It has O(N 2T) time complexity, which
is the main drawback of this algorithm.

Applications of Clustering
Below are some commonly known applications of clustering technique in Machine
Learning:
o In Identification of Cancer Cells: The clustering algorithms are widely used for the
identification of cancerous cells. It divides the cancerous and non-cancerous data sets
into different groups.
o In Search Engines: Search engines also work on the clustering technique. The search
result appears based on the closest object to the search query. It does it by grouping
similar data objects in one group that is far from the other dissimilar objects. The
accurate result of a query depends on the quality of the clustering algorithm used.
o Customer Segmentation: It is used in market research to segment the customers based
on their choice and preferences.
o In Biology: It is used in the biology stream to classify different species of plants and
animals using the image recognition technique.
o In Land Use: The clustering technique is used in identifying the area of similar lands use
in the GIS database. This can be very useful to find that for what purpose the particular
land should be used, that means for which purpose it is more suitable.

.
Applications of Clustering in different fields:
1. Marketing: It can be used to characterize & discover customer segments for
marketing purposes.
2. Biology: It can be used for classification among different species of plants and
animals.
3. Libraries: It is used in clustering different books on the basis of topics and
information.
4. Insurance: It is used to acknowledge the customers, their policies and identifying
the frauds.
5. City Planning: It is used to make groups of houses and to study their values based
on their geographical locations and other factors present.
6. Earthquake studies: By learning the earthquake-affected areas we can determine
the dangerous zones.
7. Image Processing: Clustering can be used to group similar images together,
classify images based on content, and identify patterns in image data.
8. Genetics: Clustering is used to group genes that have similar expression patterns
and identify gene networks that work together in biological processes.
9. Finance: Clustering is used to identify market segments based on customer
behavior, identify patterns in stock market data, and analyze risk in investment
portfolios.
10.Customer Service: Clustering is used to group customer inquiries and complaints
into categories, identify common issues, and develop targeted solutions.
11.Manufacturing: Clustering is used to group similar products together, optimize
production processes, and identify defects in manufacturing processes.
12.Medical diagnosis: Clustering is used to group patients with similar symptoms or
diseases, which helps in making accurate diagnoses and identifying effective
treatments.
13.Fraud detection: Clustering is used to identify suspicious patterns or anomalies in
financial transactions, which can help in detecting fraud or other financial crimes.
14.Traffic analysis: Clustering is used to group similar patterns of traffic data, such
as peak hours, routes, and speeds, which can help in improving transportation
planning and infrastructure.
15.Social network analysis: Clustering is used to identify communities or groups
within social networks, which can help in understanding social behavior,
influence, and trends.
16.Cybersecurity: Clustering is used to group similar patterns of network traffic or
system behavior, which can help in detecting and preventing cyberattacks.
17.Climate analysis: Clustering is used to group similar patterns of climate data,
such as temperature, precipitation, and wind, which can help in understanding
climate change and its impact on the environment.
18.Sports analysis: Clustering is used to group similar patterns of player or team
performance data, which can help in analyzing player or team strengths and
weaknesses and making strategic decisions.
19.Crime analysis: Clustering is used to group similar patterns of crime data, such as
location, time, and type, which can help in identifying crime hotspots, predicting
future crime trends, and improving crime prevention strategies.

You might also like