Machine Learning Tutorial

You might also like

Download as docx, pdf, or txt
Download as docx, pdf, or txt
You are on page 1of 775

Machine Learning Tutorial

Machine Learning tutorial provides basic and advanced concepts of machine learning. Our
machine learning tutorial is designed for students and working professionals.

Machine learning is a growing technology which enables computers to learn automatically


from past data. Machine learning uses various algorithms for building mathematical
models and making predictions using historical data or information. Currently, it
is being used for various tasks such as image recognition, speech recognition, email
filtering, Facebook auto-tagging, recommender system, and many more.

This machine learning tutorial gives you an introduction to machine learning along with the
wide range of machine learning techniques such as Supervised, Unsupervised,
and Reinforcement learning. You will learn about regression and classification models,
clustering methods, hidden Markov models, and various sequential models.

What is Machine Learning


In the real world, we are surrounded by humans who can learn everything from their
experiences with their learning capability, and we have computers or machines which work
on our instructions. But can a machine also learn from experiences or past data like a human
does? So here comes the role of Machine Learning.

Backward Skip 10sPlay VideoForward Skip 10s


Machine Learning is said as a subset of artificial intelligence that is mainly concerned
with the development of algorithms which allow a computer to learn from the data and past
experiences on their own. The term machine learning was first introduced by Arthur
Samuel in 1959. We can define it in a summarized way as:

Machine learning enables a machine to automatically learn from data, improve


performance from experiences, and predict things without being explicitly
programmed.

With the help of sample historical data, which is known as training data, machine learning
algorithms build a mathematical model that helps in making predictions or decisions
without being explicitly programmed. Machine learning brings computer science and
statistics together for creating predictive models. Machine learning constructs or uses the
algorithms that learn from historical data. The more we will provide the information, the
higher will be the performance.

A machine has the ability to learn if it can improve its performance by gaining
more data.

How does Machine Learning work


A Machine Learning system learns from historical data, builds the prediction
models, and whenever it receives new data, predicts the output for it. The
accuracy of predicted output depends upon the amount of data, as the huge amount of data
helps to build a better model which predicts the output more accurately.
Suppose we have a complex problem, where we need to perform some predictions, so instead
of writing a code for it, we just need to feed the data to generic algorithms, and with the help
of these algorithms, machine builds the logic as per the data and predict the output. Machine
learning has changed our way of thinking about the problem. The below block diagram
explains the working of Machine Learning algorithm:

Features of Machine Learning:


o Machine learning uses data to detect various patterns in a given dataset.
o It can learn from past data and improve automatically.
o It is a data-driven technology.
o Machine learning is much similar to data mining as it also deals with the huge
amount of the data.

Need for Machine Learning


The need for machine learning is increasing day by day. The reason behind the need for
machine learning is that it is capable of doing tasks that are too complex for a person to
implement directly. As a human, we have some limitations as we cannot access the huge
amount of data manually, so for this, we need some computer systems and here comes the
machine learning to make things easy for us.

We can train machine learning algorithms by providing them the huge amount of data and let
them explore the data, construct the models, and predict the required output automatically.
The performance of the machine learning algorithm depends on the amount of data, and it can
be determined by the cost function. With the help of machine learning, we can save both time
and money.

The importance of machine learning can be easily understood by its uses cases, Currently,
machine learning is used in self-driving cars, cyber fraud detection, face
recognition, and friend suggestion by Facebook, etc. Various top companies such as
Netflix and Amazon have build machine learning models that are using a vast amount of data
to analyze the user interest and recommend product accordingly.

Following are some key points which show the importance of Machine
Learning:
o Rapid increment in the production of data
o Solving complex problems, which are difficult for a human
o Decision making in various sector including finance
o Finding hidden patterns and extracting useful information from data.

Classification of Machine Learning


At a broad level, machine learning can be classified into three types:

1. Supervised learning
2. Unsupervised learning
3. Reinforcement learning

1) Supervised Learning
Supervised learning is a type of machine learning method in which we provide sample
labeled data to the machine learning system in order to train it, and on that basis, it predicts
the output.

The system creates a model using labeled data to understand the datasets and learn about each
data, once the training and processing are done then we test the model by providing a sample
data to check whether it is predicting the exact output or not.
The goal of supervised learning is to map input data with the output data. The supervised
learning is based on supervision, and it is the same as when a student learns things in the
supervision of the teacher. The example of supervised learning is spam filtering.

Supervised learning can be grouped further in two categories of algorithms:

o Classification
o Regression

2) Unsupervised Learning
Unsupervised learning is a learning method in which a machine learns without any
supervision.

The training is provided to the machine with the set of data that has not been labeled,
classified, or categorized, and the algorithm needs to act on that data without any supervision.
The goal of unsupervised learning is to restructure the input data into new features or a group
of objects with similar patterns.

In unsupervised learning, we don't have a predetermined result. The machine tries to find
useful insights from the huge amount of data. It can be further classifieds into two categories
of algorithms:

o Clustering
o Association

3) Reinforcement Learning
Reinforcement learning is a feedback-based learning method, in which a learning agent gets a
reward for each right action and gets a penalty for each wrong action. The agent learns
automatically with these feedbacks and improves its performance. In reinforcement learning,
the agent interacts with the environment and explores it. The goal of an agent is to get the
most reward points, and hence, it improves its performance.

The robotic dog, which automatically learns the movement of his arms, is an example of
Reinforcement learning.

Note: We will learn about the above types of machine learning in detail in later chapters.

History of Machine Learning


Before some years (about 40-50 years), machine learning was science fiction, but today it is
the part of our daily life. Machine learning is making our day to day life easy from self-
driving cars to Amazon virtual assistant "Alexa". However, the idea behind machine
learning is so old and has a long history. Below some milestones are given which have
occurred in the history of machine learning:
The early history of Machine Learning (Pre-1940):

o 1834: In 1834, Charles Babbage, the father of the computer, conceived a


device that could be programmed with punch cards. However, the machine
was never built, but all modern computers rely on its logical structure.
o 1936: In 1936, Alan Turing gave a theory that how a machine can determine
and execute a set of instructions.

The era of stored program computers:

o 1940: In 1940, the first manually operated computer, "ENIAC" was invented,
which was the first electronic general-purpose computer. After that stored
program computer such as EDSAC in 1949 and EDVAC in 1951 were invented.
o 1943: In 1943, a human neural network was modeled with an electrical circuit.
In 1950, the scientists started applying their idea to work and analyzed how
human neurons might work.

Computer machinery and intelligence:


o 1950: In 1950, Alan Turing published a seminal paper, "Computer Machinery
and Intelligence," on the topic of artificial intelligence. In his paper, he
asked, "Can machines think?"

Machine intelligence in Games:

o 1952: Arthur Samuel, who was the pioneer of machine learning, created a
program that helped an IBM computer to play a checkers game. It performed
better more it played.
o 1959: In 1959, the term "Machine Learning" was first coined by Arthur
Samuel.

The first "AI" winter:

o The duration of 1974 to 1980 was the tough time for AI and ML researchers,
and this duration was called as AI winter.
o In this duration, failure of machine translation occurred, and people had
reduced their interest from AI, which led to reduced funding by the
government to the researches.

Machine Learning from theory to reality

o 1959: In 1959, the first neural network was applied to a real-world problem to
remove echoes over phone lines using an adaptive filter.
o 1985: In 1985, Terry Sejnowski and Charles Rosenberg invented a neural
network NETtalk, which was able to teach itself how to correctly pronounce
20,000 words in one week.
o 1997: The IBM's Deep blue intelligent computer won the chess game against
the chess expert Garry Kasparov, and it became the first computer which had
beaten a human chess expert.

Machine Learning at 21st century

o 2006: In the year 2006, computer scientist Geoffrey Hinton has given a new
name to neural net research as "deep learning," and nowadays, it has become
one of the most trending technologies.
o 2012: In 2012, Google created a deep neural network which learned to
recognize the image of humans and cats in YouTube videos.
o 2014: In 2014, the Chabot "Eugen Goostman" cleared the Turing Test. It was
the first Chabot who convinced the 33% of human judges that it was not a
machine.
o 2014: DeepFace was a deep neural network created by Facebook, and they
claimed that it could recognize a person with the same precision as a human
can do.
o 2016: AlphaGo beat the world's number second player Lee sedol at Go
game. In 2017 it beat the number one player of this game Ke Jie.
o 2017: In 2017, the Alphabet's Jigsaw team built an intelligent system that was
able to learn the online trolling. It used to read millions of comments of
different websites to learn to stop online trolling.

Machine Learning at present:


Now machine learning has got a great advancement in its research, and it is present
everywhere around us, such as self-driving cars, Amazon
Alexa, Catboats, recommender system, and many more. It
includes Supervised, unsupervised, and reinforcement learning with
clustering, classification, decision tree, SVM algorithms, etc.

Modern machine learning models can be used for making various predictions,
including weather prediction, disease prediction, stock market analysis, etc.

Prerequisites
Before learning machine learning, you must have the basic knowledge of followings so that
you can easily understand the concepts of machine learning:

o Fundamental knowledge of probability and linear algebra.


o The ability to code in any computer language, especially in Python language.
o Knowledge of Calculus, especially derivatives of single variable and
multivariate functions.

Audience
Our Machine learning tutorial is designed to help beginner and professionals.
Problems
We assure you that you will not find any difficulty while learning our Machine
learning tutorial. But if there is any mistake in this tutorial, kindly post the problem or
error in the contact form so that we can improve it.

Applications of Machine learning


Machine learning is a buzzword for today's technology, and it is growing very rapidly
day by day. We are using machine learning in our daily life even without knowing it
such as Google Maps, Google assistant, Alexa, etc. Below are some most trending
real-world applications of Machine Learning:

1. Image Recognition:
Image recognition is one of the most common applications of machine learning. It is
used to identify objects, persons, places, digital images, etc. The popular use case of
image recognition and face detection is, Automatic friend tagging suggestion:

Facebook provides us a feature of auto friend tagging suggestion. Whenever we


upload a photo with our Facebook friends, then we automatically get a tagging
suggestion with name, and the technology behind this is machine learning's face
detection and recognition algorithm.

It is based on the Facebook project named "Deep Face," which is responsible for face
recognition and person identification in the picture.

Backward Skip 10sPlay VideoForward Skip 10s

2. Speech Recognition
While using Google, we get an option of "Search by voice," it comes under speech
recognition, and it's a popular application of machine learning.

Speech recognition is a process of converting voice instructions into text, and it is


also known as "Speech to text", or "Computer speech recognition." At present,
machine learning algorithms are widely used by various applications of speech
recognition. Google assistant, Siri, Cortana, and Alexa are using speech recognition
technology to follow the voice instructions.

3. Traffic prediction:
If we want to visit a new place, we take help of Google Maps, which shows us the
correct path with the shortest route and predicts the traffic conditions.

It predicts the traffic conditions such as whether traffic is cleared, slow-moving, or


heavily congested with the help of two ways:

o Real Time location of the vehicle form Google Map app and sensors
o Average time has taken on past days at the same time.

Everyone who is using Google Map is helping this app to make it better. It takes
information from the user and sends back to its database to improve the
performance.

4. Product recommendations:
Machine learning is widely used by various e-commerce and entertainment
companies such as Amazon, Netflix, etc., for product recommendation to the user.
Whenever we search for some product on Amazon, then we started getting an
advertisement for the same product while internet surfing on the same browser and
this is because of machine learning.

Google understands the user interest using various machine learning algorithms and
suggests the product as per customer interest.

As similar, when we use Netflix, we find some recommendations for entertainment


series, movies, etc., and this is also done with the help of machine learning.

5. Self-driving cars:
One of the most exciting applications of machine learning is self-driving cars.
Machine learning plays a significant role in self-driving cars. Tesla, the most popular
car manufacturing company is working on self-driving car. It is using unsupervised
learning method to train the car models to detect people and objects while driving.

6. Email Spam and Malware Filtering:


Whenever we receive a new email, it is filtered automatically as important, normal,
and spam. We always receive an important mail in our inbox with the important
symbol and spam emails in our spam box, and the technology behind this is Machine
learning. Below are some spam filters used by Gmail:

o Content Filter
o Header filter
o General blacklists filter
o Rules-based filters
o Permission filters

Some machine learning algorithms such as Multi-Layer Perceptron, Decision tree,


and Naïve Bayes classifier are used for email spam filtering and malware detection.

7. Virtual Personal Assistant:


We have various virtual personal assistants such as Google
assistant, Alexa, Cortana, Siri. As the name suggests, they help us in finding the
information using our voice instruction. These assistants can help us in various ways
just by our voice instructions such as Play music, call someone, Open an email,
Scheduling an appointment, etc.

These virtual assistants use machine learning algorithms as an important part.


These assistant record our voice instructions, send it over the server on a cloud, and
decode it using ML algorithms and act accordingly.

8. Online Fraud Detection:


Machine learning is making our online transaction safe and secure by detecting fraud
transaction. Whenever we perform some online transaction, there may be various
ways that a fraudulent transaction can take place such as fake accounts, fake ids,
and steal money in the middle of a transaction. So to detect this, Feed Forward
Neural network helps us by checking whether it is a genuine transaction or a fraud
transaction.

For each genuine transaction, the output is converted into some hash values, and
these values become the input for the next round. For each genuine transaction,
there is a specific pattern which gets change for the fraud transaction hence, it
detects it and makes our online transactions more secure.

9. Stock Market trading:


Machine learning is widely used in stock market trading. In the stock market, there is
always a risk of up and downs in shares, so for this machine learning's long short
term memory neural network is used for the prediction of stock market trends.

10. Medical Diagnosis:


In medical science, machine learning is used for diseases diagnoses. With this,
medical technology is growing very fast and able to build 3D models that can predict
the exact position of lesions in the brain.

It helps in finding brain tumors and other brain-related diseases easily.

11. Automatic Language Translation:


Nowadays, if we visit a new place and we are not aware of the language then it is not
a problem at all, as for this also machine learning helps us by converting the text into
our known languages. Google's GNMT (Google Neural Machine Translation) provide
this feature, which is a Neural Machine Learning that translates the text into our
familiar language, and it called as automatic translation.

The technology behind the automatic translation is a sequence to sequence learning


algorithm, which is used with image recognition and translates the text from one
language to another language.
Machine learning Life cycle
Machine learning has given the computer systems the abilities to automatically learn
without being explicitly programmed. But how does a machine learning system
work? So, it can be described using the life cycle of machine learning. Machine
learning life cycle is a cyclic process to build an efficient machine learning project.
The main purpose of the life cycle is to find a solution to the problem or project.

Machine learning life cycle involves seven major steps, which are given below:

o Gathering Data
o Data preparation
o Data Wrangling
o Analyse Data
o Train the model
o Test the model
o Deployment

The most important thing in the complete process is to understand the problem and
to know the purpose of the problem. Therefore, before starting the life cycle, we
need to understand the problem because the good result depends on the better
understanding of the problem.

In the complete life cycle process, to solve a problem, we create a machine learning
system called "model", and this model is created by providing "training". But to train
a model, we need data, hence, life cycle starts by collecting data.

Backward Skip 10sPlay VideoForward Skip 10s

1. Gathering Data:
Data Gathering is the first step of the machine learning life cycle. The goal of this
step is to identify and obtain all data-related problems.

In this step, we need to identify the different data sources, as data can be collected
from various sources such as files, database, internet, or mobile devices. It is one
of the most important steps of the life cycle. The quantity and quality of the collected
data will determine the efficiency of the output. The more will be the data, the more
accurate will be the prediction.

This step includes the below tasks:

o Identify various data sources


o Collect data
o Integrate the data obtained from different sources

By performing the above task, we get a coherent set of data, also called as a dataset.
It will be used in further steps.

2. Data preparation
After collecting the data, we need to prepare it for further steps. Data preparation is
a step where we put our data into a suitable place and prepare it to use in our
machine learning training.

In this step, first, we put all data together, and then randomize the ordering of data.

This step can be further divided into two processes:


o Data exploration:
It is used to understand the nature of data that we have to work with. We
need to understand the characteristics, format, and quality of data.
A better understanding of data leads to an effective outcome. In this, we find
Correlations, general trends, and outliers.
o Data pre-processing:
Now the next step is preprocessing of data for its analysis.

3. Data Wrangling
Data wrangling is the process of cleaning and converting raw data into a useable
format. It is the process of cleaning the data, selecting the variable to use, and
transforming the data in a proper format to make it more suitable for analysis in the
next step. It is one of the most important steps of the complete process. Cleaning of
data is required to address the quality issues.

It is not necessary that data we have collected is always of our use as some of the
data may not be useful. In real-world applications, collected data may have various
issues, including:

o Missing Values
o Duplicate data
o Invalid data
o Noise

So, we use various filtering techniques to clean the data.

It is mandatory to detect and remove the above issues because it can negatively
affect the quality of the outcome.

4. Data Analysis
Now the cleaned and prepared data is passed on to the analysis step. This step
involves:

o Selection of analytical techniques


o Building models
o Review the result

The aim of this step is to build a machine learning model to analyze the data using
various analytical techniques and review the outcome. It starts with the
determination of the type of the problems, where we select the machine learning
techniques such as Classification, Regression, Cluster analysis, Association, etc.
then build the model using prepared data, and evaluate the model.

Hence, in this step, we take the data and use machine learning algorithms to build
the model.

5. Train Model
Now the next step is to train the model, in this step we train our model to improve its
performance for better outcome of the problem.

We use datasets to train the model using various machine learning algorithms.
Training a model is required so that it can understand the various patterns, rules,
and, features.

6. Test Model
Once our machine learning model has been trained on a given dataset, then we test
the model. In this step, we check for the accuracy of our model by providing a test
dataset to it.

Testing the model determines the percentage accuracy of the model as per the
requirement of project or problem.

7. Deployment
The last step of machine learning life cycle is deployment, where we deploy the
model in the real-world system.

If the above-prepared model is producing an accurate result as per our requirement


with acceptable speed, then we deploy the model in the real system. But before
deploying the project, we will check whether it is improving its performance using
available data or not. The deployment phase is similar to making the final report for a
project.

Installing Anaconda and Python


To learn machine learning, we will use the Python programming language in this
tutorial. So, in order to use Python for machine learning, we need to install it in our
computer system with compatible IDEs (Integrated Development Environment).

In this topic, we will learn to install Python and an IDE with the help of Anaconda
distribution.

Anaconda distribution is a free and open-source platform for Python/R programming


languages. It can be easily installed on any OS such as Windows, Linux, and MAC OS.
It provides more than 1500 Python/R data science packages which are suitable for
developing machine learning and deep learning models.

Anaconda distribution provides installation of Python with various IDE's such


as Jupyter Notebook, Spyder, Anaconda prompt, etc. Hence it is a very convenient
packaged solution which you can easily download and install in your computer. It will
automatically install Python and some basic IDEs and libraries with it.

Backward Skip 10sPlay VideoForward Skip 10s

Below some steps are given to show the downloading and installing process of
Anaconda and IDE:

Step-1: Download Anaconda Python:

o To download Anaconda in your system, firstly, open your favorite browser and
type Download Anaconda Python, and then click on the first link as given in
the below image. Alternatively, you can directly download it by clicking on this
link, https://www.anaconda.com/distribution/#download-section.
o After clicking on the first link, you will reach to download page of Anaconda,
as shown in the below image:

o Since, Anaconda is available for Windows, Linux, and Mac OS, hence, you can
download it as per your OS type by clicking on available options shown in
below image. It will provide you Python 2.7 and Python 3.7 versions, but the
latest version is 3.7, hence we will download Python 3.7 version. After clicking
on the download option, it will start downloading on your computer.
Note: In this topic, we are downloading Anaconda for Windows you can choose it as per
your OS.

Step- 2: Install Anaconda Python (Python 3.7


version):
Once the downloading process gets completed, go to downloads → double click on
the ".exe" file (Anaconda3-2019.03-Windows-x86_64.exe) of Anaconda. It will
open a setup window for Anaconda installations as given in below image, then click
on Next.
o It will open a License agreement window click on "I Agree" option and move
further.

o In the next window, you will get two options for installations as given in the
below image. Select the first option (Just me) and click on Next.
o Now you will get a window for installing location, here, you can leave it as
default or change it by browsing a location, and then click on Next. Consider
the below image:

o Now select the second option, and click on install.


o Once the installation gets complete, click on Next.

o Now installation is completed, tick the checkbox if you want to learn more
about Anaconda and Anaconda cloud. Click on Finish to end the process.
Note: Here, we will use the Spyder IDE to run Python programs.

Step- 3: Open Anaconda Navigator

o After successful installation of Anaconda, use Anaconda navigator to launch a


Python IDE such as Spyder and Jupyter Notebook.
o To open Anaconda Navigator, click on window Key and search for
Anaconda navigator, and click on it. Consider the below image:
o After opening the navigator, launch the Spyder IDE by clicking on
the Launch button given below the Spyder. It will install the Spyder IDE in
your system.

Run your Python program in Spyder IDE.

o Open Spyder IDE, it will look like the below image:


o Write your first program, and save it using the .py extension.
o Run the program using the triangle Run button.
o You can check the program's output on console pane at the bottom right side.

Step- 4: Close the Spyder IDE.

Difference between Artificial


intelligence and Machine learning
Artificial intelligence and machine learning are the part of computer science that are
correlated with each other. These two technologies are the most trending
technologies which are used for creating intelligent systems.

Although these are two related technologies and sometimes people use them as a
synonym for each other, but still both are the two different terms in various cases.

On a broad level, we can differentiate both AI and ML as:

AI is a bigger concept to create intelligent machines that can simulate human thinking
capability and behavior, whereas, machine learning is an application or subset of AI that
allows machines to learn from data without being programmed explicitly.
Below are some main differences between AI and machine learning along with the
overview of Artificial intelligence and machine learning.

PlayNext
Unmute

Current Time 0:00

Duration 18:10
Loaded: 0.37%
Â
Fullscreen
Backward Skip 10sPlay VideoForward Skip 10s

Artificial Intelligence
Artificial intelligence is a field of computer science which makes a computer system
that can mimic human intelligence. It is comprised of two words "Artificial" and
"intelligence", which means "a human-made thinking power." Hence we can define
it as,

Artificial intelligence is a technology using which we can create intelligent systems that
can simulate human intelligence.
The Artificial intelligence system does not require to be pre-programmed, instead of
that, they use such algorithms which can work with their own intelligence. It involves
machine learning algorithms such as Reinforcement learning algorithm and deep
learning neural networks. AI is being used in multiple places such as Siri, Google?s
AlphaGo, AI in Chess playing, etc.

Based on capabilities, AI can be classified into three types:

o Weak AI
o General AI
o Strong AI

Currently, we are working with weak AI and general AI. The future of AI is Strong AI
for which it is said that it will be intelligent than humans.

Machine learning
Machine learning is about extracting knowledge from the data. It can be defined as,

Machine learning is a subfield of artificial intelligence, which enables machines to learn


from past data or experiences without being explicitly programmed.

Machine learning enables a computer system to make predictions or take some


decisions using historical data without being explicitly programmed. Machine
learning uses a massive amount of structured and semi-structured data so that a
machine learning model can generate accurate result or give predictions based on
that data.

Machine learning works on algorithm which learn by it?s own using historical data. It
works only for specific domains such as if we are creating a machine learning model
to detect pictures of dogs, it will only give result for dog images, but if we provide a
new data like cat image then it will become unresponsive. Machine learning is being
used in various places such as for online recommender system, for Google search
algorithms, Email spam filter, Facebook Auto friend tagging suggestion, etc.

It can be divided into three types:

o Supervised learning
o Reinforcement learning
o Unsupervised learning
Key differences between Artificial
Intelligence (AI) and Machine learning
(ML):

Artificial Intelligence Machine learning

Artificial intelligence is a technology Machine learning is a subset of AI which allows a


which enables a machine to simulate machine to automatically learn from past data
human behavior. without programming explicitly.

The goal of AI is to make a smart The goal of ML is to allow machines to learn


computer system like humans to from data so that they can give accurate output.
solve complex problems.

In AI, we make intelligent systems to In ML, we teach machines with data to perform a
perform any task like a human. particular task and give an accurate result.

Machine learning and deep learning Deep learning is a main subset of machine
are the two main subsets of AI. learning.

AI has a very wide range of scope. Machine learning has a limited scope.

AI is working to create an intelligent Machine learning is working to create machines


system which can perform various that can perform only those specific tasks for
complex tasks. which they are trained.

AI system is concerned about Machine learning is mainly concerned about


maximizing the chances of success. accuracy and patterns.

The main applications of AI are Siri, The main applications of machine learning
customer support using catboats, are Online recommender system, Google
Expert System, Online game playing, search algorithms, Facebook auto friend
intelligent humanoid robot, etc. tagging suggestions, etc.

On the basis of capabilities, AI can Machine learning can also be divided into mainly
be divided into three types, which three types that are Supervised
are, Weak AI, General AI, learning, Unsupervised learning,
and Strong AI. and Reinforcement learning.

It includes learning, reasoning, and It includes learning and self-correction when


self-correction. introduced with new data.
AI completely deals with Structured, Machine learning deals with Structured and
semi-structured, and unstructured semi-structured data.
data.

How to get datasets for Machine


Learning
The key to success in the field of machine learning or to become a great data
scientist is to practice with different types of datasets. But discovering a suitable
dataset for each kind of machine learning project is a difficult task. So, in this topic,
we will provide the detail of the sources from where you can easily get the dataset
according to your project.

Before knowing the sources of the machine learning dataset, let's discuss datasets.

What is a dataset?
A dataset is a collection of data in which data is arranged in some order. A dataset
can contain any data from a series of an array to a database table. Below table shows
an example of the dataset:

Country Age Salary Purchased

India 38 48000 No

France 43 45000 Yes

Germany 30 54000 No

France 48 65000 No

Germany 40 Yes

India 35 58000 Yes

A tabular dataset can be understood as a database table or matrix, where each


column corresponds to a particular variable, and each row corresponds to
the fields of the dataset. The most supported file type for a tabular dataset
is "Comma Separated File," or CSV. But to store a "tree-like data," we can use the
JSON file more efficiently.
PlayNext
Unmute

Current Time 0:00

Duration 18:10
Loaded: 0.37%
Â
Fullscreen
Backward Skip 10sPlay VideoForward Skip 10s

Types of data in datasets

o Numerical data:Such as house price, temperature, etc.


o Categorical data:Such as Yes/No, True/False, Blue/green, etc.
o Ordinal data:These data are similar to categorical data but can be measured
on the basis of comparison.

Note: A real-world dataset is of huge size, which is difficult to


manage and process at the initial level. Therefore, to practice
machine learning algorithms, we can use any dummy dataset.

Need of Dataset
To work with machine learning projects, we need a huge amount of data, because,
without the data, one cannot train ML/AI models. Collecting and preparing the
dataset is one of the most crucial parts while creating an ML/AI project.

The technology applied behind any ML projects cannot work properly if the dataset
is not well prepared and pre-processed.

During the development of the ML project, the developers completely rely on the
datasets. In building ML applications, datasets are divided into two parts:

o Training dataset:
o Test Dataset
Note: The datasets are of large size, so to download these
datasets, you must have fast internet on your computer.

Popular sources for Machine Learning


datasets
Below is the list of datasets which are freely available for the public to work on it:

1. Kaggle Datasets
Kaggle is one of the best sources for providing datasets for Data Scientists and
Machine Learners. It allows users to find, download, and publish datasets in an easy
way. It also provides the opportunity to work with other machine learning engineers
and solve difficult Data Science related tasks.

Kaggle provides a high-quality dataset in different formats that we can easily find
and download.

The link for the Kaggle dataset is https://www.kaggle.com/datasets.

2. UCI Machine Learning Repository


UCI Machine learning repository is one of the great sources of machine learning
datasets. This repository contains databases, domain theories, and data generators
that are widely used by the machine learning community for the analysis of ML
algorithms.

Since the year 1987, it has been widely used by students, professors, researchers as a
primary source of machine learning dataset.

It classifies the datasets as per the problems and tasks of machine learning such
as Regression, Classification, Clustering, etc. It also contains some of the popular
datasets such as the Iris dataset, Car Evaluation dataset, Poker Hand dataset, etc.

The link for the UCI machine learning repository


is https://archive.ics.uci.edu/ml/index.php.

3. Datasets via AWS


We can search, download, access, and share the datasets that are publicly available
via AWS resources. These datasets can be accessed through AWS resources but
provided and maintained by different government organizations, researches,
businesses, or individuals.

Anyone can analyze and build various services using shared data via AWS resources.
The shared dataset on cloud helps users to spend more time on data analysis rather
than on acquisitions of data.

This source provides the various types of datasets with examples and ways to use the
dataset. It also provides the search box using which we can search for the required
dataset. Anyone can add any dataset or example to the Registry of Open Data on
AWS.

The link for the resource is https://registry.opendata.aws/.

4. Google's Dataset Search Engine


Google dataset search engine is a search engine launched
by Google on September 5, 2018. This source helps researchers to get online
datasets that are freely available for use.

The link for the Google dataset search engine


is https://toolbox.google.com/datasetsearch.

5. Microsoft Datasets
The Microsoft has launched the "Microsoft Research Open data" repository with
the collection of free datasets in various areas such as natural language processing,
computer vision, and domain-specific sciences.

Using this resource, we can download the datasets to use on the current device, or
we can also directly use it on the cloud infrastructure.

The link to download or use the dataset from this resource


is https://msropendata.com/.

6. Awesome Public Dataset Collection

Awesome public dataset collection provides high-quality datasets that are arranged
in a well-organized manner within a list according to topics such as Agriculture,
Biology, Climate, Complex networks, etc. Most of the datasets are available free, but
some may not, so it is better to check the license before downloading the dataset.

The link to download the dataset from Awesome public dataset collection
is https://github.com/awesomedata/awesome-public-datasets.

7. Government Datasets
There are different sources to get government-related data. Various countries
publish government data for public use collected by them from different
departments.
The goal of providing these datasets is to increase transparency of government work
among the people and to use the data in an innovative approach. Below are some
links of government datasets:

o Indian Government dataset


o US Government Dataset
o Northern Ireland Public Sector Datasets
o European Union Open Data Portal

8. Computer Vision Datasets

Visual data provides multiple numbers of the great dataset that are specific to
computer visions such as Image Classification, Video classification, Image
Segmentation, etc. Therefore, if you want to build a project on deep learning or
image processing, then you can refer to this source.

The link for downloading the dataset from this source is https://www.visualdata.io/.

9. Scikit-learn dataset
Scikit-learn is a great source for machine learning enthusiasts. This source provides
both toy and real-world datasets. These datasets can be obtained from
sklearn.datasets package and using general dataset API.

The toy dataset available on scikit-learn can be loaded using some predefined
functions such as, load_boston([return_X_y]), load_iris([return_X_y]), etc, rather
than importing any file from external sources. But these datasets are not suitable for
real-world projects.

The link to download datasets from this source


is https://scikit-learn.org/stable/datasets/index.html.

Data Preprocessing in Machine


learning
Data preprocessing is a process of preparing the raw data and making it suitable for
a machine learning model. It is the first and crucial step while creating a machine
learning model.

When creating a machine learning project, it is not always a case that we come
across the clean and formatted data. And while doing any operation with data, it is
mandatory to clean it and put in a formatted way. So for this, we use data
preprocessing task.
Why do we need Data Preprocessing?
A real-world data generally contains noises, missing values, and maybe in an
unusable format which cannot be directly used for machine learning models. Data
preprocessing is required tasks for cleaning the data and making it suitable for a
machine learning model which also increases the accuracy and efficiency of a
machine learning model.

It involves below steps:

PlayNext
Unmute

Current Time 0:00

Duration 18:10
Loaded: 0.37%
Â
Fullscreen
Backward Skip 10sPlay VideoForward Skip 10s

o Getting the dataset


o Importing libraries
o Importing datasets
o Finding Missing Data
o Encoding Categorical Data
o Splitting dataset into training and test set
o Feature scaling

1) Get the Dataset


To create a machine learning model, the first thing we required is a dataset as a
machine learning model completely works on data. The collected data for a particular
problem in a proper format is known as the dataset.

Dataset may be of different formats for different purposes, such as, if we want to
create a machine learning model for business purpose, then dataset will be different
with the dataset required for a liver patient. So each dataset is different from another
dataset. To use the dataset in our code, we usually put it into a CSV file. However,
sometimes, we may also need to use an HTML or xlsx file.

What is a CSV File?


CSV stands for "Comma-Separated Values" files; it is a file format which allows us to
save the tabular data, such as spreadsheets. It is useful for huge datasets and can use
these datasets in programs.

Here we will use a demo dataset for data preprocessing, and for practice, it can be
downloaded from here, "https://www.superdatascience.com/pages/machine-
learning. For real-world problems, we can download datasets online from various
sources such as https://www.kaggle.com/uciml/datasets, https://archive.ics.uci.edu/
ml/index.php etc.

We can also create our dataset by gathering data using various API with Python and
put that data into a .csv file.

2) Importing Libraries
In order to perform data preprocessing using Python, we need to import some
predefined Python libraries. These libraries are used to perform some specific jobs.
There are three specific libraries that we will use for data preprocessing, which are:

Numpy: Numpy Python library is used for including any type of mathematical
operation in the code. It is the fundamental package for scientific calculation in
Python. It also supports to add large, multidimensional arrays and matrices. So, in
Python, we can import it as:

1. import numpy as nm

Here we have used nm, which is a short name for Numpy, and it will be used in the
whole program.

Matplotlib: The second library is matplotlib, which is a Python 2D plotting library,


and with this library, we need to import a sub-library pyplot. This library is used to
plot any type of charts in Python for the code. It will be imported as below:

1. import matplotlib.pyplot as mpt

Here we have used mpt as a short name for this library.


Pandas: The last library is the Pandas library, which is one of the most famous
Python libraries and used for importing and managing the datasets. It is an open-
source data manipulation and analysis library. It will be imported as below:

Here, we have used pd as a short name for this library. Consider the below image:

3) Importing the Datasets


Now we need to import the datasets which we have collected for our machine
learning project. But before importing a dataset, we need to set the current directory
as a working directory. To set a working directory in Spyder IDE, we need to follow
the below steps:

1. Save your Python file in the directory which contains dataset.


2. Go to File explorer option in Spyder IDE, and select the required directory.
3. Click on F5 button or run option to execute the file.

Note: We can set any directory as a working directory, but it must contain the required
dataset.

Here, in the below image, we can see the Python file along with required dataset.
Now, the current folder is set as a working directory.
read_csv() function:

Now to import the dataset, we will use read_csv() function of pandas library, which is
used to read a csv file and performs various operations on it. Using this function, we
can read a csv file locally as well as through an URL.

We can use read_csv function as below:

1. data_set= pd.read_csv('Dataset.csv')

Here, data_set is a name of the variable to store our dataset, and inside the function,
we have passed the name of our dataset. Once we execute the above line of code, it
will successfully import the dataset in our code. We can also check the imported
dataset by clicking on the section variable explorer, and then double click
on data_set. Consider the below image:
As in the above image, indexing is started from 0, which is the default indexing in
Python. We can also change the format of our dataset by clicking on the format
option.

Extracting dependent and independent variables:

In machine learning, it is important to distinguish the matrix of features (independent


variables) and dependent variables from dataset. In our dataset, there are three
independent variables that are Country, Age, and Salary, and one is a dependent
variable which is Purchased.

Extracting independent variable:

To extract an independent variable, we will use iloc[ ] method of Pandas library. It is


used to extract the required rows and columns from the dataset.

1. x= data_set.iloc[:,:-1].values

In the above code, the first colon(:) is used to take all the rows, and the second
colon(:) is for all the columns. Here we have used :-1, because we don't want to take
the last column as it contains the dependent variable. So by doing this, we will get
the matrix of features.

By executing the above code, we will get output as:


1. [['India' 38.0 68000.0]
2. ['France' 43.0 45000.0]
3. ['Germany' 30.0 54000.0]
4. ['France' 48.0 65000.0]
5. ['Germany' 40.0 nan]
6. ['India' 35.0 58000.0]
7. ['Germany' nan 53000.0]
8. ['France' 49.0 79000.0]
9. ['India' 50.0 88000.0]
10. ['France' 37.0 77000.0]]

As we can see in the above output, there are only three variables.

Extracting dependent variable:

To extract dependent variables, again, we will use Pandas .iloc[] method.

1. y= data_set.iloc[:,3].values

Here we have taken all the rows with the last column only. It will give the array of
dependent variables.

By executing the above code, we will get output as:

Output:

array(['No', 'Yes', 'No', 'No', 'Yes', 'Yes', 'No', 'Yes', 'No', 'Yes'],
dtype=object)

Note: If you are using Python language for machine learning, then extraction is
mandatory, but for R language it is not required.

4) Handling Missing data:


The next step of data preprocessing is to handle missing data in the datasets. If our
dataset contains some missing data, then it may create a huge problem for our
machine learning model. Hence it is necessary to handle missing values present in
the dataset.

Ways to handle missing data:

There are mainly two ways to handle missing data, which are:
By deleting the particular row: The first way is used to commonly deal with null
values. In this way, we just delete the specific row or column which consists of null
values. But this way is not so efficient and removing data may lead to loss of
information which will not give the accurate output.

By calculating the mean: In this way, we will calculate the mean of that column or
row which contains any missing value and will put it on the place of missing value.
This strategy is useful for the features which have numeric data such as age, salary,
year, etc. Here, we will use this approach.

To handle missing values, we will use Scikit-learn library in our code, which contains
various libraries for building machine learning models. Here we will
use Imputer class of sklearn.preprocessing library. Below is the code for it:

1. #handling missing data (Replacing missing data with the mean value)
2. from sklearn.preprocessing import Imputer
3. imputer= Imputer(missing_values ='NaN', strategy='mean', axis = 0)
4. #Fitting imputer object to the independent variables x.
5. imputerimputer= imputer.fit(x[:, 1:3])
6. #Replacing missing data with the calculated mean value
7. x[:, 1:3]= imputer.transform(x[:, 1:3])

Output:

array([['India', 38.0, 68000.0],


['France', 43.0, 45000.0],
['Germany', 30.0, 54000.0],
['France', 48.0, 65000.0],
['Germany', 40.0, 65222.22222222222],
['India', 35.0, 58000.0],
['Germany', 41.111111111111114, 53000.0],
['France', 49.0, 79000.0],
['India', 50.0, 88000.0],
['France', 37.0, 77000.0]], dtype=object

As we can see in the above output, the missing values have been replaced with the
means of rest column values.

5) Encoding Categorical data:


Categorical data is data which has some categories such as, in our dataset; there are
two categorical variable, Country, and Purchased.

Since machine learning model completely works on mathematics and numbers, but if
our dataset would have a categorical variable, then it may create trouble while
building the model. So it is necessary to encode these categorical variables into
numbers.

For Country variable:

Firstly, we will convert the country variables into categorical data. So to do this, we
will use LabelEncoder() class from preprocessing library.

1. #Catgorical data
2. #for Country Variable
3. from sklearn.preprocessing import LabelEncoder
4. label_encoder_x= LabelEncoder()
5. x[:, 0]= label_encoder_x.fit_transform(x[:, 0])

Output:

Out[15]:
array([[2, 38.0, 68000.0],
[0, 43.0, 45000.0],
[1, 30.0, 54000.0],
[0, 48.0, 65000.0],
[1, 40.0, 65222.22222222222],
[2, 35.0, 58000.0],
[1, 41.111111111111114, 53000.0],
[0, 49.0, 79000.0],
[2, 50.0, 88000.0],
[0, 37.0, 77000.0]], dtype=object)

Explanation:

In above code, we have imported LabelEncoder class of sklearn library. This class
has successfully encoded the variables into digits.

But in our case, there are three country variables, and as we can see in the above
output, these variables are encoded into 0, 1, and 2. By these values, the machine
learning model may assume that there is some correlation between these variables
which will produce the wrong output. So to remove this issue, we will use dummy
encoding.

Dummy Variables:

Dummy variables are those variables which have values 0 or 1. The 1 value gives the
presence of that variable in a particular column, and rest variables become 0. With
dummy encoding, we will have a number of columns equal to the number of
categories.
In our dataset, we have 3 categories so it will produce three columns having 0 and 1
values. For Dummy Encoding, we will use OneHotEncoder class
of preprocessing library.

1. #for Country Variable


2. from sklearn.preprocessing import LabelEncoder, OneHotEncoder
3. label_encoder_x= LabelEncoder()
4. x[:, 0]= label_encoder_x.fit_transform(x[:, 0])
5. #Encoding for dummy variables
6. onehot_encoder= OneHotEncoder(categorical_features= [0])
7. x= onehot_encoder.fit_transform(x).toarray()

Output:

array([[0.00000000e+00, 0.00000000e+00, 1.00000000e+00, 3.80000000e+01,


6.80000000e+04],
[1.00000000e+00, 0.00000000e+00, 0.00000000e+00, 4.30000000e+01,
4.50000000e+04],
[0.00000000e+00, 1.00000000e+00, 0.00000000e+00, 3.00000000e+01,
5.40000000e+04],
[1.00000000e+00, 0.00000000e+00, 0.00000000e+00, 4.80000000e+01,
6.50000000e+04],
[0.00000000e+00, 1.00000000e+00, 0.00000000e+00, 4.00000000e+01,
6.52222222e+04],
[0.00000000e+00, 0.00000000e+00, 1.00000000e+00, 3.50000000e+01,
5.80000000e+04],
[0.00000000e+00, 1.00000000e+00, 0.00000000e+00, 4.11111111e+01,
5.30000000e+04],
[1.00000000e+00, 0.00000000e+00, 0.00000000e+00, 4.90000000e+01,
7.90000000e+04],
[0.00000000e+00, 0.00000000e+00, 1.00000000e+00, 5.00000000e+01,
8.80000000e+04],
[1.00000000e+00, 0.00000000e+00, 0.00000000e+00, 3.70000000e+01,
7.70000000e+04]])

As we can see in the above output, all the variables are encoded into numbers 0 and
1 and divided into three columns.

It can be seen more clearly in the variables explorer section, by clicking on x option
as:
For Purchased Variable:

1. labelencoder_y= LabelEncoder()
2. y= labelencoder_y.fit_transform(y)

For the second categorical variable, we will only use labelencoder object
of LableEncoder class. Here we are not using OneHotEncoder class because the
purchased variable has only two categories yes or no, and which are automatically
encoded into 0 and 1.

Output:

Out[17]: array([0, 1, 0, 0, 1, 1, 0, 1, 0, 1])

It can also be seen as:


6) Splitting the Dataset into the Training
set and Test set
In machine learning data preprocessing, we divide our dataset into a training set and
test set. This is one of the crucial steps of data preprocessing as by doing this, we can
enhance the performance of our machine learning model.

Suppose, if we have given training to our machine learning model by a dataset and
we test it by a completely different dataset. Then, it will create difficulties for our
model to understand the correlations between the models.

If we train our model very well and its training accuracy is also very high, but we
provide a new dataset to it, then it will decrease the performance. So we always try to
make a machine learning model which performs well with the training set and also
with the test dataset. Here, we can define these datasets as:
Training Set: A subset of dataset to train the machine learning model, and we
already know the output.

Test set: A subset of dataset to test the machine learning model, and by using the
test set, model predicts the output.

For splitting the dataset, we will use the below lines of code:

1. from sklearn.model_selection import train_test_split


2. x_train, x_test, y_train, y_test= train_test_split(x, y, test_size= 0.2, random_state
=0)

Explanation:

o In the above code, the first line is used for splitting arrays of the dataset into
random train and test subsets.
o In the second line, we have used four variables for our output that are
o x_train: features for the training data
o x_test: features for testing data
o y_train: Dependent variables for training data
o y_test: Independent variable for testing data
o In train_test_split() function, we have passed four parameters in which first
two are for arrays of data, and test_size is for specifying the size of the test
set. The test_size maybe .5, .3, or .2, which tells the dividing ratio of training
and testing sets.
o The last parameter random_state is used to set a seed for a random
generator so that you always get the same result, and the most used value for
this is 42.

Output:
By executing the above code, we will get 4 different variables, which can be seen
under the variable explorer section.

As we can see in the above image, the x and y variables are divided into 4 different
variables with corresponding values.

7) Feature Scaling
Feature scaling is the final step of data preprocessing in machine learning. It is a
technique to standardize the independent variables of the dataset in a specific range.
In feature scaling, we put our variables in the same range and in the same scale so
that no any variable dominate the other variable.

Consider the below dataset:


As we can see, the age and salary column values are not on the same scale. A
machine learning model is based on Euclidean distance, and if we do not scale the
variable, then it will cause some issue in our machine learning model.

Euclidean distance is given as:


If we compute any two values from age and salary, then salary values will dominate
the age values, and it will produce an incorrect result. So to remove this issue, we
need to perform feature scaling for machine learning.

There are two ways to perform feature scaling in machine learning:

Standardization

Normalization
Here, we will use the standardization method for our dataset.

For feature scaling, we will import StandardScaler class


of sklearn.preprocessing library as:

1. from sklearn.preprocessing import StandardScaler

Now, we will create the object of StandardScaler class for independent variables or
features. And then we will fit and transform the training dataset.

1. st_x= StandardScaler()
2. x_train= st_x.fit_transform(x_train)

For test dataset, we will directly apply transform() function instead


of fit_transform() because it is already done in training set.

1. x_test= st_x.transform(x_test)

Output:

By executing the above lines of code, we will get the scaled values for x_train and
x_test as:

x_train:
x_test:
As we can see in the above output, all the variables are scaled between values -1 to
1.

Note: Here, we have not scaled the dependent variable because there are only two
values 0 and 1. But if these variables will have more range of values, then we will also
need to scale those variables.

Combining all the steps:

Now, in the end, we can combine all the steps together to make our complete code
more understandable.

1. # importing libraries
2. import numpy as nm
3. import matplotlib.pyplot as mtp
4. import pandas as pd
5.
6. #importing datasets
7. data_set= pd.read_csv('Dataset.csv')
8.
9. #Extracting Independent Variable
10. x= data_set.iloc[:, :-1].values
11.
12. #Extracting Dependent variable
13. y= data_set.iloc[:, 3].values
14.
15. #handling missing data(Replacing missing data with the mean value)
16. from sklearn.preprocessing import Imputer
17. imputer= Imputer(missing_values ='NaN', strategy='mean', axis = 0)
18.
19. #Fitting imputer object to the independent varibles x.
20. imputerimputer= imputer.fit(x[:, 1:3])
21.
22. #Replacing missing data with the calculated mean value
23. x[:, 1:3]= imputer.transform(x[:, 1:3])
24.
25. #for Country Variable
26. from sklearn.preprocessing import LabelEncoder, OneHotEncoder
27. label_encoder_x= LabelEncoder()
28. x[:, 0]= label_encoder_x.fit_transform(x[:, 0])
29.
30. #Encoding for dummy variables
31. onehot_encoder= OneHotEncoder(categorical_features= [0])
32. x= onehot_encoder.fit_transform(x).toarray()
33.
34. #encoding for purchased variable
35. labelencoder_y= LabelEncoder()
36. y= labelencoder_y.fit_transform(y)
37.
38. # Splitting the dataset into training and test set.
39. from sklearn.model_selection import train_test_split
40. x_train, x_test, y_train, y_test= train_test_split(x, y, test_size= 0.2, random_state
=0)
41.
42. #Feature Scaling of datasets
43. from sklearn.preprocessing import StandardScaler
44. st_x= StandardScaler()
45. x_train= st_x.fit_transform(x_train)
46. x_test= st_x.transform(x_test)

In the above code, we have included all the data preprocessing steps together. But
there are some steps or lines of code which are not necessary for all machine
learning models. So we can exclude them from our code to make it reusable for all
models.

Supervised Machine Learning


Supervised learning is the types of machine learning in which machines are trained
using well "labelled" training data, and on basis of that data, machines predict the
output. The labelled data means some input data is already tagged with the correct
output.

In supervised learning, the training data provided to the machines work as the
supervisor that teaches the machines to predict the output correctly. It applies the
same concept as a student learns in the supervision of the teacher.

Supervised learning is a process of providing input data as well as correct output


data to the machine learning model. The aim of a supervised learning algorithm is
to find a mapping function to map the input variable(x) with the output
variable(y).

In the real-world, supervised learning can be used for Risk Assessment, Image
classification, Fraud Detection, spam filtering, etc.

PlayNext
Unmute

Current Time 0:00

Duration 18:10
Loaded: 0.37%
Â
Fullscreen
Backward Skip 10sPlay VideoForward Skip 10s

How Supervised Learning Works?


In supervised learning, models are trained using labelled dataset, where the model
learns about each type of data. Once the training process is completed, the model is
tested on the basis of test data (a subset of the training set), and then it predicts the
output.

The working of Supervised learning can be easily understood by the below example
and diagram:
Suppose we have a dataset of different types of shapes which includes square,
rectangle, triangle, and Polygon. Now the first step is that we need to train the model
for each shape.

o If the given shape has four sides, and all the sides are equal, then it will be
labelled as a Square.
o If the given shape has three sides, then it will be labelled as a triangle.
o If the given shape has six equal sides then it will be labelled as hexagon.

Now, after training, we test our model using the test set, and the task of the model is
to identify the shape.

The machine is already trained on all types of shapes, and when it finds a new shape,
it classifies the shape on the bases of a number of sides, and predicts the output.

Steps Involved in Supervised Learning:


o First Determine the type of training dataset
o Collect/Gather the labelled training data.
o Split the training dataset into training dataset, test dataset, and validation
dataset.
o Determine the input features of the training dataset, which should have
enough knowledge so that the model can accurately predict the output.
o Determine the suitable algorithm for the model, such as support vector
machine, decision tree, etc.
o Execute the algorithm on the training dataset. Sometimes we need validation
sets as the control parameters, which are the subset of training datasets.
o Evaluate the accuracy of the model by providing the test set. If the model
predicts the correct output, which means our model is accurate.

Types of supervised Machine learning


Algorithms:
Supervised learning can be further divided into two types of problems:
1. Regression

Regression algorithms are used if there is a relationship between the input variable
and the output variable. It is used for the prediction of continuous variables, such as
Weather forecasting, Market Trends, etc. Below are some popular Regression
algorithms which come under supervised learning:

o Linear Regression
o Regression Trees
o Non-Linear Regression
o Bayesian Linear Regression
o Polynomial Regression

2. Classification

Classification algorithms are used when the output variable is categorical, which
means there are two classes such as Yes-No, Male-Female, True-false, etc.

Spam Filtering,

o Random Forest
o Decision Trees
o Logistic Regression
o Support vector Machines

Note: We will discuss these algorithms in detail in later chapters.

Advantages of Supervised learning:


o With the help of supervised learning, the model can predict the output on the
basis of prior experiences.
o In supervised learning, we can have an exact idea about the classes of objects.
o Supervised learning model helps us to solve various real-world problems such
as fraud detection, spam filtering, etc.

Disadvantages of supervised learning:


o Supervised learning models are not suitable for handling the complex tasks.
o Supervised learning cannot predict the correct output if the test data is
different from the training dataset.
o Training required lots of computation times.
o In supervised learning, we need enough knowledge about the classes of
object.

Unsupervised Machine Learning


In the previous topic, we learned supervised machine learning in which models are
trained using labeled data under the supervision of training data. But there may be
many cases in which we do not have labeled data and need to find the hidden
patterns from the given dataset. So, to solve such types of cases in machine learning,
we need unsupervised learning techniques.

What is Unsupervised Learning?


As the name suggests, unsupervised learning is a machine learning technique in
which models are not supervised using training dataset. Instead, models itself find
the hidden patterns and insights from the given data. It can be compared to learning
which takes place in the human brain while learning new things. It can be defined as:

Unsupervised learning is a type of machine learning in which models are trained using
unlabeled dataset and are allowed to act on that data without any supervision.

Unsupervised learning cannot be directly applied to a regression or classification


problem because unlike supervised learning, we have the input data but no
corresponding output data. The goal of unsupervised learning is to find the
underlying structure of dataset, group that data according to similarities, and
represent that dataset in a compressed format.

Example: Suppose the unsupervised learning algorithm is given an input dataset


containing images of different types of cats and dogs. The algorithm is never trained
upon the given dataset, which means it does not have any idea about the features of
the dataset. The task of the unsupervised learning algorithm is to identify the image
features on their own. Unsupervised learning algorithm will perform this task by
clustering the image dataset into the groups according to similarities between
images.

PlayNext
Unmute

Current Time 0:00

Duration 18:10
Loaded: 0.37%
Â
Fullscreen
Backward Skip 10sPlay VideoForward Skip 10s

Why use Unsupervised Learning?


Below are some main reasons which describe the importance of Unsupervised
Learning:

o Unsupervised learning is helpful for finding useful insights from the data.
o Unsupervised learning is much similar as a human learns to think by their own
experiences, which makes it closer to the real AI.
o Unsupervised learning works on unlabeled and uncategorized data which
make unsupervised learning more important.
o In real-world, we do not always have input data with the corresponding
output so to solve such cases, we need unsupervised learning.

Working of Unsupervised Learning


Working of unsupervised learning can be understood by the below diagram:

Here, we have taken an unlabeled input data, which means it is not categorized and
corresponding outputs are also not given. Now, this unlabeled input data is fed to
the machine learning model in order to train it. Firstly, it will interpret the raw data to
find the hidden patterns from the data and then will apply suitable algorithms such
as k-means clustering, Decision tree, etc.

Once it applies the suitable algorithm, the algorithm divides the data objects into
groups according to the similarities and difference between the objects.

Types of Unsupervised Learning Algorithm:


The unsupervised learning algorithm can be further categorized into two types of
problems:
o Clustering: Clustering is a method of grouping the objects into clusters such
that objects with most similarities remains into a group and has less or no
similarities with the objects of another group. Cluster analysis finds the
commonalities between the data objects and categorizes them as per the
presence and absence of those commonalities.
o Association: An association rule is an unsupervised learning method which is
used for finding the relationships between variables in the large database. It
determines the set of items that occurs together in the dataset. Association
rule makes marketing strategy more effective. Such as people who buy X item
(suppose a bread) are also tend to purchase Y (Butter/Jam) item. A typical
example of Association rule is Market Basket Analysis.

Note: We will learn these algorithms in later chapters.

Unsupervised Learning algorithms:


Below is the list of some popular unsupervised learning algorithms:

o K-means clustering
o KNN (k-nearest neighbors)
o Hierarchal clustering
o Anomaly detection
o Neural Networks
o Principle Component Analysis
o Independent Component Analysis
o Apriori algorithm
o Singular value decomposition

Advantages of Unsupervised Learning


o Unsupervised learning is used for more complex tasks as compared to
supervised learning because, in unsupervised learning, we don't have labeled
input data.
o Unsupervised learning is preferable as it is easy to get unlabeled data in
comparison to labeled data.

Disadvantages of Unsupervised Learning


o Unsupervised learning is intrinsically more difficult than supervised learning as
it does not have corresponding output.
o The result of the unsupervised learning algorithm might be less accurate as
input data is not labeled, and algorithms do not know the exact output in
advance.

Difference between Supervised and


Unsupervised Learning
Supervised and Unsupervised learning are the two techniques of machine learning.
But both the techniques are used in different scenarios and with different datasets.
Below the explanation of both learning methods along with their difference table is
given.
Supervised Machine Learning:
Supervised learning is a machine learning method in which models are trained using
labeled data. In supervised learning, models need to find the mapping function to
map the input variable (X) with the output variable (Y).

Supervised learning needs supervision to train the model, which is similar to as a


student learns things in the presence of a teacher. Supervised learning can be used
for two types of problems: Classification and Regression.

Learn more Supervised Machine Learning

PlayNext
Unmute

Current Time 0:00

Duration 18:10
Loaded: 0.37%
Â
Fullscreen
Backward Skip 10sPlay VideoForward Skip 10s

Example: Suppose we have an image of different types of fruits. The task of our
supervised learning model is to identify the fruits and classify them accordingly. So to
identify the image in supervised learning, we will give the input data as well as
output for that, which means we will train the model by the shape, size, color, and
taste of each fruit. Once the training is completed, we will test the model by giving
the new set of fruit. The model will identify the fruit and predict the output using a
suitable algorithm.

Unsupervised Machine Learning:


Unsupervised learning is another machine learning method in which patterns
inferred from the unlabeled input data. The goal of unsupervised learning is to find
the structure and patterns from the input data. Unsupervised learning does not need
any supervision. Instead, it finds patterns from the data by its own.

Learn more Unsupervised Machine Learning

Unsupervised learning can be used for two types of


problems: Clustering and Association.

Example: To understand the unsupervised learning, we will use the example given
above. So unlike supervised learning, here we will not provide any supervision to the
model. We will just provide the input dataset to the model and allow the model to
find the patterns from the data. With the help of a suitable algorithm, the model will
train itself and divide the fruits into different groups according to the most similar
features between them.

The main differences between Supervised and Unsupervised learning are given
below:

Supervised Learning Unsupervised Learning

Supervised learning algorithms are Unsupervised learning algorithms are


trained using labeled data. trained using unlabeled data.

Supervised learning model takes direct Unsupervised learning model does not
feedback to check if it is predicting take any feedback.
correct output or not.

Supervised learning model predicts the Unsupervised learning model finds the
output. hidden patterns in data.

In supervised learning, input data is In unsupervised learning, only input


provided to the model along with the data is provided to the model.
output.
The goal of supervised learning is to The goal of unsupervised learning is to
train the model so that it can predict the find the hidden patterns and useful
output when it is given new data. insights from the unknown dataset.

Supervised learning needs supervision to Unsupervised learning does not need


train the model. any supervision to train the model.

Supervised learning can be categorized Unsupervised Learning can be


in Classification and Regression proble classified
ms. in Clustering and Associations proble
ms.

Supervised learning can be used for Unsupervised learning can be used for
those cases where we know the input as those cases where we have only input
well as corresponding outputs. data and no corresponding output
data.

Supervised learning model produces an Unsupervised learning model may give


accurate result. less accurate result as compared to
supervised learning.

Supervised learning is not close to true Unsupervised learning is more close to


Artificial intelligence as in this, we first the true Artificial Intelligence as it
train the model for each data, and then learns similarly as a child learns daily
only it can predict the correct output. routine things by his experiences.

It includes various algorithms such as It includes various algorithms such as


Linear Regression, Logistic Regression, Clustering, KNN, and Apriori algorithm.
Support Vector Machine, Multi-class
Classification, Decision tree, Bayesian
Logic, etc.

Note: The supervised and unsupervised learning both are the machine learning
methods, and selection of any of these learning depends on the factors related to the
structure and volume of your dataset and the use cases of the problem.

Regression Analysis in Machine


learning
Regression analysis is a statistical method to model the relationship between a
dependent (target) and independent (predictor) variables with one or more
independent variables. More specifically, Regression analysis helps us to understand
how the value of the dependent variable is changing corresponding to an
independent variable when other independent variables are held fixed. It predicts
continuous/real values such as temperature, age, salary, price, etc.

We can understand the concept of regression analysis using the below example:

Example: Suppose there is a marketing company A, who does various advertisement


every year and get sales on that. The below list shows the advertisement made by the
company in the last 5 years and the corresponding sales:

Now, the company wants to do the advertisement of $200 in the year 2019 and
wants to know the prediction about the sales for this year. So to solve such type
of prediction problems in machine learning, we need regression analysis.

PlayNext
Unmute

Current Time 0:00

Duration 18:10
Loaded: 0.37%
Â
Fullscreen
Backward Skip 10sPlay VideoForward Skip 10s
Regression is a supervised learning technique which helps in finding the correlation
between variables and enables us to predict the continuous output variable based on
the one or more predictor variables. It is mainly used for prediction, forecasting,
time series modeling, and determining the causal-effect relationship between
variables.

In Regression, we plot a graph between the variables which best fits the given
datapoints, using this plot, the machine learning model can make predictions about
the data. In simple words, "Regression shows a line or curve that passes through
all the datapoints on target-predictor graph in such a way that the vertical
distance between the datapoints and the regression line is minimum." The
distance between datapoints and line tells whether a model has captured a strong
relationship or not.

Some examples of regression can be as:

o Prediction of rain using temperature and other factors


o Determining Market trends
o Prediction of road accidents due to rash driving.

Terminologies Related to the Regression


Analysis:
o Dependent Variable: The main factor in Regression analysis which we want to
predict or understand is called the dependent variable. It is also called target
variable.
o Independent Variable: The factors which affect the dependent variables or
which are used to predict the values of the dependent variables are called
independent variable, also called as a predictor.
o Outliers: Outlier is an observation which contains either very low value or very
high value in comparison to other observed values. An outlier may hamper the
result, so it should be avoided.
o Multicollinearity: If the independent variables are highly correlated with each
other than other variables, then such condition is called Multicollinearity. It
should not be present in the dataset, because it creates problem while ranking
the most affecting variable.
o Underfitting and Overfitting: If our algorithm works well with the training
dataset but not well with test dataset, then such problem is called Overfitting.
And if our algorithm does not perform well even with training dataset, then
such problem is called underfitting.

Why do we use Regression Analysis?


As mentioned above, Regression analysis helps in the prediction of a continuous
variable. There are various scenarios in the real world where we need some future
predictions such as weather condition, sales prediction, marketing trends, etc., for
such case we need some technology which can make predictions more accurately. So
for such case we need Regression analysis which is a statistical method and used in
machine learning and data science. Below are some other reasons for using
Regression analysis:

o Regression estimates the relationship between the target and the


independent variable.
o It is used to find the trends in data.
o It helps to predict real/continuous values.
o By performing the regression, we can confidently determine the most
important factor, the least important factor, and how each factor is
affecting the other factors.

Types of Regression
There are various types of regressions which are used in data science and machine
learning. Each type has its own importance on different scenarios, but at the core, all
the regression methods analyze the effect of the independent variable on dependent
variables. Here we are discussing some important types of regression which are given
below:

o Linear Regression
o Logistic Regression
o Polynomial Regression
o Support Vector Regression
o Decision Tree Regression
o Random Forest Regression
o Ridge Regression
o Lasso Regression:
Linear Regression:

o Linear regression is a statistical regression method which is used for predictive


analysis.
o It is one of the very simple and easy algorithms which works on regression
and shows the relationship between the continuous variables.
o It is used for solving the regression problem in machine learning.
o Linear regression shows the linear relationship between the independent
variable (X-axis) and the dependent variable (Y-axis), hence called linear
regression.
o If there is only one input variable (x), then such linear regression is
called simple linear regression. And if there is more than one input variable,
then such linear regression is called multiple linear regression.
o The relationship between variables in the linear regression model can be
explained using the below image. Here we are predicting the salary of an
employee on the basis of the year of experience.
o Below is the mathematical equation for Linear regression:

1. Y= aX+b

Here, Y = dependent variables (target variables),


X= Independent variables (predictor variables),
a and b are the linear coefficients

Some popular applications of linear regression are:

o Analyzing trends and sales estimates


o Salary forecasting
o Real estate prediction
o Arriving at ETAs in traffic.

Logistic Regression:

o Logistic regression is another supervised learning algorithm which is used to


solve the classification problems. In classification problems, we have
dependent variables in a binary or discrete format such as 0 or 1.
o Logistic regression algorithm works with the categorical variable such as 0 or
1, Yes or No, True or False, Spam or not spam, etc.
o It is a predictive analysis algorithm which works on the concept of probability.
o Logistic regression is a type of regression, but it is different from the linear
regression algorithm in the term how they are used.
o Logistic regression uses sigmoid function or logistic function which is a
complex cost function. This sigmoid function is used to model the data in
logistic regression. The function can be represented as:

o f(x)= Output between the 0 and 1 value.


o x= input to the function
o e= base of natural logarithm.

When we provide the input values (data) to the function, it gives the S-curve as
follows:

o It uses the concept of threshold levels, values above the threshold level are
rounded up to 1, and values below the threshold level are rounded up to 0.

There are three types of logistic regression:

o Binary(0/1, pass/fail)
o Multi(cats, dogs, lions)
o Ordinal(low, medium, high)

Polynomial Regression:
o Polynomial Regression is a type of regression which models the non-linear
dataset using a linear model.
o It is similar to multiple linear regression, but it fits a non-linear curve between
the value of x and corresponding conditional values of y.
o Suppose there is a dataset which consists of datapoints which are present in a
non-linear fashion, so for such case, linear regression will not best fit to those
datapoints. To cover such datapoints, we need Polynomial regression.
o In Polynomial regression, the original features are transformed into
polynomial features of given degree and then modeled using a linear
model. Which means the datapoints are best fitted using a polynomial line.

o The equation for polynomial regression also derived from linear regression
equation that means Linear regression equation Y= b 0+ b1x, is transformed
into Polynomial regression equation Y= b0+b1x+ b2x2+ b3x3+.....+ bnxn.
o Here Y is the predicted/target output, b0, b1,... bn are the regression
coefficients. x is our independent/input variable.
o The model is still linear as the coefficients are still linear with quadratic

Note: This is different from Multiple Linear regression in such a way that in Polynomial
regression, a single element has different degrees instead of multiple variables with the
same degree.

Support Vector Regression:


Support Vector Machine is a supervised learning algorithm which can be used for
regression as well as classification problems. So if we use it for regression problems,
then it is termed as Support Vector Regression.

Support Vector Regression is a regression algorithm which works for continuous


variables. Below are some keywords which are used in Support Vector Regression:

o Kernel: It is a function used to map a lower-dimensional data into higher


dimensional data.
o Hyperplane: In general SVM, it is a separation line between two classes, but in
SVR, it is a line which helps to predict the continuous variables and cover most
of the datapoints.
o Boundary line: Boundary lines are the two lines apart from hyperplane, which
creates a margin for datapoints.
o Support vectors: Support vectors are the datapoints which are nearest to the
hyperplane and opposite class.

In SVR, we always try to determine a hyperplane with a maximum margin, so that


maximum number of datapoints are covered in that margin. The main goal of SVR
is to consider the maximum datapoints within the boundary lines and the
hyperplane (best-fit line) must contain a maximum number of datapoints.
Consider the below image:

Here, the blue line is called hyperplane, and the other two lines are known as
boundary lines.
Decision Tree Regression:

o Decision Tree is a supervised learning algorithm which can be used for solving
both classification and regression problems.
o It can solve problems for both categorical and numerical data
o Decision Tree regression builds a tree-like structure in which each internal
node represents the "test" for an attribute, each branch represent the result of
the test, and each leaf node represents the final decision or result.
o A decision tree is constructed starting from the root node/parent node
(dataset), which splits into left and right child nodes (subsets of dataset).
These child nodes are further divided into their children node, and themselves
become the parent node of those nodes. Consider the below image:

Above image showing the example of Decision Tee regression, here, the model is
trying to predict the choice of a person between Sports cars or Luxury car.

o Random forest is one of the most powerful supervised learning algorithms


which is capable of performing regression as well as classification tasks.
o The Random Forest regression is an ensemble learning method which
combines multiple decision trees and predicts the final output based on the
average of each tree output. The combined decision trees are called as base
models, and it can be represented more formally as:

g(x)= f0(x)+ f1(x)+ f2(x)+....

o Random forest uses Bagging or Bootstrap Aggregation technique of


ensemble learning in which aggregated decision tree runs in parallel and do
not interact with each other.
o With the help of Random Forest regression, we can prevent Overfitting in the
model by creating random subsets of the dataset.

Ridge Regression:

o Ridge regression is one of the most robust versions of linear regression in


which a small amount of bias is introduced so that we can get better long
term predictions.
o The amount of bias added to the model is known as Ridge Regression
penalty. We can compute this penalty term by multiplying with the lambda to
the squared weight of each individual features.
o The equation for ridge regression will be:

o A general linear or polynomial regression will fail if there is high collinearity


between the independent variables, so to solve such problems, Ridge
regression can be used.
o Ridge regression is a regularization technique, which is used to reduce the
complexity of the model. It is also called as L2 regularization.
o It helps to solve the problems if we have more parameters than samples.

Lasso Regression:

o Lasso regression is another regularization technique to reduce the complexity


of the model.
o It is similar to the Ridge Regression except that penalty term contains only the
absolute weights instead of a square of weights.
o Since it takes absolute values, hence, it can shrink the slope to 0, whereas
Ridge Regression can only shrink it near to 0.
o It is also called as L1 regularization. The equation for Lasso regression will be:

Linear Regression in Machine


Learning
Linear regression is one of the easiest and most popular Machine Learning
algorithms. It is a statistical method that is used for predictive analysis. Linear
regression makes predictions for continuous/real or numeric variables such as sales,
salary, age, product price, etc.

Linear regression algorithm shows a linear relationship between a dependent (y) and
one or more independent (y) variables, hence called as linear regression. Since linear
regression shows the linear relationship, which means it finds how the value of the
dependent variable is changing according to the value of the independent variable.
The linear regression model provides a sloped straight line representing the
relationship between the variables. Consider the below image:

Mathematically, we can represent a linear regression as:

PlayNext
Unmute

Current Time 0:00

Duration 18:10
Loaded: 0.37%
Â
Fullscreen
Backward Skip 10sPlay VideoForward Skip 10s
y= a0+a1x+ ε

Here,

Y= Dependent Variable (Target Variable)


X= Independent Variable (predictor Variable)
a0= intercept of the line (Gives an additional degree of freedom)
a1 = Linear regression coefficient (scale factor to each input value).
ε = random error

The values for x and y variables are training datasets for Linear Regression model
representation.

Types of Linear Regression


Linear regression can be further divided into two types of the algorithm:

o Simple Linear Regression:


If a single independent variable is used to predict the value of a numerical
dependent variable, then such a Linear Regression algorithm is called Simple
Linear Regression.
o Multiple Linear regression:
If more than one independent variable is used to predict the value of a
numerical dependent variable, then such a Linear Regression algorithm is
called Multiple Linear Regression.

Linear Regression Line


A linear line showing the relationship between the dependent and independent
variables is called a regression line. A regression line can show two types of
relationship:

o Positive Linear Relationship:


If the dependent variable increases on the Y-axis and independent variable
increases on X-axis, then such a relationship is termed as a Positive linear
relationship.
o Negative Linear Relationship:
If the dependent variable decreases on the Y-axis and independent variable
increases on the X-axis, then such a relationship is called a negative linear
relationship.

Finding the best fit line:


When working with linear regression, our main goal is to find the best fit line that
means the error between predicted values and actual values should be minimized.
The best fit line will have the least error.

The different values for weights or the coefficient of lines (a 0, a1) gives a different line
of regression, so we need to calculate the best values for a 0 and a1 to find the best fit
line, so to calculate this we use cost function.

Cost function-

o The different values for weights or coefficient of lines (a 0, a1) gives the
different line of regression, and the cost function is used to estimate the
values of the coefficient for the best fit line.
o Cost function optimizes the regression coefficients or weights. It measures
how a linear regression model is performing.
o We can use the cost function to find the accuracy of the mapping function,
which maps the input variable to the output variable. This mapping function is
also known as Hypothesis function.
For Linear Regression, we use the Mean Squared Error (MSE) cost function, which is
the average of squared error occurred between the predicted values and actual
values. It can be written as:

For the above linear equation, MSE can be calculated as:

Where,

N=Total number of observation


Yi = Actual value
(a1xi+a0)= Predicted value.

Residuals: The distance between the actual value and predicted values is called
residual. If the observed points are far from the regression line, then the residual will
be high, and so cost function will high. If the scatter points are close to the
regression line, then the residual will be small and hence the cost function.

Gradient Descent:

o Gradient descent is used to minimize the MSE by calculating the gradient of


the cost function.
o A regression model uses gradient descent to update the coefficients of the
line by reducing the cost function.
o It is done by a random selection of values of coefficient and then iteratively
update the values to reach the minimum cost function.

Model Performance:
The Goodness of fit determines how the line of regression fits the set of
observations. The process of finding the best model out of various models is
called optimization. It can be achieved by below method:

1. R-squared method:

o R-squared is a statistical method that determines the goodness of fit.


o It measures the strength of the relationship between the dependent and
independent variables on a scale of 0-100%.
o The high value of R-square determines the less difference between the
predicted values and actual values and hence represents a good model.
o It is also called a coefficient of determination, or coefficient of multiple
determination for multiple regression.
o It can be calculated from the below formula:

Assumptions of Linear Regression


Below are some important assumptions of Linear Regression. These are some formal
checks while building a Linear Regression model, which ensures to get the best
possible result from the given dataset.

o Linear relationship between the features and target:


Linear regression assumes the linear relationship between the dependent and
independent variables.
o Small or no multicollinearity between the features:
Multicollinearity means high-correlation between the independent variables.
Due to multicollinearity, it may difficult to find the true relationship between
the predictors and target variables. Or we can say, it is difficult to determine
which predictor variable is affecting the target variable and which is not. So,
the model assumes either little or no multicollinearity between the features or
independent variables.
o Homoscedasticity Assumption:
Homoscedasticity is a situation when the error term is the same for all the
values of independent variables. With homoscedasticity, there should be no
clear pattern distribution of data in the scatter plot.
o Normal distribution of error terms:
Linear regression assumes that the error term should follow the normal
distribution pattern. If error terms are not normally distributed, then
confidence intervals will become either too wide or too narrow, which may
cause difficulties in finding coefficients.
It can be checked using the q-q plot. If the plot shows a straight line without
any deviation, which means the error is normally distributed.
o No autocorrelations:
The linear regression model assumes no autocorrelation in error terms. If
there will be any correlation in the error term, then it will drastically reduce the
accuracy of the model. Autocorrelation usually occurs if there is a dependency
between residual errors.

Simple Linear Regression in Machine


Learning
Simple Linear Regression is a type of Regression algorithms that models the
relationship between a dependent variable and a single independent variable. The
relationship shown by a Simple Linear Regression model is linear or a sloped straight
line, hence it is called Simple Linear Regression.

The key point in Simple Linear Regression is that the dependent variable must be a
continuous/real value. However, the independent variable can be measured on
continuous or categorical values.

Simple Linear regression algorithm has mainly two objectives:

o Model the relationship between the two variables. Such as the relationship
between Income and expenditure, experience and Salary, etc.
o Forecasting new observations. Such as Weather forecasting according to
temperature, Revenue of a company according to the investments in a year,
etc.

Simple Linear Regression Model:


The Simple Linear Regression model can be represented using the below equation:

Backward Skip 10sPlay VideoForward Skip 10s


y= a0+a1x+ ε

Where,

a0= It is the intercept of the Regression line (can be obtained putting x=0)
a1= It is the slope of the regression line, which tells whether the line is
increasing or decreasing.
ε = The error term. (For a good model it will be negligible)
Implementation of Simple Linear
Regression Algorithm using Python
Problem Statement example for Simple Linear Regression:

Here we are taking a dataset that has two variables: salary (dependent variable) and
experience (Independent variable). The goals of this problem is:

o We want to find out if there is any correlation between these two


variables
o We will find the best fit line for the dataset.
o How the dependent variable is changing by changing the independent
variable.

In this section, we will create a Simple Linear Regression model to find out the best
fitting line for representing the relationship between these two variables.

To implement the Simple Linear regression model in machine learning using Python,
we need to follow the below steps:

Step-1: Data Pre-processing

The first step for creating the Simple Linear Regression model is data pre-processing.
We have already done it earlier in this tutorial. But there will be some changes, which
are given in the below steps:

o First, we will import the three important libraries, which will help us for loading
the dataset, plotting the graphs, and creating the Simple Linear Regression
model.

1. import numpy as nm
2. import matplotlib.pyplot as mtp
3. import pandas as pd

o Next, we will load the dataset into our code:

1. data_set= pd.read_csv('Salary_Data.csv')

By executing the above line of code (ctrl+ENTER), we can read the dataset on our
Spyder IDE screen by clicking on the variable explorer option.
The above output shows the dataset, which has two variables: Salary and Experience.

Note: In Spyder IDE, the folder containing the code file must be saved as a working
directory, and the dataset or csv file should be in the same folder.

o After that, we need to extract the dependent and independent variables from
the given dataset. The independent variable is years of experience, and the
dependent variable is salary. Below is code for it:

1. x= data_set.iloc[:, :-1].values
2. y= data_set.iloc[:, 1].values

In the above lines of code, for x variable, we have taken -1 value since we want to
remove the last column from the dataset. For y variable, we have taken 1 value as a
parameter, since we want to extract the second column and indexing starts from the
zero.

By executing the above line of code, we will get the output for X and Y variable as:
In the above output image, we can see the X (independent) variable and Y
(dependent) variable has been extracted from the given dataset.

o Next, we will split both variables into the test set and training set. We have 30
observations, so we will take 20 observations for the training set and 10
observations for the test set. We are splitting our dataset so that we can train
our model using a training dataset and then test the model using a test
dataset. The code for this is given below:

1. # Splitting the dataset into training and test set.


2. from sklearn.model_selection import train_test_split
3. x_train, x_test, y_train, y_test= train_test_split(x, y, test_size= 1/3, random_state
=0)

By executing the above code, we will get x-test, x-train and y-test, y-train dataset.
Consider the below images:

Test-dataset:
Training Dataset:

o For simple linear Regression, we will not use Feature Scaling. Because Python
libraries take care of it for some cases, so we don't need to perform it here.
Now, our dataset is well prepared to work on it and we are going to start
building a Simple Linear Regression model for the given problem.
Step-2: Fitting the Simple Linear Regression to the Training Set:

Now the second step is to fit our model to the training dataset. To do so, we will
import the LinearRegression class of the linear_model library from the scikit learn.
After importing the class, we are going to create an object of the class named as
a regressor. The code for this is given below:

1. #Fitting the Simple Linear Regression model to the training dataset


2. from sklearn.linear_model import LinearRegression
3. regressor= LinearRegression()
4. regressor.fit(x_train, y_train)

In the above code, we have used a fit() method to fit our Simple Linear Regression
object to the training set. In the fit() function, we have passed the x_train and y_train,
which is our training dataset for the dependent and an independent variable. We
have fitted our regressor object to the training set so that the model can easily learn
the correlations between the predictor and target variables. After executing the
above lines of code, we will get the below output.

Output:

Out[7]: LinearRegression(copy_X=True, fit_intercept=True, n_jobs=None,


normalize=False)

Step: 3. Prediction of test set result:

dependent (salary) and an independent variable (Experience). So, now, our model is
ready to predict the output for the new observations. In this step, we will provide the
test dataset (new observations) to the model to check whether it can predict the
correct output or not.

We will create a prediction vector y_pred, and x_pred, which will contain predictions
of test dataset, and prediction of training set respectively.

1. #Prediction of Test and Training set result


2. y_pred= regressor.predict(x_test)
3. x_pred= regressor.predict(x_train)

On executing the above lines of code, two variables named y_pred and x_pred will
generate in the variable explorer options that contain salary predictions for the
training set and test set.

Output:
You can check the variable by clicking on the variable explorer option in the IDE, and
also compare the result by comparing values from y_pred and y_test. By comparing
these values, we can check how good our model is performing.

Step: 4. visualizing the Training set results:

Now in this step, we will visualize the training set result. To do so, we will use the
scatter() function of the pyplot library, which we have already imported in the pre-
processing step. The scatter () function will create a scatter plot of observations.

In the x-axis, we will plot the Years of Experience of employees and on the y-axis,
salary of employees. In the function, we will pass the real values of training set, which
means a year of experience x_train, training set of Salaries y_train, and color of the
observations. Here we are taking a green color for the observation, but it can be any
color as per the choice.

Now, we need to plot the regression line, so for this, we will use the plot()
function of the pyplot library. In this function, we will pass the years of experience
for training set, predicted salary for training set x_pred, and color of the line.

Next, we will give the title for the plot. So here, we will use the title() function of
the pyplot library and pass the name ("Salary vs Experience (Training Dataset)".

After that, we will assign labels for x-axis and y-axis using xlabel() and ylabel()
function.

Finally, we will represent all above things in a graph using show(). The code is given
below:

1. mtp.scatter(x_train, y_train, color="green")


2. mtp.plot(x_train, x_pred, color="red")
3. mtp.title("Salary vs Experience (Training Dataset)")
4. mtp.xlabel("Years of Experience")
5. mtp.ylabel("Salary(In Rupees)")
6. mtp.show()

Output:

By executing the above lines of code, we will get the below graph plot as an output.
In the above plot, we can see the real values observations in green dots and
predicted values are covered by the red regression line. The regression line shows a
correlation between the dependent and independent variable.

The good fit of the line can be observed by calculating the difference between actual
values and predicted values. But as we can see in the above plot, most of the
observations are close to the regression line, hence our model is good for the
training set.

Step: 5. visualizing the Test set results:

In the previous step, we have visualized the performance of our model on the
training set. Now, we will do the same for the Test set. The complete code will remain
the same as the above code, except in this, we will use x_test, and y_test instead of
x_train and y_train.

Here we are also changing the color of observations and regression line to
differentiate between the two plots, but it is optional.

1. #visualizing the Test set results


2. mtp.scatter(x_test, y_test, color="blue")
3. mtp.plot(x_train, x_pred, color="red")
4. mtp.title("Salary vs Experience (Test Dataset)")
5. mtp.xlabel("Years of Experience")
6. mtp.ylabel("Salary(In Rupees)")
7. mtp.show()

Output:

By executing the above line of code, we will get the output as:

In the above plot, there are observations given by the blue color, and prediction is
given by the red regression line. As we can see, most of the observations are close to
the regression line, hence we can say our Simple Linear Regression is a good model
and able to make good predictions.

Multiple Linear Regression


In the previous topic, we have learned about Simple Linear Regression, where a
single Independent/Predictor(X) variable is used to model the response variable (Y).
But there may be various cases in which the response variable is affected by more
than one predictor variable; for such cases, the Multiple Linear Regression algorithm
is used.

Moreover, Multiple Linear Regression is an extension of Simple Linear regression as it


takes more than one predictor variable to predict the response variable. We can
define it as:
Multiple Linear Regression is one of the important regression algorithms which models
the linear relationship between a single dependent continuous variable and more than one
independent variable.

Example:

Prediction of CO2 emission based on engine size and number of cylinders in a car.

Backward Skip 10sPlay VideoForward Skip 10s

Some key points about MLR:

o For MLR, the dependent or target variable(Y) must be the continuous/real, but
the predictor or independent variable may be of continuous or categorical
form.
o Each feature variable must model the linear relationship with the dependent
variable.
o MLR tries to fit a regression line through a multidimensional space of data-
points.

MLR equation:
In Multiple Linear Regression, the target variable(Y) is a linear combination of
multiple predictor variables x1, x2, x3, ...,xn. Since it is an enhancement of Simple Linear
Regression, so the same is applied for the multiple linear regression equation, the
equation becomes:

1. Y= b<sub>0</sub>+b<sub>1</sub>x<sub>1</sub>+ b<sub>2</
sub>x<sub>2</sub>+ b<sub>3</sub>x<sub>3</sub>+...... bnxn ..............
. (a)

Where,

Y= Output/Response variable

b0, b1, b2, b3 , bn....= Coefficients of the model.

x1, x2, x3, x4,...= Various Independent/feature variable

Assumptions for Multiple Linear Regression:


o A linear relationship should exist between the Target and predictor variables.
o The regression residuals must be normally distributed.
o MLR assumes little or no multicollinearity (correlation between the
independent variable) in data.

Implementation of Multiple Linear Regression


model using Python:
To implement MLR using Python, we have below problem:

Problem Description:

We have a dataset of 50 start-up companies. This dataset contains five main


information: R&D Spend, Administration Spend, Marketing Spend, State, and
Profit for a financial year. Our goal is to create a model that can easily determine
which company has a maximum profit, and which is the most affecting factor for the
profit of a company.

Since we need to find the Profit, so it is the dependent variable, and the other four
variables are independent variables. Below are the main steps of deploying the MLR
model:

1. Data Pre-processing Steps


2. Fitting the MLR model to the training set
3. Predicting the result of the test set

Step-1: Data Pre-processing Step:

The very first step is data pre-processing, which we have already discussed in this
tutorial. This process contains the below steps:

o Importing libraries: Firstly we will import the library which will help in
building the model. Below is the code for it:

1. # importing libraries
2. import numpy as nm
3. import matplotlib.pyplot as mtp
4. import pandas as pd

o Importing dataset: Now we will import the dataset(50_CompList), which


contains all the variables. Below is the code for it:
1. #importing datasets
2. data_set= pd.read_csv('50_CompList.csv')

Output: We will get the dataset as:

In above output, we can clearly see that there are five variables, in which four
variables are continuous and one is categorical variable.

o Extracting dependent and independent Variables:

1. #Extracting Independent and dependent Variable


2. x= data_set.iloc[:, :-1].values
3. y= data_set.iloc[:, 4].values

Output:

Out[5]:

array([[165349.2, 136897.8, 471784.1, 'New York'],


[162597.7, 151377.59, 443898.53, 'California'],
[153441.51, 101145.55, 407934.54, 'Florida'],
[144372.41, 118671.85, 383199.62, 'New York'],
[142107.34, 91391.77, 366168.42, 'Florida'],
[131876.9, 99814.71, 362861.36, 'New York'],
[134615.46, 147198.87, 127716.82, 'California'],
[130298.13, 145530.06, 323876.68, 'Florida'],
[120542.52, 148718.95, 311613.29, 'New York'],
[123334.88, 108679.17, 304981.62, 'California'],
[101913.08, 110594.11, 229160.95, 'Florida'],
[100671.96, 91790.61, 249744.55, 'California'],
[93863.75, 127320.38, 249839.44, 'Florida'],
[91992.39, 135495.07, 252664.93, 'California'],
[119943.24, 156547.42, 256512.92, 'Florida'],
[114523.61, 122616.84, 261776.23, 'New York'],
[78013.11, 121597.55, 264346.06, 'California'],
[94657.16, 145077.58, 282574.31, 'New York'],
[91749.16, 114175.79, 294919.57, 'Florida'],
[86419.7, 153514.11, 0.0, 'New York'],
[76253.86, 113867.3, 298664.47, 'California'],
[78389.47, 153773.43, 299737.29, 'New York'],
[73994.56, 122782.75, 303319.26, 'Florida'],
[67532.53, 105751.03, 304768.73, 'Florida'],
[77044.01, 99281.34, 140574.81, 'New York'],
[64664.71, 139553.16, 137962.62, 'California'],
[75328.87, 144135.98, 134050.07, 'Florida'],
[72107.6, 127864.55, 353183.81, 'New York'],
[66051.52, 182645.56, 118148.2, 'Florida'],
[65605.48, 153032.06, 107138.38, 'New York'],
[61994.48, 115641.28, 91131.24, 'Florida'],
[61136.38, 152701.92, 88218.23, 'New York'],
[63408.86, 129219.61, 46085.25, 'California'],
[55493.95, 103057.49, 214634.81, 'Florida'],
[46426.07, 157693.92, 210797.67, 'California'],
[46014.02, 85047.44, 205517.64, 'New York'],
[28663.76, 127056.21, 201126.82, 'Florida'],
[44069.95, 51283.14, 197029.42, 'California'],
[20229.59, 65947.93, 185265.1, 'New York'],
[38558.51, 82982.09, 174999.3, 'California'],
[28754.33, 118546.05, 172795.67, 'California'],
[27892.92, 84710.77, 164470.71, 'Florida'],
[23640.93, 96189.63, 148001.11, 'California'],
[15505.73, 127382.3, 35534.17, 'New York'],
[22177.74, 154806.14, 28334.72, 'California'],
[1000.23, 124153.04, 1903.93, 'New York'],
[1315.46, 115816.21, 297114.46, 'Florida'],
[0.0, 135426.92, 0.0, 'California'],
[542.05, 51743.15, 0.0, 'New York'],
[0.0, 116983.8, 45173.06, 'California']], dtype=object)

As we can see in the above output, the last column contains categorical variables
which are not suitable to apply directly for fitting the model. So we need to encode
this variable.

Encoding Dummy Variables:

As we have one categorical variable (State), which cannot be directly applied to the
model, so we will encode it. To encode the categorical variable into numbers, we will
use the LabelEncoder class. But it is not sufficient because it still has some relational
order, which may create a wrong model. So in order to remove this problem, we will
use OneHotEncoder, which will create the dummy variables. Below is code for it:

1. #Catgorical data
2. from sklearn.preprocessing import LabelEncoder, OneHotEncoder
3. labelencoder_x= LabelEncoder()
4. x[:, 3]= labelencoder_x.fit_transform(x[:,3])
5. onehotencoder= OneHotEncoder(categorical_features= [3])
6. x= onehotencoder.fit_transform(x).toarray()

Here we are only encoding one independent variable, which is state as other
variables are continuous.

Output:

As we can see in the above output, the state column has been converted into dummy
variables (0 and 1). Here each dummy variable column is corresponding to the
one State. We can check by comparing it with the original dataset. The first column
corresponds to the California State, the second column corresponds to the Florida
State, and the third column corresponds to the New York State.
Note: We should not use all the dummy variables at the same time, so it must be 1 less
than the total number of dummy variables, else it will create a dummy variable trap.

o Now, we are writing a single line of code just to avoid the dummy variable
trap:

1. #avoiding the dummy variable trap:


2. x = x[:, 1:]

If we do not remove the first dummy variable, then it may introduce multicollinearity
in the model.

As we can see in the above output image, the first column has been removed.

o Now we will split the dataset into training and test set. The code for this is
given below:
1. # Splitting the dataset into training and test set.
2. from sklearn.model_selection import train_test_split
3. x_train, x_test, y_train, y_test= train_test_split(x, y, test_size= 0.2, random_state
=0)

The above code will split our dataset into a training set and test set.

Output: The above code will split the dataset into training set and test set. You can
check the output by clicking on the variable explorer option given in Spyder IDE. The
test set and training set will look like the below image:

Test set:

Training set:
Note: In MLR, we will not do feature scaling as it is taken care by the library, so we don't
need to do it manually.

Step: 2- Fitting our MLR model to the Training set:


Now, we have well prepared our dataset in order to provide training, which means
we will fit our regression model to the training set. It will be similar to as we did
in Simple Linear Regression model. The code for this will be:

1. #Fitting the MLR model to the training set:


2. from sklearn.linear_model import LinearRegression
3. regressor= LinearRegression()
4. regressor.fit(x_train, y_train)

Output:

Out[9]: LinearRegression(copy_X=True, fit_intercept=True, n_jobs=None,


normalize=False)

Now, we have successfully trained our model using the training dataset. In the next
step, we will test the performance of the model using the test dataset.

Step: 3- Prediction of Test set results:


The last step for our model is checking the performance of the model. We will do it
by predicting the test set result. For prediction, we will create a y_pred vector. Below
is the code for it:
1. #Predicting the Test set result;
2. y_pred= regressor.predict(x_test)

By executing the above lines of code, a new vector will be generated under the
variable explorer option. We can test our model by comparing the predicted values
and test set values.

Output:

In the above output, we have predicted result set and test set. We can check model
performance by comparing these two value index by index. For example, the first
index has a predicted value of 103015$ profit and test/real value of 103282$ profit.
The difference is only of 267$, which is a good prediction, so, finally, our model is
completed here.

o We can also check the score for training dataset and test dataset. Below is the
code for it:

1. print('Train Score: ', regressor.score(x_train, y_train))


2. print('Test Score: ', regressor.score(x_test, y_test))

Output: The score is:

Train Score: 0.9501847627493607


Test Score: 0.9347068473282446

The above score tells that our model is 95% accurate with the training dataset
and 93% accurate with the test dataset.

Note: In the next topic, we will see how we can improve the performance of the model
using the Backward Elimination process.

Applications of Multiple Linear Regression:


There are mainly two applications of Multiple Linear Regression:

o Effectiveness of Independent variable on prediction:


o Predicting the impact of changes:

What is Backward Elimination?


Backward elimination is a feature selection technique while building a machine
learning model. It is used to remove those features that do not have a significant
effect on the dependent variable or prediction of output. There are various ways to
build a model in Machine Learning, which are:

1. All-in
2. Backward Elimination
3. Forward Selection
4. Bidirectional Elimination
5. Score Comparison

Above are the possible methods for building the model in Machine learning, but we
will only use here the Backward Elimination process as it is the fastest method.

Steps of Backward Elimination


Below are some main steps which are used to apply backward elimination process:

Step-1: Firstly, We need to select a significance level to stay in the model. (SL=0.05)

PlayNext
Unmute
Current Time 0:00

Duration 18:10
Loaded: 0.37%
Â
Fullscreen
Backward Skip 10sPlay VideoForward Skip 10s

Step-2: Fit the complete model with all possible predictors/independent variables.

Step-3: Choose the predictor which has the highest P-value, such that.

a. If P-value >SL, go to step 4.

b. Else Finish, and Our model is ready.

Step-4: Remove that predictor.

Step-5: Rebuild and fit the model with the remaining variables.

Need for Backward Elimination: An optimal


Multiple Linear Regression model:
In the previous chapter, we discussed and successfully created our Multiple Linear
Regression model, where we took 4 independent variables (R&D spend,
Administration spend, Marketing spend, and state (dummy variables)) and one
dependent variable (Profit). But that model is not optimal, as we have included all
the independent variables and do not know which independent model is most
affecting and which one is the least affecting for the prediction.

Unnecessary features increase the complexity of the model. Hence it is good to have
only the most significant features and keep our model simple to get the better result.

So, in order to optimize the performance of the model, we will use the Backward
Elimination method. This process is used to optimize the performance of the MLR
model as it will only include the most affecting feature and remove the least affecting
feature. Let's start to apply it to our MLR model.

Steps for Backward Elimination method:


We will use the same model which we build in the previous chapter of MLR. Below is
the complete code for it:

1. # importing libraries
2. import numpy as nm
3. import matplotlib.pyplot as mtp
4. import pandas as pd
5.
6. #importing datasets
7. data_set= pd.read_csv('50_CompList.csv')
8.
9. #Extracting Independent and dependent Variable
10. x= data_set.iloc[:, :-1].values
11. y= data_set.iloc[:, 4].values
12.
13. #Catgorical data
14. from sklearn.preprocessing import LabelEncoder, OneHotEncoder
15. labelencoder_x= LabelEncoder()
16. x[:, 3]= labelencoder_x.fit_transform(x[:,3])
17. onehotencoder= OneHotEncoder(categorical_features= [3])
18. x= onehotencoder.fit_transform(x).toarray()
19.
20. #Avoiding the dummy variable trap:
21. x = x[:, 1:]
22.
23.
24. # Splitting the dataset into training and test set.
25. from sklearn.model_selection import train_test_split
26. x_train, x_test, y_train, y_test= train_test_split(x, y, test_size= 0.2, random_state
=0)
27.
28. #Fitting the MLR model to the training set:
29. from sklearn.linear_model import LinearRegression
30. regressor= LinearRegression()
31. regressor.fit(x_train, y_train)
32.
33. #Predicting the Test set result;
34. y_pred= regressor.predict(x_test)
35.
36. #Checking the score
37. print('Train Score: ', regressor.score(x_train, y_train))
38. print('Test Score: ', regressor.score(x_test, y_test))

From the above code, we got training and test set result as:

Train Score: 0.9501847627493607


Test Score: 0.9347068473282446

The difference between both scores is 0.0154.

Note: On the basis of this score, we will estimate the effect of features on our model
after using the Backward elimination process.

Step: 1- Preparation of Backward Elimination:

o Importing the library: Firstly, we need to import


the statsmodels.formula.api library, which is used for the estimation of
various statistical models such as OLS(Ordinary Least Square). Below is the
code for it:

1. import statsmodels.api as smf

o Adding a column in matrix of features: As we can check in our MLR


equation (a), there is one constant term b 0, but this term is not present in our
matrix of features, so we need to add it manually. We will add a column
having values x0 = 1 associated with the constant term b 0.
To add this, we will use append function of Numpy library (nm which we have
already imported into our code), and will assign a value of 1. Below is the code
for it.

1. x = nm.append(arr = nm.ones((50,1)).astype(int), values=x, axis=1)

Here we have used axis =1, as we wanted to add a column. For adding a row, we can
use axis =0.

Output: By executing the above line of code, a new column will be added into our
matrix of features, which will have all values equal to 1. We can check it by clicking
on the x dataset under the variable explorer option.
As we can see in the above output image, the first column is added successfully,
which corresponds to the constant term of the MLR equation.

Step: 2:

o Now, we are actually going to apply a backward elimination process. Firstly we


will create a new feature vector x_opt, which will only contain a set of
independent features that are significantly affecting the dependent variable.
o Next, as per the Backward Elimination process, we need to choose a significant
level(0.5), and then need to fit the model with all possible predictors. So for
fitting the model, we will create a regressor_OLS object of new
class OLS of statsmodels library. Then we will fit it by using the fit() method.
o Next we need p-value to compare with SL value, so for this we will
use summary() method to get the summary table of all the values. Below is
the code for it:

1. x_opt=x [:, [0,1,2,3,4,5]]


2. regressor_OLS=sm.OLS(endog = y, exog=x_opt).fit()
3. regressor_OLS.summary()

Output: By executing the above lines of code, we will get a summary table. Consider
the below image:

In the above image, we can clearly see the p-values of all the variables. Here x1, x2
are dummy variables, x3 is R&D spend, x4 is Administration spend, and x5 is
Marketing spend.

From the table, we will choose the highest p-value, which is for x1=0.953 Now, we
have the highest p-value which is greater than the SL value, so will remove the x1
variable (dummy variable) from the table and will refit the model. Below is the code
for it:

1. x_opt=x[:, [0,2,3,4,5]]
2. regressor_OLS=sm.OLS(endog = y, exog=x_opt).fit()
3. regressor_OLS.summary()

Output:
As we can see in the output image, now five variables remain. In these variables, the
highest p-value is 0.961. So we will remove it in the next iteration.

o Now the next highest value is 0.961 for x1 variable, which is another dummy
variable. So we will remove it and refit the model. Below is the code for it:

1. x_opt= x[:, [0,3,4,5]]


2. regressor_OLS=sm.OLS(endog = y, exog=x_opt).fit()
3. regressor_OLS.summary()

Output:
In the above output image, we can see the dummy variable(x2) has been removed.
And the next highest value is .602, which is still greater than .5, so we need to remove
it.

o Now we will remove the Admin spend which is having .602 p-value and again
refit the model.

1. x_opt=x[:, [0,3,5]]
2. regressor_OLS=sm.OLS(endog = y, exog=x_opt).fit()
3. regressor_OLS.summary()

Output:
As we can see in the above output image, the variable (Admin spend) has been
removed. But still, there is one variable left, which is marketing spend as it has a
high p-value (0.60). So we need to remove it.

o Finally, we will remove one more variable, which has .60 p-value for marketing
spend, which is more than a significant level.
Below is the code for it:

1. x_opt=x[:, [0,3]]
2. regressor_OLS=sm.OLS(endog = y, exog=x_opt).fit()
3. regressor_OLS.summary()

Output:
As we can see in the above output image, only two variables are left. So only
the R&D independent variable is a significant variable for the prediction. So we can
now predict efficiently using this variable.

Estimating the performance:


In the previous topic, we have calculated the train and test score of the model when
we have used all the features variables. Now we will check the score with only one
feature variable (R&D spend). Our dataset now looks like:
Below is the code for Building Multiple Linear Regression model by only using
R&D spend:

1. # importing libraries
2. import numpy as nm
3. import matplotlib.pyplot as mtp
4. import pandas as pd
5.
6. #importing datasets
7. data_set= pd.read_csv('50_CompList1.csv')
8.
9. #Extracting Independent and dependent Variable
10. x_BE= data_set.iloc[:, :-1].values
11. y_BE= data_set.iloc[:, 1].values
12.
13.
14. # Splitting the dataset into training and test set.
15. from sklearn.model_selection import train_test_split
16. x_BE_train, x_BE_test, y_BE_train, y_BE_test= train_test_split(x_BE, y_BE, test_size
= 0.2, random_state=0)
17.
18. #Fitting the MLR model to the training set:
19. from sklearn.linear_model import LinearRegression
20. regressor= LinearRegression()
21. regressor.fit(nm.array(x_BE_train).reshape(-1,1), y_BE_train)
22.
23. #Predicting the Test set result;
24. y_pred= regressor.predict(x_BE_test)
25.
26. #Cheking the score
27. print('Train Score: ', regressor.score(x_BE_train, y_BE_train))
28. print('Test Score: ', regressor.score(x_BE_test, y_BE_test))

Output:

After executing the above code, we will get the Training and test scores as:

Train Score: 0.9449589778363044


Test Score: 0.9464587607787219

As we can see, the training score is 94% accurate, and the test score is also 94%
accurate. The difference between both scores is .00149. This score is very much close
to the previous score, i.e., 0.0154, where we have included all the variables.

We got this result by using one independent variable (R&D spend) only instead
of four variables. Hence, now, our model is simple and accurate.

ML Polynomial Regression
o Polynomial Regression is a regression algorithm that models the relationship
between a dependent(y) and independent variable(x) as nth degree
polynomial. The Polynomial Regression equation is given below:

y= b0+b1x1+ b2x12+ b2x13+...... bnx1n

o It is also called the special case of Multiple Linear Regression in ML. Because
we add some polynomial terms to the Multiple Linear regression equation to
convert it into Polynomial Regression.
o It is a linear model with some modification in order to increase the accuracy.
o The dataset used in Polynomial regression for training is of non-linear nature.
o It makes use of a linear regression model to fit the complicated and non-linear
functions and datasets.
o Hence, "In Polynomial regression, the original features are converted
into Polynomial features of required degree (2,3,..,n) and then modeled
using a linear model."

Need for Polynomial Regression:


The need of Polynomial Regression in ML can be understood in the below points:

o If we apply a linear model on a linear dataset, then it provides us a good


result as we have seen in Simple Linear Regression, but if we apply the same
model without any modification on a non-linear dataset, then it will produce
a drastic output. Due to which loss function will increase, the error rate will be
high, and accuracy will be decreased.
o So for such cases, where data points are arranged in a non-linear fashion,
we need the Polynomial Regression model. We can understand it in a
better way using the below comparison diagram of the linear dataset and
non-linear dataset.

o In the above image, we have taken a dataset which is arranged non-linearly.


So if we try to cover it with a linear model, then we can clearly see that it
hardly covers any data point. On the other hand, a curve is suitable to cover
most of the data points, which is of the Polynomial model.
o Hence, if the datasets are arranged in a non-linear fashion, then we should use
the Polynomial Regression model instead of Simple Linear Regression.

Note: A Polynomial Regression algorithm is also called Polynomial Linear Regression


because it does not depend on the variables, instead, it depends on the coefficients,
which are arranged in a linear fashion.

Equation of the Polynomial Regression


Model:
Simple Linear Regression equation: y = b0+b1x .........(a)

Multiple Linear Regression equation: y= b0+b1x+ b2x2+ b3x3+....+ bnxn .


........(b)

Polynomial Regression equation: y= b0+b1x + b2x2+ b3x3+....+ bnxn .......


...(c)

PlayNext
Unmute

Current Time 0:00

Duration 18:10
Loaded: 0.37%
Â
Fullscreen
Backward Skip 10sPlay VideoForward Skip 10s

When we compare the above three equations, we can clearly see that all three
equations are Polynomial equations but differ by the degree of variables. The Simple
and Multiple Linear equations are also Polynomial equations with a single degree,
and the Polynomial regression equation is Linear equation with the nth degree. So if
we add a degree to our linear equations, then it will be converted into Polynomial
Linear equations.

Note: To better understand Polynomial Regression, you must have knowledge of Simple
Linear Regression.
Implementation of Polynomial Regression
using Python:
Here we will implement the Polynomial Regression using Python. We will understand
it by comparing Polynomial Regression model with the Simple Linear Regression
model. So first, let's understand the problem for which we are going to build the
model.

Problem Description: There is a Human Resource company, which is going to hire a


new candidate. The candidate has told his previous salary 160K per annum, and the
HR have to check whether he is telling the truth or bluff. So to identify this, they only
have a dataset of his previous company in which the salaries of the top 10 positions
are mentioned with their levels. By checking the dataset available, we have found
that there is a non-linear relationship between the Position levels and the
salaries. Our goal is to build a Bluffing detector regression model, so HR can hire
an honest candidate. Below are the steps to build such a model.

Steps for Polynomial Regression:


The main steps involved in Polynomial Regression are given below:

o Data Pre-processing
o Build a Linear Regression model and fit it to the dataset
o Build a Polynomial Regression model and fit it to the dataset
o Visualize the result for Linear Regression and Polynomial Regression model.
o Predicting the output.

Note: Here, we will build the Linear regression model as well as Polynomial Regression
to see the results between the predictions. And Linear regression model is for reference.

Data Pre-processing Step:

The data pre-processing step will remain the same as in previous regression models,
except for some changes. In the Polynomial Regression model, we will not use
feature scaling, and also we will not split our dataset into training and test set. It has
two reasons:

o The dataset contains very less information which is not suitable to divide it
into a test and training set, else our model will not be able to find the
correlations between the salaries and levels.
o In this model, we want very accurate predictions for salary, so the model
should have enough information.

The code for pre-processing step is given below:

1. # importing libraries
2. import numpy as nm
3. import matplotlib.pyplot as mtp
4. import pandas as pd
5.
6. #importing datasets
7. data_set= pd.read_csv('Position_Salaries.csv')
8.
9. #Extracting Independent and dependent Variable
10. x= data_set.iloc[:, 1:2].values
11. y= data_set.iloc[:, 2].values

Explanation:

o In the above lines of code, we have imported the important Python libraries to
import dataset and operate on it.
o Next, we have imported the dataset 'Position_Salaries.csv', which contains
three columns (Position, Levels, and Salary), but we will consider only two
columns (Salary and Levels).
o After that, we have extracted the dependent(Y) and independent variable(X)
from the dataset. For x-variable, we have taken parameters as [:,1:2], because
we want 1 index(levels), and included :2 to make it as a matrix.

Output:

By executing the above code, we can read our dataset as:

As we can see in the above output, there are three columns present (Positions,
Levels, and Salaries). But we are only considering two columns because Positions are
equivalent to the levels or may be seen as the encoded form of Positions.

Here we will predict the output for level 6.5 because the candidate has 4+ years'
experience as a regional manager, so he must be somewhere between levels 7 and 6.

Building the Linear regression model:

Now, we will build and fit the Linear regression model to the dataset. In building
polynomial regression, we will take the Linear regression model as reference and
compare both the results. The code is given below:

1. #Fitting the Linear Regression to the dataset


2. from sklearn.linear_model import LinearRegression
3. lin_regs= LinearRegression()
4. lin_regs.fit(x,y)

In the above code, we have created the Simple Linear model using lin_regs object
of LinearRegression class and fitted it to the dataset variables (x and y).

Output:

Out[5]: LinearRegression(copy_X=True, fit_intercept=True, n_jobs=None,


normalize=False)

Building the Polynomial regression model:

Now we will build the Polynomial Regression model, but it will be a little different
from the Simple Linear model. Because here we will use PolynomialFeatures class
of preprocessing library. We are using this class to add some extra features to our
dataset.

1. #Fitting the Polynomial regression to the dataset


2. from sklearn.preprocessing import PolynomialFeatures
3. poly_regs= PolynomialFeatures(degree= 2)
4. x_poly= poly_regs.fit_transform(x)
5. lin_reg_2 =LinearRegression()
6. lin_reg_2.fit(x_poly, y)

In the above lines of code, we have used poly_regs.fit_transform(x), because first


we are converting our feature matrix into polynomial feature matrix, and then fitting
it to the Polynomial regression model. The parameter value(degree= 2) depends on
our choice. We can choose it according to our Polynomial features.

After executing the code, we will get another matrix x_poly, which can be seen under
the variable explorer option:
Next, we have used another LinearRegression object, namely lin_reg_2, to fit
our x_poly vector to the linear model.

Output:

Out[11]: LinearRegression(copy_X=True, fit_intercept=True, n_jobs=None,


normalize=False)

Visualizing the result for Linear regression:

Now we will visualize the result for Linear regression model as we did in Simple
Linear Regression. Below is the code for it:

1. #Visulaizing the result for Linear Regression model


2. mtp.scatter(x,y,color="blue")
3. mtp.plot(x,lin_regs.predict(x), color="red")
4. mtp.title("Bluff detection model(Linear Regression)")
5. mtp.xlabel("Position Levels")
6. mtp.ylabel("Salary")
7. mtp.show()

Output:

In the above output image, we can clearly see that the regression line is so far from
the datasets. Predictions are in a red straight line, and blue points are actual values. If
we consider this output to predict the value of CEO, it will give a salary of approx.
600000$, which is far away from the real value.

So we need a curved model to fit the dataset other than a straight line.

Visualizing the result for Polynomial Regression

Here we will visualize the result of Polynomial regression model, code for which is
little different from the above model.

Code for this is given below:

1. #Visulaizing the result for Polynomial Regression


2. mtp.scatter(x,y,color="blue")
3. mtp.plot(x, lin_reg_2.predict(poly_regs.fit_transform(x)), color="red")
4. mtp.title("Bluff detection model(Polynomial Regression)")
5. mtp.xlabel("Position Levels")
6. mtp.ylabel("Salary")
7. mtp.show()
In the above code, we have taken lin_reg_2.predict(poly_regs.fit_transform(x), instead
of x_poly, because we want a Linear regressor object to predict the polynomial
features matrix.

Output:

As we can see in the above output image, the predictions are close to the real values.
The above plot will vary as we will change the degree.

For degree= 3:

If we change the degree=3, then we will give a more accurate plot, as shown in the
below image.
SO as we can see here in the above output image, the predicted salary for level 6.5 is
near to 170K$-190k$, which seems that future employee is saying the truth about his
salary.

Degree= 4: Let's again change the degree to 4, and now will get the most accurate
plot. Hence we can get more accurate results by increasing the degree of Polynomial.

Predicting the final result with the Linear Regression model:

Now, we will predict the final output using the Linear regression model to see
whether an employee is saying truth or bluff. So, for this, we will use
the predict() method and will pass the value 6.5. Below is the code for it:

1. lin_pred = lin_regs.predict([[6.5]])
2. print(lin_pred)

Output:

[330378.78787879]

Predicting the final result with the Polynomial Regression model:

Now, we will predict the final output using the Polynomial Regression model to
compare with Linear model. Below is the code for it:

1. poly_pred = lin_reg_2.predict(poly_regs.fit_transform([[6.5]]))
2. print(poly_pred)

Output:

[158862.45265153]
As we can see, the predicted output for the Polynomial Regression is
[158862.45265153], which is much closer to real value hence, we can say that future
employee is saying true.

Classification Algorithm in Machine


Learning
As we know, the Supervised Machine Learning algorithm can be broadly classified
into Regression and Classification Algorithms. In Regression algorithms, we have
predicted the output for continuous values, but to predict the categorical values, we
need Classification algorithms.

What is the Classification Algorithm?


The Classification algorithm is a Supervised Learning technique that is used to
identify the category of new observations on the basis of training data. In
Classification, a program learns from the given dataset or observations and then
classifies new observation into a number of classes or groups. Such as, Yes or No, 0
or 1, Spam or Not Spam, cat or dog, etc. Classes can be called as targets/labels or
categories.

Unlike regression, the output variable of Classification is a category, not a value, such
as "Green or Blue", "fruit or animal", etc. Since the Classification algorithm is a
Supervised learning technique, hence it takes labeled input data, which means it
contains input with the corresponding output.

In classification algorithm, a discrete output function(y) is mapped to input


variable(x).

PlayNext
Unmute

Current Time 0:00

Duration 18:10
Loaded: 0.37%
Â
Fullscreen
Backward Skip 10sPlay VideoForward Skip 10s

1. y=f(x), where y = categorical output


The best example of an ML classification algorithm is Email Spam Detector.

The main goal of the Classification algorithm is to identify the category of a given
dataset, and these algorithms are mainly used to predict the output for the
categorical data.

Classification algorithms can be better understood using the below diagram. In the
below diagram, there are two classes, class A and Class B. These classes have features
that are similar to each other and dissimilar to other classes.

The algorithm which implements the classification on a dataset is known as a


classifier. There are two types of Classifications:

o Binary Classifier: If the classification problem has only two possible


outcomes, then it is called as Binary Classifier.
Examples: YES or NO, MALE or FEMALE, SPAM or NOT SPAM, CAT or DOG,
etc.
o Multi-class Classifier: If a classification problem has more than two
outcomes, then it is called as Multi-class Classifier.
Example: Classifications of types of crops, Classification of types of music.

Learners in Classification Problems:


In the classification problems, there are two types of learners:

1. Lazy Learners: Lazy Learner firstly stores the training dataset and wait until it
receives the test dataset. In Lazy learner case, classification is done on the
basis of the most related data stored in the training dataset. It takes less time
in training but more time for predictions.
Example: K-NN algorithm, Case-based reasoning
2. Eager Learners:Eager Learners develop a classification model based on a
training dataset before receiving a test dataset. Opposite to Lazy learners,
Eager Learner takes more time in learning, and less time in
prediction. Example: Decision Trees, Naïve Bayes, ANN.

Types of ML Classification Algorithms:


Classification Algorithms can be further divided into the Mainly two category:

o Linear Models
o Logistic Regression
o Support Vector Machines
o Non-linear Models
o K-Nearest Neighbours
o Kernel SVM
o Naïve Bayes
o Decision Tree Classification
o Random Forest Classification

Note: We will learn the above algorithms in later chapters.

Evaluating a Classification model:


Once our model is completed, it is necessary to evaluate its performance; either it is a
Classification or Regression model. So for evaluating a Classification model, we have
the following ways:

1. Log Loss or Cross-Entropy Loss:

o It is used for evaluating the performance of a classifier, whose output is a


probability value between the 0 and 1.
o For a good binary Classification model, the value of log loss should be near to
0.
o The value of log loss increases if the predicted value deviates from the actual
value.
o The lower log loss represents the higher accuracy of the model.
o For Binary classification, cross-entropy can be calculated as:

1. ?(ylog(p)+(1?y)log(1?p))

Where y= Actual output, p= predicted output.

2. Confusion Matrix:

o The confusion matrix provides us a matrix/table as output and describes the


performance of the model.
o It is also known as the error matrix.
o The matrix consists of predictions result in a summarized form, which has a
total number of correct predictions and incorrect predictions. The matrix looks
like as below table:

Actual Positive Actual Negative


o

Predicted Positive True Positive False Positive

Predicted Negative False Negative True Negative

3. AUC-ROC curve:

o ROC curve stands for Receiver Operating Characteristics Curve and AUC
stands for Area Under the Curve.
o It is a graph that shows the performance of the classification model at
different thresholds.
o To visualize the performance of the multi-class classification model, we use
the AUC-ROC Curve.
o The ROC curve is plotted with TPR and FPR, where TPR (True Positive Rate) on
Y-axis and FPR(False Positive Rate) on X-axis.

Use cases of Classification Algorithms


Classification algorithms can be used in different places. Below are some popular use
cases of Classification Algorithms:

o Email Spam Detection


o Speech Recognition
o Identifications of Cancer tumor cells.
o Drugs Classification
o Biometric Identification, etc.

Logistic Regression in Machine


Learning
o Logistic regression is one of the most popular Machine Learning algorithms,
which comes under the Supervised Learning technique. It is used for
predicting the categorical dependent variable using a given set of
independent variables.
o Logistic regression predicts the output of a categorical dependent variable.
Therefore the outcome must be a categorical or discrete value. It can be either
Yes or No, 0 or 1, true or False, etc. but instead of giving the exact value as 0
and 1, it gives the probabilistic values which lie between 0 and 1.
o Logistic Regression is much similar to the Linear Regression except that how
they are used. Linear Regression is used for solving Regression problems,
whereas Logistic regression is used for solving the classification problems.
o In Logistic regression, instead of fitting a regression line, we fit an "S" shaped
logistic function, which predicts two maximum values (0 or 1).
o The curve from the logistic function indicates the likelihood of something such
as whether the cells are cancerous or not, a mouse is obese or not based on
its weight, etc.
o Logistic Regression is a significant machine learning algorithm because it has
the ability to provide probabilities and classify new data using continuous and
discrete datasets.
o Logistic Regression can be used to classify the observations using different
types of data and can easily determine the most effective variables used for
the classification. The below image is showing the logistic function:

Note: Logistic regression uses the concept of predictive modeling as regression;


therefore, it is called logistic regression, but is used to classify samples; Therefore, it
falls under the classification algorithm.

Logistic Function (Sigmoid Function):

o The sigmoid function is a mathematical function used to map the predicted


values to probabilities.
o It maps any real value into another value within a range of 0 and 1.
o The value of the logistic regression must be between 0 and 1, which cannot go
beyond this limit, so it forms a curve like the "S" form. The S-form curve is
called the Sigmoid function or the logistic function.
o In logistic regression, we use the concept of the threshold value, which defines
the probability of either 0 or 1. Such as values above the threshold value tends
to 1, and a value below the threshold values tends to 0.

Assumptions for Logistic Regression:


o The dependent variable must be categorical in nature.
o The independent variable should not have multi-collinearity.

Logistic Regression Equation:


The Logistic regression equation can be obtained from the Linear Regression
equation. The mathematical steps to get Logistic Regression equations are given
below:

o We know the equation of the straight line can be written as:

o In Logistic Regression y can be between 0 and 1 only, so for this let's divide
the above equation by (1-y):

o But we need range between -[infinity] to +[infinity], then take logarithm of the
equation it will become:

The above equation is the final equation for Logistic Regression.

Type of Logistic Regression:


On the basis of the categories, Logistic Regression can be classified into three types:

o Binomial: In binomial Logistic regression, there can be only two possible


types of the dependent variables, such as 0 or 1, Pass or Fail, etc.
o Multinomial: In multinomial Logistic regression, there can be 3 or more
possible unordered types of the dependent variable, such as "cat", "dogs", or
"sheep"
o Ordinal: In ordinal Logistic regression, there can be 3 or more possible
ordered types of dependent variables, such as "low", "Medium", or "High".
Python Implementation of Logistic Regression
(Binomial)
To understand the implementation of Logistic Regression in Python, we will use the
below example:

PlayNext
Unmute

Current Time 0:00

Duration 18:10
Loaded: 0.37%
Â
Fullscreen
Backward Skip 10sPlay VideoForward Skip 10s

Example: There is a dataset given which contains the information of various users
obtained from the social networking sites. There is a car making company that has
recently launched a new SUV car. So the company wanted to check how many users
from the dataset, wants to purchase the car.

For this problem, we will build a Machine Learning model using the Logistic
regression algorithm. The dataset is shown in the below image. In this problem, we
will predict the purchased variable (Dependent Variable) by using age and salary
(Independent variables).
Steps in Logistic Regression: To implement the Logistic Regression using Python,
we will use the same steps as we have done in previous topics of Regression. Below
are the steps:

o Data Pre-processing step


o Fitting Logistic Regression to the Training set
o Predicting the test result
o Test accuracy of the result(Creation of Confusion matrix)
o Visualizing the test set result.

1. Data Pre-processing step: In this step, we will pre-process/prepare the data so


that we can use it in our code efficiently. It will be the same as we have done in Data
pre-processing topic. The code for this is given below:

1. #Data Pre-procesing Step


2. # importing libraries
3. import numpy as nm
4. import matplotlib.pyplot as mtp
5. import pandas as pd
6.
7. #importing datasets
8. data_set= pd.read_csv('user_data.csv')

By executing the above lines of code, we will get the dataset as the output. Consider
the given image:

Now, we will extract the dependent and independent variables from the given
dataset. Below is the code for it:

1. #Extracting Independent and dependent Variable


2. x= data_set.iloc[:, [2,3]].values
3. y= data_set.iloc[:, 4].values

In the above code, we have taken [2, 3] for x because our independent variables are
age and salary, which are at index 2, 3. And we have taken 4 for y variable because
our dependent variable is at index 4. The output will be:
Now we will split the dataset into a training set and test set. Below is the code for it:

1. # Splitting the dataset into training and test set.


2. from sklearn.model_selection import train_test_split
3. x_train, x_test, y_train, y_test= train_test_split(x, y, test_size= 0.25, random_stat
e=0)

The output for this is given below:


For test
set:

For training set:


In logistic regression, we will do feature scaling because we want accurate result of
predictions. Here we will only scale the independent variable because dependent
variable have only 0 and 1 values. Below is the code for it:

1. #feature Scaling
2. from sklearn.preprocessing import StandardScaler
3. st_x= StandardScaler()
4. x_train= st_x.fit_transform(x_train)
5. x_test= st_x.transform(x_test)

The scaled output is given below:

2. Fitting Logistic Regression to the Training set:

We have well prepared our dataset, and now we will train the dataset using the
training set. For providing training or fitting the model to the training set, we will
import the LogisticRegression class of the sklearn library.

After importing the class, we will create a classifier object and use it to fit the model
to the logistic regression. Below is the code for it:

1. #Fitting Logistic Regression to the training set


2. from sklearn.linear_model import LogisticRegression
3. classifier= LogisticRegression(random_state=0)
4. classifier.fit(x_train, y_train)

Output: By executing the above code, we will get the below output:

Out[5]:

1. LogisticRegression(C=1.0, class_weight=None, dual=False, fit_intercept=True,


2. intercept_scaling=1, l1_ratio=None, max_iter=100,
3. multi_class='warn', n_jobs=None, penalty='l2',
4. random_state=0, solver='warn', tol=0.0001, verbose=0,
5. warm_start=False)

Hence our model is well fitted to the training set.

3. Predicting the Test Result

Our model is well trained on the training set, so we will now predict the result by
using test set data. Below is the code for it:

1. #Predicting the test set result


2. y_pred= classifier.predict(x_test)

In the above code, we have created a y_pred vector to predict the test set result.

Output: By executing the above code, a new vector (y_pred) will be created under
the variable explorer option. It can be seen as:
The above output image shows the corresponding predicted users who want to
purchase or not purchase the car.

4. Test Accuracy of the result

Now we will create the confusion matrix here to check the accuracy of the
classification. To create it, we need to import the confusion_matrix function of the
sklearn library. After importing the function, we will call it using a new variable cm.
The function takes two parameters, mainly y_true( the actual values) and y_pred (the
targeted value return by the classifier). Below is the code for it:

1. #Creating the Confusion matrix


2. from sklearn.metrics import confusion_matrix
3. cm= confusion_matrix()

Output:
By executing the above code, a new confusion matrix will be created. Consider the
below image:

We can find the accuracy of the predicted result by interpreting the confusion matrix.
By above output, we can interpret that 65+24= 89 (Correct Output) and 8+3=
11(Incorrect Output).

5. Visualizing the training set result

Finally, we will visualize the training set result. To visualize the result, we will
use ListedColormap class of matplotlib library. Below is the code for it:

1. #Visualizing the training set result


2. from matplotlib.colors import ListedColormap
3. x_set, y_set = x_train, y_train
4. x1, x2 = nm.meshgrid(nm.arange(start = x_set[:, 0].min() - 1, stop = x_set[:, 0].
max() + 1, step =0.01),
5. nm.arange(start = x_set[:, 1].min() - 1, stop = x_set[:, 1].max() + 1, step = 0.01))
6. mtp.contourf(x1, x2, classifier.predict(nm.array([x1.ravel(), x2.ravel()]).T).reshape
(x1.shape),
7. alpha = 0.75, cmap = ListedColormap(('purple','green' )))
8. mtp.xlim(x1.min(), x1.max())
9. mtp.ylim(x2.min(), x2.max())
10. for i, j in enumerate(nm.unique(y_set)):
11. mtp.scatter(x_set[y_set == j, 0], x_set[y_set == j, 1],
12. c = ListedColormap(('purple', 'green'))(i), label = j)
13. mtp.title('Logistic Regression (Training set)')
14. mtp.xlabel('Age')
15. mtp.ylabel('Estimated Salary')
16. mtp.legend()
17. mtp.show()

In the above code, we have imported the ListedColormap class of Matplotlib library
to create the colormap for visualizing the result. We have created two new
variables x_set and y_set to replace x_train and y_train. After that, we have used
the nm.meshgrid command to create a rectangular grid, which has a range of -
1(minimum) to 1 (maximum). The pixel points we have taken are of 0.01 resolution.

To create a filled contour, we have used mtp.contourf command, it will create


regions of provided colors (purple and green). In this function, we have passed
the classifier.predict to show the predicted data points predicted by the classifier.

Output: By executing the above code, we will get the below output:

The graph can be explained in the below points:

o In the above graph, we can see that there are some Green points within the
green region and Purple points within the purple region.
o All these data points are the observation points from the training set, which
shows the result for purchased variables.
o This graph is made by using two independent variables i.e., Age on the x-
axis and Estimated salary on the y-axis.
o The purple point observations are for which purchased (dependent variable)
is probably 0, i.e., users who did not purchase the SUV car.
o The green point observations are for which purchased (dependent variable)
is probably 1 means user who purchased the SUV car.
o We can also estimate from the graph that the users who are younger with low
salary, did not purchase the car, whereas older users with high estimated
salary purchased the car.
o But there are some purple points in the green region (Buying the car) and
some green points in the purple region(Not buying the car). So we can say
that younger users with a high estimated salary purchased the car, whereas an
older user with a low estimated salary did not purchase the car.

The goal of the classifier:

We have successfully visualized the training set result for the logistic regression, and
our goal for this classification is to divide the users who purchased the SUV car and
who did not purchase the car. So from the output graph, we can clearly see the two
regions (Purple and Green) with the observation points. The Purple region is for
those users who didn't buy the car, and Green Region is for those users who
purchased the car.

Linear Classifier:

As we can see from the graph, the classifier is a Straight line or linear in nature as we
have used the Linear model for Logistic Regression. In further topics, we will learn for
non-linear Classifiers.

Visualizing the test set result:

Our model is well trained using the training dataset. Now, we will visualize the result
for new observations (Test set). The code for the test set will remain same as above
except that here we will use x_test and y_test instead of x_train and y_train. Below
is the code for it:

1. #Visulaizing the test set result


2. from matplotlib.colors import ListedColormap
3. x_set, y_set = x_test, y_test
4. x1, x2 = nm.meshgrid(nm.arange(start = x_set[:, 0].min() - 1, stop = x_set[:, 0].
max() + 1, step =0.01),
5. nm.arange(start = x_set[:, 1].min() - 1, stop = x_set[:, 1].max() + 1, step = 0.01))
6. mtp.contourf(x1, x2, classifier.predict(nm.array([x1.ravel(), x2.ravel()]).T).reshape
(x1.shape),
7. alpha = 0.75, cmap = ListedColormap(('purple','green' )))
8. mtp.xlim(x1.min(), x1.max())
9. mtp.ylim(x2.min(), x2.max())
10. for i, j in enumerate(nm.unique(y_set)):
11. mtp.scatter(x_set[y_set == j, 0], x_set[y_set == j, 1],
12. c = ListedColormap(('purple', 'green'))(i), label = j)
13. mtp.title('Logistic Regression (Test set)')
14. mtp.xlabel('Age')
15. mtp.ylabel('Estimated Salary')
16. mtp.legend()
17. mtp.show()

Output:

The above graph shows the test set result. As we can see, the graph is divided into
two regions (Purple and Green). And Green observations are in the green region, and
Purple observations are in the purple region. So we can say it is a good prediction
and model. Some of the green and purple data points are in different regions, which
can be ignored as we have already calculated this error using the confusion matrix
(11 Incorrect output).

Hence our model is pretty good and ready to make new predictions for this
classification problem.
K-Nearest Neighbor(KNN) Algorithm
for Machine Learning
o K-Nearest Neighbour is one of the simplest Machine Learning algorithms
based on Supervised Learning technique.
o K-NN algorithm assumes the similarity between the new case/data and
available cases and put the new case into the category that is most similar to
the available categories.
o K-NN algorithm stores all the available data and classifies a new data point
based on the similarity. This means when new data appears then it can be
easily classified into a well suite category by using K- NN algorithm.
o K-NN algorithm can be used for Regression as well as for Classification but
mostly it is used for the Classification problems.
o K-NN is a non-parametric algorithm, which means it does not make any
assumption on underlying data.
o It is also called a lazy learner algorithm because it does not learn from the
training set immediately instead it stores the dataset and at the time of
classification, it performs an action on the dataset.
o KNN algorithm at the training phase just stores the dataset and when it gets
new data, then it classifies that data into a category that is much similar to the
new data.
o Example: Suppose, we have an image of a creature that looks similar to cat
and dog, but we want to know either it is a cat or dog. So for this
identification, we can use the KNN algorithm, as it works on a similarity
measure. Our KNN model will find the similar features of the new data set to
the cats and dogs images and based on the most similar features it will put it
in either cat or dog category.
Why do we need a K-NN Algorithm?
Suppose there are two categories, i.e., Category A and Category B, and we have a
new data point x1, so this data point will lie in which of these categories. To solve this
type of problem, we need a K-NN algorithm. With the help of K-NN, we can easily
identify the category or class of a particular dataset. Consider the below diagram:

How does K-NN work?


The K-NN working can be explained on the basis of the below algorithm:

o Step-1: Select the number K of the neighbors


o Step-2: Calculate the Euclidean distance of K number of neighbors
o Step-3: Take the K nearest neighbors as per the calculated Euclidean distance.
o Step-4: Among these k neighbors, count the number of the data points in
each category.
o Step-5: Assign the new data points to that category for which the number of
the neighbor is maximum.
o Step-6: Our model is ready.

Suppose we have a new data point and we need to put it in the required category.
Consider the below image:

o Firstly, we will choose the number of neighbors, so we will choose the k=5.
o Next, we will calculate the Euclidean distance between the data points. The
Euclidean distance is the distance between two points, which we have already
studied in geometry. It can be calculated as:
o By calculating the Euclidean distance we got the nearest neighbors, as three
nearest neighbors in category A and two nearest neighbors in category B.
Consider the below image:
o As we can see the 3 nearest neighbors are from category A, hence this new
data point must belong to category A.

How to select the value of K in the K-NN


Algorithm?
Below are some points to remember while selecting the value of K in the K-NN
algorithm:

PlayNext
Unmute

Current Time 0:00

Duration 18:10
Loaded: 0.37%
Â
Fullscreen
Backward Skip 10sPlay VideoForward Skip 10s

o There is no particular way to determine the best value for "K", so we need to
try some values to find the best out of them. The most preferred value for K is
5.
o A very low value for K such as K=1 or K=2, can be noisy and lead to the effects
of outliers in the model.
o Large values for K are good, but it may find some difficulties.

Advantages of KNN Algorithm:


o It is simple to implement.
o It is robust to the noisy training data
o It can be more effective if the training data is large.

Disadvantages of KNN Algorithm:

o Always needs to determine the value of K which may be complex some time.
o The computation cost is high because of calculating the distance between the
data points for all the training samples.
Python implementation of the KNN
algorithm
To do the Python implementation of the K-NN algorithm, we will use the same
problem and dataset which we have used in Logistic Regression. But here we will
improve the performance of the model. Below is the problem description:

Problem for K-NN Algorithm: There is a Car manufacturer company that has
manufactured a new SUV car. The company wants to give the ads to the users who
are interested in buying that SUV. So for this problem, we have a dataset that
contains multiple user's information through the social network. The dataset contains
lots of information but the Estimated Salary and Age we will consider for the
independent variable and the Purchased variable is for the dependent variable.
Below is the dataset:

Steps to implement the K-NN algorithm:

o Data Pre-processing step


o Fitting the K-NN algorithm to the Training set
o Predicting the test result
o Test accuracy of the result(Creation of Confusion matrix)
o Visualizing the test set result.

Data Pre-Processing Step:

The Data Pre-processing step will remain exactly the same as Logistic Regression.
Below is the code for it:

1. # importing libraries
2. import numpy as nm
3. import matplotlib.pyplot as mtp
4. import pandas as pd
5.
6. #importing datasets
7. data_set= pd.read_csv('user_data.csv')
8.
9. #Extracting Independent and dependent Variable
10. x= data_set.iloc[:, [2,3]].values
11. y= data_set.iloc[:, 4].values
12.
13. # Splitting the dataset into training and test set.
14. from sklearn.model_selection import train_test_split
15. x_train, x_test, y_train, y_test= train_test_split(x, y, test_size= 0.25, random_stat
e=0)
16.
17. #feature Scaling
18. from sklearn.preprocessing import StandardScaler
19. st_x= StandardScaler()
20. x_train= st_x.fit_transform(x_train)
21. x_test= st_x.transform(x_test)

By executing the above code, our dataset is imported to our program and well pre-
processed. After feature scaling our test dataset will look like:
From the above output image, we can see that our data is successfully scaled.

o Fitting K-NN classifier to the Training data:


Now we will fit the K-NN classifier to the training data. To do this we will
import the KNeighborsClassifier class of Sklearn Neighbors library. After
importing the class, we will create the Classifier object of the class. The
Parameter of this class will be
o n_neighbors: To define the required neighbors of the algorithm.
Usually, it takes 5.
o metric='minkowski': This is the default parameter and it decides the
distance between the points.
o p=2: It is equivalent to the standard Euclidean metric.

And then we will fit the classifier to the training data. Below is the code for it:

1. #Fitting K-NN classifier to the training set


2. from sklearn.neighbors import KNeighborsClassifier
3. classifier= KNeighborsClassifier(n_neighbors=5, metric='minkowski', p=2 )
4. classifier.fit(x_train, y_train)

Output: By executing the above code, we will get the output as:
Out[10]:
KNeighborsClassifier(algorithm='auto', leaf_size=30, metric='minkowski',
metric_params=None, n_jobs=None, n_neighbors=5, p=2,
weights='uniform')

o Predicting the Test Result: To predict the test set result, we will create
a y_pred vector as we did in Logistic Regression. Below is the code for it:

1. #Predicting the test set result


2. y_pred= classifier.predict(x_test)

Output:

The output for the above code will be:

o Creating the Confusion Matrix:


Now we will create the Confusion Matrix for our K-NN model to see the
accuracy of the classifier. Below is the code for it:
1. #Creating the Confusion matrix
2. from sklearn.metrics import confusion_matrix
3. cm= confusion_matrix(y_test, y_pred)

In above code, we have imported the confusion_matrix function and called it using
the variable cm.

Output: By executing the above code, we will get the matrix as below:

In the above image, we can see there are 64+29= 93 correct predictions and 3+4= 7
incorrect predictions, whereas, in Logistic Regression, there were 11 incorrect
predictions. So we can say that the performance of the model is improved by using
the K-NN algorithm.

o Visualizing the Training set result:


Now, we will visualize the training set result for K-NN model. The code will
remain same as we did in Logistic Regression, except the name of the graph.
Below is the code for it:

1. #Visulaizing the trianing set result


2. from matplotlib.colors import ListedColormap
3. x_set, y_set = x_train, y_train
4. x1, x2 = nm.meshgrid(nm.arange(start = x_set[:, 0].min() - 1, stop = x_set[:, 0].
max() + 1, step =0.01),
5. nm.arange(start = x_set[:, 1].min() - 1, stop = x_set[:, 1].max() + 1, step = 0.01))
6. mtp.contourf(x1, x2, classifier.predict(nm.array([x1.ravel(), x2.ravel()]).T).reshape
(x1.shape),
7. alpha = 0.75, cmap = ListedColormap(('red','green' )))
8. mtp.xlim(x1.min(), x1.max())
9. mtp.ylim(x2.min(), x2.max())
10. for i, j in enumerate(nm.unique(y_set)):
11. mtp.scatter(x_set[y_set == j, 0], x_set[y_set == j, 1],
12. c = ListedColormap(('red', 'green'))(i), label = j)
13. mtp.title('K-NN Algorithm (Training set)')
14. mtp.xlabel('Age')
15. mtp.ylabel('Estimated Salary')
16. mtp.legend()
17. mtp.show()

Output:

By executing the above code, we will get the below graph:

The output graph is different from the graph which we have occurred in Logistic
Regression. It can be understood in the below points:

o As we can see the graph is showing the red point and green points. The
green points are for Purchased(1) and Red Points for not Purchased(0)
variable.
o The graph is showing an irregular boundary instead of showing any
straight line or any curve because it is a K-NN algorithm, i.e., finding
the nearest neighbor.
o The graph has classified users in the correct categories as most of the
users who didn't buy the SUV are in the red region and users who
bought the SUV are in the green region.
o The graph is showing good result but still, there are some green points
in the red region and red points in the green region. But this is no big
issue as by doing this model is prevented from overfitting issues.
o Hence our model is well trained.
o Visualizing the Test set result:
After the training of the model, we will now test the result by putting a new
dataset, i.e., Test dataset. Code remains the same except some minor changes:
such as x_train and y_train will be replaced by x_test and y_test.
Below is the code for it:

1. #Visualizing the test set result


2. from matplotlib.colors import ListedColormap
3. x_set, y_set = x_test, y_test
4. x1, x2 = nm.meshgrid(nm.arange(start = x_set[:, 0].min() - 1, stop = x_set[:, 0].
max() + 1, step =0.01),
5. nm.arange(start = x_set[:, 1].min() - 1, stop = x_set[:, 1].max() + 1, step = 0.01))
6. mtp.contourf(x1, x2, classifier.predict(nm.array([x1.ravel(), x2.ravel()]).T).reshape
(x1.shape),
7. alpha = 0.75, cmap = ListedColormap(('red','green' )))
8. mtp.xlim(x1.min(), x1.max())
9. mtp.ylim(x2.min(), x2.max())
10. for i, j in enumerate(nm.unique(y_set)):
11. mtp.scatter(x_set[y_set == j, 0], x_set[y_set == j, 1],
12. c = ListedColormap(('red', 'green'))(i), label = j)
13. mtp.title('K-NN algorithm(Test set)')
14. mtp.xlabel('Age')
15. mtp.ylabel('Estimated Salary')
16. mtp.legend()
17. mtp.show()

Output:
The above graph is showing the output for the test data set. As we can see in the
graph, the predicted output is well good as most of the red points are in the red
region and most of the green points are in the green region.

However, there are few green points in the red region and a few red points in the
green region. So these are the incorrect observations that we have observed in the
confusion matrix(7 Incorrect output).

Support Vector Machine Algorithm


Support Vector Machine or SVM is one of the most popular Supervised Learning
algorithms, which is used for Classification as well as Regression problems. However,
primarily, it is used for Classification problems in Machine Learning.

The goal of the SVM algorithm is to create the best line or decision boundary that
can segregate n-dimensional space into classes so that we can easily put the new
data point in the correct category in the future. This best decision boundary is called
a hyperplane.

SVM chooses the extreme points/vectors that help in creating the hyperplane. These
extreme cases are called as support vectors, and hence algorithm is termed as
Support Vector Machine. Consider the below diagram in which there are two
different categories that are classified using a decision boundary or hyperplane:
Example: SVM can be understood with the example that we have used in the KNN
classifier. Suppose we see a strange cat that also has some features of dogs, so if we
want a model that can accurately identify whether it is a cat or dog, so such a model
can be created by using the SVM algorithm. We will first train our model with lots of
images of cats and dogs so that it can learn about different features of cats and
dogs, and then we test it with this strange creature. So as support vector creates a
decision boundary between these two data (cat and dog) and choose extreme cases
(support vectors), it will see the extreme case of cat and dog. On the basis of the
support vectors, it will classify it as a cat. Consider the below diagram:

PlayNext
Unmute

Current Time 0:00

Duration 18:10
Loaded: 0.37%
Â
Fullscreen
Backward Skip 10sPlay VideoForward Skip 10s
SVM algorithm can be used for Face detection, image classification, text
categorization, etc.

Types of SVM
SVM can be of two types:

o Linear SVM: Linear SVM is used for linearly separable data, which means if a
dataset can be classified into two classes by using a single straight line, then
such data is termed as linearly separable data, and classifier is used called as
Linear SVM classifier.
o Non-linear SVM: Non-Linear SVM is used for non-linearly separated data,
which means if a dataset cannot be classified by using a straight line, then
such data is termed as non-linear data and classifier used is called as Non-
linear SVM classifier.

Hyperplane and Support Vectors in the SVM


algorithm:
Hyperplane: There can be multiple lines/decision boundaries to segregate the
classes in n-dimensional space, but we need to find out the best decision boundary
that helps to classify the data points. This best boundary is known as the hyperplane
of SVM.
The dimensions of the hyperplane depend on the features present in the dataset,
which means if there are 2 features (as shown in image), then hyperplane will be a
straight line. And if there are 3 features, then hyperplane will be a 2-dimension plane.

We always create a hyperplane that has a maximum margin, which means the
maximum distance between the data points.

Support Vectors:

The data points or vectors that are the closest to the hyperplane and which affect the
position of the hyperplane are termed as Support Vector. Since these vectors support
the hyperplane, hence called a Support vector.

How does SVM works?


Linear SVM:

The working of the SVM algorithm can be understood by using an example. Suppose
we have a dataset that has two tags (green and blue), and the dataset has two
features x1 and x2. We want a classifier that can classify the pair(x1, x2) of
coordinates in either green or blue. Consider the below image:

So as it is 2-d space so by just using a straight line, we can easily separate these two
classes. But there can be multiple lines that can separate these classes. Consider the
below image:
Hence, the SVM algorithm helps to find the best line or decision boundary; this best
boundary or region is called as a hyperplane. SVM algorithm finds the closest point
of the lines from both the classes. These points are called support vectors. The
distance between the vectors and the hyperplane is called as margin. And the goal
of SVM is to maximize this margin. The hyperplane with maximum margin is called
the optimal hyperplane.
Non-Linear SVM:

If data is linearly arranged, then we can separate it by using a straight line, but for
non-linear data, we cannot draw a single straight line. Consider the below image:
So to separate these data points, we need to add one more dimension. For linear
data, we have used two dimensions x and y, so for non-linear data, we will add a
third dimension z. It can be calculated as:

z=x2 +y2

By adding the third dimension, the sample space will become as below image:

So now, SVM will divide the datasets into classes in the following way. Consider the
below image:
Since we are in 3-d Space, hence it is looking like a plane parallel to the x-axis. If we
convert it in 2d space with z=1, then it will become as:

Hence we get a circumference of radius 1 in case of non-linear data.


Python Implementation of Support Vector Machine

Now we will implement the SVM algorithm using Python. Here we will use the same
dataset user_data, which we have used in Logistic regression and KNN classification.

o Data Pre-processing step

Till the Data pre-processing step, the code will remain the same. Below is the code:

1. #Data Pre-processing Step


2. # importing libraries
3. import numpy as nm
4. import matplotlib.pyplot as mtp
5. import pandas as pd
6.
7. #importing datasets
8. data_set= pd.read_csv('user_data.csv')
9.
10. #Extracting Independent and dependent Variable
11. x= data_set.iloc[:, [2,3]].values
12. y= data_set.iloc[:, 4].values
13.
14. # Splitting the dataset into training and test set.
15. from sklearn.model_selection import train_test_split
16. x_train, x_test, y_train, y_test= train_test_split(x, y, test_size= 0.25, random_stat
e=0)
17. #feature Scaling
18. from sklearn.preprocessing import StandardScaler
19. st_x= StandardScaler()
20. x_train= st_x.fit_transform(x_train)
21. x_test= st_x.transform(x_test)

After executing the above code, we will pre-process the data. The code will give the
dataset as:
The scaled output for the test set will be:
Fitting the SVM classifier to the training set:

Now the training set will be fitted to the SVM classifier. To create the SVM classifier,
we will import SVC class from Sklearn.svm library. Below is the code for it:

1. from sklearn.svm import SVC # "Support vector classifier"


2. classifier = SVC(kernel='linear', random_state=0)
3. classifier.fit(x_train, y_train)

In the above code, we have used kernel='linear', as here we are creating SVM for
linearly separable data. However, we can change it for non-linear data. And then we
fitted the classifier to the training dataset(x_train, y_train)

Output:

Out[8]:
SVC(C=1.0, cache_size=200, class_weight=None, coef0=0.0,
decision_function_shape='ovr', degree=3, gamma='auto_deprecated',
kernel='linear', max_iter=-1, probability=False, random_state=0,
shrinking=True, tol=0.001, verbose=False)

The model performance can be altered by changing the value of C(Regularization


factor), gamma, and kernel.
o Predicting the test set result:
Now, we will predict the output for test set. For this, we will create a new
vector y_pred. Below is the code for it:

1. #Predicting the test set result


2. y_pred= classifier.predict(x_test)

After getting the y_pred vector, we can compare the result of y_pred and y_test to
check the difference between the actual value and predicted value.

Output: Below is the output for the prediction of the test set:

o Creating the confusion matrix:


Now we will see the performance of the SVM classifier that how many
incorrect predictions are there as compared to the Logistic regression
classifier. To create the confusion matrix, we need to import
the confusion_matrix function of the sklearn library. After importing the
function, we will call it using a new variable cm. The function takes two
parameters, mainly y_true( the actual values) and y_pred (the targeted value
return by the classifier). Below is the code for it:

1. #Creating the Confusion matrix


2. from sklearn.metrics import confusion_matrix
3. cm= confusion_matrix(y_test, y_pred)

Output:

As we can see in the above output image, there are 66+24= 90 correct predictions
and 8+2= 10 correct predictions. Therefore we can say that our SVM model
improved as compared to the Logistic regression model.

o Visualizing the training set result:


Now we will visualize the training set result, below is the code for it:

1. from matplotlib.colors import ListedColormap


2. x_set, y_set = x_train, y_train
3. x1, x2 = nm.meshgrid(nm.arange(start = x_set[:, 0].min() - 1, stop = x_set[:, 0].
max() + 1, step =0.01),
4. nm.arange(start = x_set[:, 1].min() - 1, stop = x_set[:, 1].max() + 1, step = 0.01))
5. mtp.contourf(x1, x2, classifier.predict(nm.array([x1.ravel(), x2.ravel()]).T).reshape
(x1.shape),
6. alpha = 0.75, cmap = ListedColormap(('red', 'green')))
7. mtp.xlim(x1.min(), x1.max())
8. mtp.ylim(x2.min(), x2.max())
9. for i, j in enumerate(nm.unique(y_set)):
10. mtp.scatter(x_set[y_set == j, 0], x_set[y_set == j, 1],
11. c = ListedColormap(('red', 'green'))(i), label = j)
12. mtp.title('SVM classifier (Training set)')
13. mtp.xlabel('Age')
14. mtp.ylabel('Estimated Salary')
15. mtp.legend()
16. mtp.show()

Output:

By executing the above code, we will get the output as:

As we can see, the above output is appearing similar to the Logistic regression
output. In the output, we got the straight line as hyperplane because we have used a
linear kernel in the classifier. And we have also discussed above that for the 2d
space, the hyperplane in SVM is a straight line.

o Visualizing the test set result:

1. #Visulaizing the test set result


2. from matplotlib.colors import ListedColormap
3. x_set, y_set = x_test, y_test
4. x1, x2 = nm.meshgrid(nm.arange(start = x_set[:, 0].min() - 1, stop = x_set[:, 0].
max() + 1, step =0.01),
5. nm.arange(start = x_set[:, 1].min() - 1, stop = x_set[:, 1].max() + 1, step = 0.01))
6. mtp.contourf(x1, x2, classifier.predict(nm.array([x1.ravel(), x2.ravel()]).T).reshape
(x1.shape),
7. alpha = 0.75, cmap = ListedColormap(('red','green' )))
8. mtp.xlim(x1.min(), x1.max())
9. mtp.ylim(x2.min(), x2.max())
10. for i, j in enumerate(nm.unique(y_set)):
11. mtp.scatter(x_set[y_set == j, 0], x_set[y_set == j, 1],
12. c = ListedColormap(('red', 'green'))(i), label = j)
13. mtp.title('SVM classifier (Test set)')
14. mtp.xlabel('Age')
15. mtp.ylabel('Estimated Salary')
16. mtp.legend()
17. mtp.show()

Output:

By executing the above code, we will get the output as:

As we can see in the above output image, the SVM classifier has divided the users
into two regions (Purchased or Not purchased). Users who purchased the SUV are in
the red region with the red scatter points. And users who did not purchase the SUV
are in the green region with green scatter points. The hyperplane has divided the two
classes into Purchased and not purchased variable.

Naïve Bayes Classifier Algorithm


o Naïve Bayes algorithm is a supervised learning algorithm, which is based
on Bayes theorem and used for solving classification problems.
o It is mainly used in text classification that includes a high-dimensional training
dataset.
o Naïve Bayes Classifier is one of the simple and most effective Classification
algorithms which helps in building the fast machine learning models that can
make quick predictions.
o It is a probabilistic classifier, which means it predicts on the basis of the
probability of an object.
o Some popular examples of Naïve Bayes Algorithm are spam filtration,
Sentimental analysis, and classifying articles.

Why is it called Naïve Bayes?


The Naïve Bayes algorithm is comprised of two words Naïve and Bayes, Which can be
described as:

o Naïve: It is called Naïve because it assumes that the occurrence of a certain


feature is independent of the occurrence of other features. Such as if the fruit
is identified on the bases of color, shape, and taste, then red, spherical, and
sweet fruit is recognized as an apple. Hence each feature individually
contributes to identify that it is an apple without depending on each other.
o Bayes: It is called Bayes because it depends on the principle of Bayes'
Theorem.

Bayes' Theorem:
o Bayes' theorem is also known as Bayes' Rule or Bayes' law, which is used to
determine the probability of a hypothesis with prior knowledge. It depends on
the conditional probability.
o The formula for Bayes' theorem is given as:

Where,

P(A|B) is Posterior probability: Probability of hypothesis A on the observed event B.


P(B|A) is Likelihood probability: Probability of the evidence given that the
probability of a hypothesis is true.

PlayNext
Unmute

Current Time 0:00

Duration 18:10
Loaded: 0.37%
Â
Fullscreen
Backward Skip 10sPlay VideoForward Skip 10s

P(A) is Prior Probability: Probability of hypothesis before observing the evidence.

P(B) is Marginal Probability: Probability of Evidence.

Working of Naïve Bayes' Classifier:


Working of Naïve Bayes' Classifier can be understood with the help of the below
example:

Suppose we have a dataset of weather conditions and corresponding target


variable "Play". So using this dataset we need to decide that whether we should play
or not on a particular day according to the weather conditions. So to solve this
problem, we need to follow the below steps:

1. Convert the given dataset into frequency tables.


2. Generate Likelihood table by finding the probabilities of given features.
3. Now, use Bayes theorem to calculate the posterior probability.

Problem: If the weather is sunny, then the Player should play or not?

Solution: To solve this, first consider the below dataset:

Outlook Play

0 Rainy Yes

1 Sunny Yes
2 Overcast Yes

3 Overcast Yes

4 Sunny No

5 Rainy Yes

6 Sunny Yes

7 Overcast Yes

8 Rainy No

9 Sunny No

10 Sunny Yes

11 Rainy No

12 Overcast Yes

13 Overcast Yes

Frequency table for the Weather Conditions:

Weather Yes No

Overcast 5 0

Rainy 2 2

Sunny 3 2

Total 10 5

Likelihood table weather condition:

Weather No Yes

Overcast 0 5 5/14=
0.35

Rainy 2 2 4/14=0.29
Sunny 2 3 5/14=0.35

All 4/14=0.29 10/14=0.71

Applying Bayes'theorem:

P(Yes|Sunny)= P(Sunny|Yes)*P(Yes)/P(Sunny)

P(Sunny|Yes)= 3/10= 0.3

P(Sunny)= 0.35

P(Yes)=0.71

So P(Yes|Sunny) = 0.3*0.71/0.35= 0.60

P(No|Sunny)= P(Sunny|No)*P(No)/P(Sunny)

P(Sunny|NO)= 2/4=0.5

P(No)= 0.29

P(Sunny)= 0.35

So P(No|Sunny)= 0.5*0.29/0.35 = 0.41

So as we can see from the above calculation that P(Yes|Sunny)>P(No|Sunny)

Hence on a Sunny day, Player can play the game.

Advantages of Naïve Bayes Classifier:

o Naïve Bayes is one of the fast and easy ML algorithms to predict a class of
datasets.
o It can be used for Binary as well as Multi-class Classifications.
o It performs well in Multi-class predictions as compared to the other
Algorithms.
o It is the most popular choice for text classification problems.

Disadvantages of Naïve Bayes Classifier:


o Naive Bayes assumes that all features are independent or unrelated, so it
cannot learn the relationship between features.

Applications of Naïve Bayes Classifier:

o It is used for Credit Scoring.


o It is used in medical data classification.
o It can be used in real-time predictions because Naïve Bayes Classifier is an
eager learner.
o It is used in Text classification such as Spam filtering and Sentiment
analysis.

Types of Naïve Bayes Model:


There are three types of Naive Bayes Model, which are given below:

o Gaussian: The Gaussian model assumes that features follow a normal


distribution. This means if predictors take continuous values instead of
discrete, then the model assumes that these values are sampled from the
Gaussian distribution.
o Multinomial: The Multinomial Naïve Bayes classifier is used when the data is
multinomial distributed. It is primarily used for document classification
problems, it means a particular document belongs to which category such as
Sports, Politics, education, etc.
The classifier uses the frequency of words for the predictors.
o Bernoulli: The Bernoulli classifier works similar to the Multinomial classifier,
but the predictor variables are the independent Booleans variables. Such as if
a particular word is present or not in a document. This model is also famous
for document classification tasks.

Python Implementation of the Naïve Bayes


algorithm:
Now we will implement a Naive Bayes Algorithm using Python. So for this, we will
use the "user_data" dataset, which we have used in our other classification model.
Therefore we can easily compare the Naive Bayes model with the other models.
Steps to implement:

o Data Pre-processing step


o Fitting Naive Bayes to the Training set
o Predicting the test result
o Test accuracy of the result(Creation of Confusion matrix)
o Visualizing the test set result.

1) Data Pre-processing step:


In this step, we will pre-process/prepare the data so that we can use it efficiently in
our code. It is similar as we did in data-pre-processing. The code for this is given
below:

1. Importing the libraries


2. import numpy as nm
3. import matplotlib.pyplot as mtp
4. import pandas as pd
5.
6. # Importing the dataset
7. dataset = pd.read_csv('user_data.csv')
8. x = dataset.iloc[:, [2, 3]].values
9. y = dataset.iloc[:, 4].values
10.
11. # Splitting the dataset into the Training set and Test set
12. from sklearn.model_selection import train_test_split
13. x_train, x_test, y_train, y_test = train_test_split(x, y, test_size = 0.25, random_sta
te = 0)
14.
15. # Feature Scaling
16. from sklearn.preprocessing import StandardScaler
17. sc = StandardScaler()
18. x_train = sc.fit_transform(x_train)
19. x_test = sc.transform(x_test)

In the above code, we have loaded the dataset into our program using "dataset =
pd.read_csv('user_data.csv'). The loaded dataset is divided into training and test
set, and then we have scaled the feature variable.
The output for the dataset is given as:

2) Fitting Naive Bayes to the Training Set:


After the pre-processing step, now we will fit the Naive Bayes model to the Training
set. Below is the code for it:

1. # Fitting Naive Bayes to the Training set


2. from sklearn.naive_bayes import GaussianNB
3. classifier = GaussianNB()
4. classifier.fit(x_train, y_train)

In the above code, we have used the GaussianNB classifier to fit it to the training
dataset. We can also use other classifiers as per our requirement.
Output:

Out[6]: GaussianNB(priors=None, var_smoothing=1e-09)

3) Prediction of the test set result:


Now we will predict the test set result. For this, we will create a new predictor
variable y_pred, and will use the predict function to make the predictions.

1. # Predicting the Test set results


2. y_pred = classifier.predict(x_test)

Output:

The above output shows the result for prediction vector y_pred and real vector
y_test. We can see that some predications are different from the real values, which
are the incorrect predictions.
4) Creating Confusion Matrix:
Now we will check the accuracy of the Naive Bayes classifier using the Confusion
matrix. Below is the code for it:

1. # Making the Confusion Matrix


2. from sklearn.metrics import confusion_matrix
3. cm = confusion_matrix(y_test, y_pred)

Output:

As we can see in the above confusion matrix output, there are 7+3= 10 incorrect
predictions, and 65+25=90 correct predictions.

5) Visualizing the training set result:


Next we will visualize the training set result using Naïve Bayes Classifier. Below is the
code for it:

1. # Visualising the Training set results


2. from matplotlib.colors import ListedColormap
3. x_set, y_set = x_train, y_train
4. X1, X2 = nm.meshgrid(nm.arange(start = x_set[:, 0].min() - 1, stop = x_set[:, 0].
max() + 1, step = 0.01),
5. nm.arange(start = x_set[:, 1].min() - 1, stop = x_set[:, 1].max() + 1,
step = 0.01))
6. mtp.contourf(X1, X2, classifier.predict(nm.array([X1.ravel(), X2.ravel()]).T).reshap
e(X1.shape),
7. alpha = 0.75, cmap = ListedColormap(('purple', 'green')))
8. mtp.xlim(X1.min(), X1.max())
9. mtp.ylim(X2.min(), X2.max())
10. for i, j in enumerate(nm.unique(y_set)):
11. mtp.scatter(x_set[y_set == j, 0], x_set[y_set == j, 1],
12. c = ListedColormap(('purple', 'green'))(i), label = j)
13. mtp.title('Naive Bayes (Training set)')
14. mtp.xlabel('Age')
15. mtp.ylabel('Estimated Salary')
16. mtp.legend()
17. mtp.show()

Output:

In the above output we can see that the Naïve Bayes classifier has segregated the
data points with the fine boundary. It is Gaussian curve as we have
used GaussianNB classifier in our code.

6) Visualizing the Test set result:

1. # Visualising the Test set results


2. from matplotlib.colors import ListedColormap
3. x_set, y_set = x_test, y_test
4. X1, X2 = nm.meshgrid(nm.arange(start = x_set[:, 0].min() - 1, stop = x_set[:, 0].
max() + 1, step = 0.01),
5. nm.arange(start = x_set[:, 1].min() - 1, stop = x_set[:, 1].max() + 1,
step = 0.01))
6. mtp.contourf(X1, X2, classifier.predict(nm.array([X1.ravel(), X2.ravel()]).T).reshap
e(X1.shape),
7. alpha = 0.75, cmap = ListedColormap(('purple', 'green')))
8. mtp.xlim(X1.min(), X1.max())
9. mtp.ylim(X2.min(), X2.max())
10. for i, j in enumerate(nm.unique(y_set)):
11. mtp.scatter(x_set[y_set == j, 0], x_set[y_set == j, 1],
12. c = ListedColormap(('purple', 'green'))(i), label = j)
13. mtp.title('Naive Bayes (test set)')
14. mtp.xlabel('Age')
15. mtp.ylabel('Estimated Salary')
16. mtp.legend()
17. mtp.show()

Output:

The above output is final output for test set data. As we can see the classifier has
created a Gaussian curve to divide the "purchased" and "not purchased" variables.
There are some wrong predictions which we have calculated in Confusion matrix. But
still it is pretty good classifier.
Regression vs. Classification in
Machine Learning
Regression and Classification algorithms are Supervised Learning algorithms. Both
the algorithms are used for prediction in Machine learning and work with the labeled
datasets. But the difference between both is how they are used for different machine
learning problems.

The main difference between Regression and Classification algorithms that


Regression algorithms are used to predict the continuous values such as price,
salary, age, etc. and Classification algorithms are used to predict/Classify the
discrete values such as Male or Female, True or False, Spam or Not Spam, etc.

Consider the below diagram:

Classification:
Classification is a process of finding a function which helps in dividing the dataset
into classes based on different parameters. In Classification, a computer program is
trained on the training dataset and based on that training, it categorizes the data
into different classes.

PlayNext
Unmute

Current Time 0:00


/

Duration 18:10
Loaded: 0.37%
Â
Fullscreen
Backward Skip 10sPlay VideoForward Skip 10s

The task of the classification algorithm is to find the mapping function to map the
input(x) to the discrete output(y).

Example: The best example to understand the Classification problem is Email Spam
Detection. The model is trained on the basis of millions of emails on different
parameters, and whenever it receives a new email, it identifies whether the email is
spam or not. If the email is spam, then it is moved to the Spam folder.

Types of ML Classification Algorithms:

Classification Algorithms can be further divided into the following types:

o Logistic Regression
o K-Nearest Neighbours
o Support Vector Machines
o Kernel SVM
o Naïve Bayes
o Decision Tree Classification
o Random Forest Classification

Regression:
Regression is a process of finding the correlations between dependent and
independent variables. It helps in predicting the continuous variables such as
prediction of Market Trends, prediction of House prices, etc.

The task of the Regression algorithm is to find the mapping function to map the
input variable(x) to the continuous output variable(y).

Example: Suppose we want to do weather forecasting, so for this, we will use the
Regression algorithm. In weather prediction, the model is trained on the past data,
and once the training is completed, it can easily predict the weather for future days.
Types of Regression Algorithm:

o Simple Linear Regression


o Multiple Linear Regression
o Polynomial Regression
o Support Vector Regression
o Decision Tree Regression
o Random Forest Regression

Difference between Regression and Classification

Regression Algorithm Classification Algorithm

In Regression, the output variable must be In Classification, the output variable


of continuous nature or real value. must be a discrete value.

The task of the regression algorithm is to The task of the classification


map the input value (x) with the continuous algorithm is to map the input
output variable(y). value(x) with the discrete output
variable(y).

Regression Algorithms are used with Classification Algorithms are used


continuous data. with discrete data.

In Regression, we try to find the best fit line, In Classification, we try to find the
which can predict the output more decision boundary, which can
accurately. divide the dataset into different
classes.

Regression algorithms can be used to solve Classification Algorithms can be


the regression problems such as Weather used to solve classification
Prediction, House price prediction, etc. problems such as Identification of
spam emails, Speech Recognition,
Identification of cancer cells, etc.

The regression Algorithm can be further The Classification algorithms can


divided into Linear and Non-linear be divided into Binary Classifier and
Regression. Multi-class Classifier.
Linear Regression vs Logistic
Regression
Linear Regression and Logistic Regression are the two famous Machine Learning
Algorithms which come under supervised learning technique. Since both the
algorithms are of supervised in nature hence these algorithms use labeled dataset to
make the predictions. But the main difference between them is how they are being
used. The Linear Regression is used for solving Regression problems whereas Logistic
Regression is used for solving the Classification problems. The description of both
the algorithms is given below along with difference table.

Linear Regression:

o Linear Regression is one of the most simple Machine learning algorithm that
comes under Supervised Learning technique and used for solving regression
problems.
o It is used for predicting the continuous dependent variable with the help of
independent variables.
o The goal of the Linear regression is to find the best fit line that can accurately
predict the output for the continuous dependent variable.
o If single independent variable is used for prediction then it is called Simple
Linear Regression and if there are more than two independent variables then
such regression is called as Multiple Linear Regression.
o By finding the best fit line, algorithm establish the relationship between
dependent variable and independent variable. And the relationship should be
of linear nature.
o The output for Linear regression should only be the continuous values such as
price, age, salary, etc. The relationship between the dependent variable and
independent variable can be shown in below image:

In above image the dependent variable is on Y-axis (salary) and independent variable
is on x-axis(experience). The regression line can be written as:

y= a0+a1x+ ε

Where, a0 and a1 are the coefficients and ε is the error term.

Logistic Regression:

o Logistic regression is one of the most popular Machine learning algorithm


that comes under Supervised Learning techniques.
o It can be used for Classification as well as for Regression problems, but mainly
used for Classification problems.
o Logistic regression is used to predict the categorical dependent variable with
the help of independent variables.
o The output of Logistic Regression problem can be only between the 0 and 1.
o Logistic regression can be used where the probabilities between two classes is
required. Such as whether it will rain today or not, either 0 or 1, true or false
etc.
o Logistic regression is based on the concept of Maximum Likelihood
estimation. According to this estimation, the observed data should be most
probable.
o In logistic regression, we pass the weighted sum of inputs through an
activation function that can map values in between 0 and 1. Such activation
function is known as sigmoid function and the curve obtained is called as
sigmoid curve or S-curve. Consider the below image:

o The equation for logistic regression is:

Difference between Linear Regression and Logistic Regression:

PlayNext
Unmute

Current Time 0:00

Duration 18:10
Loaded: 0.37%
Â
Fullscreen
Backward Skip 10sPlay VideoForward Skip 10s

Logistic Regression
Linear Regression

Linear regression is used to predict the Logistic Regression is used to predict


continuous dependent variable using a the categorical dependent variable
given set of independent variables. using a given set of independent
variables.

Linear Regression is used for solving Logistic regression is used for solving
Regression problem. Classification problems.

In Linear regression, we predict the In logistic Regression, we predict the


value of continuous variables. values of categorical variables.

In linear regression, we find the best fit In Logistic Regression, we find the S-
line, by which we can easily predict the curve by which we can classify the
output. samples.

Least square estimation method is used Maximum likelihood estimation method


for estimation of accuracy. is used for estimation of accuracy.

The output for Linear Regression must The output of Logistic Regression must
be a continuous value, such as price, be a Categorical value such as 0 or 1,
age, etc. Yes or No, etc.

In Linear regression, it is required that In Logistic regression, it is not required


relationship between dependent to have the linear relationship between
variable and independent variable must the dependent and independent
be linear. variable.

In linear regression, there may be In logistic regression, there should not


collinearity between the independent be collinearity between the
variables. independent variable.

Decision Tree Classification


Algorithm
o Decision Tree is a Supervised learning technique that can be used for both
classification and Regression problems, but mostly it is preferred for solving
Classification problems. It is a tree-structured classifier, where internal nodes
represent the features of a dataset, branches represent the decision
rules and each leaf node represents the outcome.
o In a Decision tree, there are two nodes, which are the Decision
Node and Leaf Node. Decision nodes are used to make any decision and
have multiple branches, whereas Leaf nodes are the output of those decisions
and do not contain any further branches.
o The decisions or the test are performed on the basis of features of the given
dataset.
o It is a graphical representation for getting all the possible solutions to a
problem/decision based on given conditions.
o It is called a decision tree because, similar to a tree, it starts with the root
node, which expands on further branches and constructs a tree-like structure.
o In order to build a tree, we use the CART algorithm, which stands
for Classification and Regression Tree algorithm.
o A decision tree simply asks a question, and based on the answer (Yes/No), it
further split the tree into subtrees.
o Below diagram explains the general structure of a decision tree:

Note: A decision tree can contain categorical data (YES/NO) as well as numeric data.
Why use Decision Trees?
There are various algorithms in Machine learning, so choosing the best algorithm for
the given dataset and problem is the main point to remember while creating a
machine learning model. Below are the two reasons for using the Decision tree:

o Decision Trees usually mimic human thinking ability while making a decision,
so it is easy to understand.
o The logic behind the decision tree can be easily understood because it shows
a tree-like structure.

Decision Tree Terminologies


 Root Node: Root node is from where the decision tree starts. It represents the entire
dataset, which further gets divided into two or more homogeneous sets.
 Leaf Node: Leaf nodes are the final output node, and the tree cannot be segregated
further after getting a leaf node.
 Splitting: Splitting is the process of dividing the decision node/root node into sub-nodes
according to the given conditions.
 Branch/Sub Tree: A tree formed by splitting the tree.
 Pruning: Pruning is the process of removing the unwanted branches from the tree.
 Parent/Child node: The root node of the tree is called the parent node, and other nodes
are called the child nodes.
How does the Decision Tree algorithm Work?

In a decision tree, for predicting the class of the given dataset, the algorithm starts
from the root node of the tree. This algorithm compares the values of root attribute
with the record (real dataset) attribute and, based on the comparison, follows the
branch and jumps to the next node.

For the next node, the algorithm again compares the attribute value with the other
sub-nodes and move further. It continues the process until it reaches the leaf node of
the tree. The complete process can be better understood using the below algorithm:

PlayNext
Unmute

Current Time 0:00

Duration 18:10
Loaded: 0.37%
Â
Fullscreen
Backward Skip 10sPlay VideoForward Skip 10s

o Step-1: Begin the tree with the root node, says S, which contains the complete
dataset.
o Step-2: Find the best attribute in the dataset using Attribute Selection
Measure (ASM).
o Step-3: Divide the S into subsets that contains possible values for the best
attributes.
o Step-4: Generate the decision tree node, which contains the best attribute.
o Step-5: Recursively make new decision trees using the subsets of the dataset
created in step -3. Continue this process until a stage is reached where you
cannot further classify the nodes and called the final node as a leaf node.

Example: Suppose there is a candidate who has a job offer and wants to decide
whether he should accept the offer or Not. So, to solve this problem, the decision
tree starts with the root node (Salary attribute by ASM). The root node splits further
into the next decision node (distance from the office) and one leaf node based on
the corresponding labels. The next decision node further gets split into one decision
node (Cab facility) and one leaf node. Finally, the decision node splits into two leaf
nodes (Accepted offers and Declined offer). Consider the below diagram:
Attribute Selection Measures
While implementing a Decision tree, the main issue arises that how to select the best
attribute for the root node and for sub-nodes. So, to solve such problems there is a
technique which is called as Attribute selection measure or ASM. By this
measurement, we can easily select the best attribute for the nodes of the tree. There
are two popular techniques for ASM, which are:

o Information Gain
o Gini Index

1. Information Gain:

o Information gain is the measurement of changes in entropy after the


segmentation of a dataset based on an attribute.
o It calculates how much information a feature provides us about a class.
o According to the value of information gain, we split the node and build the
decision tree.
o A decision tree algorithm always tries to maximize the value of information
gain, and a node/attribute having the highest information gain is split first. It
can be calculated using the below formula:
1. Information Gain= Entropy(S)- [(Weighted Avg) *Entropy(each feature)

Entropy: Entropy is a metric to measure the impurity in a given attribute. It specifies


randomness in data. Entropy can be calculated as:

Entropy(s)= -P(yes)log2 P(yes)- P(no) log2 P(no)

Where,

o S= Total number of samples


o P(yes)= probability of yes
o P(no)= probability of no

2. Gini Index:

o Gini index is a measure of impurity or purity used while creating a decision


tree in the CART(Classification and Regression Tree) algorithm.
o An attribute with the low Gini index should be preferred as compared to the
high Gini index.
o It only creates binary splits, and the CART algorithm uses the Gini index to
create binary splits.
o Gini index can be calculated using the below formula:

Gini Index= 1- ∑jPj2

Pruning: Getting an Optimal Decision tree


Pruning is a process of deleting the unnecessary nodes from a tree in order to get the
optimal decision tree.

A too-large tree increases the risk of overfitting, and a small tree may not capture all
the important features of the dataset. Therefore, a technique that decreases the size
of the learning tree without reducing accuracy is known as Pruning. There are mainly
two types of tree pruning technology used:

o Cost Complexity Pruning


o Reduced Error Pruning.

Advantages of the Decision Tree


o It is simple to understand as it follows the same process which a human follow
while making any decision in real-life.
o It can be very useful for solving decision-related problems.
o It helps to think about all the possible outcomes for a problem.
o There is less requirement of data cleaning compared to other algorithms.

Disadvantages of the Decision Tree


o The decision tree contains lots of layers, which makes it complex.
o It may have an overfitting issue, which can be resolved using the Random
Forest algorithm.
o For more class labels, the computational complexity of the decision tree may
increase.

Python Implementation of Decision Tree


Now we will implement the Decision tree using Python. For this, we will use the
dataset "user_data.csv," which we have used in previous classification models. By
using the same dataset, we can compare the Decision tree classifier with other
classification models such as KNN SVM, LogisticRegression, etc.

Steps will also remain the same, which are given below:

o Data Pre-processing step


o Fitting a Decision-Tree algorithm to the Training set
o Predicting the test result
o Test accuracy of the result(Creation of Confusion matrix)
o Visualizing the test set result.

1. Data Pre-Processing Step:


Below is the code for the pre-processing step:

1. # importing libraries
2. import numpy as nm
3. import matplotlib.pyplot as mtp
4. import pandas as pd
5.
6. #importing datasets
7. data_set= pd.read_csv('user_data.csv')
8.
9. #Extracting Independent and dependent Variable
10. x= data_set.iloc[:, [2,3]].values
11. y= data_set.iloc[:, 4].values
12.
13. # Splitting the dataset into training and test set.
14. from sklearn.model_selection import train_test_split
15. x_train, x_test, y_train, y_test= train_test_split(x, y, test_size= 0.25, random_stat
e=0)
16.
17. #feature Scaling
18. from sklearn.preprocessing import StandardScaler
19. st_x= StandardScaler()
20. x_train= st_x.fit_transform(x_train)
21. x_test= st_x.transform(x_test)

In the above code, we have pre-processed the data. Where we have loaded the
dataset, which is given as:
2. Fitting a Decision-Tree algorithm to the Training
set
Now we will fit the model to the training set. For this, we will import
the DecisionTreeClassifier class from sklearn.tree library. Below is the code for it:

1. #Fitting Decision Tree classifier to the training set


2. From sklearn.tree import DecisionTreeClassifier
3. classifier= DecisionTreeClassifier(criterion='entropy', random_state=0)
4. classifier.fit(x_train, y_train)

In the above code, we have created a classifier object, in which we have passed two
main parameters;

o "criterion='entropy': Criterion is used to measure the quality of split, which is


calculated by information gain given by entropy.
o random_state=0": For generating the random states.

Below is the output for this:

Out[8]:
DecisionTreeClassifier(class_weight=None, criterion='entropy',
max_depth=None,
max_features=None, max_leaf_nodes=None,
min_impurity_decrease=0.0, min_impurity_split=None,
min_samples_leaf=1, min_samples_split=2,
min_weight_fraction_leaf=0.0, presort=False,
random_state=0, splitter='best')

3. Predicting the test result


Now we will predict the test set result. We will create a new prediction
vector y_pred. Below is the code for it:

1. #Predicting the test set result


2. y_pred= classifier.predict(x_test)

Output:

In the below output image, the predicted output and real test output are given. We
can clearly see that there are some values in the prediction vector, which are different
from the real vector values. These are prediction errors.
4. Test accuracy of the result (Creation of
Confusion matrix)
In the above output, we have seen that there were some incorrect predictions, so if
we want to know the number of correct and incorrect predictions, we need to use
the confusion matrix. Below is the code for it:

1. #Creating the Confusion matrix


2. from sklearn.metrics import confusion_matrix
3. cm= confusion_matrix(y_test, y_pred)

Output:

In the above output image, we can see the confusion matrix, which has 6+3= 9
incorrect predictions and62+29=91 correct predictions. Therefore, we can say
that compared to other classification models, the Decision Tree classifier made
a good prediction.

5. Visualizing the training set result:


Here we will visualize the training set result. To visualize the training set result we will
plot a graph for the decision tree classifier. The classifier will predict yes or No for the
users who have either Purchased or Not purchased the SUV car as we did in Logistic
Regression. Below is the code for it:

1. #Visulaizing the trianing set result


2. from matplotlib.colors import ListedColormap
3. x_set, y_set = x_train, y_train
4. x1, x2 = nm.meshgrid(nm.arange(start = x_set[:, 0].min() - 1, stop = x_set[:, 0].
max() + 1, step =0.01),
5. nm.arange(start = x_set[:, 1].min() - 1, stop = x_set[:, 1].max() + 1, step = 0.01))
6. mtp.contourf(x1, x2, classifier.predict(nm.array([x1.ravel(), x2.ravel()]).T).reshape
(x1.shape),
7. alpha = 0.75, cmap = ListedColormap(('purple','green' )))
8. mtp.xlim(x1.min(), x1.max())
9. mtp.ylim(x2.min(), x2.max())
10. fori, j in enumerate(nm.unique(y_set)):
11. mtp.scatter(x_set[y_set == j, 0], x_set[y_set == j, 1],
12. c = ListedColormap(('purple', 'green'))(i), label = j)
13. mtp.title('Decision Tree Algorithm (Training set)')
14. mtp.xlabel('Age')
15. mtp.ylabel('Estimated Salary')
16. mtp.legend()
17. mtp.show()

Output:

The above output is completely different from the rest classification models. It has
both vertical and horizontal lines that are splitting the dataset according to the age
and estimated salary variable.

As we can see, the tree is trying to capture each dataset, which is the case of
overfitting.
6. Visualizing the test set result:
Visualization of test set result will be similar to the visualization of the training set
except that the training set will be replaced with the test set.

1. #Visulaizing the test set result


2. from matplotlib.colors import ListedColormap
3. x_set, y_set = x_test, y_test
4. x1, x2 = nm.meshgrid(nm.arange(start = x_set[:, 0].min() - 1, stop = x_set[:, 0].
max() + 1, step =0.01),
5. nm.arange(start = x_set[:, 1].min() - 1, stop = x_set[:, 1].max() + 1, step = 0.01))
6. mtp.contourf(x1, x2, classifier.predict(nm.array([x1.ravel(), x2.ravel()]).T).reshape
(x1.shape),
7. alpha = 0.75, cmap = ListedColormap(('purple','green' )))
8. mtp.xlim(x1.min(), x1.max())
9. mtp.ylim(x2.min(), x2.max())
10. fori, j in enumerate(nm.unique(y_set)):
11. mtp.scatter(x_set[y_set == j, 0], x_set[y_set == j, 1],
12. c = ListedColormap(('purple', 'green'))(i), label = j)
13. mtp.title('Decision Tree Algorithm(Test set)')
14. mtp.xlabel('Age')
15. mtp.ylabel('Estimated Salary')
16. mtp.legend()
17. mtp.show()

Output:
As we can see in the above image that there are some green data points within the
purple region and vice versa. So, these are the incorrect predictions which we have
discussed in the confusion matrix.

Random Forest Algorithm


Random Forest is a popular machine learning algorithm that belongs to the
supervised learning technique. It can be used for both Classification and Regression
problems in ML. It is based on the concept of ensemble learning, which is a process
of combining multiple classifiers to solve a complex problem and to improve the
performance of the model.

As the name suggests, "Random Forest is a classifier that contains a number of


decision trees on various subsets of the given dataset and takes the average to
improve the predictive accuracy of that dataset." Instead of relying on one
decision tree, the random forest takes the prediction from each tree and based on
the majority votes of predictions, and it predicts the final output.

The greater number of trees in the forest leads to higher accuracy and prevents
the problem of overfitting.

The below diagram explains the working of the Random Forest algorithm:

PlayNext
Unmute

Current Time 0:00

Duration 18:10
Loaded: 0.37%
Â
Fullscreen
Backward Skip 10sPlay VideoForward Skip 10s
Note: To better understand the Random Forest Algorithm, you should have knowledge
of the Decision Tree Algorithm.

Assumptions for Random Forest


Since the random forest combines multiple trees to predict the class of the dataset, it
is possible that some decision trees may predict the correct output, while others may
not. But together, all the trees predict the correct output. Therefore, below are two
assumptions for a better Random forest classifier:

o There should be some actual values in the feature variable of the dataset so
that the classifier can predict accurate results rather than a guessed result.
o The predictions from each tree must have very low correlations.

Why use Random Forest?


Below are some points that explain why we should use the Random Forest algorithm:

<="" li="">
o It takes less training time as compared to other algorithms.
o It predicts output with high accuracy, even for the large dataset it runs
efficiently.
o It can also maintain accuracy when a large proportion of data is missing.

How does Random Forest algorithm work?


Random Forest works in two-phase first is to create the random forest by combining
N decision tree, and second is to make predictions for each tree created in the first
phase.

The Working process can be explained in the below steps and diagram:

Step-1: Select random K data points from the training set.

Step-2: Build the decision trees associated with the selected data points (Subsets).

Step-3: Choose the number N for decision trees that you want to build.

Step-4: Repeat Step 1 & 2.

Step-5: For new data points, find the predictions of each decision tree, and assign
the new data points to the category that wins the majority votes.

The working of the algorithm can be better understood by the below example:

Example: Suppose there is a dataset that contains multiple fruit images. So, this
dataset is given to the Random forest classifier. The dataset is divided into subsets
and given to each decision tree. During the training phase, each decision tree
produces a prediction result, and when a new data point occurs, then based on the
majority of results, the Random Forest classifier predicts the final decision. Consider
the below image:
Applications of Random Forest
There are mainly four sectors where Random forest mostly used:

1. Banking: Banking sector mostly uses this algorithm for the identification of
loan risk.
2. Medicine: With the help of this algorithm, disease trends and risks of the
disease can be identified.
3. Land Use: We can identify the areas of similar land use by this algorithm.
4. Marketing: Marketing trends can be identified using this algorithm.

Advantages of Random Forest


o Random Forest is capable of performing both Classification and Regression
tasks.
o It is capable of handling large datasets with high dimensionality.
o It enhances the accuracy of the model and prevents the overfitting issue.

Disadvantages of Random Forest


o Although random forest can be used for both classification and regression
tasks, it is not more suitable for Regression tasks.

Python Implementation of Random Forest


Algorithm
Now we will implement the Random Forest Algorithm tree using Python. For this, we
will use the same dataset "user_data.csv", which we have used in previous
classification models. By using the same dataset, we can compare the Random Forest
classifier with other classification models such as Decision tree
Classifier, KNN, SVM, Logistic Regression, etc.

Implementation Steps are given below:

o Data Pre-processing step


o Fitting the Random forest algorithm to the Training set
o Predicting the test result
o Test accuracy of the result (Creation of Confusion matrix)
o Visualizing the test set result.

1.Data Pre-Processing Step:


Below is the code for the pre-processing step:

1. # importing libraries
2. import numpy as nm
3. import matplotlib.pyplot as mtp
4. import pandas as pd
5.
6. #importing datasets
7. data_set= pd.read_csv('user_data.csv')
8.
9. #Extracting Independent and dependent Variable
10. x= data_set.iloc[:, [2,3]].values
11. y= data_set.iloc[:, 4].values
12.
13. # Splitting the dataset into training and test set.
14. from sklearn.model_selection import train_test_split
15. x_train, x_test, y_train, y_test= train_test_split(x, y, test_size= 0.25, random_stat
e=0)
16.
17. #feature Scaling
18. from sklearn.preprocessing import StandardScaler
19. st_x= StandardScaler()
20. x_train= st_x.fit_transform(x_train)
21. x_test= st_x.transform(x_test)

In the above code, we have pre-processed the data. Where we have loaded the
dataset, which is given as:
2. Fitting the Random Forest algorithm to the
training set:
Now we will fit the Random forest algorithm to the training set. To fit it, we will
import the RandomForestClassifier class from the sklearn.ensemble library. The
code is given below:

1. #Fitting Decision Tree classifier to the training set


2. from sklearn.ensemble import RandomForestClassifier
3. classifier= RandomForestClassifier(n_estimators= 10, criterion="entropy")
4. classifier.fit(x_train, y_train)

In the above code, the classifier object takes below parameters:

o n_estimators= The required number of trees in the Random Forest. The


default value is 10. We can choose any number but need to take care of the
overfitting issue.
o criterion= It is a function to analyze the accuracy of the split. Here we have
taken "entropy" for the information gain.

Output:

RandomForestClassifier(bootstrap=True, class_weight=None,
criterion='entropy',
max_depth=None, max_features='auto',
max_leaf_nodes=None,
min_impurity_decrease=0.0, min_impurity_split=None,
min_samples_leaf=1, min_samples_split=2,
min_weight_fraction_leaf=0.0, n_estimators=10,
n_jobs=None, oob_score=False, random_state=None,
verbose=0, warm_start=False)

3. Predicting the Test Set result


Since our model is fitted to the training set, so now we can predict the test result. For
prediction, we will create a new prediction vector y_pred. Below is the code for it:

1. #Predicting the test set result


2. y_pred= classifier.predict(x_test)

Output:

The prediction vector is given as:


By checking the above prediction vector and test set real vector, we can determine
the incorrect predictions done by the classifier.

4. Creating the Confusion Matrix


Now we will create the confusion matrix to determine the correct and incorrect
predictions. Below is the code for it:

1. #Creating the Confusion matrix


2. from sklearn.metrics import confusion_matrix
3. cm= confusion_matrix(y_test, y_pred)

Output:
As we can see in the above matrix, there are 4+4= 8 incorrect
predictions and 64+28= 92 correct predictions.

5. Visualizing the training Set result


Here we will visualize the training set result. To visualize the training set result we will
plot a graph for the Random forest classifier. The classifier will predict yes or No for
the users who have either Purchased or Not purchased the SUV car as we did
in Logistic Regression. Below is the code for it:

1. from matplotlib.colors import ListedColormap


2. x_set, y_set = x_train, y_train
3. x1, x2 = nm.meshgrid(nm.arange(start = x_set[:, 0].min() - 1, stop = x_set[:, 0].
max() + 1, step =0.01),
4. nm.arange(start = x_set[:, 1].min() - 1, stop = x_set[:, 1].max() + 1, step = 0.01))
5. mtp.contourf(x1, x2, classifier.predict(nm.array([x1.ravel(), x2.ravel()]).T).reshape
(x1.shape),
6. alpha = 0.75, cmap = ListedColormap(('purple','green' )))
7. mtp.xlim(x1.min(), x1.max())
8. mtp.ylim(x2.min(), x2.max())
9. for i, j in enumerate(nm.unique(y_set)):
10. mtp.scatter(x_set[y_set == j, 0], x_set[y_set == j, 1],
11. c = ListedColormap(('purple', 'green'))(i), label = j)
12. mtp.title('Random Forest Algorithm (Training set)')
13. mtp.xlabel('Age')
14. mtp.ylabel('Estimated Salary')
15. mtp.legend()
16. mtp.show()

Output:

The above image is the visualization result for the Random Forest classifier working
with the training set result. It is very much similar to the Decision tree classifier. Each
data point corresponds to each user of the user_data, and the purple and green
regions are the prediction regions. The purple region is classified for the users who
did not purchase the SUV car, and the green region is for the users who purchased
the SUV.

So, in the Random Forest classifier, we have taken 10 trees that have predicted Yes or
NO for the Purchased variable. The classifier took the majority of the predictions and
provided the result.

6. Visualizing the test set result


Now we will visualize the test set result. Below is the code for it:

1. #Visulaizing the test set result


2. from matplotlib.colors import ListedColormap
3. x_set, y_set = x_test, y_test
4. x1, x2 = nm.meshgrid(nm.arange(start = x_set[:, 0].min() - 1, stop = x_set[:, 0].
max() + 1, step =0.01),
5. nm.arange(start = x_set[:, 1].min() - 1, stop = x_set[:, 1].max() + 1, step = 0.01))
6. mtp.contourf(x1, x2, classifier.predict(nm.array([x1.ravel(), x2.ravel()]).T).reshape
(x1.shape),
7. alpha = 0.75, cmap = ListedColormap(('purple','green' )))
8. mtp.xlim(x1.min(), x1.max())
9. mtp.ylim(x2.min(), x2.max())
10. for i, j in enumerate(nm.unique(y_set)):
11. mtp.scatter(x_set[y_set == j, 0], x_set[y_set == j, 1],
12. c = ListedColormap(('purple', 'green'))(i), label = j)
13. mtp.title('Random Forest Algorithm(Test set)')
14. mtp.xlabel('Age')
15. mtp.ylabel('Estimated Salary')
16. mtp.legend()
17. mtp.show()

Output:

The above image is the visualization result for the test set. We can check that there is
a minimum number of incorrect predictions (8) without the Overfitting issue. We will
get different results by changing the number of trees in the classifier.

Clustering in Machine Learning


Clustering or cluster analysis is a machine learning technique, which groups the
unlabelled dataset. It can be defined as "A way of grouping the data points into
different clusters, consisting of similar data points. The objects with the
possible similarities remain in a group that has less or no similarities with
another group."
It does it by finding some similar patterns in the unlabelled dataset such as shape,
size, color, behavior, etc., and divides them as per the presence and absence of those
similar patterns.

It is an unsupervised learning method, hence no supervision is provided to the


algorithm, and it deals with the unlabeled dataset.

After applying this clustering technique, each cluster or group is provided with a
cluster-ID. ML system can use this id to simplify the processing of large and complex
datasets.

PlayNext
Unmute

Current Time 0:00

Duration 18:10
Loaded: 0.37%
Â
Fullscreen
Backward Skip 10sPlay VideoForward Skip 10s

The clustering technique is commonly used for statistical data analysis.

Note: Clustering is somewhere similar to the classification algorithm, but the difference is
the type of dataset that we are using. In classification, we work with the labeled data set,
whereas in clustering, we work with the unlabelled dataset.

Example: Let's understand the clustering technique with the real-world example of
Mall: When we visit any shopping mall, we can observe that the things with similar
usage are grouped together. Such as the t-shirts are grouped in one section, and
trousers are at other sections, similarly, at vegetable sections, apples, bananas,
Mangoes, etc., are grouped in separate sections, so that we can easily find out the
things. The clustering technique also works in the same way. Other examples of
clustering are grouping documents according to the topic.

The clustering technique can be widely used in various tasks. Some most common
uses of this technique are:

o Market Segmentation
o Statistical data analysis
o Social network analysis
o Image segmentation
o Anomaly detection, etc.

Apart from these general usages, it is used by the Amazon in its recommendation
system to provide the recommendations as per the past search of
products. Netflix also uses this technique to recommend the movies and web-series
to its users as per the watch history.

The below diagram explains the working of the clustering algorithm. We can see the
different fruits are divided into several groups with similar properties.

Types of Clustering Methods


The clustering methods are broadly divided into Hard clustering (datapoint belongs
to only one group) and Soft Clustering (data points can belong to another group
also). But there are also other various approaches of Clustering exist. Below are the
main clustering methods used in Machine learning:

1. Partitioning Clustering
2. Density-Based Clustering
3. Distribution Model-Based Clustering
4. Hierarchical Clustering
5. Fuzzy Clustering
Partitioning Clustering
It is a type of clustering that divides the data into non-hierarchical groups. It is also
known as the centroid-based method. The most common example of partitioning
clustering is the K-Means Clustering algorithm.

In this type, the dataset is divided into a set of k groups, where K is used to define
the number of pre-defined groups. The cluster center is created in such a way that
the distance between the data points of one cluster is minimum as compared to
another cluster centroid.

Density-Based Clustering
The density-based clustering method connects the highly-dense areas into clusters,
and the arbitrarily shaped distributions are formed as long as the dense region can
be connected. This algorithm does it by identifying different clusters in the dataset
and connects the areas of high densities into clusters. The dense areas in data space
are divided from each other by sparser areas.

These algorithms can face difficulty in clustering the data points if the dataset has
varying densities and high dimensions.
Distribution Model-Based Clustering
In the distribution model-based clustering method, the data is divided based on the
probability of how a dataset belongs to a particular distribution. The grouping is
done by assuming some distributions commonly Gaussian Distribution.

The example of this type is the Expectation-Maximization Clustering


algorithm that uses Gaussian Mixture Models (GMM).

Hierarchical Clustering
Hierarchical clustering can be used as an alternative for the partitioned clustering as
there is no requirement of pre-specifying the number of clusters to be created. In
this technique, the dataset is divided into clusters to create a tree-like structure,
which is also called a dendrogram. The observations or any number of clusters can
be selected by cutting the tree at the correct level. The most common example of
this method is the Agglomerative Hierarchical algorithm.

Fuzzy Clustering
Fuzzy clustering is a type of soft method in which a data object may belong to more
than one group or cluster. Each dataset has a set of membership coefficients, which
depend on the degree of membership to be in a cluster. Fuzzy C-means
algorithm is the example of this type of clustering; it is sometimes also known as the
Fuzzy k-means algorithm.

Clustering Algorithms
The Clustering algorithms can be divided based on their models that are explained
above. There are different types of clustering algorithms published, but only a few
are commonly used. The clustering algorithm is based on the kind of data that we
are using. Such as, some algorithms need to guess the number of clusters in the
given dataset, whereas some are required to find the minimum distance between the
observation of the dataset.

Here we are discussing mainly popular Clustering algorithms that are widely used in
machine learning:
1. K-Means algorithm: The k-means algorithm is one of the most popular
clustering algorithms. It classifies the dataset by dividing the samples into
different clusters of equal variances. The number of clusters must be specified
in this algorithm. It is fast with fewer computations required, with the linear
complexity of O(n).
2. Mean-shift algorithm: Mean-shift algorithm tries to find the dense areas in
the smooth density of data points. It is an example of a centroid-based model,
that works on updating the candidates for centroid to be the center of the
points within a given region.
3. DBSCAN Algorithm: It stands for Density-Based Spatial Clustering of
Applications with Noise. It is an example of a density-based model similar to
the mean-shift, but with some remarkable advantages. In this algorithm, the
areas of high density are separated by the areas of low density. Because of
this, the clusters can be found in any arbitrary shape.
4. Expectation-Maximization Clustering using GMM: This algorithm can be
used as an alternative for the k-means algorithm or for those cases where K-
means can be failed. In GMM, it is assumed that the data points are Gaussian
distributed.
5. Agglomerative Hierarchical algorithm: The Agglomerative hierarchical
algorithm performs the bottom-up hierarchical clustering. In this, each data
point is treated as a single cluster at the outset and then successively merged.
The cluster hierarchy can be represented as a tree-structure.
6. Affinity Propagation: It is different from other clustering algorithms as it
does not require to specify the number of clusters. In this, each data point
sends a message between the pair of data points until convergence. It has
O(N2T) time complexity, which is the main drawback of this algorithm.

Applications of Clustering
Below are some commonly known applications of clustering technique in Machine
Learning:

o In Identification of Cancer Cells: The clustering algorithms are widely used


for the identification of cancerous cells. It divides the cancerous and non-
cancerous data sets into different groups.
o In Search Engines: Search engines also work on the clustering technique. The
search result appears based on the closest object to the search query. It does
it by grouping similar data objects in one group that is far from the other
dissimilar objects. The accurate result of a query depends on the quality of the
clustering algorithm used.
o Customer Segmentation: It is used in market research to segment the
customers based on their choice and preferences.
o In Biology: It is used in the biology stream to classify different species of
plants and animals using the image recognition technique.
o In Land Use: The clustering technique is used in identifying the area of similar
lands use in the GIS database. This can be very useful to find that for what
purpose the particular land should be used, that means for which purpose it is
more suitable.

Hierarchical Clustering in Machine


Learning
Hierarchical clustering is another unsupervised machine learning algorithm, which is
used to group the unlabeled datasets into a cluster and also known as hierarchical
cluster analysis or HCA.

In this algorithm, we develop the hierarchy of clusters in the form of a tree, and this
tree-shaped structure is known as the dendrogram.

Sometimes the results of K-means clustering and hierarchical clustering may look
similar, but they both differ depending on how they work. As there is no requirement
to predetermine the number of clusters as we did in the K-Means algorithm.

The hierarchical clustering technique has two approaches:

PlayNext
Unmute

Current Time 0:00

Duration 18:10
Loaded: 0.37%
Â
Fullscreen
Backward Skip 10sPlay VideoForward Skip 10s

1. Agglomerative: Agglomerative is a bottom-up approach, in which the


algorithm starts with taking all data points as single clusters and merging
them until one cluster is left.
2. Divisive: Divisive algorithm is the reverse of the agglomerative algorithm as it
is a top-down approach.

Why hierarchical clustering?


As we already have other clustering algorithms such as K-Means Clustering, then
why we need hierarchical clustering? So, as we have seen in the K-means clustering
that there are some challenges with this algorithm, which are a predetermined
number of clusters, and it always tries to create the clusters of the same size. To solve
these two challenges, we can opt for the hierarchical clustering algorithm because, in
this algorithm, we don't need to have knowledge about the predefined number of
clusters.

In this topic, we will discuss the Agglomerative Hierarchical clustering algorithm.

Agglomerative Hierarchical clustering


The agglomerative hierarchical clustering algorithm is a popular example of HCA. To
group the datasets into clusters, it follows the bottom-up approach. It means, this
algorithm considers each dataset as a single cluster at the beginning, and then start
combining the closest pair of clusters together. It does this until all the clusters are
merged into a single cluster that contains all the datasets.

This hierarchy of clusters is represented in the form of the dendrogram.

How the Agglomerative Hierarchical


clustering Work?
The working of the AHC algorithm can be explained using the below steps:
o Step-1: Create each data point as a single cluster. Let's say there are N data
points, so the number of clusters will also be N.

o Step-2: Take two closest data points or clusters and merge them to form one
cluster. So, there will now be N-1 clusters.
o Step-3: Again, take the two closest clusters and merge them together to form
one cluster. There will be N-2 clusters.

o Step-4: Repeat Step 3 until only one cluster left. So, we will get the following
clusters. Consider the below images:
o Step-5: Once all the clusters are combined into one big cluster, develop the
dendrogram to divide the clusters as per the problem.

Note: To better understand hierarchical clustering, it is advised to have a look on k-


means clustering

Measure for the distance between two clusters


As we have seen, the closest distance between the two clusters is crucial for the
hierarchical clustering. There are various ways to calculate the distance between two
clusters, and these ways decide the rule for clustering. These measures are
called Linkage methods. Some of the popular linkage methods are given below:
1. Single Linkage: It is the Shortest Distance between the closest points of the
clusters. Consider the below image:

2. Complete Linkage: It is the farthest distance between the two points of two
different clusters. It is one of the popular linkage methods as it forms tighter
clusters than single-linkage.

3. Average Linkage: It is the linkage method in which the distance between


each pair of datasets is added up and then divided by the total number of
datasets to calculate the average distance between two clusters. It is also one
of the most popular linkage methods.
4. Centroid Linkage: It is the linkage method in which the distance between the
centroid of the clusters is calculated. Consider the below image:

From the above-given approaches, we can apply any of them according to the type
of problem or business requirement.

Woking of Dendrogram in Hierarchical clustering


The dendrogram is a tree-like structure that is mainly used to store each step as a
memory that the HC algorithm performs. In the dendrogram plot, the Y-axis shows
the Euclidean distances between the data points, and the x-axis shows all the data
points of the given dataset.

The working of the dendrogram can be explained using the below diagram:
In the above diagram, the left part is showing how clusters are created in
agglomerative clustering, and the right part is showing the corresponding
dendrogram.

o As we have discussed above, firstly, the datapoints P2 and P3 combine


together and form a cluster, correspondingly a dendrogram is created, which
connects P2 and P3 with a rectangular shape. The hight is decided according
to the Euclidean distance between the data points.
o In the next step, P5 and P6 form a cluster, and the corresponding dendrogram
is created. It is higher than of previous, as the Euclidean distance between P5
and P6 is a little bit greater than the P2 and P3.
o Again, two new dendrograms are created that combine P1, P2, and P3 in one
dendrogram, and P4, P5, and P6, in another dendrogram.
o At last, the final dendrogram is created that combines all the data points
together.

We can cut the dendrogram tree structure at any level as per our requirement.

Python Implementation of Agglomerative


Hierarchical Clustering
Now we will see the practical implementation of the agglomerative hierarchical
clustering algorithm using Python. To implement this, we will use the same dataset
problem that we have used in the previous topic of K-means clustering so that we
can compare both concepts easily.

The dataset is containing the information of customers that have visited a mall for
shopping. So, the mall owner wants to find some patterns or some particular
behavior of his customers using the dataset information.

Steps for implementation of AHC using Python:


The steps for implementation will be the same as the k-means clustering, except for
some changes such as the method to find the number of clusters. Below are the
steps:

1. Data Pre-processing
2. Finding the optimal number of clusters using the Dendrogram
3. Training the hierarchical clustering model
4. Visualizing the clusters

Data Pre-processing Steps:


In this step, we will import the libraries and datasets for our model.

o Importing the libraries

1. # Importing the libraries


2. import numpy as nm
3. import matplotlib.pyplot as mtp
4. import pandas as pd

The above lines of code are used to import the libraries to perform specific tasks,
such as numpy for the Mathematical operations, matplotlib for drawing the graphs
or scatter plot, and pandas for importing the dataset.

o Importing the dataset

1. # Importing the dataset


2. dataset = pd.read_csv('Mall_Customers_data.csv')

As discussed above, we have imported the same dataset


of Mall_Customers_data.csv, as we did in k-means clustering. Consider the below
output:
o Extracting the matrix of features

Here we will extract only the matrix of features as we don't have any further
information about the dependent variable. Code is given below:

1. x = dataset.iloc[:, [3, 4]].values

Here we have extracted only 3 and 4 columns as we will use a 2D plot to see the
clusters. So, we are considering the Annual income and spending score as the matrix
of features.

Step-2: Finding the optimal number of clusters


using the Dendrogram
Now we will find the optimal number of clusters using the Dendrogram for our
model. For this, we are going to use scipy library as it provides a function that will
directly return the dendrogram for our code. Consider the below lines of code:
1. #Finding the optimal number of clusters using the dendrogram
2. import scipy.cluster.hierarchy as shc
3. dendro = shc.dendrogram(shc.linkage(x, method="ward"))
4. mtp.title("Dendrogrma Plot")
5. mtp.ylabel("Euclidean Distances")
6. mtp.xlabel("Customers")
7. mtp.show()

In the above lines of code, we have imported the hierarchy module of scipy library.
This module provides us a method shc.denrogram(), which takes the linkage() as a
parameter. The linkage function is used to define the distance between two clusters,
so here we have passed the x(matrix of features), and method "ward," the popular
method of linkage in hierarchical clustering.

The remaining lines of code are to describe the labels for the dendrogram plot.

Output:

By executing the above lines of code, we will get the below output:

Using this Dendrogram, we will now determine the optimal number of clusters for
our model. For this, we will find the maximum vertical distance that does not cut
any horizontal bar. Consider the below diagram:
In the above diagram, we have shown the vertical distances that are not cutting their
horizontal bars. As we can visualize, the 4 th distance is looking the maximum, so
according to this, the number of clusters will be 5(the vertical lines in this range).
We can also take the 2nd number as it approximately equals the 4 th distance, but we
will consider the 5 clusters because the same we calculated in the K-means
algorithm.

So, the optimal number of clusters will be 5, and we will train the model in the
next step, using the same.

Step-3: Training the hierarchical clustering model


As we know the required optimal number of clusters, we can now train our model.
The code is given below:

1. #training the hierarchical model on dataset


2. from sklearn.cluster import AgglomerativeClustering
3. hc= AgglomerativeClustering(n_clusters=5, affinity='euclidean', linkage='ward'
)
4. y_pred= hc.fit_predict(x)
In the above code, we have imported the AgglomerativeClustering class of cluster
module of scikit learn library.

Then we have created the object of this class named as hc. The
AgglomerativeClustering class takes the following parameters:

o n_clusters=5: It defines the number of clusters, and we have taken here 5


because it is the optimal number of clusters.
o affinity='euclidean': It is a metric used to compute the linkage.
o linkage='ward': It defines the linkage criteria, here we have used the "ward"
linkage. This method is the popular linkage method that we have already used
for creating the Dendrogram. It reduces the variance in each cluster.

In the last line, we have created the dependent variable y_pred to fit or train the
model. It does train not only the model but also returns the clusters to which each
data point belongs.

After executing the above lines of code, if we go through the variable explorer option
in our Sypder IDE, we can check the y_pred variable. We can compare the original
dataset with the y_pred variable. Consider the below image:

As we can see in the above image, the y_pred shows the clusters value, which means
the customer id 1 belongs to the 5 th cluster (as indexing starts from 0, so 4 means
5th cluster), the customer id 2 belongs to 4 th cluster, and so on.
Step-4: Visualizing the clusters
As we have trained our model successfully, now we can visualize the clusters
corresponding to the dataset.

Here we will use the same lines of code as we did in k-means clustering, except one
change. Here we will not plot the centroid that we did in k-means, because here we
have used dendrogram to determine the optimal number of clusters. The code is
given below:

1. #visulaizing the clusters


2. mtp.scatter(x[y_pred == 0, 0], x[y_pred == 0, 1], s = 100, c = 'blue', label = 'Clu
ster 1')
3. mtp.scatter(x[y_pred == 1, 0], x[y_pred == 1, 1], s = 100, c = 'green', label = 'C
luster 2')
4. mtp.scatter(x[y_pred== 2, 0], x[y_pred == 2, 1], s = 100, c = 'red', label = 'Clust
er 3')
5. mtp.scatter(x[y_pred == 3, 0], x[y_pred == 3, 1], s = 100, c = 'cyan', label = 'Cl
uster 4')
6. mtp.scatter(x[y_pred == 4, 0], x[y_pred == 4, 1], s = 100, c = 'magenta', label =
'Cluster 5')
7. mtp.title('Clusters of customers')
8. mtp.xlabel('Annual Income (k$)')
9. mtp.ylabel('Spending Score (1-100)')
10. mtp.legend()
11. mtp.show()

Output: By executing the above lines of code, we will get the below output:
K-Means Clustering Algorithm
K-Means Clustering is an unsupervised learning algorithm that is used to solve the
clustering problems in machine learning or data science. In this topic, we will learn
what is K-means clustering algorithm, how the algorithm works, along with the
Python implementation of k-means clustering.

What is K-Means Algorithm?


K-Means Clustering is an Unsupervised Learning algorithm, which groups the
unlabeled dataset into different clusters. Here K defines the number of pre-defined
clusters that need to be created in the process, as if K=2, there will be two clusters,
and for K=3, there will be three clusters, and so on.

It is an iterative algorithm that divides the unlabeled dataset into k different clusters in
such a way that each dataset belongs only one group that has similar properties.

It allows us to cluster the data into different groups and a convenient way to discover
the categories of groups in the unlabeled dataset on its own without the need for
any training.

It is a centroid-based algorithm, where each cluster is associated with a centroid. The


main aim of this algorithm is to minimize the sum of distances between the data
point and their corresponding clusters.

PlayNext
Unmute
Current Time 0:00

Duration 18:10
Loaded: 0.37%
Â
Fullscreen
Backward Skip 10sPlay VideoForward Skip 10s

The algorithm takes the unlabeled dataset as input, divides the dataset into k-
number of clusters, and repeats the process until it does not find the best clusters.
The value of k should be predetermined in this algorithm.

The k-means clustering algorithm mainly performs two tasks:

o Determines the best value for K center points or centroids by an iterative


process.
o Assigns each data point to its closest k-center. Those data points which are
near to the particular k-center, create a cluster.

Hence each cluster has datapoints with some commonalities, and it is away from
other clusters.

The below diagram explains the working of the K-means Clustering Algorithm:

How does the K-Means Algorithm Work?


The working of the K-Means algorithm is explained in the below steps:

Step-1: Select the number K to decide the number of clusters.

Step-2: Select random K points or centroids. (It can be other from the input dataset).

Step-3: Assign each data point to their closest centroid, which will form the
predefined K clusters.

Step-4: Calculate the variance and place a new centroid of each cluster.

Step-5: Repeat the third steps, which means reassign each datapoint to the new
closest centroid of each cluster.

Step-6: If any reassignment occurs, then go to step-4 else go to FINISH.

Step-7: The model is ready.

Let's understand the above steps by considering the visual plots:

Suppose we have two variables M1 and M2. The x-y axis scatter plot of these two
variables is given below:
o Let's take number k of clusters, i.e., K=2, to identify the dataset and to put
them into different clusters. It means here we will try to group these datasets
into two different clusters.
o We need to choose some random k points or centroid to form the cluster.
These points can be either the points from the dataset or any other point. So,
here we are selecting the below two points as k points, which are not the part
of our dataset. Consider the below image:

o Now we will assign each data point of the scatter plot to its closest K-point or
centroid. We will compute it by applying some mathematics that we have
studied to calculate the distance between two points. So, we will draw a
median between both the centroids. Consider the below image:

From the above image, it is clear that points left side of the line is near to the K1 or
blue centroid, and points to the right of the line are close to the yellow centroid. Let's
color them as blue and yellow for clear visualization.
o As we need to find the closest cluster, so we will repeat the process by
choosing a new centroid. To choose the new centroids, we will compute the
center of gravity of these centroids, and will find new centroids as below:

o Next, we will reassign each datapoint to the new centroid. For this, we will
repeat the same process of finding a median line. The median will be like
below image:

From the above image, we can see, one yellow point is on the left side of the line,
and two blue points are right to the line. So, these three points will be assigned to
new centroids.
As reassignment has taken place, so we will again go to the step-4, which is finding
new centroids or K-points.

o We will repeat the process by finding the center of gravity of centroids, so the
new centroids will be as shown in the below image:
o As we got the new centroids so again will draw the median line and reassign
the data points. So, the image will be:

o We can see in the above image; there are no dissimilar data points on either
side of the line, which means our model is formed. Consider the below image:

As our model is ready, so we can now remove the assumed centroids, and the two
final clusters will be as shown in the below image:
How to choose the value of "K number of
clusters" in K-means Clustering?
The performance of the K-means clustering algorithm depends upon highly efficient
clusters that it forms. But choosing the optimal number of clusters is a big task. There
are some different ways to find the optimal number of clusters, but here we are
discussing the most appropriate method to find the number of clusters or value of K.
The method is given below:

Elbow Method
The Elbow method is one of the most popular ways to find the optimal number of
clusters. This method uses the concept of WCSS value. WCSS stands for Within
Cluster Sum of Squares, which defines the total variations within a cluster. The
formula to calculate the value of WCSS (for 3 clusters) is given below:

WCSS= ∑Pi in Cluster1 distance(Pi C1)2 +∑Pi in Cluster2 distance(Pi C2)2+∑Pi in

CLuster3 distance(Pi C3)2

In the above formula of WCSS,

∑Pi in Cluster1 distance(Pi C1)2: It is the sum of the square of the distances between each
data point and its centroid within a cluster1 and the same for the other two terms.
To measure the distance between data points and centroid, we can use any method
such as Euclidean distance or Manhattan distance.

To find the optimal value of clusters, the elbow method follows the below steps:

o It executes the K-means clustering on a given dataset for different K values


(ranges from 1-10).
o For each value of K, calculates the WCSS value.
o Plots a curve between calculated WCSS values and the number of clusters K.
o The sharp point of bend or a point of the plot looks like an arm, then that
point is considered as the best value of K.

Since the graph shows the sharp bend, which looks like an elbow, hence it is known
as the elbow method. The graph for the elbow method looks like the below image:

Note: We can choose the number of clusters equal to the given data points. If we choose
the number of clusters equal to the data points, then the value of WCSS becomes zero,
and that will be the endpoint of the plot.

Python Implementation of K-means


Clustering Algorithm
In the above section, we have discussed the K-means algorithm, now let's see how it
can be implemented using Python.
Before implementation, let's understand what type of problem we will solve here. So,
we have a dataset of Mall_Customers, which is the data of customers who visit the
mall and spend there.

In the given dataset, we have Customer_Id, Gender, Age, Annual Income ($), and
Spending Score (which is the calculated value of how much a customer has spent in
the mall, the more the value, the more he has spent). From this dataset, we need to
calculate some patterns, as it is an unsupervised method, so we don't know what to
calculate exactly.

The steps to be followed for the implementation are given below:

o Data Pre-processing
o Finding the optimal number of clusters using the elbow method
o Training the K-means algorithm on the training dataset
o Visualizing the clusters

Step-1: Data pre-processing Step


The first step will be the data pre-processing, as we did in our earlier topics of
Regression and Classification. But for the clustering problem, it will be different from
other models. Let's discuss it:

o Importing Libraries
As we did in previous topics, firstly, we will import the libraries for our model,
which is part of data pre-processing. The code is given below:

1. # importing libraries
2. import numpy as nm
3. import matplotlib.pyplot as mtp
4. import pandas as pd

In the above code, the numpy we have imported for the performing mathematics
calculation, matplotlib is for plotting the graph, and pandas are for managing the
dataset.

o Importing the Dataset:


Next, we will import the dataset that we need to use. So here, we are using the
Mall_Customer_data.csv dataset. It can be imported using the below code:

1. # Importing the dataset


2. dataset = pd.read_csv('Mall_Customers_data.csv')

By executing the above lines of code, we will get our dataset in the Spyder IDE. The
dataset looks like the below image:

From the above dataset, we need to find some patterns in it.

o Extracting Independent Variables

Here we don't need any dependent variable for data pre-processing step as it is a
clustering problem, and we have no idea about what to determine. So we will just
add a line of code for the matrix of features.

1. x = dataset.iloc[:, [3, 4]].values

As we can see, we are extracting only 3 rd and 4th feature. It is because we need a 2d
plot to visualize the model, and some features are not required, such as customer_id.
Step-2: Finding the optimal number of clusters
using the elbow method
In the second step, we will try to find the optimal number of clusters for our
clustering problem. So, as discussed above, here we are going to use the elbow
method for this purpose.

As we know, the elbow method uses the WCSS concept to draw the plot by plotting
WCSS values on the Y-axis and the number of clusters on the X-axis. So we are going
to calculate the value for WCSS for different k values ranging from 1 to 10. Below is
the code for it:

1. #finding optimal number of clusters using the elbow method


2. from sklearn.cluster import KMeans
3. wcss_list= [] #Initializing the list for the values of WCSS
4.
5. #Using for loop for iterations from 1 to 10.
6. for i in range(1, 11):
7. kmeans = KMeans(n_clusters=i, init='k-means++', random_state= 42)
8. kmeans.fit(x)
9. wcss_list.append(kmeans.inertia_)
10. mtp.plot(range(1, 11), wcss_list)
11. mtp.title('The Elobw Method Graph')
12. mtp.xlabel('Number of clusters(k)')
13. mtp.ylabel('wcss_list')
14. mtp.show()

As we can see in the above code, we have used the KMeans class of sklearn. cluster
library to form the clusters.

Next, we have created the wcss_list variable to initialize an empty list, which is used
to contain the value of wcss computed for different values of k ranging from 1 to 10.

After that, we have initialized the for loop for the iteration on a different value of k
ranging from 1 to 10; since for loop in Python, exclude the outbound limit, so it is
taken as 11 to include 10th value.

The rest part of the code is similar as we did in earlier topics, as we have fitted the
model on a matrix of features and then plotted the graph between the number of
clusters and WCSS.
Output: After executing the above code, we will get the below output:

From the above plot, we can see the elbow point is at 5. So the number of clusters
here will be 5.
Step- 3: Training the K-means algorithm on the
training dataset
As we have got the number of clusters, so we can now train the model on the
dataset.

To train the model, we will use the same two lines of code as we have used in the
above section, but here instead of using i, we will use 5, as we know there are 5
clusters that need to be formed. The code is given below:

1. #training the K-means model on a dataset


2. kmeans = KMeans(n_clusters=5, init='k-means++', random_state= 42)
3. y_predict= kmeans.fit_predict(x)

The first line is the same as above for creating the object of KMeans class.

In the second line of code, we have created the dependent variable y_predict to train
the model.

By executing the above lines of code, we will get the y_predict variable. We can check
it under the variable explorer option in the Spyder IDE. We can now compare the
values of y_predict with our original dataset. Consider the below image:

From the above image, we can now relate that the CustomerID 1 belongs to a cluster

3(as index starts from 0, hence 2 will be considered as 3), and 2 belongs to cluster 4,
and so on.
Step-4: Visualizing the Clusters
The last step is to visualize the clusters. As we have 5 clusters for our model, so we
will visualize each cluster one by one.

To visualize the clusters will use scatter plot using mtp.scatter() function of
matplotlib.

1. #visulaizing the clusters


2. mtp.scatter(x[y_predict == 0, 0], x[y_predict == 0, 1], s = 100, c = 'blue', label
= 'Cluster 1') #for first cluster
3. mtp.scatter(x[y_predict == 1, 0], x[y_predict == 1, 1], s = 100, c = 'green', label
= 'Cluster 2') #for second cluster
4. mtp.scatter(x[y_predict== 2, 0], x[y_predict == 2, 1], s = 100, c = 'red', label = '
Cluster 3') #for third cluster
5. mtp.scatter(x[y_predict == 3, 0], x[y_predict == 3, 1], s = 100, c = 'cyan', label
= 'Cluster 4') #for fourth cluster
6. mtp.scatter(x[y_predict == 4, 0], x[y_predict == 4, 1], s = 100, c = 'magenta', la
bel = 'Cluster 5') #for fifth cluster
7. mtp.scatter(kmeans.cluster_centers_[:, 0], kmeans.cluster_centers_[:, 1], s = 300,
c = 'yellow', label = 'Centroid')
8. mtp.title('Clusters of customers')
9. mtp.xlabel('Annual Income (k$)')
10. mtp.ylabel('Spending Score (1-100)')
11. mtp.legend()
12. mtp.show()

In above lines of code, we have written code for each clusters, ranging from 1 to 5.
The first coordinate of the mtp.scatter, i.e., x[y_predict == 0, 0] containing the x value
for the showing the matrix of features values, and the y_predict is ranging from 0 to
1.

Output:
The output image is clearly showing the five different clusters with different colors.
The clusters are formed between two parameters of the dataset; Annual income of
customer and Spending. We can change the colors and labels as per the requirement
or choice. We can also observe some points from the above patterns, which are given
below:

o Cluster1 shows the customers with average salary and average spending so
we can categorize these customers as
o Cluster2 shows the customer has a high income but low spending, so we can
categorize them as careful.
o Cluster3 shows the low income and also low spending so they can be
categorized as sensible.
o Cluster4 shows the customers with low income with very high spending so
they can be categorized as careless.
o Cluster5 shows the customers with high income and high spending so they
can be categorized as target, and these customers can be the most profitable
customers for the mall owner.

Apriori Algorithm in Machine


Learning
The Apriori algorithm uses frequent itemsets to generate association rules, and it is
designed to work on the databases that contain transactions. With the help of these
association rule, it determines how strongly or how weakly two objects are
connected. This algorithm uses a breadth-first search and Hash Tree to calculate
the itemset associations efficiently. It is the iterative process for finding the frequent
itemsets from the large dataset.

This algorithm was given by the R. Agrawal and Srikant in the year 1994. It is mainly
used for market basket analysis and helps to find those products that can be bought
together. It can also be used in the healthcare field to find drug reactions for
patients.

What is Frequent Itemset?

Frequent itemsets are those items whose support is greater than the threshold value
or user-specified minimum support. It means if A & B are the frequent itemsets
together, then individually A and B should also be the frequent itemset.

Backward Skip 10sPlay VideoForward Skip 10s

Suppose there are the two transactions: A= {1,2,3,4,5}, and B= {2,3,7}, in these two
transactions, 2 and 3 are the frequent itemsets.

Note: To better understand the apriori algorithm, and related term such as support and
confidence, it is recommended to understand the association rule learning.

Steps for Apriori Algorithm


Below are the steps for the apriori algorithm:

Step-1: Determine the support of itemsets in the transactional database, and select
the minimum support and confidence.

Step-2: Take all supports in the transaction with higher support value than the
minimum or selected support value.

Step-3: Find all the rules of these subsets that have higher confidence value than the
threshold or minimum confidence.

Step-4: Sort the rules as the decreasing order of lift.

Apriori Algorithm Working


We will understand the apriori algorithm using an example and mathematical
calculation:

Example: Suppose we have the following dataset that has various transactions, and
from this dataset, we need to find the frequent itemsets and generate the association
rules using the Apriori algorithm:
Solution:

Step-1: Calculating C1 and L1:

o In the first step, we will create a table that contains support count (The
frequency of each itemset individually in the dataset) of each itemset in the
given dataset. This table is called the Candidate set or C1.

o Now, we will take out all the itemsets that have the greater support count that
the Minimum Support (2). It will give us the table for the frequent itemset L1.
Since all the itemsets have greater or equal support count than the minimum
support, except the E, so E itemset will be removed.

Step-2: Candidate Generation C2, and L2:


o In this step, we will generate C2 with the help of L1. In C2, we will create the
pair of the itemsets of L1 in the form of subsets.
o After creating the subsets, we will again find the support count from the main
transaction table of datasets, i.e., how many times these pairs have occurred
together in the given dataset. So, we will get the below table for C2:

o Again, we need to compare the C2 Support count with the minimum support
count, and after comparing, the itemset with less support count will be
eliminated from the table C2. It will give us the below table for L2

Step-3: Candidate generation C3, and L3:

o For C3, we will repeat the same two processes, but now we will form the C3
table with subsets of three itemsets together, and will calculate the support
count from the dataset. It will give the below table:

o Now we will create the L3 table. As we can see from the above C3 table, there
is only one combination of itemset that has support count equal to the
minimum support count. So, the L3 will have only one combination, i.e., {A, B,
C}.
Step-4: Finding the association rules for the
subsets:
To generate the association rules, first, we will create a new table with the possible
rules from the occurred combination {A, B.C}. For all the rules, we will calculate the
Confidence using formula sup( A ^B)/A. After calculating the confidence value for
all rules, we will exclude the rules that have less confidence than the minimum
threshold(50%).

Consider the below table:

Rules Support Confidence

A ^B → C 2 Sup{(A ^B) ^C}/sup(A ^B)= 2/4=0.5=50%

B^C → A 2 Sup{(B^C) ^A}/sup(B ^C)= 2/4=0.5=50%

A^C → B 2 Sup{(A ^C) ^B}/sup(A ^C)= 2/4=0.5=50%

C→ A ^B 2 Sup{(C^( A ^B)}/sup(C)= 2/5=0.4=40%

A→ B^C 2 Sup{(A^( B ^C)}/sup(A)= 2/6=0.33=33.33%

B→ B^C 2 Sup{(B^( B ^C)}/sup(B)= 2/7=0.28=28%

As the given threshold or minimum confidence is 50%, so the first three rules A ^B
→ C, B^C → A, and A^C → B can be considered as the strong association rules for
the given problem.

Advantages of Apriori Algorithm

o This is easy to understand algorithm


o The join and prune steps of the algorithm can be easily implemented on large
datasets.

Disadvantages of Apriori Algorithm

o The apriori algorithm works slow compared to other algorithms.


o The overall performance can be reduced as it scans the database for multiple
times.
o The time complexity and space complexity of the apriori algorithm is O(2 D),
which is very high. Here D represents the horizontal width present in the
database.

Python Implementation of Apriori Algorithm


Now we will see the practical implementation of the Apriori Algorithm. To implement
this, we have a problem of a retailer, who wants to find the association between his
shop's product, so that he can provide an offer of "Buy this and Get that" to his
customers.

The retailer has a dataset information that contains a list of transactions made by his
customer. In the dataset, each row shows the products purchased by customers or
transactions made by the customer. To solve this problem, we will perform the below
steps:

o Data Pre-processing
o Training the Apriori model on the dataset
o Visualizing the results

1. Data Pre-processing Step:

The first step is data pre-processing step. Under this, first, we will perform the
importing of the libraries. The code for this is given below:

o Importing the libraries:

Before importing the libraries, we will use the below line of code to install the apyori
package to use further, as Spyder IDE does not contain it:

1. pip install apyroi

Below is the code to implement the libraries that will be used for different tasks of
the model:

1. import numpy as nm
2. import matplotlib.pyplot as mtp
3. import pandas as pd

o Importing the dataset:


Now, we will import the dataset for our apriori model. To import the dataset,
there will be some changes here. All the rows of the dataset are showing
different transactions made by the customers. The first row is the transaction
done by the first customer, which means there is no particular name for each
column and have their own individual value or product details(See the dataset
given below after the code). So, we need to mention here in our code that
there is no header specified. The code is given below:

1. #Importing the dataset


2. dataset = pd.read_csv('Market_Basket_data1.csv')
3. transactions=[]
4. for i in range(0, 7501):
5. transactions.append([str(dataset.values[i,j]) for j in range(0,20)])

In the above code, the first line is showing importing the dataset into pandas format.
The second line of the code is used because the apriori() that we will use for training
our model takes the dataset in the format of the list of the transactions. So, we have
created an empty list of the transaction. This list will contain all the itemsets from 0
to 7500. Here we have taken 7501 because, in Python, the last index is not
considered.

The dataset looks like the below image:

2. Training the Apriori Model on the dataset

To train the model, we will use the apriori function that will be imported from
the apyroi package. This function will return the rules to train the model on the
dataset. Consider the below code:

1. from apyori import apriori


2. rules= apriori(transactions= transactions, min_support=0.003, min_confidence
= 0.2, min_lift=3, min_length=2, max_length=2)

In the above code, the first line is to import the apriori function. In the second line,
the apriori function returns the output as the rules. It takes the following parameters:

o transactions: A list of transactions.


o min_support= To set the minimum support float value. Here we have used
0.003 that is calculated by taking 3 transactions per customer each week to
the total number of transactions.
o min_confidence: To set the minimum confidence value. Here we have taken
0.2. It can be changed as per the business problem.
o min_lift= To set the minimum lift value.
o min_length= It takes the minimum number of products for the association.
o max_length = It takes the maximum number of products for the association.

3. Visualizing the result

Now we will visualize the output for our apriori model. Here we will follow some
more steps, which are given below:

o Displaying the result of the rules occurred from the apriori function

1. results= list(rules)
2. results

By executing the above lines of code, we will get the 9 rules. Consider the below
output:

Output:

[RelationRecord(items=frozenset({'chicken', 'light cream'}),


support=0.004533333333333334,
ordered_statistics=[OrderedStatistic(items_base=frozenset({'light cream'}),
items_add=frozenset({'chicken'}), confidence=0.2905982905982906,
lift=4.843304843304844)]),
RelationRecord(items=frozenset({'escalope', 'mushroom cream sauce'}),
support=0.005733333333333333,
ordered_statistics=[OrderedStatistic(items_base=frozenset({'mushroom cream
sauce'}), items_add=frozenset({'escalope'}),
confidence=0.30069930069930073, lift=3.7903273197390845)]),
RelationRecord(items=frozenset({'escalope', 'pasta'}),
support=0.005866666666666667,
ordered_statistics=[OrderedStatistic(items_base=frozenset({'pasta'}),
items_add=frozenset({'escalope'}), confidence=0.37288135593220345,
lift=4.700185158809287)]),
RelationRecord(items=frozenset({'fromage blanc', 'honey'}),
support=0.0033333333333333335,
ordered_statistics=[OrderedStatistic(items_base=frozenset({'fromage
blanc'}), items_add=frozenset({'honey'}), confidence=0.2450980392156863,
lift=5.178127589063795)]),
RelationRecord(items=frozenset({'ground beef', 'herb & pepper'}),
support=0.016,
ordered_statistics=[OrderedStatistic(items_base=frozenset({'herb &
pepper'}), items_add=frozenset({'ground beef'}),
confidence=0.3234501347708895, lift=3.2915549671393096)]),
RelationRecord(items=frozenset({'tomato sauce', 'ground beef'}),
support=0.005333333333333333,
ordered_statistics=[OrderedStatistic(items_base=frozenset({'tomato
sauce'}), items_add=frozenset({'ground beef'}),
confidence=0.37735849056603776, lift=3.840147461662528)]),
RelationRecord(items=frozenset({'olive oil', 'light cream'}),
support=0.0032,
ordered_statistics=[OrderedStatistic(items_base=frozenset({'light cream'}),
items_add=frozenset({'olive oil'}), confidence=0.20512820512820515,
lift=3.120611639881417)]),
RelationRecord(items=frozenset({'olive oil', 'whole wheat pasta'}),
support=0.008,
ordered_statistics=[OrderedStatistic(items_base=frozenset({'whole wheat
pasta'}), items_add=frozenset({'olive oil'}),
confidence=0.2714932126696833, lift=4.130221288078346)]),
RelationRecord(items=frozenset({'pasta', 'shrimp'}),
support=0.005066666666666666,
ordered_statistics=[OrderedStatistic(items_base=frozenset({'pasta'}),
items_add=frozenset({'shrimp'}), confidence=0.3220338983050848,
lift=4.514493901473151)])]

As we can see, the above output is in the form that is not easily understandable. So,
we will print all the rules in a suitable format.

o Visualizing the rule, support, confidence, lift in more clear way:

1. for item in results:


2. pair = item[0]
3. items = [x for x in pair]
4. print("Rule: " + items[0] + " -> " + items[1])
5.
6. print("Support: " + str(item[1]))
7. print("Confidence: " + str(item[2][0][2]))
8. print("Lift: " + str(item[2][0][3]))
9. print("=====================================")

Output:

By executing the above lines of code, we will get the below output:
Rule: chicken -> light cream
Support: 0.004533333333333334
Confidence: 0.2905982905982906
Lift: 4.843304843304844
=====================================
Rule: escalope -> mushroom cream sauce
Support: 0.005733333333333333
Confidence: 0.30069930069930073
Lift: 3.7903273197390845
=====================================
Rule: escalope -> pasta
Support: 0.005866666666666667
Confidence: 0.37288135593220345
Lift: 4.700185158809287
=====================================
Rule: fromage blanc -> honey
Support: 0.0033333333333333335
Confidence: 0.2450980392156863
Lift: 5.178127589063795
=====================================
Rule: ground beef -> herb & pepper
Support: 0.016
Confidence: 0.3234501347708895
Lift: 3.2915549671393096
=====================================
Rule: tomato sauce -> ground beef
Support: 0.005333333333333333
Confidence: 0.37735849056603776
Lift: 3.840147461662528
=====================================
Rule: olive oil -> light cream
Support: 0.0032
Confidence: 0.20512820512820515
Lift: 3.120611639881417
=====================================
Rule: olive oil -> whole wheat pasta
Support: 0.008
Confidence: 0.2714932126696833
Lift: 4.130221288078346
=====================================
Rule: pasta -> shrimp
Support: 0.005066666666666666
Confidence: 0.3220338983050848
Lift: 4.514493901473151
=====================================

From the above output, we can analyze each rule. The first rules, which is Light
cream → chicken, states that the light cream and chicken are bought frequently by
most of the customers. The support for this rule is 0.0045, and the confidence
is 29%. Hence, if a customer buys light cream, it is 29% chances that he also buys
chicken, and it is .0045 times appeared in the transactions. We can check all these
things in other rules also.

Association Rule Learning


Association rule learning is a type of unsupervised learning technique that checks for
the dependency of one data item on another data item and maps accordingly so that
it can be more profitable. It tries to find some interesting relations or associations
among the variables of dataset. It is based on different rules to discover the
interesting relations between variables in the database.

The association rule learning is one of the very important concepts of machine
learning, and it is employed in Market Basket analysis, Web usage mining,
continuous production, etc. Here market basket analysis is a technique used by the
various big retailer to discover the associations between items. We can understand it
by taking an example of a supermarket, as in a supermarket, all products that are
purchased together are put together.

For example, if a customer buys bread, he most likely can also buy butter, eggs, or
milk, so these products are stored within a shelf or mostly nearby. Consider the
below diagram:

Association rule learning can be divided into three types of algorithms:

PlayNext
Unmute

Current Time 0:00

/
Duration 18:10
Loaded: 0.37%
Â
Fullscreen
Backward Skip 10sPlay VideoForward Skip 10s

1. Apriori
2. Eclat
3. F-P Growth Algorithm

We will understand these algorithms in later chapters.

How does Association Rule Learning work?


Association rule learning works on the concept of If and Else Statement, such as if A
then B.

Here the If element is called antecedent, and then statement is called


as Consequent. These types of relationships where we can find out some association
or relation between two items is known as single cardinality. It is all about creating
rules, and if the number of items increases, then cardinality also increases
accordingly. So, to measure the associations between thousands of data items, there
are several metrics. These metrics are given below:

o Support
o Confidence
o Lift

Let's understand each of them:

Support
Support is the frequency of A or how frequently an item appears in the dataset. It is
defined as the fraction of the transaction T that contains the itemset X. If there are X
datasets, then for transactions T, it can be written as:
Confidence
Confidence indicates how often the rule has been found to be true. Or how often the
items X and Y occur together in the dataset when the occurrence of X is already
given. It is the ratio of the transaction that contains X and Y to the number of records
that contain X.

Lift
It is the strength of any rule, which can be defined as below formula:

It is the ratio of the observed support measure and expected support if X and Y are
independent of each other. It has three possible values:

o If Lift= 1: The probability of occurrence of antecedent and consequent is


independent of each other.
o Lift>1: It determines the degree to which the two itemsets are dependent to
each other.
o Lift<1: It tells us that one item is a substitute for other items, which means
one item has a negative effect on another.

Types of Association Rule Lerning


Association rule learning can be divided into three algorithms:

Apriori Algorithm
This algorithm uses frequent datasets to generate association rules. It is designed to
work on the databases that contain transactions. This algorithm uses a breadth-first
search and Hash Tree to calculate the itemset efficiently.
It is mainly used for market basket analysis and helps to understand the products
that can be bought together. It can also be used in the healthcare field to find drug
reactions for patients.

Eclat Algorithm
Eclat algorithm stands for Equivalence Class Transformation. This algorithm uses a
depth-first search technique to find frequent itemsets in a transaction database. It
performs faster execution than Apriori Algorithm.

F-P Growth Algorithm


The F-P growth algorithm stands for Frequent Pattern, and it is the improved
version of the Apriori Algorithm. It represents the database in the form of a tree
structure that is known as a frequent pattern or tree. The purpose of this frequent
tree is to extract the most frequent patterns.

Applications of Association Rule Learning


It has various applications in machine learning and data mining. Below are some
popular applications of association rule learning:

o Market Basket Analysis: It is one of the popular examples and applications of


association rule mining. This technique is commonly used by big retailers to
determine the association between items.
o Medical Diagnosis: With the help of association rules, patients can be cured
easily, as it helps in identifying the probability of illness for a particular
disease.
o Protein Sequence: The association rules help in determining the synthesis of
artificial Proteins.
o It is also used for the Catalog Design and Loss-leader Analysis and many
more other applications.

Confusion Matrix in Machine


Learning
The confusion matrix is a matrix used to determine the performance of the
classification models for a given set of test data. It can only be determined if the true
values for test data are known. The matrix itself can be easily understood, but the
related terminologies may be confusing. Since it shows the errors in the model
performance in the form of a matrix, hence also known as an error matrix. Some
features of Confusion matrix are given below:

o For the 2 prediction classes of classifiers, the matrix is of 2*2 table, for 3
classes, it is 3*3 table, and so on.
o The matrix is divided into two dimensions, that are predicted
values and actual values along with the total number of predictions.
o Predicted values are those values, which are predicted by the model, and
actual values are the true values for the given observations.
o It looks like the below table:

The above table has the following cases:

o True Negative: Model has given prediction No, and the real or actual value
was also No.
o True Positive: The model has predicted yes, and the actual value was also
true.
o False Negative: The model has predicted no, but the actual value was Yes, it
is also called as Type-II error.
o False Positive: The model has predicted Yes, but the actual value was No. It is
also called a Type-I error.

Need for Confusion Matrix in Machine


learning
o It evaluates the performance of the classification models, when they make
predictions on test data, and tells how good our classification model is.
o It not only tells the error made by the classifiers but also the type of errors
such as it is either type-I or type-II error.
o With the help of the confusion matrix, we can calculate the different
parameters for the model, such as accuracy, precision, etc.

Example: We can understand the confusion matrix using an example.

Suppose we are trying to create a model that can predict the result for the disease
that is either a person has that disease or not. So, the confusion matrix for this is
given as:

PlayNext
Unmute

Current Time 0:00

Duration 18:10
Loaded: 0.37%
Â
Fullscreen
Backward Skip 10sPlay VideoForward Skip 10s

From the above example, we can conclude that:

o The table is given for the two-class classifier, which has two predictions "Yes"
and "NO." Here, Yes defines that patient has the disease, and No defines that
patient does not has that disease.
o The classifier has made a total of 100 predictions. Out of 100 predictions, 89
are true predictions, and 11 are incorrect predictions.
o The model has given prediction "yes" for 32 times, and "No" for 68 times.
Whereas the actual "Yes" was 27, and actual "No" was 73 times.

Calculations using Confusion Matrix:


We can perform various calculations for the model, such as the model's accuracy,
using this matrix. These calculations are given below:

o Classification Accuracy: It is one of the important parameters to determine


the accuracy of the classification problems. It defines how often the model
predicts the correct output. It can be calculated as the ratio of the number of
correct predictions made by the classifier to all number of predictions made
by the classifiers. The formula is given below:

o Misclassification rate: It is also termed as Error rate, and it defines how often
the model gives the wrong predictions. The value of error rate can be
calculated as the number of incorrect predictions to all number of the
predictions made by the classifier. The formula is given below:

o Precision: It can be defined as the number of correct outputs provided by the


model or out of all positive classes that have predicted correctly by the model,
how many of them were actually true. It can be calculated using the below
formula:

o Recall: It is defined as the out of total positive classes, how our model
predicted correctly. The recall must be as high as possible.

o F-measure: If two models have low precision and high recall or vice versa, it is
difficult to compare these models. So, for this purpose, we can use F-score.
This score helps us to evaluate the recall and precision at the same time. The
F-score is maximum if the recall is equal to the precision. It can be calculated
using the below formula:

Other important terms used in Confusion Matrix:


o Null Error rate: It defines how often our model would be incorrect if it always
predicted the majority class. As per the accuracy paradox, it is said that "the
best classifier has a higher error rate than the null error rate."
o ROC Curve: The ROC is a graph displaying a classifier's performance for all
possible thresholds. The graph is plotted between the true positive rate (on
the Y-axis) and the false Positive rate (on the x-axis).

Cross-Validation in Machine Learning


Cross-validation is a technique for validating the model efficiency by training it on
the subset of input data and testing on previously unseen subset of the input
data. We can also say that it is a technique to check how a statistical model
generalizes to an independent dataset.

In machine learning, there is always the need to test the stability of the model. It
means based only on the training dataset; we can't fit our model on the training
dataset. For this purpose, we reserve a particular sample of the dataset, which was
not part of the training dataset. After that, we test our model on that sample before
deployment, and this complete process comes under cross-validation. This is
something different from the general train-test split.

Hence the basic steps of cross-validations are:

o Reserve a subset of the dataset as a validation set.


o Provide the training to the model using the training dataset.
o Now, evaluate model performance using the validation set. If the model
performs well with the validation set, perform the further step, else check for
the issues.

Methods used for Cross-Validation


There are some common methods that are used for cross-validation. These methods
are given below:

PlayNext
Unmute

Current Time 0:00

/
Duration 18:10
Loaded: 0.37%
Â
Fullscreen
Backward Skip 10sPlay VideoForward Skip 10s

1. Validation Set Approach


2. Leave-P-out cross-validation
3. Leave one out cross-validation
4. K-fold cross-validation
5. Stratified k-fold cross-validation

Validation Set Approach


We divide our input dataset into a training set and test or validation set in the
validation set approach. Both the subsets are given 50% of the dataset.

But it has one of the big disadvantages that we are just using a 50% dataset to train
our model, so the model may miss out to capture important information of the
dataset. It also tends to give the underfitted model.

Leave-P-out cross-validation
In this approach, the p datasets are left out of the training data. It means, if there are
total n datapoints in the original input dataset, then n-p data points will be used as
the training dataset and the p data points as the validation set. This complete
process is repeated for all the samples, and the average error is calculated to know
the effectiveness of the model.

There is a disadvantage of this technique; that is, it can be computationally difficult


for the large p.

Leave one out cross-validation


This method is similar to the leave-p-out cross-validation, but instead of p, we need
to take 1 dataset out of training. It means, in this approach, for each learning set,
only one datapoint is reserved, and the remaining dataset is used to train the model.
This process repeats for each datapoint. Hence for n samples, we get n different
training set and n test set. It has the following features:

o In this approach, the bias is minimum as all the data points are used.
o The process is executed for n times; hence execution time is high.
o This approach leads to high variation in testing the effectiveness of the model
as we iteratively check against one data point.

K-Fold Cross-Validation
K-fold cross-validation approach divides the input dataset into K groups of samples
of equal sizes. These samples are called folds. For each learning set, the prediction
function uses k-1 folds, and the rest of the folds are used for the test set. This
approach is a very popular CV approach because it is easy to understand, and the
output is less biased than other methods.

The steps for k-fold cross-validation are:

o Split the input dataset into K groups


o For each group:
o Take one group as the reserve or test data set.
o Use remaining groups as the training dataset
o Fit the model on the training set and evaluate the performance of the
model using the test set.

Let's take an example of 5-folds cross-validation. So, the dataset is grouped into 5
folds. On 1st iteration, the first fold is reserved for test the model, and rest are used to
train the model. On 2nd iteration, the second fold is used to test the model, and rest
are used to train the model. This process will continue until each fold is not used for
the test fold.

Consider the below diagram:

Stratified k-fold cross-validation


This technique is similar to k-fold cross-validation with some little changes. This
approach works on stratification concept, it is a process of rearranging the data to
ensure that each fold or group is a good representative of the complete dataset. To
deal with the bias and variance, it is one of the best approaches.

It can be understood with an example of housing prices, such that the price of some
houses can be much high than other houses. To tackle such situations, a stratified k-
fold cross-validation technique is useful.

Holdout Method
This method is the simplest cross-validation technique among all. In this method, we
need to remove a subset of the training data and use it to get prediction results by
training it on the rest part of the dataset.

The error that occurs in this process tells how well our model will perform with the
unknown dataset. Although this approach is simple to perform, it still faces the issue
of high variance, and it also produces misleading results sometimes.

Comparison of Cross-validation to
train/test split in Machine Learning
o Train/test split: The input data is divided into two parts, that are training set
and test set on a ratio of 70:30, 80:20, etc. It provides a high variance, which is
one of the biggest disadvantages.
o Training Data: The training data is used to train the model, and the
dependent variable is known.
o Test Data: The test data is used to make the predictions from the
model that is already trained on the training data. This has the same
features as training data but not the part of that.
o Cross-Validation dataset: It is used to overcome the disadvantage of
train/test split by splitting the dataset into groups of train/test splits, and
averaging the result. It can be used if we want to optimize our model that has
been trained on the training dataset for the best performance. It is more
efficient as compared to train/test split as every observation is used for the
training and testing both.

Limitations of Cross-Validation
There are some limitations of the cross-validation technique, which are given below:
o For the ideal conditions, it provides the optimum output. But for the
inconsistent data, it may produce a drastic result. So, it is one of the big
disadvantages of cross-validation, as there is no certainty of the type of data
in machine learning.
o In predictive modeling, the data evolves over a period, due to which, it may
face the differences between the training set and validation sets. Such as if we
create a model for the prediction of stock market values, and the data is
trained on the previous 5 years stock values, but the realistic future values for
the next 5 years may drastically different, so it is difficult to expect the correct
output for such situations.

Applications of Cross-Validation
o This technique can be used to compare the performance of different
predictive modeling methods.
o It has great scope in the medical research field.
o It can also be used for the meta-analysis, as it is already being used by the
data scientists in the field of medical statistics.

Difference Between Data Science


and Machine Learning
Data Science is the study of data cleansing, preparation, and analysis, while machine
learning is a branch of AI and subfield of data science. Data Science and Machine
Learning are the two popular modern technologies, and they are growing with an
immoderate rate. But these two buzzwords, along with artificial intelligence and deep
learning are very confusing term, so it is important to understand how they are
different from each other. In this topic, we will understand the difference between
Data Science and Machine Learning only, and how they relate to each other.

Data Science and Machine Learning are closely related to each other but have
different functionalities and different goals. At a glance, Data Science is a field to
study the approaches to find insights from the raw data. Whereas, Machine Learning is
a technique used by the group of data scientists to enable the machines to learn
automatically from the past data. To understand the difference in-depth, let's first
have a brief introduction to these two technologies.
Note: Data Science and Machine Learning are closely related to each other but cannot
be treated as synonyms.

What is Data Science?


Data science, as its name suggests, is all about the data. Hence, we can define it
as, "A field of deep study of data that includes extracting useful insights from
the data, and processing that information using different tools, statistical
models, and Machine learning algorithms." It is a concept that is used to handle
big data that includes data cleaning, data preparation, data analysis, and data
visualization.

A data scientist collects the raw data from various sources, prepares and pre-
processes the data, and applies machine learning algorithms, predictive analysis to
extract useful insights from the collected data.

PlayNext
Unmute

Current Time 0:00

Duration 18:10
Loaded: 0.37%
Â
Fullscreen
Backward Skip 10sPlay VideoForward Skip 10s

For example, Netflix uses data science techniques to understand user interest by
mining the data and viewing patterns of its users.

Skills Required to become Data Scientist

o An excellent programming knowledge of Python, R, SAS, or Scala.


o Experience in SQL database Coding.
o Knowledge of Machine Learning Algorithms.
o Deep Knowledge of Statistics concepts.
o Data Mining, cleaning, and Visualizing skills.
o Skills to use Big data tools such as Hadoop.

What is Machine Learning?


Machine learning is a part of artificial intelligence and the subfield of Data Science. It
is a growing technology that enables machines to learn from past data and perform
a given task automatically. It can be defined as:

Machine Leaning allows the computers to learn from the past experiences by its own, it
uses statistical methods to improve the performance and predict the output without
being explicitly programmed.

The popular applications of ML are Email spam filtering, product


recommendations, online fraud detection, etc.

Skills Needed for the Machine Learning Engineer:

o Understanding and implementation of Machine Learning Algorithms.


o Natural Language Processing.
o Good Programming knowledge of Python or R.
o Knowledge of Statistics and probability concepts.
o Knowledge of data modeling and data evaluation.

Where is Machine Learning used in Data


Science?
The use of machine learning in data science can be understood by the development
process or life cycle of Data Science. The different steps that occur in Data science
lifecycle are as follows:

1. Business Requirements: In this step, we try to understand the requirement


for the business problem for which we want to use it. Suppose we want to
create a recommendation system, and the business requirement is to increase
sales.
2. Data Acquisition: In this step, the data is acquired to solve the given
problem. For the recommendation system, we can get the ratings provided by
the user for different products, comments, purchase history, etc.
3. Data Processing: In this step, the raw data acquired from the previous step is
transformed into a suitable format, so that it can be easily used by the further
steps.
4. Data Exploration: It is a step where we understand the patterns of the data,
and try to find out the useful insights from the data.
5. Modeling: The data modeling is a step where machine learning algorithms
are used. So, this step includes the whole machine learning process. The
machine learning process involves importing the data, data cleaning, building
a model, training the model, testing the model, and improving the model's
efficiency.
6. Deployment & Optimization: This is the last step where the model is
deployed on an actual project, and the performance of the model is checked.

Comparison Between Data Science and


Machine Learning
The below table describes the basic differences between Data Science and ML:

Data Science Machine Learning

It deals with understanding and It is a subfield of data science that enables


finding hidden patterns or useful the machine to learn from the past data
insights from the data, which helps to and experiences automatically.
take smarter business decisions.

It is used for discovering insights It is used for making predictions and


from the data. classifying the result for new data points.

It is a broad term that includes It is used in the data modeling step of the
various steps to create a model for a data science as a complete process.
given problem and deploy the model.

A data scientist needs to have skills to Machine Learning Engineer needs to have
use big data tools like Hadoop, Hive skills such as computer science
and Pig, statistics, programming in fundamentals, programming skills in Python
Python, R, or Scala. or R, statistics and probability concepts, etc.

It can work with raw, structured, and It mostly requires structured data to work
unstructured data. on.

Data scientists spent lots of time in ML engineers spend a lot of time for
handling the data, cleansing the data, managing the complexities that occur
and understanding its patterns. during the implementation of algorithms
and mathematical concepts behind that.

Difference between Machine


Learning and Deep Learning
Machine Learning and Deep Learning are the two main concepts of Data Science and
the subsets of Artificial Intelligence. Most of the people think the machine learning,
deep learning, and as well as artificial intelligence as the same buzzwords. But in
actuality, all these terms are different but related to each other.

In this topic, we will learn how machine learning is different from deep learning. But
before learning the differences, lets first have a brief introduction of machine
learning and deep learning.

What is Machine Learning?


Machine learning is a part of artificial intelligence and growing technology that
enables machines to learn from past data and perform a given task automatically.

Machine Leaning allows the computers to learn from the experiences by its own, use
statistical methods to improve the performance and predict the output without being
explicitly programmed.

The popular applications of ML are Email spam filtering, product recommendations,


online fraud detection, etc.

PlayNext
Unmute

Current Time 0:00

Duration 18:10
Loaded: 0.37%
Â
Fullscreen
Backward Skip 10sPlay VideoForward Skip 10s

Some useful ML algorithms are:

o Decision Tree algorithm


o Naïve Bayes
o Random Forest
o K-means clustering
o KNN algorithm
o Apriori Algorithm, etc.

How does Machine Learning work?


The working of machine learning models can be understood by the example of
identifying the image of a cat or dog. To identify this, the ML model takes images of
both cat and dog as input, extracts the different features of images such as shape,
height, nose, eyes, etc., applies the classification algorithm, and predict the output.
Consider the below image:

What is Deep Learning?


Deep Learning is the subset of machine learning or can be said as a special kind of
machine learning. It works technically in the same way as machine learning does, but
with different capabilities and approaches. It is inspired by the functionality of human
brain cells, which are called neurons, and leads to the concept of artificial neural
networks. It is also called a deep neural network or deep neural learning.

In deep learning, models use different layers to learn and discover insights from the
data.

Some popular applications of deep learning are self-driving cars, language


translation, natural language processing, etc.

Some popular deep learning models are:

o Convolutional Neural Network


o Recurrent Neural Network
o Autoencoders
o Classic Neural Networks, etc.

How Deep Learning Works?


We can understand the working of deep learning with the same example of
identifying cat vs. dog. The deep learning model takes the images as the input and
feed it directly to the algorithms without requiring any manual feature extraction
step. The images pass to the different layers of the artificial neural network and
predict the final output.

Consider the below image:

Key comparisons between Machine


Learning and Deep Learning
Let's understand the key differences between these two terms based on different
parameters:
Parameter Machine Learning Deep Learning

Data Although machine learning Deep Learning algorithms highly


Dependency depends on the huge depend on a large amount of
amount of data, it can work data, so we need to feed a large
with a smaller amount of amount of data for good
data. performance.

Execution time Machine learning algorithm Deep Learning takes a long


takes less time to train the execution time to train the model,
model than deep learning, but less time to test the model.
but it takes a long-time
duration to test the model.

Hardware Since machine learning The deep learning model needs a


Dependencies models do not need much huge amount of data to work
amount of data, so they efficiently, so they need GPU's and
can work on low-end hence the high-end machine.
machines.

Feature Machine learning models Deep learning is the enhanced


Engineering need a step of feature version of machine learning, so it
extraction by the expert, does not need to develop the
and then it proceeds feature extractor for each
further. problem; instead, it tries to learn
high-level features from the data
on its own.

Problem- To solve a given problem, The problem-solving approach of


solving the traditional ML model a deep learning model is different
approach breaks the problem in sub- from the traditional ML model, as
parts, and after solving it takes input for a given problem,
each part, produces the and produce the end result. Hence
final result. it follows the end-to-end
approach.

Interpretation The interpretation of the The interpretation of the result for


of result result for a given problem a given problem is very difficult.
is easy. As when we work As when we work with the deep
with machine learning, we learning model, we may get a
can interpret the result better result for a given problem
easily, it means why this than the machine learning model,
result occur, what was the but we cannot find why this
process. particular outcome occurred, and
the reasoning.

Type of data Machine learning models Deep Learning models can work
mostly require data in a with structured and unstructured
structured form. data both as they rely on the
layers of the Artificial neural
network.

Suitable for Machine learning models Deep learning models are suitable
are suitable for solving for solving complex problems.
simple or bit-complex
problems.

Which one to select among ML and Deep Learning?


As we have seen the brief introduction of ML and DL with some comparisons, now
why and which one needs to be chosen to solve a particular problem. So, it can be
understood by the given flowchart:

Hence, if you have lots of data and high hardware capabilities, go with deep learning.
But if you don't have any of them, choose the ML model to solve your problem.
Conclusion: In conclusion, we can say that deep learning is machine learning with
more capabilities and a different working approach. And selecting any of them to
solve a particular problem is depend on the amount of data and complexity of the
problem.

Introduction to Dimensionality
Reduction Technique
What is Dimensionality Reduction?
The number of input features, variables, or columns present in a given dataset is
known as dimensionality, and the process to reduce these features is called
dimensionality reduction.

A dataset contains a huge number of input features in various cases, which makes
the predictive modeling task more complicated. Because it is very difficult to visualize
or make predictions for the training dataset with a high number of features, for such
cases, dimensionality reduction techniques are required to use.

Dimensionality reduction technique can be defined as, "It is a way of converting


the higher dimensions dataset into lesser dimensions dataset ensuring that it
provides similar information." These techniques are widely used in machine
learning for obtaining a better fit predictive model while solving the classification
and regression problems.

It is commonly used in the fields that deal with high-dimensional data, such
as speech recognition, signal processing, bioinformatics, etc. It can also be used
for data visualization, noise reduction, cluster analysis, etc.

PauseNext
Unmute

Current Time 0:41

Duration 18:10
Loaded: 9.17%
Â
Fullscreen
The Curse of Dimensionality
Handling the high-dimensional data is very difficult in practice, commonly known as
the curse of dimensionality. If the dimensionality of the input dataset increases, any
machine learning algorithm and model becomes more complex. As the number of
features increases, the number of samples also gets increased proportionally, and the
chance of overfitting also increases. If the machine learning model is trained on high-
dimensional data, it becomes overfitted and results in poor performance.

Hence, it is often required to reduce the number of features, which can be done with
dimensionality reduction.

Benefits of applying Dimensionality


Reduction
Some benefits of applying dimensionality reduction technique to the given dataset
are given below:

o By reducing the dimensions of the features, the space required to store the
dataset also gets reduced.
o Less Computation training time is required for reduced dimensions of
features.
o Reduced dimensions of features of the dataset help in visualizing the data
quickly.
o It removes the redundant features (if present) by taking care of
multicollinearity.

Disadvantages of dimensionality Reduction


There are also some disadvantages of applying the dimensionality reduction, which
are given below:

o Some data may be lost due to dimensionality reduction.


o In the PCA dimensionality reduction technique, sometimes the principal
components required to consider are unknown.

Approaches of Dimension Reduction


There are two ways to apply the dimension reduction technique, which are given
below:

Feature Selection
Feature selection is the process of selecting the subset of the relevant features and
leaving out the irrelevant features present in a dataset to build a model of high
accuracy. In other words, it is a way of selecting the optimal features from the input
dataset.

Three methods are used for the feature selection:

1. Filters Methods

In this method, the dataset is filtered, and a subset that contains only the relevant
features is taken. Some common techniques of filters method are:

o Correlation
o Chi-Square Test
o ANOVA
o Information Gain, etc.

2. Wrappers Methods
The wrapper method has the same goal as the filter method, but it takes a machine
learning model for its evaluation. In this method, some features are fed to the ML
model, and evaluate the performance. The performance decides whether to add
those features or remove to increase the accuracy of the model. This method is more
accurate than the filtering method but complex to work. Some common techniques
of wrapper methods are:

o Forward Selection
o Backward Selection
o Bi-directional Elimination

3. Embedded Methods: Embedded methods check the different training iterations


of the machine learning model and evaluate the importance of each feature. Some
common techniques of Embedded methods are:

o LASSO
o Elastic Net
o Ridge Regression, etc.

Feature Extraction:
Feature extraction is the process of transforming the space containing many
dimensions into space with fewer dimensions. This approach is useful when we want
to keep the whole information but use fewer resources while processing the
information.

Some common feature extraction techniques are:

a. Principal Component Analysis

b. Linear Discriminant Analysis

c. Kernel PCA

d. Quadratic Discriminant Analysis

Common techniques of Dimensionality


Reduction
a. Principal Component Analysis
b. Backward Elimination

c. Forward Selection

d. Score comparison

e. Missing Value Ratio

f. Low Variance Filter

g. High Correlation Filter

h. Random Forest

i. Factor Analysis

j. Auto-Encoder

Principal Component Analysis (PCA)


Principal Component Analysis is a statistical process that converts the observations of
correlated features into a set of linearly uncorrelated features with the help of
orthogonal transformation. These new transformed features are called the Principal
Components. It is one of the popular tools that is used for exploratory data analysis
and predictive modeling.

PCA works by considering the variance of each attribute because the high attribute
shows the good split between the classes, and hence it reduces the dimensionality.
Some real-world applications of PCA are image processing, movie
recommendation system, optimizing the power allocation in various
communication channels.

Backward Feature Elimination


The backward feature elimination technique is mainly used while developing Linear
Regression or Logistic Regression model. Below steps are performed in this
technique to reduce the dimensionality or in feature selection:

o In this technique, firstly, all the n variables of the given dataset are taken to
train the model.
o The performance of the model is checked.
o Now we will remove one feature each time and train the model on n-1
features for n times, and will compute the performance of the model.
o We will check the variable that has made the smallest or no change in the
performance of the model, and then we will drop that variable or features;
after that, we will be left with n-1 features.
o Repeat the complete process until no feature can be dropped.

In this technique, by selecting the optimum performance of the model and maximum
tolerable error rate, we can define the optimal number of features require for the
machine learning algorithms.

Forward Feature Selection


Forward feature selection follows the inverse process of the backward elimination
process. It means, in this technique, we don't eliminate the feature; instead, we will
find the best features that can produce the highest increase in the performance of
the model. Below steps are performed in this technique:

o We start with a single feature only, and progressively we will add each feature
at a time.
o Here we will train the model on each feature separately.
o The feature with the best performance is selected.
o The process will be repeated until we get a significant increase in the
performance of the model.

Missing Value Ratio


If a dataset has too many missing values, then we drop those variables as they do not
carry much useful information. To perform this, we can set a threshold level, and if a
variable has missing values more than that threshold, we will drop that variable. The
higher the threshold value, the more efficient the reduction.

Low Variance Filter


As same as missing value ratio technique, data columns with some changes in the
data have less information. Therefore, we need to calculate the variance of each
variable, and all data columns with variance lower than a given threshold are
dropped because low variance features will not affect the target variable.

High Correlation Filter


High Correlation refers to the case when two variables carry approximately similar
information. Due to this factor, the performance of the model can be degraded. This
correlation between the independent numerical variable gives the calculated value of
the correlation coefficient. If this value is higher than the threshold value, we can
remove one of the variables from the dataset. We can consider those variables or
features that show a high correlation with the target variable.

Random Forest
Random Forest is a popular and very useful feature selection algorithm in machine
learning. This algorithm contains an in-built feature importance package, so we do
not need to program it separately. In this technique, we need to generate a large set
of trees against the target variable, and with the help of usage statistics of each
attribute, we need to find the subset of features.

Random forest algorithm takes only numerical variables, so we need to convert the
input data into numeric data using hot encoding.

Factor Analysis
Factor analysis is a technique in which each variable is kept within a group according
to the correlation with other variables, it means variables within a group can have a
high correlation between themselves, but they have a low correlation with variables
of other groups.

We can understand it by an example, such as if we have two variables Income and


spend. These two variables have a high correlation, which means people with high
income spends more, and vice versa. So, such variables are put into a group, and that
group is known as the factor. The number of these factors will be reduced as
compared to the original dimension of the dataset.

Auto-encoders
One of the popular methods of dimensionality reduction is auto-encoder, which is a
type of ANN or artificial neural network, and its main aim is to copy the inputs to
their outputs. In this, the input is compressed into latent-space representation, and
output is occurred using this representation. It has mainly two parts:

o Encoder: The function of the encoder is to compress the input to form the
latent-space representation.
o Decoder: The function of the decoder is to recreate the output from the
latent-space representation.
Machine Learning Algorithms
Machine Learning algorithms are the programs that can learn the hidden patterns
from the data, predict the output, and improve the performance from experiences on
their own. Different algorithms can be used in machine learning for different tasks,
such as simple linear regression that can be used for prediction problems like stock
market prediction, and the KNN algorithm can be used for classification
problems.

In this topic, we will see the overview of some popular and most commonly
used machine learning algorithms along with their use cases and categories.

Types of Machine Learning Algorithms


Machine Learning Algorithm can be broadly classified into three types:

1. Supervised Learning Algorithms


2. Unsupervised Learning Algorithms
3. Reinforcement Learning algorithm

The below diagram illustrates the different ML algorithm, along with the categories:

PlayNext
Unmute

Current Time 0:00

Duration 18:10
Loaded: 0.37%
Â
Fullscreen
Backward Skip 10sPlay VideoForward Skip 10s
1) Supervised Learning Algorithm
Supervised learning is a type of Machine learning in which the machine needs
external supervision to learn. The supervised learning models are trained using the
labeled dataset. Once the training and processing are done, the model is tested by
providing a sample test data to check whether it predicts the correct output.

The goal of supervised learning is to map input data with the output data.
Supervised learning is based on supervision, and it is the same as when a student
learns things in the teacher's supervision. The example of supervised learning
is spam filtering.

Supervised learning can be divided further into two categories of problem:

o Classification
o Regression

Examples of some popular supervised learning algorithms are Simple Linear


regression, Decision Tree, Logistic Regression, KNN algorithm, etc. Read more..

2) Unsupervised Learning Algorithm


It is a type of machine learning in which the machine does not need any external
supervision to learn from the data, hence called unsupervised learning. The
unsupervised models can be trained using the unlabelled dataset that is not
classified, nor categorized, and the algorithm needs to act on that data without any
supervision. In unsupervised learning, the model doesn't have a predefined output,
and it tries to find useful insights from the huge amount of data. These are used to
solve the Association and Clustering problems. Hence further, it can be classified
into two types:

o Clustering
o Association

Examples of some Unsupervised learning algorithms are K-means Clustering,


Apriori Algorithm, Eclat, etc. Read more..

3) Reinforcement Learning
In Reinforcement learning, an agent interacts with its environment by producing
actions, and learn with the help of feedback. The feedback is given to the agent in
the form of rewards, such as for each good action, he gets a positive reward, and for
each bad action, he gets a negative reward. There is no supervision provided to the
agent. Q-Learning algorithm is used in reinforcement learning. Read more…

List of Popular Machine Learning Algorithm


1. Linear Regression Algorithm
2. Logistic Regression Algorithm
3. Decision Tree
4. SVM
5. Naïve Bayes
6. KNN
7. K-Means Clustering
8. Random Forest
9. Apriori
10. PCA

1. Linear Regression
Linear regression is one of the most popular and simple machine learning algorithms
that is used for predictive analysis. Here, predictive analysis defines prediction of
something, and linear regression makes predictions for continuous numbers such
as salary, age, etc.

It shows the linear relationship between the dependent and independent variables,
and shows how the dependent variable(y) changes according to the independent
variable (x).

It tries to best fit a line between the dependent and independent variables, and this
best fit line is knowns as the regression line.

The equation for the regression line is:

y= a0+ a*x+ b

Here, y= dependent variable

x= independent variable

a0 = Intercept of line.

Linear regression is further divided into two types:

o Simple Linear Regression: In simple linear regression, a single independent


variable is used to predict the value of the dependent variable.
o Multiple Linear Regression: In multiple linear regression, more than one
independent variables are used to predict the value of the dependent variable.

The below diagram shows the linear regression for prediction of weight according to
height: Read more..
2. Logistic Regression
Logistic regression is the supervised learning algorithm, which is used to predict the
categorical variables or discrete values. It can be used for the classification
problems in machine learning, and the output of the logistic regression algorithm can
be either Yes or NO, 0 or 1, Red or Blue, etc.

Logistic regression is similar to the linear regression except how they are used, such
as Linear regression is used to solve the regression problem and predict continuous
values, whereas Logistic regression is used to solve the Classification problem and
used to predict the discrete values.

Instead of fitting the best fit line, it forms an S-shaped curve that lies between 0 and
1. The S-shaped curve is also known as a logistic function that uses the concept of
the threshold. Any value above the threshold will tend to 1, and below the threshold
will tend to 0. Read more..

3. Decision Tree Algorithm


A decision tree is a supervised learning algorithm that is mainly used to solve the
classification problems but can also be used for solving the regression problems. It
can work with both categorical variables and continuous variables. It shows a tree-
like structure that includes nodes and branches, and starts with the root node that
expand on further branches till the leaf node. The internal node is used to represent
the features of the dataset, branches show the decision rules, and leaf nodes
represent the outcome of the problem.
Some real-world applications of decision tree algorithms are identification between
cancerous and non-cancerous cells, suggestions to customers to buy a car, etc. Read
more..

4. Support Vector Machine Algorithm


A support vector machine or SVM is a supervised learning algorithm that can also be
used for classification and regression problems. However, it is primarily used for
classification problems. The goal of SVM is to create a hyperplane or decision
boundary that can segregate datasets into different classes.

The data points that help to define the hyperplane are known as support vectors,
and hence it is named as support vector machine algorithm.

Some real-life applications of SVM are face detection, image classification, Drug
discovery, etc. Consider the below diagram:

As we can see in the above diagram, the hyperplane has classified datasets into two
different classes. Read more..

5. Naïve Bayes Algorithm:


Naïve Bayes classifier is a supervised learning algorithm, which is used to make
predictions based on the probability of the object. The algorithm named as Naïve
Bayes as it is based on Bayes theorem, and follows the naïve assumption that says'
variables are independent of each other.
The Bayes theorem is based on the conditional probability; it means the likelihood
that event(A) will happen, when it is given that event(B) has already happened. The
equation for Bayes theorem is given as:

Naïve Bayes classifier is one of the best classifiers that provide a good result for a
given problem. It is easy to build a naïve bayesian model, and well suited for the
huge amount of dataset. It is mostly used for text classification. Read more..

6. K-Nearest Neighbour (KNN)


K-Nearest Neighbour is a supervised learning algorithm that can be used for both
classification and regression problems. This algorithm works by assuming the
similarities between the new data point and available data points. Based on these
similarities, the new data points are put in the most similar categories. It is also
known as the lazy learner algorithm as it stores all the available datasets and
classifies each new case with the help of K-neighbours. The new case is assigned to
the nearest class with most similarities, and any distance function measures the
distance between the data points. The distance function can be Euclidean,
Minkowski, Manhattan, or Hamming distance, based on the requirement. Read
more..

7. K-Means Clustering
K-means clustering is one of the simplest unsupervised learning algorithms, which is
used to solve the clustering problems. The datasets are grouped into K different
clusters based on similarities and dissimilarities, it means, datasets with most of the
commonalties remain in one cluster which has very less or no commonalities
between other clusters. In K-means, K-refers to the number of clusters,
and means refer to the averaging the dataset in order to find the centroid.

It is a centroid-based algorithm, and each cluster is associated with a centroid. This


algorithm aims to reduce the distance between the data points and their centroids
within a cluster.

This algorithm starts with a group of randomly selected centroids that form the
clusters at starting and then perform the iterative process to optimize these
centroids' positions.
It can be used for spam detection and filtering, identification of fake news, etc. Read
more..

8. Random Forest Algorithm


Random forest is the supervised learning algorithm that can be used for both
classification and regression problems in machine learning. It is an ensemble learning
technique that provides the predictions by combining the multiple classifiers and
improve the performance of the model.

It contains multiple decision trees for subsets of the given dataset, and find the average
to improve the predictive accuracy of the model. A random-forest should contain 64-
128 trees. The greater number of trees leads to higher accuracy of the algorithm.

To classify a new dataset or object, each tree gives the classification result and based
on the majority votes, the algorithm predicts the final output.

Random forest is a fast algorithm, and can efficiently deal with the missing &
incorrect data. Read more..

9. Apriori Algorithm
Apriori algorithm is the unsupervised learning algorithm that is used to solve the
association problems. It uses frequent itemsets to generate association rules, and it is
designed to work on the databases that contain transactions. With the help of these
association rule, it determines how strongly or how weakly two objects are
connected to each other. This algorithm uses a breadth-first search and Hash Tree to
calculate the itemset efficiently.

The algorithm process iteratively for finding the frequent itemsets from the large
dataset.

The apriori algorithm was given by the R. Agrawal and Srikant in the year 1994. It is
mainly used for market basket analysis and helps to understand the products that
can be bought together. It can also be used in the healthcare field to find drug
reactions in patients. Read more..

10. Principle Component Analysis


Principle Component Analysis (PCA) is an unsupervised learning technique, which is
used for dimensionality reduction. It helps in reducing the dimensionality of the
dataset that contains many features correlated with each other. It is a statistical
process that converts the observations of correlated features into a set of linearly
uncorrelated features with the help of orthogonal transformation. It is one of the
popular tools that is used for exploratory data analysis and predictive modeling.

PCA works by considering the variance of each attribute because the high variance
shows the good split between the classes, and hence it reduces the dimensionality.

Some real-world applications of PCA are image processing, movie recommendation


system, optimizing the power allocation in various communication channels. Read
more..

Overfitting and Underfitting in


Machine Learning
Overfitting and Underfitting are the two main problems that occur in machine
learning and degrade the performance of the machine learning models.

The main goal of each machine learning model is to generalize well.


Here generalization defines the ability of an ML model to provide a suitable output
by adapting the given set of unknown input. It means after providing training on the
dataset, it can produce reliable and accurate output. Hence, the underfitting and
overfitting are the two terms that need to be checked for the performance of the
model and whether the model is generalizing well or not.

Before understanding the overfitting and underfitting, let's understand some basic
term that will help to understand this topic well:

o Signal: It refers to the true underlying pattern of the data that helps the
machine learning model to learn from the data.
o Noise: Noise is unnecessary and irrelevant data that reduces the performance
of the model.
o Bias: Bias is a prediction error that is introduced in the model due to
oversimplifying the machine learning algorithms. Or it is the difference
between the predicted values and the actual values.
o Variance: If the machine learning model performs well with the training
dataset, but does not perform well with the test dataset, then variance occurs.

Overfitting
Overfitting occurs when our machine learning model tries to cover all the data points
or more than the required data points present in the given dataset. Because of this,
the model starts caching noise and inaccurate values present in the dataset, and all
these factors reduce the efficiency and accuracy of the model. The overfitted model
has low bias and high variance.

PlayNext
Unmute

Current Time 0:00

Duration 18:10
Loaded: 0.37%
Â
Fullscreen
Backward Skip 10sPlay VideoForward Skip 10s

The chances of occurrence of overfitting increase as much we provide training to our


model. It means the more we train our model, the more chances of occurring the
overfitted model.

Overfitting is the main problem that occurs in supervised learning.

Example: The concept of the overfitting can be understood by the below graph of
the linear regression output:

As we can see from the above graph, the model tries to cover all the data points
present in the scatter plot. It may look efficient, but in reality, it is not so. Because the
goal of the regression model to find the best fit line, but here we have not got any
best fit, so, it will generate the prediction errors.
How to avoid the Overfitting in Model
Both overfitting and underfitting cause the degraded performance of the machine
learning model. But the main cause is overfitting, so there are some ways by which
we can reduce the occurrence of overfitting in our model.

o Cross-Validation
o Training with more data
o Removing features
o Early stopping the training
o Regularization
o Ensembling

Underfitting
Underfitting occurs when our machine learning model is not able to capture the
underlying trend of the data. To avoid the overfitting in the model, the fed of training
data can be stopped at an early stage, due to which the model may not learn enough
from the training data. As a result, it may fail to find the best fit of the dominant
trend in the data.

In the case of underfitting, the model is not able to learn enough from the training
data, and hence it reduces the accuracy and produces unreliable predictions.

An underfitted model has high bias and low variance.

Example: We can understand the underfitting using below output of the linear
regression model:
As we can see from the above diagram, the model is unable to capture the data
points present in the plot.

How to avoid underfitting:

o By increasing the training time of the model.


o By increasing the number of features.

Goodness of Fit
The "Goodness of fit" term is taken from the statistics, and the goal of the machine
learning models to achieve the goodness of fit. In statistics modeling, it defines how
closely the result or predicted values match the true values of the dataset.

The model with a good fit is between the underfitted and overfitted model, and
ideally, it makes predictions with 0 errors, but in practice, it is difficult to achieve it.

As when we train our model for a time, the errors in the training data go down, and
the same happens with test data. But if we train the model for a long duration, then
the performance of the model may decrease due to the overfitting, as the model also
learn the noise present in the dataset. The errors in the test dataset start
increasing, so the point, just before the raising of errors, is the good point, and we can
stop here for achieving a good model.

There are two other methods by which we can get a good point for our model, which
are the resampling method to estimate model accuracy and validation dataset.

Principal Component Analysis


Principal Component Analysis is an unsupervised learning algorithm that is used for
the dimensionality reduction in machine learning. It is a statistical process that
converts the observations of correlated features into a set of linearly uncorrelated
features with the help of orthogonal transformation. These new transformed features
are called the Principal Components. It is one of the popular tools that is used for
exploratory data analysis and predictive modeling. It is a technique to draw strong
patterns from the given dataset by reducing the variances.

PCA generally tries to find the lower-dimensional surface to project the high-
dimensional data.

PCA works by considering the variance of each attribute because the high attribute
shows the good split between the classes, and hence it reduces the dimensionality.
Some real-world applications of PCA are image processing, movie
recommendation system, optimizing the power allocation in various
communication channels. It is a feature extraction technique, so it contains the
important variables and drops the least important variable.

The PCA algorithm is based on some mathematical concepts such as:

PlayNext
Unmute

Current Time 0:00

Duration 18:10
Loaded: 0.37%
Â
Fullscreen
Backward Skip 10sPlay VideoForward Skip 10s

o Variance and Covariance


o Eigenvalues and Eigen factors

Some common terms used in PCA algorithm:

o Dimensionality: It is the number of features or variables present in the given


dataset. More easily, it is the number of columns present in the dataset.
o Correlation: It signifies that how strongly two variables are related to each
other. Such as if one changes, the other variable also gets changed. The
correlation value ranges from -1 to +1. Here, -1 occurs if variables are
inversely proportional to each other, and +1 indicates that variables are
directly proportional to each other.
o Orthogonal: It defines that variables are not correlated to each other, and
hence the correlation between the pair of variables is zero.
o Eigenvectors: If there is a square matrix M, and a non-zero vector v is given.
Then v will be eigenvector if Av is the scalar multiple of v.
o Covariance Matrix: A matrix containing the covariance between the pair of
variables is called the Covariance Matrix.

Principal Components in PCA


As described above, the transformed new features or the output of PCA are the
Principal Components. The number of these PCs are either equal to or less than the
original features present in the dataset. Some properties of these principal
components are given below:

o The principal component must be the linear combination of the original


features.
o These components are orthogonal, i.e., the correlation between a pair of
variables is zero.
o The importance of each component decreases when going to 1 to n, it means
the 1 PC has the most importance, and n PC will have the least importance.

Steps for PCA algorithm

1. Getting the dataset


Firstly, we need to take the input dataset and divide it into two subparts X and
Y, where X is the training set, and Y is the validation set.
2. Representing data into a structure
Now we will represent our dataset into a structure. Such as we will represent
the two-dimensional matrix of independent variable X. Here each row
corresponds to the data items, and the column corresponds to the Features.
The number of columns is the dimensions of the dataset.
3. Standardizing the data
In this step, we will standardize our dataset. Such as in a particular column, the
features with high variance are more important compared to the features with
lower variance.
If the importance of features is independent of the variance of the feature,
then we will divide each data item in a column with the standard deviation of
the column. Here we will name the matrix as Z.
4. Calculating the Covariance of Z
To calculate the covariance of Z, we will take the matrix Z, and will transpose
it. After transpose, we will multiply it by Z. The output matrix will be the
Covariance matrix of Z.
5. Calculating the Eigen Values and Eigen Vectors
Now we need to calculate the eigenvalues and eigenvectors for the resultant
covariance matrix Z. Eigenvectors or the covariance matrix are the directions
of the axes with high information. And the coefficients of these eigenvectors
are defined as the eigenvalues.
6. Sorting the Eigen Vectors
In this step, we will take all the eigenvalues and will sort them in decreasing
order, which means from largest to smallest. And simultaneously sort the
eigenvectors accordingly in matrix P of eigenvalues. The resultant matrix will
be named as P*.
7. Calculating the new features Or Principal Components
Here we will calculate the new features. To do this, we will multiply the P*
matrix to the Z. In the resultant matrix Z*, each observation is the linear
combination of original features. Each column of the Z* matrix is independent
of each other.
8. Remove less or unimportant features from the new dataset.
The new feature set has occurred, so we will decide here what to keep and
what to remove. It means, we will only keep the relevant or important features
in the new dataset, and unimportant features will be removed out.

Applications of Principal Component


Analysis
o PCA is mainly used as the dimensionality reduction technique in various AI
applications such as computer vision, image compression, etc.
o It can also be used for finding hidden patterns if data has high dimensions.
Some fields where PCA is used are Finance, data mining, Psychology, etc.
What is P-Value
In Statistical hypothesis testing, the P-value or sometimes called probability value, is
used to observe the test results or more extreme results by assuming that the null
hypothesis (H0) is true. In data science, there are lots of concepts that are borrowed
from different disciplines, and the p-value is one of them. The concept of p-value
comes from statistics and widely used in machine learning and data science.

o P-value is also used as an alternative to determine the point of rejection in


order to provide the smallest significance level at which the null hypothesis is
least or rejected.
o It is expressed as the level of significance that lies between 0 and 1, and if
there is smaller p-value, then there would be strong evidence to reject the null
hypothesis. If the value of p-value is very small, then it means the observed
output is feasible but doesn't lie under the null hypothesis conditions (H 0).
o The p-value of 0.05 is known as the level of significance (α). Usually, it is
considered using two suggestions, which are given below:
o If p-value>0.05: The large p-value shows that the null hypothesis
needs to be accepted.
o If p-value<0.05: The small p-value shows that the null hypothesis
needs to be rejected, and the result is declared as statically significant.

In Statistics, our main goal is to determine the statistical significance of our result,
and this statistical significance is made on below three concepts:

o Hypothesis Testing
o Normal Distribution
o Statistical Significance

Let's understand each of them.

Hypothesis Testing
Hypothesis testing can be defined between two terms; Null
hypothesis and Alternative Hypothesis. It is used to check the validity of the null
hypothesis or claim made using the sample data. Here, the null hypothesis (H0) is
defined as the hypothesis with no statistical significance between two variables, while
an alternative hypothesis is defined as the hypothesis with a statistical significance
between the two variables. No significant relationship between the two variables tells
that one variable will not affect the other variable. Thus, the Null hypothesis tells that
what you are going to prove doesn't actually happen. If the independent variable
doesn't affect the dependent variable, then it shows the alternative hypothesis
condition.

PlayNext
Unmute

Current Time 0:00

Duration 18:10
Loaded: 0.37%
Â
Fullscreen
Backward Skip 10sPlay VideoForward Skip 10s

In a simple way, we can say that in hypothesis testing, first, we make a claim that is
assumed as a null hypothesis using the sample data. If this claim is found invalid, then
the alternative hypothesis is selected. This assumption or claim is validated using the
p-value to see if it is statistically significant or not using the evidence. If the evidence
supports the alternative hypothesis, then the null hypothesis is rejected.

Steps for Hypothesis testing

Below are the steps to perform an experiment for hypothesis testing:

1. Claim or state a Null hypothesis for the experiment.


2. State the alternative hypothesis, which is opposite to the null hypothesis.
3. Set the value of alpha to be used in the experiment.
4. Determine the z-score using the normal distribution.
5. Compare the P-value to validate the statistical significance.

Normal Distribution
The normal distribution, which is also known as Gaussian distribution, is the
Probability distribution function. It is symmetric about the mean, and use to see the
distribution of data using a graph plot. It shows that data near the mean is more
frequent to occur as compared to data which is far from the mean, and it looks like
a bell-shaped curve. The two main terms of the normal distribution are mean(μ) and
standard deviation(σ). For a normal distribution, the mean is zero, and the standard
deviation is 1.
In hypothesis testing, we need to calculate z-score. Z-score is the number of
standard deviations from the mean of data-point.

Here, the z-score inform us that where the data lies compared to the average
population.

Statistical significance:
To determine the statistical significance of the hypothesis test is the goal of
calculating the p-value. To do this, first, we need to set a threshold, which is said to
be alpha. We should always set the value of alpha before the experiment, and it is set
to be either 0.05 or 0.01(depending on the type of problem).

The result is concluded as a significant result if the observed p-value is lower than
alpha.

Errors in P-value
Two types of errors are defined for the p-value; these errors are given below:

1. Type I error
2. Type II error

Type I Error:
It is defined as the incorrect or false rejection of the Null hypothesis. For this error,
the maximum probability is alpha, and it is set in advance. The error is not affected
by the sample size of the dataset. The type I error increases as we increase the
number of tests or endpoints.

Type II error
Type II error is defined as the wrong acceptance of the Null hypothesis. The
probability of type II error is beta, and the beta depends upon the sample size and
value of alpha. The beta cannot be determined as the function of the true population
effect. The value of beta is inversely proportional to the sample size, and it means
beta decreases as the sample size increases.

The value of beta also decreases when we increase the number of tests or endpoints.
We can understand the relationship between hypothesis testing and decision on the
basis of the below table:

Decision

Truth Accept H0 Reject H0

H0 is true Correct decision Type I error

H0 is false Type II error Correct Decision

Importance of P-value
The importance of p-value can be understood in two aspects:

o Statistics Aspect: In statistics, the concept of the p-value is important for


hypothesis testing and statistical methods such as Regression.
o Data Science Aspect: In data science also, it is one of the important aspect
Here the smaller p-value shows that there is an association between the
predictor and response. It is advised while working with the machine learning
problem in data science, the p-value should be taken carefully.

Regularization in Machine Learning


What is Regularization?
Regularization is one of the most important concepts of machine learning. It is a
technique to prevent the model from overfitting by adding extra information to it.

Sometimes the machine learning model performs well with the training data but
does not perform well with the test data. It means the model is not able to predict
the output when deals with unseen data by introducing noise in the output, and
hence the model is called overfitted. This problem can be deal with the help of a
regularization technique.

This technique can be used in such a way that it will allow to maintain all variables or
features in the model by reducing the magnitude of the variables. Hence, it maintains
accuracy as well as a generalization of the model.
It mainly regularizes or reduces the coefficient of features toward zero. In simple
words, "In regularization technique, we reduce the magnitude of the features by
keeping the same number of features."

PlayNext
Unmute

Current Time 0:00

Duration 18:10
Loaded: 0.37%
Â
Fullscreen
Backward Skip 10sPlay VideoForward Skip 10s

How does Regularization Work?


Regularization works by adding a penalty or complexity term to the complex model.
Let's consider the simple linear regression equation:

y= β0+β1x1+β2x2+β3x3+⋯+βnxn +b

In the above equation, Y represents the value to be predicted

X1, X2, …Xn are the features for Y.

β0,β1,…..βn are the weights or magnitude attached to the features, respectively. Here
represents the bias of the model, and b represents the intercept.

Linear regression models try to optimize the β0 and b to minimize the cost function.
The equation for the cost function for the linear model is given below:

Now, we will add a loss function and optimize parameter to make the model that can
predict the accurate value of Y. The loss function for the linear regression is called
as RSS or Residual sum of squares.

Techniques of Regularization
There are mainly two types of regularization techniques, which are given below:
o Ridge Regression
o Lasso Regression

Ridge Regression

o Ridge regression is one of the types of linear regression in which a small


amount of bias is introduced so that we can get better long-term predictions.
o Ridge regression is a regularization technique, which is used to reduce the
complexity of the model. It is also called as L2 regularization.
o In this technique, the cost function is altered by adding the penalty term to it.
The amount of bias added to the model is called Ridge Regression penalty.
We can calculate it by multiplying with the lambda to the squared weight of
each individual feature.
o The equation for the cost function in ridge regression will be:

o In the above equation, the penalty term regularizes the coefficients of the
model, and hence ridge regression reduces the amplitudes of the coefficients
that decreases the complexity of the model.
o As we can see from the above equation, if the values of λ tend to zero, the
equation becomes the cost function of the linear regression
model. Hence, for the minimum value of λ, the model will resemble the linear
regression model.
o A general linear or polynomial regression will fail if there is high collinearity
between the independent variables, so to solve such problems, Ridge
regression can be used.
o It helps to solve the problems if we have more parameters than samples.

Lasso Regression:

o Lasso regression is another regularization technique to reduce the complexity


of the model. It stands for Least Absolute and Selection Operator.
o It is similar to the Ridge Regression except that the penalty term contains only
the absolute weights instead of a square of weights.
o Since it takes absolute values, hence, it can shrink the slope to 0, whereas
Ridge Regression can only shrink it near to 0.
o It is also called as L1 regularization. The equation for the cost function of
Lasso regression will be:

o Some of the features in this technique are completely neglected for model
evaluation.
o Hence, the Lasso regression can help us to reduce the overfitting in the model
as well as the feature selection.

Key Difference between Ridge Regression and


Lasso Regression

o Ridge regression is mostly used to reduce the overfitting in the model, and it
includes all the features present in the model. It reduces the complexity of the
model by shrinking the coefficients.
o Lasso regression helps to reduce the overfitting in the model as well as
feature selection.

Examples of Machine Learning


Machine Learning technology has widely changed the lifestyle of a human beings as
we are highly dependent on this technology. It is the subset of Artificial Intelligence,
and we all are using this either knowingly or unknowingly. For example, we use
Google Assistant that employs ML concepts, we take help from online customer
support, which is also an example of machine learning, and many more.

Machine Learning uses statistical techniques to make a computer more intelligent,


which helps to fetch entire business data and utilize it automatically as per
requirement. There are so many examples of Machine Learning in real-world, which
are as follows:
1. Speech & Image Recognition
Computer Speech Recognition or Automatic Speech Recognition helps to convert
speech into text. Many applications convert the live speech into an audio file format
and later convert it into a text file.

Voice search, voice dialing, and appliance control are some real-world examples
of speech recognition. Alexa and Google Home are the most widely used speech
recognition software.

PlayNext
Unmute

Current Time 0:00

Duration 18:10
Loaded: 0.37%
Â
Fullscreen
Backward Skip 10sPlay VideoForward Skip 10s

Similar to speech recognition, Image recognition is also the most widely used
example of Machine Learning technology that helps identify any object in the form
of a digital image. There are some real-world examples of Image recognition, such
as,

Tagging the name on any photo as we have seen on Facebook. It is also used in
recognizing handwriting by segmenting a single letter into smaller images.
Further, there is the biggest example of Image recognition is facial recognition. We
all are using new generation mobile phones, where we use facial recognition
techniques to unlock our devices. Hence, it also helps to increase the security of the
system.

2. Traffic alerts using Google Map


Google Map is one of the widely used applications whenever anyone goes out to
reach the correct destination. The map helps us find the best route or fastest route,
traffic, and much more information. But how it provides this information to us?
Google map uses different technologies, including machine learning which collects
information from different users, analyze that information, update the information,
and make predictions. With the help of predictions, it can also tell us the traffic before
we start our journey. Machine Learning also helps identify the best and fastest route
while we are in traffic using Google Maps. Further, we can also answer some
questions like does the route still have traffic? This information and data get stored
automatically in the database, which Machine Learning uses for the exact information
for other people in traffic. Further, Google maps also help find locations like a hotel,
mall, restaurant, cinema hall, buses, etc.

3. Chatbot (Online Customer Support)


A chatbot is the most widely used software in every industry like banking, Medical,
education, health, etc. You can see chatbots in any banking application for quick
online support to customers. These chatbots also work on the concepts of Machine
Learning. The programmers feed some basic questions and answers based on the
frequently asked queries. So, whenever a customer asks a query, the chatbot
recognizes the question's keywords from a database and then provides appropriate
resolution to the customer. This helps to make quick and fast customer service
facilities to customers.

4. Google Translation
Suppose you work on an international banking project like French, German, etc., but
you only know English. In that case, this will be a very panic moment for you because
you can't proceed further without reviewing documents. Google Translator software
helps to translate any language into the desired language. So, in this way, you can
convert French, German, etc., into English, Hindi, or any other language. This makes
the job of different sectors very easy as a user can work on any country's project
hassle-free.

Google uses the Google Neural Machine Translation to detect any language and
translate it into any desired language.
5. Prediction
Prediction system also uses Machine learning algorithms for making predictions.
There are various sectors where predictions are used. For example, in bank loan
systems, error probability can be determined using predictions with machine
learning. For this, the available data are classified into different groups with the set of
rules provided by analysts, and once the classification is done, the error probability is
predicted.

6. Extraction
One of the best examples of machine learning is the extraction of information. In this
process, structured data is extracted from unstructured data, and which is used in
predictive analytics tools. The data is usually found in a raw or unstructured form that
is not useful, and to make it useful, the extraction process is used. Some real-world
examples of extraction are:

o Generating a model to predict vocal cord disorders.


o Helping diagnosis and treatment of problem faster.

7. Statistical Arbitrage
Arbitrage is an automated trading process, which is used in the finance industry to
manage a large volume of securities. The process uses a trading algorithm to analyze
a set of securities using economic variables and correlations. Some examples of
statistical arbitrage are as follows:

o Algorithmic trading that analyses a market microstructure


o Analyze large data sets
o Identify real-time arbitrage opportunities
o Machine learning optimizes the arbitrage strategy to enhance results.

8. Auto-Friend Tagging Suggestion


One of the popular examples of machine learning is the Auto-friend tagging
suggestions feature by Facebook. Whenever we upload a new picture on Facebook
with friends, it suggests to tag the friends and automatically provides the names.
Facebook does it by using DeepFace, which is a facial recognition system created by
Facebook. It identifies the faces and images also.

9. Self-driving cars
The future of the automobile industry is self-driving cars. These are driverless cars,
which are based on concepts of deep learning and machine learning. Some
commonly used machine learning algorithms in self-driving cars are Scale-invariant
feature transform (SIFT), AdaBoost, TextonBoost, YOLO(You only look once).

10. Ads Recommendation


Nowadays, most people spend multiple hours on google or the internet surfing. And
while working on any webpage or website, they get multiples ads on each page. But
these ads are different for each user even when two users are using the same
internet and on the same location. These ads recommendations are done with the
help of machine learning algorithms. These ads recommendations are based on the
search history of each user. For example, if one user searches for the Shirt on
Amazon or any other e-commerce website, he will get start ads recommendation of
shirts after some time.

11. Video Surveillance


Video Surveillance is an advanced application of AI and machine learning, which can
detect any crime before it happens. It is much efficient than observed by a human
because it is a much difficult and boring task for a human to keep monitoring
multiple videos; that's why machines are the better option. Video surveillance is very
useful as they keep looking for specific behavior of people like standing motionless
for a long time, stumbling, or napping on benches, etc. Whenever the surveillance
system finds any unusual activity, it alerts the respective team, which can stop or help
avoid some mishappening at that place.

Some popular uses of video surveillance are:

o Facility protections
o Operation monitoring
o Parking lots
o Traffic monitoring
o Shopping patterns

12. Email & spam filtering


Emails are filtered automatically when we receive any new email, and it is also an
example of machine learning. We always receive an important mail in our inbox with
the important symbol and spam emails in our spam box, and the technology behind
this is Machine learning. Below are some spam filters used by Gmail:
o Content Filter
o Header filter
o General blacklists filter
o Rules-based filters
o Permission filters

Some machine learning algorithms that are used in email spam filtering and malware
detection are Multi-Layer Perceptron, Decision tree, and Naïve Bayes classifier.

13. Real-Time Dynamic Pricing


Whenever we book an Uber in peak office hours in the morning or evening, we get a
difference in prices compared to normal hours. The prices are hiked due to surge
prices applied by companies whenever demand is high. But how these surge prices
are determined & applied by companies. So, the technologies behind this are AI and
machine learning. These technologies solve two main business queries, which are

o The reaction of customers on surge prices


o Suggesting optimum prices so that no harm of customer losing occurs to
business.

Machine Learning technology also helps in finding discounted prices, best prices,
promotional prices, etc., for each customer.

14. Gaming and Education


Machine learning technology is widely being used in gaming and education. There
are various gaming and learning apps that are using AI and Machine learning.
Among these apps, Duolingo is a free language learning app, which is designed in a
fun and interactive way. While using this app, people feel like playing a game on the
phone.

It collects data from the user's answer and creates a statical model to determine that
how long a person can remember the word, and before requiring a refresher, it
provides that information.

15. Virtual Assistants


Virtual assistants are much popular in today's world, which are the smart software
embedded in smartphones or laptops. These assistants work as personal assistants
and assist in searching for information that is asked over voice. A virtual assistant
understands human language or natural language voice commands and performs
the task for that user. Some examples of virtual assistants are Siri, Alexa, Google,
Cortana, etc. To start working with these virtual assistants, first, they need to be
activated, and then we can ask anything, and they will answer it. For example,
"What's the date today?", "Tell me a joke", and many more. The technologies used
behind Virtual assistants are AI, machine learning, natural language processing, etc.
Machine learning algorithms collect and analyze the data based on the previous
involvement of the user and predict data as per the user preferences.

Introduction to Semi-Supervised
Learning
Semi-Supervised learning is a type of Machine Learning algorithm that
represents the intermediate ground between Supervised and Unsupervised
learning algorithms. It uses the combination of labeled and unlabeled datasets
during the training period.

Before understanding the Semi-Supervised learning, you should know the main
categories of Machine Learning algorithms. Machine Learning consists of three main
categories: Supervised Learning, Unsupervised Learning, and Reinforcement
Learning. Further, the basic difference between Supervised and unsupervised
learning is that supervised learning datasets consist of an output label training data
associated with each tuple, and unsupervised datasets do not consist the same. Semi-
supervised learning is an important category that lies between the Supervised
and Unsupervised machine learning. Although Semi-supervised learning is the
middle ground between supervised and unsupervised learning and operates on the
data that consists of a few labels, it mostly consists of unlabeled data. As labels are
costly, but for the corporate purpose, it may have few labels.

The basic disadvantage of supervised learning is that it requires hand-labeling by ML


specialists or data scientists, and it also requires a high cost to process. Further
unsupervised learning also has a limited spectrum for its applications. To overcome
these drawbacks of supervised learning and unsupervised learning algorithms,
the concept of Semi-supervised learning is introduced. In this algorithm, training
data is a combination of both labeled and unlabeled data. However, labeled data
exists with a very small amount while it consists of a huge amount of unlabeled data.
Initially, similar data is clustered along with an unsupervised learning algorithm, and
further, it helps to label the unlabeled data into labeled data. It is why label data is a
comparatively, more expensive acquisition than unlabeled data.

We can imagine these algorithms with an example. Supervised learning is where a


student is under the supervision of an instructor at home and college. Further, if that
student is self-analyzing the same concept without any help from the instructor, it
comes under unsupervised learning. Under semi-supervised learning, the student has
to revise itself after analyzing the same concept under the guidance of an instructor
at college.

PlayNext
Unmute

Current Time 0:00

Duration 18:10
Loaded: 0.37%
Â
Fullscreen
Backward Skip 10sPlay VideoForward Skip 10s

Assumptions followed by Semi-Supervised


Learning
To work with the unlabeled dataset, there must be a relationship between the
objects. To understand this, semi-supervised learning uses any of the following
assumptions:
o Continuity Assumption:
As per the continuity assumption, the objects near each other tend to share
the same group or label. This assumption is also used in supervised learning,
and the datasets are separated by the decision boundaries. But in semi-
supervised, the decision boundaries are added with the smoothness
assumption in low-density boundaries.
o Cluster assumptions- In this assumption, data are divided into different
discrete clusters. Further, the points in the same cluster share the output label.
o Manifold assumptions- This assumption helps to use distances and densities,
and this data lie on a manifold of fewer dimensions than input space.
o The dimensional data are created by a process that has less degree of
freedom and may be hard to model directly. (This assumption becomes
practical if high).

Working of Semi-Supervised Learning


Semi-supervised learning uses pseudo labeling to train the model with less labeled
training data than supervised learning. The process can combine various neural
network models and training ways. The whole working of semi-supervised learning is
explained in the below points:

o Firstly, it trains the model with less amount of training data similar to the
supervised learning models. The training continues until the model gives
accurate results.
o The algorithms use the unlabeled dataset with pseudo labels in the next step,
and now the result may not be accurate.
o Now, the labels from labeled training data and pseudo labels data are linked
together.
o The input data in labeled training data and unlabeled training data are also
linked.
o In the end, again train the model with the new combined input as did in the
first step. It will reduce errors and improve the accuracy of the model.

Difference between Semi-supervised and


Reinforcement Learning.
Reinforcement learning is different from semi-supervised learning, as it works with
rewards and feedback. Reinforcement learning aims to maximize the rewards by
their hit and trial actions, whereas in semi-supervised learning, we train the
model with a less labeled dataset.

Real-world applications of Semi-supervised


Learning-
Semi-supervised learning models are becoming more popular in the industries. Some
of the main applications are as follows.

o Speech Analysis- It is the most classic example of semi-supervised learning


applications. Since, labeling the audio data is the most impassable task that
requires many human resources, this problem can be naturally overcome with
the help of applying SSL in a Semi-supervised learning model.
o Web content classification- However, this is very critical and impossible to
label each page on the internet because it needs mode human intervention.
Still, this problem can be reduced through Semi-Supervised learning
algorithms.
Further, Google also uses semi-supervised learning algorithms to rank a
webpage for a given query.
o Protein sequence classification- DNA strands are larger, they require active
human intervention. So, the rise of the Semi-supervised model has been
proximate in this field.
o Text document classifier- As we know, it would be very unfeasible to find a
large amount of labeled text data, so semi-supervised learning is an ideal
model to overcome this.

Essential Mathematics for Machine


Learning | Important concepts of
Mathematics for Machine Learning
Nowadays, machine learning is one of the most trending technologies among
researchers, industries and enthusiastic learners because of making human life easier.
It is being widely used in almost all areas of the real world, from Google Assistant to
self-driving cars. It is about developing models that can automatically extract
important information and patterns from data. But here, an important question
arises: what is the magic behind ML, and the answer is mathematics. Mathematics is
the core of designing ML algorithms that can automatically learn from data and
make predictions. Therefore, it is very important to understand the Maths before
going into the deep understanding of ML algorithms.

Mathematics has always been a good friend for some people and a phobia or anxiety
for some people. Many students don't find interest in mathematics around the globe
as they think that topics covered in mathematics are less or not relevant to practical
or real-world problems. But with the growth of machine learning, people are getting
motivated to learn mathematics as it is directly used in designing ML algorithms. It is
also very helpful to learn the concepts behind this. In this topic, we will learn all the
essential concepts of Mathematics that are used in Machine Learning.

Note: It is not required to go deep in learning Mathematics for working with simple
machine learning models; rather, knowing essential Maths concepts is enough to
understand how it is applied in ML.

Why to learn Mathematics for Machine


Learning?
There is always a question in enthusiast learners that what is the need of
mathematics in machine learning? As computers can solve mathematics problems
faster than humans. So, the answer is, learning mathematics in machine learning
is not about solving a maths problem, rather understanding the application of
maths in ML algorithms and their working. Other below points explain the
significance of maths in ML:
o Mathematics defines the concept behind the ML algorithms & helps in
choosing the right algorithm by considering accuracy, training time, the
complexity of the model, number of features.
o Computers understand data differently than humans, such as an image is seen
as a 2D-3D matrix by a computer for which mathematics is required.
o With Maths, we can correctly determine the interval & uncertainty.
o It helps in selecting correct parameter values and validation methods.
o Understanding the Bias-Variance trade-off helps us identify underfitting and
overfitting issues that are the main issues in ML models.

Essential Mathematics for Machine


Learning
After understanding the need for Maths, the next question arises: what level of maths
is required and what concepts one needs to understand. To answer this question, we
have provided the basic level of mathematics required for an ML Engineer/ Scientist.
Apart from the below concepts, the level of maths also depends upon the individual's
interest and the type of research someone is working on.

PlayNext
Unmute

Current Time 0:00

Duration 18:10
Loaded: 0.37%
Â
Fullscreen
Backward Skip 10sPlay VideoForward Skip 10s
o Linear algebra
o Multivariate Calculus
o Probability Theory
o Discrete Mathematics
o Statistics
o Algorithm & Optimization
o Others

The below graph shows the importance of different Maths concepts in Machine
Learning. As shown in the graph, the most important part of Mathematics is Linear
Algebra, which is widely used in ML

1. Linear Algebra for Machine Learning


Linear algebra is about the study of vectors and some rules of manipulating these
vectors. The concepts of linear algebra are widely used in developing algorithms in
machine learning. It enables the ML algorithms to run on a huge number of datasets.
It can perform the following task:

o It is used almost everywhere in the ML world.


o Linear algebra helps in optimizing data.
o It is used in loss functions, regularisation, covariance matrices, Singular Value
Decomposition (SVD), Matrix Operations, and support vector machine
classification.
o It is also used in Linear Regression in Machine Learning.

Different topics of linear algebra are used in ML such as Principal Component


Analysis (PCA), Singular Value Decomposition (SVD), Eigen decomposition of a matrix,
LU Decomposition, QR Decomposition/Factorization, Symmetric Matrices,
Orthogonalization & Orthonormalization, Matrix Operations, Projections, Eigenvalues
& Eigenvectors, Vector Spaces, and Norms. These topics are needed for understanding
the optimization methods.

Besides these uses, linear algebra is also widely used in neural networks and the data
science field. In short, Linear Algebra provides a Platform or base for all ML algorithms
to show their results.

Although linear algebra is a must-known part of mathematics for machine learning, it


is not required to get in deep with this. It means it is not required to be an expert in
linear algebra; instead, only good knowledge of these concepts is enough for
machine learning.

2. Calculus for Machine Learning


Calculus Mathematics is an integral part of Machine learning, but it is not required to
be a master of it; rather, only knowledge of basic concepts is enough. Multivariate
calculus helps in solving optimization problems in machine learning. Different ML
algorithms optimize an objective function with respect to a set of desired model
parameters that control how well a model explains the data. The process of getting
the best parameters is known as optimization, and multivariate calculus helps solve
optimization problems in the ML model. It helps in optimization and getting good
results from the model.

Multivariate calculus is used in algorithm training and gradient descent. We need to


learn and implement some important concepts of multivariate calculus, such
as Derivatives, divergence, curvature, and quadratic approximations.

Some essential topics of multivariate calculus are:

o Partial Derivatives
o Vector-Values Functions
o Directional Gradient
o Hessian, Jacobian
o Laplacian and Lagrangian Distribution.

3. Probability in Machine Learning


Probability is always an important and interesting part of Mathematics, which
measures the likelihood of an event to happen. The higher the probability of an
event, the more likely that event will occur. ML also helps in predicting the
likelihood of future events. Probability is required to work with an ML prediction
and modeling project properly. It also helps in hypothesis testing and distributions
such as Gaussian distribution and Probability density function.

Some important Probability concepts that one needs to know are given below:

o Joint, Marginal, and Conditional Probability,


o Probability Distributions (Discrete, Continuous),
o Density Estimation
o Maximum Likelihood Estimation
o Regression with Maximum Likelihood
o Bayes Theorem, etc.

4. Statistics in Machine Learning


Statistics helps in drawing logical conclusions from the given data. It is a crucial
concept that every machine learning engineer/scientist must learn to understand the
working of classifications algorithms like logistic regression, distributions,
discrimination analysis, and hypothesis testing in Machine learning. It helps in
performing the following task:

o It is a collection of tools that helps to identify the goal from the available data
and information.
o Statistics helps to understand the data and transform the sample observations
into meaningful information.
o No system in the world has perfect data stored and readily available as
needed. Every system has data anomalies like incomplete, corrupted data, etc.
Statistical concepts will be your best friend to help in such complex situations.
o It helps in finding answers to the questions such as, "Who scored the
maximum & minimum in a cricket tournament?" "Which technology is on-
trend in 2021?", and many more.

Some fundamental concepts of Statistics needed for ML are given below:

o Combinatorics
o Axioms
o Bayes'Bayes' Theorem
o Variance and Expectation
o Random Variables
o Conditional and Joint Distributions.

5. Discrete Mathematics in Machine Learning


Discrete mathematics is a study of mathematical concepts based on discrete (non-
continuous numbers) numbers, and more often, Integers. Discrete mathematics has
wide applications in different fields such as Algorithms, programming languages,
cryptography, software development, etc.

There are many cases in machine learning & AI where discrete mathematics is
required to use. For example, a neural network contains the integer number of nodes
and interconnections, and it can have .56 nodes. For such cases, a discrete element is
needed and hence required discrete mathematics. Graph structure and graph
algorithms are some important topics of discrete mathematics for machine learning.

For normal ML projects, only the fundamentals of discrete mathematics are enough.
At the same time, if we want to work with graphical models, relational domains,
structured prediction, etc., you need to refer to a discrete mathematics book.
However, for the science graduates, most of the concepts are covered during
College.

6. Algorithms and Complex Optimization


The optimization algorithms are important to understand better the computational
efficiency and scalability of machine learning algorithms. The conceptual knowledge
of data structures (Binary Trees, Hashing, Heap, Stack, etc.), Dynamic
Programming, Randomized & Sublinear Algorithm, Graphs, Gradient/Stochastic
Descents and Primal-Dual methods are needed.

Best Way/Resources to learn Mathematics


for Machine learning
To learn maths for machine learning is not much typical thing because there are
multiple resources available, including books, online courses, and different blogs. All
these resources provide plenty of knowledge of different maths topics. However,
each resource is good for some concepts, so here we are providing a list of all
important resources that will help you to learn Maths in a better & simple Way.
1. Text-Books to learn Maths for Machine Learning

o Mathematics for Machine Learning by Marc Peter Deisenroth is one of the


best books to begin your mathematical journey for machine learning. In this
book, the practical applications of the algorithms and the maths behind them
are explained in detail. The concept of essential mathematics for machine
learning has been explained in the best Way. The book can be downloaded
from here.
o Hands-on Mathematics for deep-learning by Jay Dawani is another book
for advanced maths concepts that help understand advanced ML algorithms
and deep learning models. This book also provides a brief introduction to
linear algebra, calculus, probability, and statistics. In the second edition of the
book, you will get a detailed explanation of the mathematics of multilayer
perceptron, convolutional neural networks (CNN), and recurrent neural
networks (RNN). It also explains some crucial concepts such as regularization
(L1 and L2 norm), dropout layers, and many more.

1. Online Videos to Learn Maths for Machine


Learning

o Khan Academy

Khan Academy is popular online resource that provides best-explained maths and
science courses, and that that's also for free. From these videos, you can easily learn
different concepts of Mathematics on Linear Algebra, Probability &
Statistics, Multivariable Calculus, and Optimization.

o Udacity

Introduction to Statistics by Udacity is another free video resource by which you


can understand the fundamental concepts of statistics that are needed for Machine
Learning & Data Science.

o Multivariate Calculus by Imperial College London:

Imperial College London has provided a YouTube series on some concepts


of multivariate calculus and its application in various ml algorithms. If you want the
entire mathematics course for Machine Learning, you need to enroll with Coursera;
however, Imperial College London has made the Multivariate calculus available for
free for all enthusiastic learners.
Conclusion
Mathematics is one of the most important parts of Machine Learning. However, how
much maths you need to learn is completely depends on what you want to learn and
how deep you are going in that topic. It means, for developing simple ML models,
you don't need to go into deep with Mathematics, just with a basic knowledge of
Maths concept(As studied in College) are enough, but if you want to develop
complex models and go into advanced concepts then you also need to
understand maths behind this. Learning Maths & applying it practically with ML
algorithms will require approximately 3-4 months.

Overfitting in Machine Learning


In the real world, the dataset present will never be clean and perfect. It means each
dataset contains impurities, noisy data, outliers, missing data, or imbalanced data.
Due to these impurities, different problems occur that affect the accuracy and the
performance of the model. One of such problems is Overfitting in Machine
Learning. Overfitting is a problem that a model can exhibit.

A statistical model is said to be overfitted if it can’t generalize well with unseen data.

Before understanding overfitting, we need to know some basic terms, which are:

Noise: Noise is meaningless or irrelevant data present in the dataset. It affects the
performance of the model if it is not removed.

Bias: Bias is a prediction error that is introduced in the model due to oversimplifying
the machine learning algorithms. Or it is the difference between the predicted values
and the actual values.

PlayNext
Unmute

Current Time 0:00

Duration 18:10
Loaded: 0.37%
Â
Fullscreen
Backward Skip 10sPlay VideoForward Skip 10s
Variance: If the machine learning model performs well with the training dataset, but
does not perform well with the test dataset, then variance occurs.

Generalization: It shows how well a model is trained to predict unseen data.

What is Overfitting?

o Overfitting & underfitting are the two main errors/problems in the machine
learning model, which cause poor performance in Machine Learning.
o Overfitting occurs when the model fits more data than required, and it tries to
capture each and every datapoint fed to it. Hence it starts capturing noise and
inaccurate data from the dataset, which degrades the performance of the
model.
o An overfitted model doesn't perform accurately with the test/unseen dataset
and can’t generalize well.
o An overfitted model is said to have low bias and high variance.

Example to Understand Overfitting


We can understand overfitting with a general example. Suppose there are three
students, X, Y, and Z, and all three are preparing for an exam. X has studied only
three sections of the book and left all other sections. Y has a good memory, hence
memorized the whole book. And the third student, Z, has studied and practiced all
the questions. So, in the exam, X will only be able to solve the questions if the exam
has questions related to section 3. Student Y will only be able to solve questions if
they appear exactly the same as given in the book. Student Z will be able to solve all
the exam questions in a proper way.
The same happens with machine learning; if the algorithm learns from a small part of
the data, it is unable to capture the required data points and hence under fitted.

Suppose the model learns the training dataset, like the Y student. They perform very
well on the seen dataset but perform badly on unseen data or unknown instances. In
such cases, the model is said to be Overfitting.

And if the model performs well with the training dataset and also with the
test/unseen dataset, similar to student Z, it is said to be a good fit.

How to detect Overfitting?


Overfitting in the model can only be detected once you test the data. To detect the
issue, we can perform Train/test split.

In the train-test split of the dataset, we can divide our dataset into random test and
training datasets. We train the model with a training dataset which is about 80% of
the total dataset. After training the model, we test it with the test dataset, which is 20
% of the total dataset.

Now, if the model performs well with the training dataset but not with the test
dataset, then it is likely to have an overfitting issue.

For example, if the model shows 85% accuracy with training data and 50% accuracy
with the test dataset, it means the model is not performing well.
Ways to prevent the Overfitting
Although overfitting is an error in Machine learning which reduces the performance
of the model, however, we can prevent it in several ways. With the use of the linear
model, we can avoid overfitting; however, many real-world problems are non-linear
ones. It is important to prevent overfitting from the models. Below are several ways
that can be used to prevent overfitting:

1. Early Stopping
2. Train with more data
3. Feature Selection
4. Cross-Validation
5. Data Augmentation
6. Regularization
Early Stopping
In this technique, the training is paused before the model starts learning the noise
within the model. In this process, while training the model iteratively, measure the
performance of the model after each iteration. Continue up to a certain number of
iterations until a new iteration improves the performance of the model.

After that point, the model begins to overfit the training data; hence we need to stop
the process before the learner passes that point.

Stopping the training process before the model starts capturing noise from the data
is known as early stopping.

However, this technique may lead to the underfitting problem if training is paused
too early. So, it is very important to find that "sweet spot" between underfitting and
overfitting.

Train with More data


Increasing the training set by including more data can enhance the accuracy of the
model, as it provides more chances to discover the relationship between input and
output variables.

It may not always work to prevent overfitting, but this way helps the algorithm to
detect the signal better to minimize the errors.
When a model is fed with more training data, it will be unable to overfit all the
samples of data and forced to generalize well.

But in some cases, the additional data may add more noise to the model; hence we
need to be sure that data is clean and free from in-consistencies before feeding it to
the model.

Feature Selection
While building the ML model, we have a number of parameters or features that are
used to predict the outcome. However, sometimes some of these features are
redundant or less important for the prediction, and for this feature selection process
is applied. In the feature selection process, we identify the most important features
within training data, and other features are removed. Further, this process helps to
simplify the model and reduces noise from the data. Some algorithms have the auto-
feature selection, and if not, then we can manually perform this process.

Cross-Validation
Cross-validation is one of the powerful techniques to prevent overfitting.

In the general k-fold cross-validation technique, we divided the dataset into k-equal-
sized subsets of data; these subsets are known as folds.

Data Augmentation
Data Augmentation is a data analysis technique, which is an alternative to adding
more data to prevent overfitting. In this technique, instead of adding more training
data, slightly modified copies of already existing data are added to the dataset.

The data augmentation technique makes it possible to appear data sample slightly
different every time it is processed by the model. Hence each data set appears
unique to the model and prevents overfitting.

Regularization
If overfitting occurs when a model is complex, we can reduce the number of features.
However, overfitting may also occur with a simpler model, more specifically the
Linear model, and for such cases, regularization techniques are much helpful.

Regularization is the most popular technique to prevent overfitting. It is a group of


methods that forces the learning algorithms to make a model simpler. Applying the
regularization technique may slightly increase the bias but slightly reduces the
variance. In this technique, we modify the objective function by adding the
penalizing term, which has a higher value with a more complex model.

The two commonly used regularization techniques are L1 Regularization and L2


Regularization.

Ensemble Methods
In ensemble methods, prediction from different machine learning models is
combined to identify the most popular result.

The most commonly used ensemble methods are Bagging and Boosting.

In bagging, individual data points can be selected more than once. After the
collection of several sample datasets, these models are trained independently, and
depending on the type of task-i.e., regression or classification-the average of those
predictions is used to predict a more accurate result. Moreover, bagging reduces the
chances of overfitting in complex models.

In boosting, a large number of weak learners arranged in a sequence are trained in


such a way that each learner in the sequence learns from the mistakes of the learner
before it. It combines all the weak learners to come out with one strong learner. In
addition, it improves the predictive flexibility of simple models.

Types of Encoding Techniques


The process of conversion of data from one form to another form is known as
Encoding. It is used to transform the data so that data can be supported and used by
different systems. Encoding works similarly to converting temperature from
centigrade to Fahrenheit, as it just gets converted in another form, but the original
value always remains the same. Encoding is used in mainly two fields:

o Encoding in Electronics: In electronics, encoding refers to converting analog


signals to digital signals.
o Encoding in Computing: In computing, encoding is a process of converting
data to an equivalent cipher by applying specific code, letters, and numbers to
the data.

Note: Encoding is different from encryption as its main purpose is not to hide the data
but to convert it into a format so that it can be properly consumed.

In this topic, we are going to discuss the different types of encoding techniques that
are used in computing.
Type of Encoding Technique

o Character Encoding
o Image & Audio and Video Encoding

Character Encoding
Character encoding encodes characters into bytes. It informs the computers how
to interpret the zero and ones into real characters, numbers, and symbols. The
computer understands only binary data; hence it is required to convert these
characters into numeric codes. To achieve this, each character is converted into
binary code, and for this, text documents are saved with encoding types. It can be
done by pairing numbers with characters. If we don't apply character encoding, our
website will not display the characters and text in a proper format. Hence it will
decrease the readability, and the machine would not be able to process data
correctly. Further, character encoding makes sure that each character has a proper
representation in computer or binary format.

There are different types of Character Encoding techniques, which are given below:

PlayNext
Unmute

Current Time 0:00

Duration 18:10
Loaded: 0.37%
Â
Fullscreen
Backward Skip 10sPlay VideoForward Skip 10s

1. HTML Encoding
2. URL Encoding
3. Unicode Encoding
4. Base64 Encoding
5. Hex Encoding
6. ASCII Encoding

HTML Encoding
HTML encoding is used to display an HTML page in a proper format. With encoding,
a web browser gets to know that which character set to be used.

In HTML, there are various characters used in HTML Markup such as <, >. To encode
these characters as content, we need to use an encoding.

URL Encoding
URL (Uniform resource locator) Encoding is used to convert characters in such a
format that they can be transmitted over the internet. It is also known as percent-
encoding. The URL Encoding is performed to send the URL to the internet using the
ASCII character-set. Non-ASCII characters are replaced with a %, followed by the
hexadecimal digits.

UNICODE Encoding
Unicode is an encoding standard for a universal character set. It allows encoding,
represent, and handle the text represented in most of the languages or writing
systems that are available worldwide. It provides a code point or number for each
character in every supported language. It can represent approximately all the
possible characters possible in all the languages. A particular sequence of bits is
known as a coding unit.

A UNICODE standard can use 8, 16, or 32 bits to represent the characters.

The Unicode standard defines Unicode Transformation Format (UTF) to encode the
code points.

UNICODE Encoding standard has the following UTF schemes:


o UTF-8 Encoding
The UTF8 is defined by the UNICODE standard, which is variable-width
character encoding used in Electronics Communication. UTF-8 is capable of
encoding all 1,112,064 valid character code points in Unicode using one to
four one-byte (8-bit) code units.
o UTF-16 Encoding
UTF16 Encoding represents a character's code points using one of two 16-bits
integers.
o UTF-32 Encoding
UTF32 Encoding represents each code point as 32-bit integers.

Base64 Encoding
Base64 Encoding is used to encode binary data into equivalent ASCII Characters. The
Base64 encoding is used in the Mail system as mail systems such as SMTP can't work
with binary data because they accept ASCII textual data only. It is also used in simple
HTTP authentication to encode the credentials. Moreover, it is also used to transfer
the binary data into cookies and other parameters to make data unreadable to
prevent tampering. If an image or another file is transferred without Base64
encoding, it will get corrupted as the mail system is not able to deal with binary data.

Base64 represents the data into blocks of 3 bytes, where each byte contains 8 bits;
hence it represents 24 bits. These 24 bits are divided into four groups of 6 bits. Each
of these groups or chunks are converted into equivalent Base64 value.

ASCII Encoding
American Standard Code for Information Interchange (ASCII) is a type of
character-encoding. It was the first character encoding standard released in the year
1963.

Th ASCII code is used to represent English characters as numbers, where each letter
is assigned with a number from 0 to 127. Most modern character-encoding schemes
are based on ASCII, though they support many additional characters. It is a single
byte encoding only using the bottom 7 bits. In an ASCII file, each alphabetic,
numeric, or special character is represented with a 7-bit binary number. Each
character of the keyboard has an equivalent ASCII value.

Image and Audio & Video Encoding


Image and audio & video encoding are performed to save storage space. A media
file such as image, audio, and video are encoded to save them in a more efficient and
compressed format.

These encoded files contain the same content with usually similar quality, but in
compressed size, so that they can be saved within less space, can be transferred
easily via mail, or can be downloaded on the system.

We can understand it as a . WAV audio file is converted into .MP3 file to reduce the
size by 1/10th to its original size.

Feature Selection Techniques in


Machine Learning
Feature selection is a way of selecting the subset of the most relevant features from the
original features set by removing the redundant, irrelevant, or noisy features.

While developing the machine learning model, only a few variables in the dataset are
useful for building the model, and the rest features are either redundant or irrelevant.
If we input the dataset with all these redundant and irrelevant features, it may
negatively impact and reduce the overall performance and accuracy of the model.
Hence it is very important to identify and select the most appropriate features from
the data and remove the irrelevant or less important features, which is done with the
help of feature selection in machine learning.
Feature selection is one of the important concepts of machine learning, which highly
impacts the performance of the model. As machine learning works on the concept of
"Garbage In Garbage Out", so we always need to input the most appropriate and
relevant dataset to the model in order to get a better result.

In this topic, we will discuss different feature selection techniques for machine
learning. But before that, let's first understand some basics of feature selection.

o What is Feature Selection?


o Need for Feature Selection
o Feature Selection Methods/Techniques
o Feature Selection statistics

What is Feature Selection?


A feature is an attribute that has an impact on a problem or is useful for the problem,
and choosing the important features for the model is known as feature selection.
Each machine learning process depends on feature engineering, which mainly
contains two processes; which are Feature Selection and Feature Extraction. Although
feature selection and extraction processes may have the same objective, both are
completely different from each other. The main difference between them is that
feature selection is about selecting the subset of the original feature set, whereas
feature extraction creates new features. Feature selection is a way of reducing the
input variable for the model by using only relevant data in order to reduce overfitting
in the model.

PlayNext
Unmute

Current Time 0:00

Duration 18:10
Loaded: 0.37%
Â
Fullscreen
Backward Skip 10sPlay VideoForward Skip 10s

So, we can define feature Selection as, "It is a process of automatically or


manually selecting the subset of most appropriate and relevant features to be
used in model building." Feature selection is performed by either including the
important features or excluding the irrelevant features in the dataset without
changing them.

Need for Feature Selection


Before implementing any technique, it is really important to understand, need for the
technique and so for the Feature Selection. As we know, in machine learning, it is
necessary to provide a pre-processed and good input dataset in order to get better
outcomes. We collect a huge amount of data to train our model and help it to learn
better. Generally, the dataset consists of noisy data, irrelevant data, and some part of
useful data. Moreover, the huge amount of data also slows down the training
process of the model, and with noise and irrelevant data, the model may not predict
and perform well. So, it is very necessary to remove such noises and less-important
data from the dataset and to do this, and Feature selection techniques are used.

Selecting the best features helps the model to perform well. For example, Suppose
we want to create a model that automatically decides which car should be crushed
for a spare part, and to do this, we have a dataset. This dataset contains a Model of
the car, Year, Owner's name, Miles. So, in this dataset, the name of the owner does
not contribute to the model performance as it does not decide if the car should be
crushed or not, so we can remove this column and select the rest of the
features(column) for the model building.

Below are some benefits of using feature selection in machine learning:

o It helps in avoiding the curse of dimensionality.


o It helps in the simplification of the model so that it can be easily
interpreted by the researchers.
o It reduces the training time.
o It reduces overfitting hence enhance the generalization.

Feature Selection Techniques


There are mainly two types of Feature Selection techniques, which are:

o Supervised Feature Selection technique


Supervised Feature selection techniques consider the target variable and can
be used for the labelled dataset.
o Unsupervised Feature Selection technique
Unsupervised Feature selection techniques ignore the target variable and can
be used for the unlabelled dataset.

There are mainly three techniques under supervised feature Selection:

1. Wrapper Methods
In wrapper methodology, selection of features is done by considering it as a search
problem, in which different combinations are made, evaluated, and compared with
other combinations. It trains the algorithm by using the subset of features iteratively.
On the basis of the output of the model, features are added or subtracted, and with
this feature set, the model has trained again.

Some techniques of wrapper methods are:

o Forward selection - Forward selection is an iterative process, which begins


with an empty set of features. After each iteration, it keeps adding on a
feature and evaluates the performance to check whether it is improving the
performance or not. The process continues until the addition of a new
variable/feature does not improve the performance of the model.
o Backward elimination - Backward elimination is also an iterative approach,
but it is the opposite of forward selection. This technique begins the process
by considering all the features and removes the least significant feature. This
elimination process continues until removing the features does not improve
the performance of the model.
o Exhaustive Feature Selection- Exhaustive feature selection is one of the best
feature selection methods, which evaluates each feature set as brute-force. It
means this method tries & make each possible combination of features and
return the best performing feature set.
o Recursive Feature Elimination-
Recursive feature elimination is a recursive greedy optimization approach,
where features are selected by recursively taking a smaller and smaller subset
of features. Now, an estimator is trained with each set of features, and the
importance of each feature is determined using coef_attribute or through
a feature_importances_attribute.

2. Filter Methods
In Filter Method, features are selected on the basis of statistics measures. This
method does not depend on the learning algorithm and chooses the features as a
pre-processing step.

The filter method filters out the irrelevant feature and redundant columns from the
model by using different metrics through ranking.

The advantage of using filter methods is that it needs low computational time and
does not overfit the data.

Some common techniques of Filter methods are as follows:

o Information Gain
o Chi-square Test
o Fisher's Score
o Missing Value Ratio
Information Gain: Information gain determines the reduction in entropy while
transforming the dataset. It can be used as a feature selection technique by
calculating the information gain of each variable with respect to the target variable.

Chi-square Test: Chi-square test is a technique to determine the relationship


between the categorical variables. The chi-square value is calculated between each
feature and the target variable, and the desired number of features with the best chi-
square value is selected.

Fisher's Score:

Fisher's score is one of the popular supervised technique of features selection. It


returns the rank of the variable on the fisher's criteria in descending order. Then we
can select the variables with a large fisher's score.

Missing Value Ratio:

The value of the missing value ratio can be used for evaluating the feature set
against the threshold value. The formula for obtaining the missing value ratio is the
number of missing values in each column divided by the total number of
observations. The variable is having more than the threshold value can be dropped.

3. Embedded Methods
Embedded methods combined the advantages of both filter and wrapper methods
by considering the interaction of features along with low computational cost. These
are fast processing methods similar to the filter method but more accurate than the
filter method.
These methods are also iterative, which evaluates each iteration, and optimally finds
the most important features that contribute the most to training in a particular
iteration. Some techniques of embedded methods are:

o Regularization- Regularization adds a penalty term to different parameters of


the machine learning model for avoiding overfitting in the model. This penalty
term is added to the coefficients; hence it shrinks some coefficients to zero.
Those features with zero coefficients can be removed from the dataset. The
types of regularization techniques are L1 Regularization (Lasso Regularization)
or Elastic Nets (L1 and L2 regularization).
o Random Forest Importance - Different tree-based methods of feature
selection help us with feature importance to provide a way of selecting
features. Here, feature importance specifies which feature has more
importance in model building or has a great impact on the target variable.
Random Forest is such a tree-based method, which is a type of bagging
algorithm that aggregates a different number of decision trees. It
automatically ranks the nodes by their performance or decrease in the
impurity (Gini impurity) over all the trees. Nodes are arranged as per the
impurity values, and thus it allows to pruning of trees below a specific node.
The remaining nodes create a subset of the most important features.
How to choose a Feature Selection
Method?
For machine learning engineers, it is very important to understand that which feature
selection method will work properly for their model. The more we know the
datatypes of variables, the easier it is to choose the appropriate statistical measure
for feature selection.

To know this, we need to first identify the type of input and output variables. In
machine learning, variables are of mainly two types:

o Numerical Variables: Variable with continuous values such as integer, float


o Categorical Variables: Variables with categorical values such as Boolean,
ordinal, nominals.

Below are some univariate statistical measures, which can be used for filter-based
feature selection:

1. Numerical Input, Numerical Output:

Numerical Input variables are used for predictive regression modelling. The common
method to be used for such a case is the Correlation coefficient.

o Pearson's correlation coefficient (For linear Correlation).


o Spearman's rank coefficient (for non-linear correlation).
2. Numerical Input, Categorical Output:

Numerical Input with categorical output is the case for classification predictive
modelling problems. In this case, also, correlation-based techniques should be used,
but with categorical output.

o ANOVA correlation coefficient (linear).


o Kendall's rank coefficient (nonlinear).

3. Categorical Input, Numerical Output:

This is the case of regression predictive modelling with categorical input. It is a


different example of a regression problem. We can use the same measures as
discussed in the above case but in reverse order.

4. Categorical Input, Categorical Output:

This is a case of classification predictive modelling with categorical Input variables.

The commonly used technique for such a case is Chi-Squared Test. We can also use
Information gain in this case.

We can summarise the above cases with appropriate measures in the below
table:

Input Output Feature Selection technique


Variable Variable

Numerical Numerical o Pearson's correlation coefficient (For


linear Correlation).
o Spearman's rank coefficient (for non-
linear correlation).

Numerical Categorical o ANOVA correlation coefficient (linear).


o Kendall's rank coefficient (nonlinear).

Categorical Numerical o Kendall's rank coefficient (linear).


o ANOVA correlation coefficient
(nonlinear).
Categorical Categorical o Chi-Squared test (contingency tables).
o Mutual Information.

Conclusion
Feature selection is a very complicated and vast field of machine learning, and lots of
studies are already made to discover the best methods. There is no fixed rule of the
best feature selection method. However, choosing the method depend on a machine
learning engineer who can combine and innovate approaches to find the best
method for a specific problem. One should try a variety of model fits on different
subsets of features selected through different statistical Measures.

Bias and Variance in Machine


Learning
Machine learning is a branch of Artificial Intelligence, which allows machines to
perform data analysis and make predictions. However, if the machine learning model
is not accurate, it can make predictions errors, and these prediction errors are usually
known as Bias and Variance. In machine learning, these errors will always be present
as there is always a slight difference between the model predictions and actual
predictions. The main aim of ML/data science analysts is to reduce these errors in
order to get more accurate results. In this topic, we are going to discuss bias and
variance, Bias-variance trade-off, Underfitting and Overfitting. But before starting,
let's first understand what errors in Machine learning are?
Errors in Machine Learning?
In machine learning, an error is a measure of how accurately an algorithm can make
predictions for the previously unknown dataset. On the basis of these errors, the
machine learning model is selected that can perform best on the particular dataset.
There are mainly two types of errors in machine learning, which are:

o Reducible errors: These errors can be reduced to improve the model


accuracy. Such errors can further be classified into bias and Variance.

o Irreducible errors: These errors will always be present in the model

regardless of which algorithm has been used. The cause of these errors is unknown
variables whose value can't be reduced.

What is Bias?
In general, a machine learning model analyses the data, find patterns in it and make
predictions. While training, the model learns these patterns in the dataset and
applies them to test data for prediction. While making predictions, a difference
occurs between prediction values made by the model and actual
values/expected values, and this difference is known as bias errors or Errors due
to bias. It can be defined as an inability of machine learning algorithms such as
Linear Regression to capture the true relationship between the data points. Each
algorithm begins with some amount of bias because bias occurs from assumptions in
the model, which makes the target function simple to learn. A model has either:

PlayNext
Unmute
Current Time 0:00

Duration 18:10
Loaded: 0.37%
Â
Fullscreen
Backward Skip 10sPlay VideoForward Skip 10s

o Low Bias: A low bias model will make fewer assumptions about the form of
the target function.
o High Bias: A model with a high bias makes more assumptions, and the model
becomes unable to capture the important features of our dataset. A high bias
model also cannot perform well on new data.

Generally, a linear algorithm has a high bias, as it makes them learn fast. The simpler
the algorithm, the higher the bias it has likely to be introduced. Whereas a nonlinear
algorithm often has low bias.

Some examples of machine learning algorithms with low bias are Decision Trees, k-
Nearest Neighbours and Support Vector Machines. At the same time, an
algorithm with high bias is Linear Regression, Linear Discriminant Analysis and
Logistic Regression.

Ways to reduce High Bias:


High bias mainly occurs due to a much simple model. Below are some ways to
reduce the high bias:

o Increase the input features as the model is underfitted.


o Decrease the regularization term.
o Use more complex models, such as including some polynomial features.

What is a Variance Error?


The variance would specify the amount of variation in the prediction if the different
training data was used. In simple words, variance tells that how much a random
variable is different from its expected value. Ideally, a model should not vary too
much from one training dataset to another, which means the algorithm should be
good in understanding the hidden mapping between inputs and output variables.
Variance errors are either of low variance or high variance.

Low variance means there is a small variation in the prediction of the target function
with changes in the training data set. At the same time, High variance shows a large
variation in the prediction of the target function with changes in the training dataset.

A model that shows high variance learns a lot and perform well with the training
dataset, and does not generalize well with the unseen dataset. As a result, such a
model gives good results with the training dataset but shows high error rates on the
test dataset.

Since, with high variance, the model learns too much from the dataset, it leads to
overfitting of the model. A model with high variance has the below problems:

o A high variance model leads to overfitting.


o Increase model complexities.

Usually, nonlinear algorithms have a lot of flexibility to fit the model, have high
variance.

Some examples of machine learning algorithms with low variance are, Linear
Regression, Logistic Regression, and Linear discriminant analysis. At the same
time, algorithms with high variance are decision tree, Support Vector Machine,
and K-nearest neighbours.

Ways to Reduce High Variance:

o Reduce the input features or number of parameters as a model is overfitted.


o Do not use a much complex model.
o Increase the training data.
o Increase the Regularization term.
Different Combinations of Bias-Variance
There are four possible combinations of bias and variances, which are represented by
the below diagram:

1. Low-Bias, Low-Variance:
The combination of low bias and low variance shows an ideal machine
learning model. However, it is not possible practically.
2. Low-Bias, High-Variance: With low bias and high variance, model predictions
are inconsistent and accurate on average. This case occurs when the model
learns with a large number of parameters and hence leads to an overfitting
3. High-Bias, Low-Variance: With High bias and low variance, predictions are
consistent but inaccurate on average. This case occurs when a model does not
learn well with the training dataset or uses few numbers of the parameter. It
leads to underfitting problems in the model.
4. High-Bias, High-Variance:
With high bias and high variance, predictions are inconsistent and also
inaccurate on average.

How to identify High variance or High


Bias?
High variance can be identified if the model has:

o Low training error and high test error.

High Bias can be identified if the model has:

o High training error and the test error is almost similar to training error.

Bias-Variance Trade-Off
While building the machine learning model, it is really important to take care of bias
and variance in order to avoid overfitting and underfitting in the model. If the model
is very simple with fewer parameters, it may have low variance and high bias.
Whereas, if the model has a large number of parameters, it will have high variance
and low bias. So, it is required to make a balance between bias and variance errors,
and this balance between the bias error and variance error is known as the Bias-
Variance trade-off.
For an accurate prediction of the model, algorithms need a low variance and low
bias. But this is not possible because bias and variance are related to each other:

o If we decrease the variance, it will increase the bias.


o If we decrease the bias, it will increase the variance.

Bias-Variance trade-off is a central issue in supervised learning. Ideally, we need a


model that accurately captures the regularities in training data and simultaneously
generalizes well with the unseen dataset. Unfortunately, doing this is not possible
simultaneously. Because a high variance algorithm may perform well with training
data, but it may lead to overfitting to noisy data. Whereas, high bias algorithm
generates a much simple model that may not even capture important regularities in
the data. So, we need to find a sweet spot between bias and variance to make an
optimal model.

Hence, the Bias-Variance trade-off is about finding the sweet spot to make a
balance between bias and variance errors.

Machine Learning Tools


Machine learning is one of the most revolutionary technologies that is making lives
simpler. It is a subfield of Artificial Intelligence, which analyses the data, build
the model, and make predictions. Due to its popularity and great applications,
every tech enthusiast wants to learn and build new machine learning Apps. However,
to build ML models, it is important to master machine learning tools. Mastering
machine learning tools will enable you to play with the data, train your models,
discover new methods, and create algorithms.

There are different tools, software, and platform available for machine learning, and
also new software and tools are evolving day by day. Although there are many
options and availability of Machine learning tools, choosing the best tool per your
model is a challenging task. If you choose the right tool for your model, you can
make it faster and more efficient. In this topic, we will discuss some popular and
commonly used Machine learning tools and their features.

1. TensorFlow
TensorFlow is one of the most popular open-source libraries used to train and build
both machine learning and deep learning models. It provides a JS library and was
developed by Google Brain Team. It is much popular among machine learning
enthusiasts, and they use it for building different ML applications. It offers a powerful
library, tools, and resources for numerical computation, specifically for large scale
machine learning and deep learning projects. It enables data scientists/ML
developers to build and deploy machine learning applications efficiently. For training
and building the ML models, TensorFlow provides a high-level Keras API, which lets
users easily start with TensorFlow and machine learning.

Features:
Below are some top features:

PlayNext
Unmute

Current Time 0:00

Duration 18:10
Loaded: 0.37%
Â
Fullscreen
Backward Skip 10sPlay VideoForward Skip 10s

o TensorFlow enables us to build and train our ML models easily.


o It also enables you to run the existing models using the TensorFlow.js
o It provides multiple abstraction levels that allow the user to select the correct
resource as per the requirement.
o It helps in building a neural network.
o Provides support of distributed computing.
o While building a model, for more need of flexibility, it provides eager
execution that enables immediate iteration and intuitive debugging.
o This is open-source software and highly flexible.
o It also enables the developers to perform numerical computations using data
flow graphs.
o Run-on GPUs and CPUs, and also on various mobile computing platforms.
o It provides a functionality of auto diff (Automatically computing gradients is
called automatic differentiation or auto diff).
o It enables to easily deploy and training the model in the cloud.
o It can be used in two ways, i.e., by installing through NPM or by script tags.
o It is free to use.

2. PyTorch

PyTorch is an open-source machine learning framework, which is based on the


Torch library. This framework is free and open-source and developed
by FAIR(Facebook's AI Research lab). It is one of the popular ML frameworks, which
can be used for various applications, including computer vision and natural language
processing. PyTorch has Python and C++ interfaces; however, the Python interface is
more interactive. Different deep learning software is made up on top of PyTorch,
such as PyTorch Lightning, Hugging Face's Transformers, Tesla autopilot, etc.

It specifies a Tensor class containing an n-dimensional array that can perform tensor
computations along with GPU support.

Features:
Below are some top features:

o It enables the developers to create neural networks using Autograde Module.


o It is more suitable for deep learning researches with good speed and
flexibility.
o It can also be used on cloud platforms.
o It includes tutorial courses, various tools, and libraries.
o It also provides a dynamic computational graph that makes this library more
popular.
o It allows changing the network behaviour randomly without any lag.
o It is easy to use due to its hybrid front-end.
o It is freely available.

3. Google Cloud ML Engine

While training a classifier with a huge amount of data, a computer system might not
perform well. However, various machine learning or deep learning projects requires
millions or billions of training datasets. Or the algorithm that is being used is taking a
long time for execution. In such a case, one should go for the Google Cloud ML
Engine. It is a hosted platform where ML developers and data scientists build and run
optimum quality machine, learning models. It provides a managed service that allows
developers to easily create ML models with any type of data and of any size.

Features:
Below are the top features:

o Provides machine learning model training, building, deep learning and


predictive modelling.
o The two services, namely, prediction and training, can be used independently
or combinedly.
o It can be used by enterprises, i.e., for identifying clouds in a satellite image,
responding faster to emails of customers.
o It can be widely used to train a complex model.

4. Amazon Machine Learning (AML)


Amazon provides a great number of machine learning tools, and one of them
is Amazon Machine Learning or AML. Amazon Machine Learning (AML) is a cloud-
based and robust machine learning software application, which is widely used for
building machine learning models and making predictions. Moreover, it integrates
data from multiple sources, including Redshift, Amazon S3, or RDS.

Features
Below are some top features:

o AML offers visualization tools and wizards.


o Enables the users to identify the patterns, build mathematical models, and
make predictions.
o It provides support for three types of models, which are multi-class
classification, binary classification, and regression.
o It permits users to import the model into or export the model out from
Amazon Machine Learning.
o It also provides core concepts of machine learning, including ML models, Data
sources, Evaluations, Real-time predictions and Batch predictions.
o It enables the user to retrieve predictions with the help of batch APIs for bulk
requests or real-time APIs for individual requests.

5. NET
Accord.Net is .Net based Machine Learning framework, which is used for scientific
computing. It is combined with audio and image processing libraries that are written
in C#. This framework provides different libraries for various applications in ML, such
as Pattern Recognition, linear algebra, Statistical Data processing. One popular
package of the Accord.Net framework is Accord. Statistics, Accord.Math, and
Accord.MachineLearning.

Features
Below are some top features:

o It contains 38+ kernel Functions.


o Consists of more than 40 non-parametric and parametric estimation of
statistical distributions.
o Used for creating production-grade computer audition, computer vision,
signal processing, and statistics apps.
o Contains more than 35 hypothesis tests that include two-way and one way
ANOVA tests, non-parametric tests such as the Kolmogorov-Smirnov test and
many more.

6. Apache Mahout
Apache Mahout is an open-source project of Apache Software Foundation, which is
used for developing machine learning applications mainly focused on Linear Algebra.
It is a distributed linear algebra framework and mathematically expressive Scala DSL,
which enable the developers to promptly implement their own algorithms. It also
provides Java/Scala libraries to perform Mathematical operations mainly based on
linear algebra and statistics.

Features:
Below are some top features:

o It enables developers to implement machine learning techniques, including


recommendation, clustering, and classification.
o It is an efficient framework for implementing scalable algorithms.
o It consists of matrix and vector libraries.
o It provides support for multiple distributed backends(including Apache Spark)
o It runs on top of Apache Hadoop using the MapReduce paradigm.

7. Shogun

Shogun is a free and open-source machine learning software library, which was
created by Gunnar Raetsch and Soeren Sonnenburg in the year 1999. This
software library is written in C++ and supports interfaces for different languages
such as Python, R, Scala, C#, Ruby, etc., using SWIG(Simplified Wrapper and Interface
Generator). The main aim of Shogun is on different kernel-based algorithms such as
Support Vector Machine (SVM), K-Means Clustering, etc., for regression and
classification problems. It also provides the complete implementation of Hidden
Markov Models.

Features:
Below are some top features:

o The main aim of Shogun is on different kernel-based algorithms such as


Support Vector Machine (SVM), K-Means Clustering, etc., for regression and
classification problems.
o It provides support for the use of pre-calculated kernels.
o It also offers to use a combined kernel using Multiple kernel Learning
Functionality.
o This was initially designed for processing a huge dataset that consists of up to
10 million samples.
o It also enables users to work on interfaces on different programming
languages such as Lua, Python, Java, C#, Octave, Ruby, MATLAB, and R.

8. Oryx2

It is a realization of the lambda architecture and built on Apache Kafka and Apache
Spark. It is widely used for real-time large-scale machine learning projects. It is a
framework for building apps, including end-to-end applications for filtering,
packaged, regression, classification, and clustering. It is written in Java languages,
including Apache Spark, Hadoop, Tomcat, Kafka, etc. The latest version of Oryx2 is
Oryx 2.8.0.

Features:
Below are some top features:

o It has three tiers: specialization on top providing ML abstractions, generic


lambda architecture tier, end-to-end implementation of the same standard
ML algorithms.
o The original project of Oryx2 was Oryx1, and after some upgrades, Oryx2 was
launched.
o It is well suited for large-scale real-time machine learning projects.
o It contains three layers which are arranged side-by-side, and these are named
as Speed layer, batch layer, and serving layer.
o It also has a data transport layer that transfer data between different layers
and receives input from external sources.
9. Apache Spark MLlib

Apache Spark MLlib is a scalable machine learning library that runs on Apache
Mesos, Hadoop, Kubernetes, standalone, or in the cloud. Moreover, it can access
data from different data sources. It is an open-source cluster-computing framework
that offers an interface for complete clusters along with data parallelism and fault
tolerance.

For optimized numerical processing of data, MLlib provides linear algebra packages
such as Breeze and netlib-Java. It uses a query optimizer and physical execution
engine for achieving high performance with both batch and streaming data.

Features
Below are some top features:

o MLlib contains various algorithms, including Classification, Regression,


Clustering, recommendations, association rules, etc.
o It runs different platforms such as Hadoop, Apache Mesos, Kubernetes,
standalone, or in the cloud against diverse data sources.
o It contains high-quality algorithms that provide great results and performance.
o It is easy to use as it provides interfaces In Java, Python, Scala, R, and SQL.

10. Google ML kit for Mobile


For Mobile app developers, Google brings ML Kit, which is packaged with the
expertise of machine learning and technology to create more robust, optimized, and
personalized apps. This tools kit can be used for face detection, text recognition,
landmark detection, image labelling, and barcode scanning applications. One can
also use it for working offline.

Features:
Below are some top features:

o The ML kit is optimized for mobile.


o It includes the advantages of different machine learning technologies.
o It provides easy-to-use APIs that enables powerful use cases in your mobile
apps.
o It includes Vision API and Natural Language APIS to detect faces, text, and
objects, and identify different languages & provide reply suggestions.

Conclusion
In this topic, we have discussed some popular machine learning tools. However,
there are many more other ML tools, but choosing the tool completely depends on
the requirement for one's project, skills, and price to the tool. Most of these tools are
freely available, except for some tools such as Rapid Miner. Each tool works in a
different language and provides some specifications.

Prerequisites for Machine Learning


Nowadays, machine learning has become one of the most sought-after technologies
of the era, and undoubtedly it is the wave of the future. If you are interested in
learning machine learning, then you must be aware of the prerequisites for machine
learning. The machine learning prerequisites will help you to make a better career
path.

Machine Learning is an interdisciplinary field of mathematics and computer


science that aims to teach machines to perform cognitive activity similar to
humans. In machine learning, the term learning specifies a way by which machines
take input data, examine or analyze data, and gain insights from it. Machine learning
systems use different algorithms to automatically learn patterns from datasets that
may include structured data, numeric data, textual data, visual data, etc. In order to
succeed in machine learning technology, it is very important to understand each
concept in a proper way.

In this topic, we will discuss the perquisites for machine learning so that you can
make your base better for learning its advanced concepts.

What are the prerequisites for machine


learning?
To get started with machine learning, you must be aware of the below points.

Backward Skip 10sPlay VideoForward Skip 10s

o Educational Prerequisites for Machine learning Career


o Skills-based Prerequisites for Machine learning Career

1. Statistics
2. Linear Algebra
3. Calculus
4. Probability
5. Programming Languages

Educational Prerequisites for machine


learning
Does Master's/Ph.D. require to become a Machine learning engineer?

This is one of the most common questions regarding educational qualification for ML
among the aspirants who want to learn Machine learning and make a career in this.
The answer for this is NO, it means, it is not necessary that you must have a master's
or Ph.D. degree to learn and make a career in machine learning. There are lots of
people who have made a career in this field without having a degree. However,
having a Ph.D. or Master's degree will definitely give you additional benefits and will
make the path smoother. The master's/Ph.D. certificate works as a way to showcase
your skills, but in the end, your practical knowledge & skills will help you to either
build a project or make a career in Machine learning. So, if you have enough time
and funds to have a Master's or Ph.D. degree, you can do this, and it will surely give
you benefit. But if you are not having a degree and have good ML skills, then also
you can make the transition into Machine learning.

Skill-based Prerequisites for Machine learning Career

1. Statistics
Machine learning and statistics are the two tightly coupled fields, as most of the
concepts of machine learning are either taken from statistics or are dependent on it.
Machine learning techniques and algorithms are widely dependent on statistical
concepts and theories; hence it is a crucial prerequisite for ML.

Statistics is a field of mathematics that allows to draw the logical conclusion from the
data. Every machine learning enthusiast must understand the statistical concepts in
order to learn the working of algorithms such as logistic Regression, distribution,
hypothesis testing, etc. It helps in performing the following task:

o It contains various tools that allow us to get some outcomes from the
available data and information.
o It finds outcomes from the data and transforms sample observations into
meaningful information.
o Each raw data is not perfect and contains different impurities in it, such as
incomplete data, corrupted data, etc. In such cases, statistical concepts help to
identify these impurities.
o It helps in obtaining answers for different questions such as, who scored the
maximum & minimum in the cricket tournament? Which technology is on-
trend in 2021? etc.
o Statistical hypothesis tests enable in selecting the best model for any kind of
predictive modeling problem.

Some fundamental concepts of Statistics needed for ML are given below:

o Combinatorics
o Axioms
o Bayes' Theorem
o Variance and Expectation
o Random Variables
o Conditional and Joint Distributions.

2. Linear Algebra
Linear algebra deals with the study of vectors & some rules of manipulating these
vectors, matrices, and linear transform. It is one of the integral parts of machine
learning and helps the ML algorithms to run on a huge number of datasets with
multi-dimensionality.

The concepts of linear algebra are widely used in developing algorithms in machine
learning. It can perform the following task:

o Linear algebra has vast application in machine learning.


o Linear algebra is essential for optimizing the data in machine learning.
o It is used in loss functions, regularisation, covariance matrices, Singular Value
Decomposition (SVD), Matrix Operations, and support vector machine
classification.
o Linear algebra is also used for performing Principal Component Analysis(PCA)
for dimensionality reduction.
o Apart from the above applications, it is also used in neural networks and the
data science field.
Although linear algebra is one of the crucial prerequisites for machine learning, it is
not required to go in-depth in this, at least not for the beginner; only understanding
of the basic concept is enough to start.

3. Probability
In the real world, there are various scenarios where the behavior or output can vary
for the same input. Probability has always been an essential part of Mathematics,
which measures the uncertainty of the event. The higher the probability of an event,
the more chances that event will occur. In Machine learning, probability helps to
make predictions with incomplete information. It helps in predicting the
likelihood of future events. With the help of probability, we can model elements of
uncertainty such as risk in a business process or transaction, i.e., we can work with
non-deterministic problems. Whereas in traditional programming, we deal with
deterministic problems; output is not affected by uncertainty. It also helps in
hypothesis testing and distributions such as Gaussian distribution and Probability
density function.

Probability theory and statistics are related fields; probability deals with future
events, whereas statistics deal with the analysis of past events.

Below are some commonly used Probability concepts:

o Maximum Likelihood Estimation


o Regression with Maximum
o Joint, Marginal, and Conditional Probability,
o Probability Distributions (Discrete, Continuous),
o Density Estimation
o Likelihood and Bayes Theorem, etc.

4. Calculus
Calculus is also an integral part of Machine learning, but it is not required to go in-
depth of it at the beginner level; rather, only knowledge of basic concepts is enough.
In machine learning, the process of getting the best parameters is known as
optimization, and multivariate calculus helps in solving optimization problems in the
ML model. It helps in optimization and getting good results from the model. In
calculus, we don't need to solve complex derivatives manually; rather, we must
understand how differentiation work and how it is applied for vector calculus.
Multivariate calculus is not only used for algorithm training but also for gradient
descent. Some crucial concepts of multivariate calculus are Derivatives, divergence,
curvature, and quadratic approximations, Laplacian and Lagrangian
Distribution, Directional Gradient, etc.

5. Programming Languages
Apart from the mathematical concepts, it is very important to have a good
knowledge of a programming language and coding capabilities for machine learning.
Some of the most popular programming languages for machine learning are as
follows:

Python

Python is the most powerful and easy language that anyone can learn. Python was
initially developed in early 1991. Most of the developers and programmers choose
Python as their favorite programming language for developing Machine learning &
AI solutions. The best part about Python is it is very easy to learn compare to other
programming languages and also offers great carrier opportunities for programmers
and data scientists.

Python provides excellent community support and an extensive set of libraries, along
with the flexibility of programming languages. Python is a platform-independent
language as well as it provides an extensive framework for Deep Learning and
Machine Learning.

Python is also a very portable language as it can be used on different platforms


including Linux, Windows, Mac OS, and UNIX.

R is one of the great languages for statistical processing in programming. It may not
be the perfect language for machine learning, but it provides great performance
while dealing with large numbers. Some inbuilt features such as built-in functional
programming, object-oriented nature, and vectorial computation make it a
worthwhile programming language for machine learning.

R contains several packages that are specially designed for ML, which are:

o gmodels - This package provides different tools for the model fitting task.
o TM - It is a great framework that is used for text mining applications.
o RODBC - It is an ODBC interface.
o OneR - This package is used to implement the One Rule Machine Learning
classification algorithm.
Java:

Java is the most widely used programming language by all developers and
programmers in the world. Java can be easily implemented on the various platform
due to JVM(Java Virtual Machine). The best things about Java is once it is written and
compiled on one platform, then you should not need to compile it again and again.
This is known as WORA (Once Written Read/Run Anywhere) principle. Java has so
many features which make Java best for use in Machine learning. These are as
follows:

o Portable
o Memory manager
o Cross-platform.
o Easy to learn and use.
o Easy-to-code Algorithms.
o Built-in garbage collector.
o Swing and Standard Widget Toolkit.
o Simplified work with large-scale projects.
o Better user interaction.
o Easy to debug

Selecting the correct Programming Language


Apart from the above-mentioned programming languages, there are many other
programming languages that are being used in Machine learning, such as C, C++,
MATLAB, JavaScript, etc. However, choosing the best languages may become a
challenging task for beginners. In machine learning, Python and R are the two most
preferred languages because of their great benefits and vast libraries. However, other
general-purpose languages can also be used, such as Java, C, C++, but make sure
you are skilled with these languages.

Apart from the above programming and mathematics skills, awareness of some basic
concepts of machine learning is required to learn advanced concepts. These concepts
include machine learning types (Supervised, unsupervised, Reinforcement learning),
techniques, model building, etc.

Gradient Descent in Machine


Learning
Gradient Descent is known as one of the most commonly used optimization
algorithms to train machine learning models by means of minimizing errors between
actual and expected results. Further, gradient descent is also used to train Neural
Networks.

In mathematical terminology, Optimization algorithm refers to the task of


minimizing/maximizing an objective function f(x) parameterized by x. Similarly, in
machine learning, optimization is the task of minimizing the cost function
parameterized by the model's parameters. The main objective of gradient descent is
to minimize the convex function using iteration of parameter updates. Once these
machine learning models are optimized, these models can be used as powerful tools
for Artificial Intelligence and various computer science applications.

In this tutorial on Gradient Descent in Machine Learning, we will learn in detail about
gradient descent, the role of cost functions specifically as a barometer within
Machine Learning, types of gradient descents, learning rates, etc.

What is Gradient Descent or Steepest


Descent?
Gradient descent was initially discovered by "Augustin-Louis Cauchy" in mid of
18th century. Gradient Descent is defined as one of the most commonly used
iterative optimization algorithms of machine learning to train the machine
learning and deep learning models. It helps in finding the local minimum of a
function.

PlayNext
Unmute

Current Time 0:00

Duration 18:10
Loaded: 0.37%
Â
Fullscreen
Backward Skip 10sPlay VideoForward Skip 10s

The best way to define the local minimum or local maximum of a function using
gradient descent is as follows:
o If we move towards a negative gradient or away from the gradient of the
function at the current point, it will give the local minimum of that function.
o Whenever we move towards a positive gradient or towards the gradient of the
function at the current point, we will get the local maximum of that function.

This entire procedure is known as Gradient Ascent, which is also known as steepest
descent. The main objective of using a gradient descent algorithm is to
minimize the cost function using iteration. To achieve this goal, it performs two
steps iteratively:

o Calculates the first-order derivative of the function to compute the gradient or


slope of that function.
o Move away from the direction of the gradient, which means slope increased
from the current point by alpha times, where Alpha is defined as Learning
Rate. It is a tuning parameter in the optimization process which helps to
decide the length of the steps.

What is Cost-function?
The cost function is defined as the measurement of difference or error between
actual values and expected values at the current position and present in the
form of a single real number. It helps to increase and improve machine learning
efficiency by providing feedback to this model so that it can minimize error and find
the local or global minimum. Further, it continuously iterates along the direction of
the negative gradient until the cost function approaches zero. At this steepest
descent point, the model will stop learning further. Although cost function and loss
function are considered synonymous, also there is a minor difference between them.
The slight difference between the loss function and the cost function is about the
error within the training of machine learning models, as loss function refers to the
error of one training example, while a cost function calculates the average error
across an entire training set.

The cost function is calculated after making a hypothesis with initial parameters and
modifying these parameters using gradient descent algorithms over known data to
reduce the cost function.

Hypothesis:

Parameters:

Cost function:

Goal:

How does Gradient Descent work?


Before starting the working principle of gradient descent, we should know some
basic concepts to find out the slope of a line from linear regression. The equation for
simple linear regression is given as:

1. Y=mX+c

Where 'm' represents the slope of the line, and 'c' represents the intercepts on the y-
axis.
The starting point(shown in above fig.) is used to evaluate the performance as it is
considered just as an arbitrary point. At this starting point, we will derive the first
derivative or slope and then use a tangent line to calculate the steepness of this
slope. Further, this slope will inform the updates to the parameters (weights and
bias).

The slope becomes steeper at the starting point or arbitrary point, but whenever new
parameters are generated, then steepness gradually reduces, and at the lowest point,
it approaches the lowest point, which is called a point of convergence.

The main objective of gradient descent is to minimize the cost function or the error
between expected and actual. To minimize the cost function, two data points are
required:

o Direction & Learning Rate

These two factors are used to determine the partial derivative calculation of future
iteration and allow it to the point of convergence or local minimum or global
minimum. Let's discuss learning rate factors in brief;

Learning Rate:
It is defined as the step size taken to reach the minimum or lowest point. This is
typically a small value that is evaluated and updated based on the behavior of the
cost function. If the learning rate is high, it results in larger steps but also leads to
risks of overshooting the minimum. At the same time, a low learning rate shows the
small step sizes, which compromises overall efficiency but gives the advantage of
more precision.

Types of Gradient Descent


Based on the error in various training models, the Gradient Descent learning
algorithm can be divided into Batch gradient descent, stochastic gradient
descent, and mini-batch gradient descent. Let's understand these different types
of gradient descent:

1. Batch Gradient Descent:


Batch gradient descent (BGD) is used to find the error for each point in the training
set and update the model after evaluating all training examples. This procedure is
known as the training epoch. In simple words, it is a greedy approach where we have
to sum over all examples for each update.

Advantages of Batch gradient descent:

o It produces less noise in comparison to other gradient descent.


o It produces stable gradient descent convergence.
o It is Computationally efficient as all resources are used for all training samples.

2. Stochastic gradient descent


Stochastic gradient descent (SGD) is a type of gradient descent that runs one training
example per iteration. Or in other words, it processes a training epoch for each
example within a dataset and updates each training example's parameters one at a
time. As it requires only one training example at a time, hence it is easier to store in
allocated memory. However, it shows some computational efficiency losses in
comparison to batch gradient systems as it shows frequent updates that require
more detail and speed. Further, due to frequent updates, it is also treated as a noisy
gradient. However, sometimes it can be helpful in finding the global minimum and
also escaping the local minimum.

Advantages of Stochastic gradient descent:

In Stochastic gradient descent (SGD), learning happens on every example, and it


consists of a few advantages over other gradient descent.

o It is easier to allocate in desired memory.


o It is relatively fast to compute than batch gradient descent.
o It is more efficient for large datasets.

3. MiniBatch Gradient Descent:


Mini Batch gradient descent is the combination of both batch gradient descent and
stochastic gradient descent. It divides the training datasets into small batch sizes
then performs the updates on those batches separately. Splitting training datasets
into smaller batches make a balance to maintain the computational efficiency of
batch gradient descent and speed of stochastic gradient descent. Hence, we can
achieve a special type of gradient descent with higher computational efficiency and
less noisy gradient descent.

Advantages of Mini Batch gradient descent:

o It is easier to fit in allocated memory.


o It is computationally efficient.
o It produces stable gradient descent convergence.

Challenges with the Gradient Descent


Although we know Gradient Descent is one of the most popular methods for
optimization problems, it still also has some challenges. There are a few challenges as
follows:

1. Local Minima and Saddle Point:


For convex problems, gradient descent can find the global minimum easily, while for
non-convex problems, it is sometimes difficult to find the global minimum, where the
machine learning models achieve the best results.

Whenever the slope of the cost function is at zero or just close to zero, this model
stops learning further. Apart from the global minimum, there occur some scenarios
that can show this slop, which is saddle point and local minimum. Local minima
generate the shape similar to the global minimum, where the slope of the cost
function increases on both sides of the current points.

In contrast, with saddle points, the negative gradient only occurs on one side of the
point, which reaches a local maximum on one side and a local minimum on the other
side. The name of a saddle point is taken by that of a horse's saddle.

The name of local minima is because the value of the loss function is minimum at
that point in a local region. In contrast, the name of the global minima is given so
because the value of the loss function is minimum there, globally across the entire
domain the loss function.

2. Vanishing and Exploding Gradient


In a deep neural network, if the model is trained with gradient descent and
backpropagation, there can occur two more issues other than local minima and
saddle point.

Vanishing Gradients:
Vanishing Gradient occurs when the gradient is smaller than expected. During
backpropagation, this gradient becomes smaller that causing the decrease in the
learning rate of earlier layers than the later layer of the network. Once this happens,
the weight parameters update until they become insignificant.

Exploding Gradient:

Exploding gradient is just opposite to the vanishing gradient as it occurs when the
Gradient is too large and creates a stable model. Further, in this scenario, model
weight increases, and they will be represented as NaN. This problem can be solved
using the dimensionality reduction technique, which helps to minimize complexity
within the model.

Machine Learning Experts Salary in


India
Machine Learning (ML) is one of the most popular sub-part of Artificial Intelligence (AI)
that uses important concepts of mathematics and data science to create human-like
intelligent machines. Machine Learning is currently introduced in all technical and
non-technical industries to solve various complex computation problems. It helps
find patterns in the dataset, and these patterns make data predictions through data
modeling. Due to all these advantages, machine learning technology is becoming
trending day by day among people. In this article 'Machine Learning Experts Salary
in India', we will discuss the basic introduction of Machine Learning experts, their
salaries, factors affecting their salaries in India and other countries, the importance of
machine learning experts, machine learning experts job roles, and responsibilities
and skill required to become a Machine Learning expert, etc.
Let's start with the introduction of Machine Learning Experts.

Machine Learning Expert: Introduction


A Machine Learning expert is a dedicated programmer who helps machines
understand and pick up vital information as required. The primary purpose of machine
learning experts starts from creating and developing ML applications and enabling
machines to take specific actions without explicit directions. Machine learning
experts are quite similar to data scientists as both work on a heavy volume of data
and have strong bonding over data handling. Hence, we can say that machine
learning experts and data scientists work together where data scientist helps to
separate important insights from entire datasets and share data with their team. In
contrast, ML expert guarantees that data scientists well utilize these models.

Further, all machine learning experts are also responsible for customizing data for
analysis purposes, improving web and app-like experience, and identifying and
predicting business requirements. Moreover, machine learning experts are also
involved in robotics, web development, developing chatbots, data analytics,
intelligent application development, etc.

PlayNext
Unmute

Current Time 0:00

Duration 18:10
Loaded: 0.37%
Â
Fullscreen
Backward Skip 10sPlay VideoForward Skip 10s

Machine Learning Experts: Roles and


Responsibilities
As similar to other technologies experts, machine learning experts also have their
own roles and responsibilities. Some of them are as follows:

o To create machine learning programs using predefined ML libraries.


o Identifying, examining, clustering, and mining data for data modeling
purposes.
o Modify machine learning programs for scalability purposes.
o Maintain the flow of data between the database and backend.
o Debugging of machine learning codes.
o Optimizing machine learning technologies in a production environment.
o Developing neural network models that support the business/customer use
cases.
o Applying machine learning techniques to real-world problems.

Machine Learning Experts: Salaries in


India
As per statistics, machine learning experts earn salaries as per their experience level,
job title, company, location, and skills. On average, a machine learning expert
earns between 7-8 lakhs per annum as the total compensation. According to
Glassdoor, in India, the average machine expert's salary is 7.6 lakh per annum, while
as per Pay-scale data, it is around 7 lakh per annum.

Machine Learning Experts Salary: Based on


Experience Level

Experience Level Salary

Fresher 5-6 LPA

Seniors 6-15 LPA

Experts >15 LPA

Machine Learning Experts Salary: Based on


Company

Company Salary

Deloitte 6.5 LPA

Amazon 8.3 LPA


Accenture 15.5-16 LPA

Factors impacting the salary of Machine


Learning experts
The following factors majorly impact the salary of machine learning experts:

Experience:
Like all other fields, total years of relevant domain working experience also matter for
deciding an employee's salary. It helps you understand the problems and give an
appropriate production-ready solution. Hence, experience is one of the most
important deciding factors in total compensation.

If a company hires a beginner or fresher level candidate, then according to the


sources, the average salary ranges between 5 to 6 lakh per annum (LPA). Similarly,
senior-level candidates with experience of 4-5 years offer 6-15 lakh per annum, and
expert candidates with experience of more than 6-8 years are eligible to gain a good
salary package of more than 15 lakh per annum.

Company:
Other than the experience of candidates, the company is also one of the most
important factors, which decides the salary of the machine learning experts in the
industry. It directly affects the salary and perks of the candidates.

Company Average Total Compensation

TCS 5 lakh per annum (LPA)

Cognizant 5.5 lakh per annum (LPA)

Wipro 5.6 lakh per annum (LPA)

Infosys 6.3 lakh per annum (LPA)

Accenture 7.75 lakh per annum (LPA)

Oracle 10.35 lakh per annum (LPA)

Google 12.15 lakh per annum (LPA)


Qualcomm 14.2 lakh per annum (LPA)

Professional Skills:
Professional skills are the major impacting factors that decide how much machine
learning experts earn in the industry. Every hiring process is based on the appropriate
skill sets of the candidates. If you have good skillsets as per industry demand, it will
be very helpful to clear any interview and the performance of that candidate in the
production environment. Hence, based on the professional skillsets, the employee
gets more salary and compensation according to company policies and terms &
conditions.

Skills Average Total Compensation

Machine Learning (ML) 7 lakhs per annum (LPA)

Computer Vision 7.25 lakhs per annum (LPA)

Natural Language Processing (NLP) 7.3 lakhs per annum (LPA)

Deep Learning 7.5 lakhs per annum (LPA)

Artificial Intelligence (AI) 8 lakhs per annum (LPA)

Location:
In earlier days, location was undoubtedly the important impacting factor that used to
decide an employee's salary in the industry. But nowadays, in the remotely working
culture, location does not play a vital role in the compensation. However, it can affect
the salary and compensations in terms of house rent (cost of living according to
urban, rural, or metro cities) and travel allowances (cost of traveling for pick and drop
of employees). These types of compensations also attract lots of candidates.

Location Average Total Compensation

Bangalore 8.7 lakhs per annum (LPA)

Chennai 7.25 lakhs per annum (LPA)


Delhi 7 lakhs per annum (LPA)

Hyderabad 6.8 lakhs per annum (LPA)

Kolkata 6.4 lakhs per annum (LPA)

Mumbai 6.25 lakhs per annum (LPA)

Pune 6.15 lakhs per annum (LPA)

Noida 6 lakhs per annum (LPA)

Gurugram 5.35 lakhs per annum (LPA)

Skills Required for Machine Learning


Experts
If you are interested in any machine learning industry, then you should have a deep
knowledge of anyone modern programming language. Further, specifically, as a
machine learning expert, you should be proficient with one of the programming
languages such as c, C++, Java, Python, R, Scala, MATLAB, etc.

Other than the hands-on knowledge of the above-mentioned programming


languages, you should have the knowledge of various frameworks like Keras,
TensorFlow, PyTorch, etc. Further, one should understand a few languages specific
libraries and packages such as SciPy, NumPy, pandas, matplotlib in Python.

Moreover, as data science and machine learning are completely based on data, so
having experience in RDBMS and NoSQL databases is necessary to extract and
process data significantly. Hadoop, Spark, or Hive are a few important data
processing ecosystems in the computer science world.
There are some other skills that help to become a machine learning expert:

o Knowledge of cloud-based container environments like Docker, Mesos,


Kubernetes.
o Knowledge of Natural Language Processing and Deep neural networks like
RNN, LSTM, GRU, CNN, etc.
o Basic Knowledge of working with GPU, Cuda/CuDNN, profiling, and low-level
optimizations.

Steps to Become a Machine Learning


Expert
Some major steps to become a sound machine learning expert as per industry
requirements are given below. It is recommended to follow these steps, which help
beginners as well as experts to move forward. These steps are as follows:

1. Learn Programming Language:


Having sound knowledge of any programming language is the most important step
to move forward in becoming a machine learning expert. Python is the best-suited
programming language that helps in all machine learning tasks. The reason behind
using Python is its ease of learning and frameworks. So, one should initially start
learning the basic concepts of Python, then go to the advanced level for being a
machine learning specialist in the industry.

2. Knowledge of Mathematics for Machine


Learning:
Mathematics is undoubtedly the most important foundation step to learning
machine learning algorithms. One must have to gain expertise in Statistics,
Probability, Derivatives, Linear Algebra, and Partial Derivatives.

3. Learning the basic concept of Machine Learning:


After learning the above steps, you should need to learn the basic concepts of
machine learning in order to become a machine learning expert in the industry.
There are a few topics that you must learn:

o Linear Regression
o Logistic Regression
o K Nearest Neighbours (KNN)
o Decision Tree
o Random Forest Algorithm
o Support Vector Machine (SVM)
o K Means Clustering
o Cross-Validation and Bias-Variance Trade-off

4. Frameworks to build Machine learning concept


To implement the algorithms and concepts easily in machine learning, an open-
source framework is an essential part. You must need to learn these frameworks and
their appropriate libraries in a preferred programming language. Some of the
popular frameworks are TensorFlow, Keras, Torch, PyTorch, etc.

5. Understanding concepts of Natural Language


Processing (NLP) and Deep Learning (DL):
After learning above mentioned steps, this step will help you to become an expert in
machine learning with a strong foundation. If you want to gain expertise in this field,
then you should not stop here. You should explore yourself to gain expertise in some
other topics such as Natural Language Processing, Deep Learning, Reinforcement
Learning, etc. Gaining expertise in any of the above areas would make an individual
domain expert.

Machine Learning Experts Salary in other


countries
Similar to previously mentioned factors affecting salary and compensations, Machine
Learning Experts' salary varies in other countries also. The average annual income of
a machine learning expert in the USA is 120K USD and in the United Kingdom (UK) is
50K GBP.

Here's the list of salaries of Machine Learning experts in other countries:

Country Salary(Annual)

United States of America (USA) $140,675

Canada $93,684

Australia $106,532

Conclusion
Machine Learning is a very powerful technology that offers high salary packages to
their experts in India as well as in other countries. It is highly impacted by total years
of experience in the corresponding industry, the scale of the company, the skills of
candidates, and the location. Generally, you can earn 5-6 lakhs per annum (LPA) at a
very beginner level, and after gaining expertise, you can earn exponentially more
salary packages in IT hubs in India such as Bangalore, Delhi, Noida, Pune, Hyderabad,
and Kolkata, etc.

Machine Learning Models


A machine learning model is defined as a mathematical representation of the
output of the training process. Machine learning is the study of different algorithms
that can improve automatically through experience & old data and build the model.
A machine learning model is similar to computer software designed to recognize
patterns or behaviors based on previous experience or data. The learning algorithm
discovers patterns within the training data, and it outputs an ML model which
captures these patterns and makes predictions on new data.

Let's understand an example of the ML model where we are creating an app to


recognize the user's emotions based on facial expressions. So, creating such an app
is possible by Machine learning models where we will train a model by feeding
images of faces with various emotions labeled on them. Whenever this app is used to
determine the user's mood, it reads all fed data then determines any user's mood.

Hence, in simple words, we can say that a machine learning model is a simplified
representation of something or a process. In this topic, we will discuss different
machine learning models and their techniques and algorithms.
What is Machine Learning Model?
Machine Learning models can be understood as a program that has been trained to
find patterns within new data and make predictions. These models are represented
as a mathematical function that takes requests in the form of input data, makes
predictions on input data, and then provides an output in response. First, these
models are trained over a set of data, and then they are provided an algorithm to
reason over data, extract the pattern from feed data and learn from those data. Once
these models get trained, they can be used to predict the unseen dataset.

Backward Skip 10sPlay VideoForward Skip 10s

There are various types of machine learning models available based on different
business goals and data sets.

Classification of Machine Learning Models:


Based on different business goals and data sets, there are three learning models for
algorithms. Each machine learning algorithm settles into one of the three models:

o Supervised Learning
o Unsupervised Learning
o Reinforcement Learning

Supervised Learning is further divided into two categories:

o Classification
o Regression

Unsupervised Learning is also divided into below categories:


o Clustering
o Association Rule
o Dimensionality Reduction

1. Supervised Machine Learning Models


Supervised Learning is the simplest machine learning model to understand in which
input data is called training data and has a known label or result as an output. So, it
works on the principle of input-output pairs. It requires creating a function that can
be trained using a training data set, and then it is applied to unknown data and
makes some predictive performance. Supervised learning is task-based and tested on
labeled data sets.

We can implement a supervised learning model on simple real-life problems. For


example, we have a dataset consisting of age and height; then, we can build a
supervised learning model to predict the person's height based on their age.

Supervised Learning models are further classified into two categories:

Regression
In regression problems, the output is a continuous variable. Some commonly used
Regression models are as follows:

a) Linear Regression

Linear regression is the simplest machine learning model in which we try to predict
one output variable using one or more input variables. The representation of linear
regression is a linear equation, which combines a set of input values(x) and predicted
output(y) for the set of those input values. It is represented in the form of a line:

Y = bx+ c.
The main aim of the linear regression model is to find the best fit line that best fits
the data points.

Linear regression is extended to multiple linear regression (find a plane of best fit)
and polynomial regression (find the best fit curve).

b) Decision Tree

Decision trees are the popular machine learning models that can be used for both
regression and classification problems.

A decision tree uses a tree-like structure of decisions along with their possible
consequences and outcomes. In this, each internal node is used to represent a test
on an attribute; each branch is used to represent the outcome of the test. The more
nodes a decision tree has, the more accurate the result will be.

The advantage of decision trees is that they are intuitive and easy to implement, but
they lack accuracy.

Decision trees are widely used in operations research, specifically in decision


analysis, strategic planning, and mainly in machine learning.

c) Random Forest

Random Forest is the ensemble learning method, which consists of a large number of
decision trees. Each decision tree in a random forest predicts an outcome, and the
prediction with the majority of votes is considered as the outcome.
A random forest model can be used for both regression and classification problems.

For the classification task, the outcome of the random forest is taken from the
majority of votes. Whereas in the regression task, the outcome is taken from the
mean or average of the predictions generated by each tree.

d) Neural Networks

Neural networks are the subset of machine learning and are also known as artificial
neural networks. Neural networks are made up of artificial neurons and designed in a
way that resembles the human brain structure and working. Each artificial neuron
connects with many other neurons in a neural network, and such millions of
connected neurons create a sophisticated cognitive structure.

Neural networks consist of a multilayer structure, containing one input layer, one or
more hidden layers, and one output layer. As each neuron is connected with another
neuron, it transfers data from one layer to the other neuron of the next layers. Finally,
data reaches the last layer or output layer of the neural network and generates
output.

Neural networks depend on training data to learn and improve their accuracy.
However, a perfectly trained & accurate neural network can cluster data quickly and
become a powerful machine learning and AI tool. One of the best-known neural
networks is Google's search algorithm.

Classification
Classification models are the second type of Supervised Learning techniques, which
are used to generate conclusions from observed values in the categorical form. For
example, the classification model can identify if the email is spam or not; a buyer will
purchase the product or not, etc. Classification algorithms are used to predict two
classes and categorize the output into different groups.

In classification, a classifier model is designed that classifies the dataset into different
categories, and each category is assigned a label.

There are two types of classifications in machine learning:

o Binary classification: If the problem has only two possible classes, called a
binary classifier. For example, cat or dog, Yes or No,
o Multi-class classification: If the problem has more than two possible classes,
it is a multi-class classifier.

Some popular classification algorithms are as below:

a) Logistic Regression

Logistic Regression is used to solve the classification problems in machine learning.


They are similar to linear regression but used to predict the categorical variables. It
can predict the output in either Yes or No, 0 or 1, True or False, etc. However, rather
than giving the exact values, it provides the probabilistic values between 0 & 1.

b) Support Vector Machine

Support vector machine or SVM is the popular machine learning algorithm, which is
widely used for classification and regression tasks. However, specifically, it is used to
solve classification problems. The main aim of SVM is to find the best decision
boundaries in an N-dimensional space, which can segregate data points into classes,
and the best decision boundary is known as Hyperplane. SVM selects the extreme
vector to find the hyperplane, and these vectors are known as support vectors.
c) Naïve Bayes

Naïve Bayes is another popular classification algorithm used in machine learning. It is


called so as it is based on Bayes theorem and follows the naïve(independent)
assumption between the features which is given as:

Each naïve Bayes classifier assumes that the value of a specific variable is
independent of any other variable/feature. For example, if a fruit needs to be
classified based on color, shape, and taste. So yellow, oval, and sweet will be
recognized as mango. Here each feature is independent of other features.

2. Unsupervised Machine learning models


Unsupervised Machine learning models implement the learning process opposite to
supervised learning, which means it enables the model to learn from the unlabeled
training dataset. Based on the unlabeled dataset, the model predicts the output.
Using unsupervised learning, the model learns hidden patterns from the dataset by
itself without any supervision.

Unsupervised learning models are mainly used to perform three tasks, which are as
follows:
o Clustering
Clustering is an unsupervised learning technique that involves clustering or
groping the data points into different clusters based on similarities and
differences. The objects with the most similarities remain in the same group,
and they have no or very few similarities from other groups.
Clustering algorithms can be widely used in different tasks such as Image
segmentation, Statistical data analysis, Market segmentation, etc.
Some commonly used Clustering algorithms are K-means Clustering,
hierarchal Clustering, DBSCAN, etc.

o Association Rule Learning


Association rule learning is an unsupervised learning technique, which finds
interesting relations among variables within a large dataset. The main aim of
this learning algorithm is to find the dependency of one data item on another
data item and map those variables accordingly so that it can generate
maximum profit. This algorithm is mainly applied in Market Basket analysis,
Web usage mining, continuous production, etc.
Some popular algorithms of Association rule learning are Apriori Algorithm,
Eclat, FP-growth algorithm.
o Dimensionality Reduction
The number of features/variables present in a dataset is known as the
dimensionality of the dataset, and the technique used to reduce the
dimensionality is known as the dimensionality reduction technique.
Although more data provides more accurate results, it can also affect the
performance of the model/algorithm, such as overfitting issues. In such cases,
dimensionality reduction techniques are used.
"It is a process of converting the higher dimensions dataset into lesser
dimensions dataset ensuring that it provides similar information."
Different dimensionality reduction methods such as PCA(Principal
Component Analysis), Singular Value Decomposition, etc.

Reinforcement Learning
In reinforcement learning, the algorithm learns actions for a given set of states that
lead to a goal state. It is a feedback-based learning model that takes feedback
signals after each state or action by interacting with the environment. This feedback
works as a reward (positive for each good action and negative for each bad action),
and the agent's goal is to maximize the positive rewards to improve their
performance.

The behavior of the model in reinforcement learning is similar to human learning, as


humans learn things by experiences as feedback and interact with the environment.

Below are some popular algorithms that come under reinforcement learning:

o Q-learning: Q-learning is one of the popular model-free algorithms of


reinforcement learning, which is based on the Bellman equation.

It aims to learn the policy that can help the AI agent to take the best action for
maximizing the reward under a specific circumstance. It incorporates Q values for
each state-action pair that indicate the reward to following a given state path, and it
tries to maximize the Q-value.

o State-Action-Reward-State-Action (SARSA): SARSA is an On-policy


algorithm based on the Markov decision process. It uses the action performed
by the current policy to learn the Q-value. The SARSA algorithm stands for
State Action Reward State Action, which symbolizes the tuple (s, a, r, s',
a').
o Deep Q Network: DQN or Deep Q Neural network is Q-learning within the
neural network. It is basically employed in a big state space environment
where defining a Q-table would be a complex task. So, in such a case, rather
than using Q-table, the neural network uses Q-values for each action based
on the state.

Training Machine Learning Models


Once the Machine learning model is built, it is trained in order to get the appropriate
results. To train a machine learning model, one needs a huge amount of pre-
processed data. Here pre-processed data means data in structured form with
reduced null values, etc. If we do not provide pre-processed data, then there are
huge chances that our model may perform terribly.

How to choose the best model?


In the above section, we have discussed different machine learning models and
algorithms. But one most confusing question that may arise to any beginner that
"which model should I choose?". So, the answer is that it depends mainly on the
business requirement or project requirement. Apart from this, it also depends on
associated attributes, the volume of the available dataset, the number of features,
complexity, etc. However, in practice, it is recommended that we always start with the
simplest model that can be applied to the particular problem and then gradually
enhance the complexity & test the accuracy with the help of parameter tuning and
cross-validation.

Difference between Machine learning


model and Algorithms
One of the most confusing questions among beginners is that are machine learning
models, and algorithms are the same? Because in various cases in machine learning
and data science, these two terms are used interchangeably.

The answer to this question is No, and the machine learning model is not the same
as an algorithm. In a simple way, an ML algorithm is like a procedure or method
that runs on data to discover patterns from it and generate the model. At the
same time, a machine learning model is like a computer program that generates
output or makes predictions. More specifically, when we train an algorithm with
data, it becomes a model.

1. Machine Learning ModelModel = Model Data + Prediction Algorithm

2. Machine Learning Books


3. Machine Learning is one of the most popular and hottest domains in the
computer science world. Machine Learning and Artificial Intelligence are
rapidly growing and providing incredible power to humans. It helps tasks to
run in an automated manner as well as helps to make our lives comfortable.
4.
5. Let's see how Google's CEO, Mr. Sundar Pichai explain Artificial Intelligence
(AI) and Machine Learning (ML)-
6. 'Machine learning is a core, transformative way by which we're rethinking
everything we're doing. We're thoughtfully applying it across all our products,
be it search, ads, YouTube, or Play. We're in the early days, but you'll see us in a
systematic way think about how we can apply machine learning to all these
areas.'
7. Google's CEO- Mr. Sundar Pichai
8. Backward Skip 10sPlay VideoForward Skip 10s
9. Although Machine learning is continuously growing and changing the way of
living, and also it is trending among all technologies still, we usually hear
about advanced implementations in the news that might be seen as very scary
and inaccessible. However, till now, no such invention is proved hazardous to
humans; rather, it is providing us more benefits are new opportunities.
10. In this article, ''Machine Learning Books,'' we will briefly discuss the resources
of the most popular book that will help you to start your journey from
beginner to advance level. If anyone is curious to know about the best
machine learning books, then this article will be very helpful for them. Here we
are going to discuss some of the best recently-published titles on deep
learning and machine learning.
11. 1. Hands-On Machine Learning with Scikit-
Learn and TensorFlow (2nd Edition) written by
Aurélien Géron
12.
13. Why should you read this book?
14. Aurelien Geron has shared his ideas and presented theory with examples in a
very effective manner. Everyone can learn concepts, tools, and techniques to
build an intelligent system quickly through this book. So, if you really want to
get started with a practical approach, then go ahead and just buy it instantly.
This book uses concrete examples, minimal theory, and two production-ready
Python frameworks (Scikit-Learn and TensorFlow 2.0), which help you to
gain knowledge of building an intelligent system. You can use concepts for
your interview as well as your job.
15. This book consists of two parts:
16. Part 1: The first part is Scikit-Learn which helps to understand basic machine
learning tasks such as simple Linear Regression.
17. Part 2: The second part has been significantly updated and employs Keras
and TensorFlow 2.0, which helps to understand the concepts of advanced
machine learning methods using Deep learning networks. Further, each
chapter ends with an exercise that helps you to apply the knowledge that
you've learned in the entire chapter and boost your confidence.
18. Where you can get this book:
19. You can get this book online from the Amazon marketplace or from any
store.
20. Amazon Link: https://www.amazon.in/Hands-Machine-Learning-Scikit-Learn-
TensorFlow-ebook/dp/B07XGF2G87
21.

22. 2. The Hundred-Page Machine Learning


Book written by Andriy Burkov
23.
24. Why should you read this book?
25. This book does not need too much introduction as this book is available in the
best seller category on the Amazon marketplace. This is unbelievable to
everyone that, unlike other typical 500-1000 pages Machine Learning
books, Andriy Burkov has just finished this book in 100 pages and also
explained the core concepts in just a few words. This book can be very helpful
for beginners in this industry as well as experts who want to enhance their
knowledge and want to gain a broad view in this field.
26. Where to buy: This book is distributed on the ''read first, buy later" principle,
which means first you can read this book online, and when you think this is
helpful, then you can buy it on the Amazon marketplace site.
27. Amazon Link: https://www.amazon.com/Hundred-Page-Machine-Learning-
Book-ebook/dp/B07MGCNKXB
28.

29. 3. Building Machine Learning Powered


Applications: Going from Idea to Product,
written by Emmanuel Ameisen
30.
31. Why should you read this book?
32. Emmanuel Ameisen has invested his 13 months on just 250 pages to write
this book which includes how to ship Machine Learning in practice. If you
want to learn the necessary skills to design, build and deploy applications
powered by machine learning, then this book can be very helpful, as it ends
with a hands-on exercise that builds your concepts from machine learning
models to production. This book is appreciated by all data scientists, software
engineers, product managers, and experts also due to the explanation of
machine learning applications in a good step-by-step manner. This book is
distributed in three parts. In the first part, you can learn how to plan a
Machine Learning model and measure success. In the 2nd part, you can learn
to build a machine learning model. In the 3rd part, you can learn methods to
improve the model to fulfill your original vision. Further, in the last or 4th part,
you can build your deployment and monitoring strategies.
33. Where to buy: This book is highly recommended by data scientists, software
engineers, and product managers. You can purchase this book
on Amazon or O'Reilly Shop.
34. Amazon Link: https://www.amazon.com/Building-Machine-Learning-
Powered-Applications/dp/149204511X/
35. O'Reilly Shop: https://www.oreilly.com/library/view/building-machine-
learning/9781492045106/
36.

37. 4. Grokking Deep Learning, written by


Andrew W. Trask
38.
39. Why should you read this book?
40. Grokking Deep Learning was written by Andrew W.Trask. In this book, Mr.
Andrew has described how to build deep learning neural network from
scratch. Using only Python and maths supporting libraries, NumPy, you will
train your own neural networks to see and understand images, translate text
into various languages and even write like William Shakespeare. When you're
done, you'll be fully prepared to move on to mastering deep learning
frameworks.
41. Where to buy: This book covers all the basic principles and approaches of
learning machine learning and neural networks using low-level building blocks
with NumPy. You can purchase this book on Amazon or Manning
Publications.
42. Amazon Link:https://www.amazon.com/Grokking-Deep-Learning-Andrew-
Trask/dp/1617293709
43. Manning Publications: https://www.manning.com/books/grokking-deep-
learning
44.

45. 5. Deep Learning with Python written by


Francois Chollet
46.
47. Why should you read this book?
48. This book consists of core concepts of deep learning using the python
language and Keras library. Francois Chollet, who is well known for the
creation of Keras and Google Artificial Intelligence researcher, wrote this book
with intuitive explanations and practical examples. This book helps you to
explore core concepts and their practical applications in computer vision, NLP,
and learning models. After completion of this book, you will get to know all
hands-on skills as well as a theoretical understanding of deep learning using
python language and libraries.
49. Where to buy: Readers should have basic Python skills before purchasing this
book. Further, if you are even a beginner in Keras, TensorFlow, and the
Machine Learning field, then this book can help you a lot. You can purchase
this book on Amazon marketplace, manning publications, or O'Reilly websites.
The links are given below:
50. Amazon Link: https://www.amazon.com/Deep-Learning-Python-Francois-
Chollet/dp/1617294438/
51. Manning Publications: https://www.manning.com/books/deep-learning-
with-python
52. O'Reilly: https://www.oreilly.com/library/view/deep-learning-with/
9781617294433/
53.

54. 6. Deep Learning written by Ian


Goodfellow, Yoshua Bengio, Aaron Courville:
55.
56. Why should you read this book?
57. This book is considered the Bible of Deep Learning, written by three
experts Ian Goodfellow, Yoshua Bengio, Aaron Courville. Although this
book is full of technical mathematics principles and authors, have explained
each concept in a perfect manner, but if you want to start your journey in a
deep learning journey, then this is not recommended. Because, to understand
all the concepts, first you need to build your Algebraic foundation, then only
you can consider this book.
58. This book has comprehensive mathematics and conceptual background in
Linear Algebra, Probability theory, information theory, numerical computation,
and Machine Learning. Along with deep learning techniques, the authors of
this book explained deep feedforward networks, regularization,
optimization algorithms, convolutional networks, sequence modeling,
and practical methodology in a very easy manner. Further, besides deep
learning technologies, you can enhance knowledge of various applications
such as natural language processing, speech recognition, computer vision,
online recommendation systems, bioinformatics, and videogames. This
book covers all theoretical topics such as autoencoders, representation
learning, structured probabilistic models, Monte Carlo methods, the partition
function, approximate inference, and deep generative models, etc.
59. Where to buy: This book can be very helpful for students as well as experts or
researchers who are planning to do some different in this industry. You can
purchase this book on Amazon.
60. Amazon Link: https://www.amazon.com/Deep-Learning-Adaptive-
Computation-Machine/dp/0262035618/
61.
62. 7. Reinforcement Learning: An Introduction
(2nd Edition) written by Richard S. Sutton,
Andrew G. Barto

63.
64. Why should you read this book?
65. This book is available in various categories such as Machine Learning,
Reinforcement Learning, Deep Learning, Deep Reinforcement Learning, and
Artificial Intelligence.
66. This book was written by Mr. Richard S. Sutton and Andrew G. Barto. If
Deep Learning book (mentioned above) is considered as the Bible of Deep
Learning, then this book is also considered as the Bible of Reinforcement
Learning. If you really want to start a career in the Reinforcement Learning
field, then this book can be very helpful for you.
67. In this book, the author has significantly explained their clear ideas on Artificial
Intelligence algorithms. Similar to the first edition, the second edition is also
focused on core learning algorithms such as UCB, Expected Sarsa, and Double
Learning. Further, this book is distributed in various parts, which includes
topics such as artificial neural networks, Fourier basis, policy gradient methods,
reinforcement learning's relationships to psychology and neuroscience,
AlphaGo, AlphaGo Zero, Atari game playing, and IBM Watson's wagering
strategy.
68. Where to buy: You can purchase this book on the Amazon marketplace and
also read free online on the below-given link.
69. Amazon link: https://www.amazon.com/dp/0262039249/
70. Read here free
PDF: https://web.stanford.edu/class/psych209/Readings/SuttonBartoIPRLBook
2ndEd.pdf
71.

72. 8. "Deep Reinforcement Learning Hands-


On (2nd Edition)" written by Maxim Lapan:

73.
74. Why should you read this book?
75. This book is written by Mr. Maxim Lapan and helps you to understand the
practical approaches of Reinforcement Learning with the help of balancing
theory, including coding practices. As per different reviews, if you really want
to gain hands-on experience with theoretical knowledge of reinforcement
learning, then this book is best suitable. This book is also available in various
categories such as Machine Learning, Reinforcement Learning, Deep Learning,
Deep Reinforcement Learning, and Artificial Intelligence.
76. Where to buy: You can purchase this book on Amazon or the Packt website.
77. Amazon link: https://www.amazon.com/Deep-Reinforcement-Learning-
Hands-optimization/dp/1838826998
78. Packt Link: https://www.packtpub.com/product/deep-reinforcement-
learning-hands-on/9781788834247
79.

80. 9. "Learning From Data" written by Yaser


S. Abu-Mostafa, Malik Magdon-Ismail, Hsuan-
Tien Lin.
81.
82. Why should you read this book?
83. This book is written by three authors Yaser S. Abu-Mostafa, Malik Magdon-
Ismail, and Hsuan-Tien Lin. If you really want to enhance your knowledge
about the core concepts of Machine Learning, then this is the best book to
follow.
84. This book contains the complete introduction of Machine Learning and is
freely available to access online. Machine Learning is employed in various
industries such as engineering, science, finance, and commerce, etc. This
technology helps you to enable a computational system and improve the
performance through old records. Hence, this book is designed as a crash
course of machine learning and contains core topics that really all students
and experts should know.
85. Where to Buy:
86. This book is available online for free access and designed in e-chapters, and
regularly updated with current trends in Machine Learning. You can purchase
this book on Amazon also.
87. Amazon Link: https://www.amazon.com/Learning-Data-Yaser-S-Abu-
Mostafa/dp/1600490069
88.

89. 10. "The Book of Why" written by Judea


Pearl, Dana Mackenzie:
90.
91. Why should you read this book?
92. This book is combinedly written by Judea Pearl, Dana Mackenzie, and it is
the most controversial book available on this list. In this book, the author
introduces the causality framework that prevails over curve-fitting Machine
Learning or Deep Learning models and also shares their thoughts to achieve
Artificial General Intelligence.
93. This book is based on the principle of "Correlation is not causation."
94. After reading this book, you will get to know how to manage and think about
an easy thing and how to answer hard questions. Further, this book shows us
the essence of human thought and the key to artificial intelligence.
95. Where to Buy:
96. If you want to enhance your thinking capability, then this book is probably the
best available book over the internet. You can purchase this book on Amazon.
97. Amazon
Link: https://www.amazon.com/Book-Why-Science-Cause-Effect/dp/0465097
60X

Linear Algebra for Machine learning


Machine learning has a strong connection with mathematics. Each machine learning
algorithm is based on the concepts of mathematics & also with the help of
mathematics, one can choose the correct algorithm by considering training time,
complexity, number of features, etc. Linear Algebra is an essential field of
mathematics, which defines the study of vectors, matrices, planes, mapping,
and lines required for linear transformation.

The term Linear Algebra was initially introduced in the early 18 th century to find out
the unknowns in Linear equations and solve the equation easily; hence it is an
important branch of mathematics that helps study data. Also, no one can deny that
Linear Algebra is undoubtedly the important and primary thing to process the
applications of Machine Learning. It is also a prerequisite to start learning Machine
Learning and data science.

Linear algebra plays a vital role and key foundation in machine learning , and it
enables ML algorithms to run on a huge number of datasets.

The concepts of linear algebra are widely used in developing algorithms in machine
learning. Although it is used almost in each concept of Machine learning, specifically,
it can perform the following task:

PlayNext
Unmute

Current Time 0:00

Duration 18:10
Loaded: 0.37%
Â
Fullscreen
Backward Skip 10sPlay VideoForward Skip 10s

o Optimization of data.
o Applicable in loss functions, regularisation, covariance matrices, Singular Value
Decomposition (SVD), Matrix Operations, and support vector machine
classification.
o Implementation of Linear Regression in Machine Learning.

Besides the above uses, linear algebra is also used in neural networks and the data
science field.

Basic mathematics principles and concepts like Linear algebra are the foundation of
Machine Learning and Deep Learning systems. To learn and understand Machine
Learning or Data Science, one needs to be familiar with linear algebra and
optimization theory. In this topic, we will explain all the Linear algebra concepts
required for machine learning.

Note: Although linear algebra is a must-know part of mathematics for machine learning,
it is not required to get intimate in this. It means it is not required to be an expert in linear
algebra; instead, only good knowledge of these concepts is more than enough for
machine learning.

Why learn Linear Algebra before learning


Machine Learning?
Linear Algebra is just similar to the flour of bakery in Machine Learning. As the cake is
based on flour similarly, every Machine Learning Model is also based on Linear
Algebra. Further, the cake also needs more ingredients like egg, sugar, cream, soda.
Similarly, Machine Learning also requires more concepts as vector calculus,
probability, and optimization theory. So, we can say that Machine Learning creates a
useful model with the help of the above-mentioned mathematical concepts.

Below are some benefits of learning Linear Algebra before Machine learning:

o Better Graphic experience


o Improved Statistics
o Creating better Machine Learning algorithms
o Estimating the forecast of Machine Learning
o Easy to Learn

Better Graphics Experience:


Linear Algebra helps to provide better graphical processing in Machine Learning like
Image, audio, video, and edge detection. These are the various graphical
representations supported by Machine Learning projects that you can work on.
Further, parts of the given data set are trained based on their categories by classifiers
provided by machine learning algorithms. These classifiers also remove the errors
from the trained data.

Moreover, Linear Algebra helps solve and compute large and complex data set
through a specific terminology named Matrix Decomposition Techniques. There
are two most popular matrix decomposition techniques, which are as follows:

o Q-R
o L-U

Improved Statistics:
Statistics is an important concept to organize and integrate data in Machine
Learning. Also, linear Algebra helps to understand the concept of statistics in a better
manner. Advanced statistical topics can be integrated using methods, operations,
and notations of linear algebra.

Creating better Machine Learning algorithms:


Linear Algebra also helps to create better supervised as well as unsupervised
Machine Learning algorithms.

Few supervised learning algorithms can be created using Linear Algebra, which is as
follows:

o Logistic Regression
o Linear Regression
o Decision Trees
o Support Vector Machines (SVM)

Further, below are some unsupervised learning algorithms listed that can also be
created with the help of linear algebra as follows:

o Single Value Decomposition (SVD)


o Clustering
o Components Analysis
With the help of Linear Algebra concepts, you can also self-customize the various
parameters in the live project and understand in-depth knowledge to deliver the
same with more accuracy and precision.

Estimating the forecast of Machine Learning:


If you are working on a Machine Learning project, then you must be a broad-minded
person and also, you will be able to impart more perspectives. Hence, in this regard,
you must increase the awareness and affinity of Machine Learning concepts. You can
begin with setting up different graphs, visualization, using various parameters for
diverse machine learning algorithms or taking up things that others around you
might find difficult to understand.

Easy to Learn:
Linear Algebra is an important department of Mathematics that is easy to
understand. It is taken into consideration whenever there is a requirement of
advanced mathematics and its applications.

Minimum Linear Algebra for Machine


Learning
Notation:
Notation in linear algebra enables you to read algorithm descriptions in papers,
books, and websites to understand the algorithm's working. Even if you use for-loops
rather than matrix operations, you will be able to piece things together.

Operations:
Working with an advanced level of abstractions in vectors and matrices can make
concepts clearer, and it can also help in the description, coding, and even thinking
capability. In linear algebra, it is required to learn the basic operations such as
addition, multiplication, inversion, transposing of matrices, vectors, etc.

Matrix Factorization:
One of the most recommended areas of linear algebra is matrix factorization,
specifically matrix deposition methods such as SVD and QR.
Examples of Linear Algebra in Machine
Learning
Below are some popular examples of linear algebra in Machine learning:

o Datasets and Data Files


o Linear Regression
o Recommender Systems
o One-hot encoding
o Regularization
o Principal Component Analysis
o Images and Photographs
o Singular-Value Decomposition
o Deep Learning
o Latent Semantic Analysis

1. Datasets and Data Files


Each machine learning project works on the dataset, and we fit the machine learning
model using this dataset.

Each dataset resembles a table-like structure consisting of rows and columns. Where
each row represents observations, and each column represents features/Variables.
This dataset is handled as a Matrix, which is a key data structure in Linear Algebra.

Further, when this dataset is divided into input and output for the supervised
learning model, it represents a Matrix(X) and Vector(y), where the vector is also an
important concept of linear algebra.

2. Images and Photographs


In machine learning, images/photographs are used for computer vision applications.
Each Image is an example of the matrix from linear algebra because an image is a
table structure consisting of height and width for each pixel.

Moreover, different operations on images, such as cropping, scaling, resizing, etc.,


are performed using notations and operations of Linear Algebra.

3. One Hot Encoding


In machine learning, sometimes, we need to work with categorical data. These
categorical variables are encoded to make them simpler and easier to work with, and
the popular encoding technique to encode these variables is known as one-hot
encoding.

In the one-hot encoding technique, a table is created that shows a variable with one
column for each category and one row for each example in the dataset. Further, each
row is encoded as a binary vector, which contains either zero or one value. This is an
example of sparse representation, which is a subfield of Linear Algebra.

4. Linear Regression
Linear regression is a popular technique of machine learning borrowed from
statistics. It describes the relationship between input and output variables and is
used in machine learning to predict numerical values. The most common way to
solve linear regression problems using Least Square Optimization is solved with the
help of Matrix factorization methods. Some commonly used matrix factorization
methods are LU decomposition, or Singular-value decomposition, which are the
concept of linear algebra.

5. Regularization
In machine learning, we usually look for the simplest possible model to achieve the
best outcome for the specific problem. Simpler models generalize well, ranging from
specific examples to unknown datasets. These simpler models are often considered
models with smaller coefficient values.

A technique used to minimize the size of coefficients of a model while it is being fit
on data is known as regularization. Common regularization techniques are L1 and L2
regularization. Both of these forms of regularization are, in fact, a measure of the
magnitude or length of the coefficients as a vector and are methods lifted directly
from linear algebra called the vector norm.

6. Principal Component Analysis


Generally, each dataset contains thousands of features, and fitting the model with
such a large dataset is one of the most challenging tasks of machine learning.
Moreover, a model built with irrelevant features is less accurate than a model built
with relevant features. There are several methods in machine learning that
automatically reduce the number of columns of a dataset, and these methods are
known as Dimensionality reduction. The most commonly used dimensionality
reductions method in machine learning is Principal Component Analysis or PCA. This
technique makes projections of high-dimensional data for both visualizations and
training models. PCA uses the matrix factorization method from linear algebra.

7. Singular-Value Decomposition
Singular-Value decomposition is also one of the popular dimensionality reduction
techniques and is also written as SVD in short form.

It is the matrix-factorization method of linear algebra, and it is widely used in


different applications such as feature selection, visualization, noise reduction, and
many more.

8. Latent Semantic Analysis


Natural Language Processing or NLP is a subfield of machine learning that works
with text and spoken words.

NLP represents a text document as large matrices with the occurrence of words. For
example, the matrix column may contain the known vocabulary words, and rows may
contain sentences, paragraphs, pages, etc., with cells in the matrix marked as the
count or frequency of the number of times the word occurred. It is a sparse matrix
representation of text. Documents processed in this way are much easier to compare,
query, and use as the basis for a supervised machine learning model.

This form of data preparation is called Latent Semantic Analysis, or LSA for short, and
is also known by the name Latent Semantic Indexing or LSI.

9. Recommender System
A recommender system is a sub-field of machine learning, a predictive modelling
problem that provides recommendations of products. For example, online
recommendation of books based on the customer's previous purchase history,
recommendation of movies and TV series, as we see in Amazon & Netflix.

The development of recommender systems is mainly based on linear algebra


methods. We can understand it as an example of calculating the similarity between
sparse customer behaviour vectors using distance measures such as Euclidean
distance or dot products.

Different matrix factorization methods such as singular-value decomposition are


used in recommender systems to query, search, and compare user data.

10. Deep Learning


Artificial Neural Networks or ANN are the non-linear ML algorithms that work to
process the brain and transfer information from one layer to another in a similar way.

Deep learning studies these neural networks, which implement newer and faster
hardware for the training and development of larger networks with a huge dataset.
All deep learning methods achieve great results for different challenging tasks such
as machine translation, speech recognition, etc. The core of processing neural
networks is based on linear algebra data structures, which are multiplied and added
together. Deep learning algorithms also work with vectors, matrices, tensors (matrix
with more than two dimensions) of inputs and coefficients for multiple dimensions.

Conclusion
In this topic, we have discussed Linear algebra, its role and its importance in machine
learning. For each machine learning enthusiast, it is very important to learn the basic
concepts of linear algebra to understand the working of ML algorithms and choose
the best algorithm for a specific problem.

Types of Machine Learning


Machine learning is a subset of AI, which enables the machine to automatically
learn from data, improve performance from past experiences, and make
predictions. Machine learning contains a set of algorithms that work on a huge
amount of data. Data is fed to these algorithms to train them, and on the basis of
training, they build the model & perform a specific task.
These ML algorithms help to solve different business problems like Regression,
Classification, Forecasting, Clustering, and Associations, etc.

Based on the methods and way of learning, machine learning is divided into mainly
four types, which are:

1. Supervised Machine Learning


2. Unsupervised Machine Learning
3. Semi-Supervised Machine Learning
4. Reinforcement Learning

In this topic, we will provide a detailed description of the types of Machine Learning
along with their respective algorithms:

PlayNext
Unmute

Current Time 0:00

Duration 18:10
Loaded: 0.37%
Â
Fullscreen
Backward Skip 10sPlay VideoForward Skip 10s

1. Supervised Machine Learning


As its name suggests, Supervised machine learning is based on supervision. It means
in the supervised learning technique, we train the machines using the "labelled"
dataset, and based on the training, the machine predicts the output. Here, the
labelled data specifies that some of the inputs are already mapped to the output.
More preciously, we can say; first, we train the machine with the input and
corresponding output, and then we ask the machine to predict the output using the
test dataset.

Let's understand supervised learning with an example. Suppose we have an input


dataset of cats and dog images. So, first, we will provide the training to the machine
to understand the images, such as the shape & size of the tail of cat and dog,
Shape of eyes, colour, height (dogs are taller, cats are smaller), etc. After
completion of training, we input the picture of a cat and ask the machine to identify
the object and predict the output. Now, the machine is well trained, so it will check
all the features of the object, such as height, shape, colour, eyes, ears, tail, etc., and
find that it's a cat. So, it will put it in the Cat category. This is the process of how the
machine identifies the objects in Supervised Learning.

The main goal of the supervised learning technique is to map the input
variable(x) with the output variable(y). Some real-world applications of supervised
learning are Risk Assessment, Fraud Detection, Spam filtering, etc.

Categories of Supervised Machine Learning


Supervised machine learning can be classified into two types of problems, which are
given below:

o Classification
o Regression

a) Classification

Classification algorithms are used to solve the classification problems in which the
output variable is categorical, such as "Yes" or No, Male or Female, Red or Blue,
etc. The classification algorithms predict the categories present in the dataset. Some
real-world examples of classification algorithms are Spam Detection, Email
filtering, etc.

Some popular classification algorithms are given below:

o Random Forest Algorithm


o Decision Tree Algorithm
o Logistic Regression Algorithm
o Support Vector Machine Algorithm

b) Regression

Regression algorithms are used to solve regression problems in which there is a


linear relationship between input and output variables. These are used to predict
continuous output variables, such as market trends, weather prediction, etc.

Some popular Regression algorithms are given below:

o Simple Linear Regression Algorithm


o Multivariate Regression Algorithm
o Decision Tree Algorithm
o Lasso Regression

Advantages and Disadvantages of Supervised


Learning
Advantages:

o Since supervised learning work with the labelled dataset so we can have an
exact idea about the classes of objects.
o These algorithms are helpful in predicting the output on the basis of prior
experience.

Disadvantages:

o These algorithms are not able to solve complex tasks.


o It may predict the wrong output if the test data is different from the training
data.
o It requires lots of computational time to train the algorithm.

Applications of Supervised Learning


Some common applications of Supervised Learning are given below:

o Image Segmentation:
Supervised Learning algorithms are used in image segmentation. In this
process, image classification is performed on different image data with pre-
defined labels.
o Medical Diagnosis:
Supervised algorithms are also used in the medical field for diagnosis
purposes. It is done by using medical images and past labelled data with
labels for disease conditions. With such a process, the machine can identify a
disease for the new patients.
o Fraud Detection - Supervised Learning classification algorithms are used for
identifying fraud transactions, fraud customers, etc. It is done by using historic
data to identify the patterns that can lead to possible fraud.
o Spam detection - In spam detection & filtering, classification algorithms are
used. These algorithms classify an email as spam or not spam. The spam
emails are sent to the spam folder.
o Speech Recognition - Supervised learning algorithms are also used in speech
recognition. The algorithm is trained with voice data, and various
identifications can be done using the same, such as voice-activated
passwords, voice commands, etc.

2. Unsupervised Machine Learning


Unsupervised learning is different from the Supervised learning technique; as its
name suggests, there is no need for supervision. It means, in unsupervised machine
learning, the machine is trained using the unlabeled dataset, and the machine
predicts the output without any supervision.

In unsupervised learning, the models are trained with the data that is neither
classified nor labelled, and the model acts on that data without any supervision.

The main aim of the unsupervised learning algorithm is to group or categories


the unsorted dataset according to the similarities, patterns, and
differences. Machines are instructed to find the hidden patterns from the input
dataset.

Let's take an example to understand it more preciously; suppose there is a basket of


fruit images, and we input it into the machine learning model. The images are totally
unknown to the model, and the task of the machine is to find the patterns and
categories of the objects.

So, now the machine will discover its patterns and differences, such as colour
difference, shape difference, and predict the output when it is tested with the test
dataset.
Categories of Unsupervised Machine Learning
Unsupervised Learning can be further classified into two types, which are given
below:

o Clustering
o Association

1) Clustering

The clustering technique is used when we want to find the inherent groups from the
data. It is a way to group the objects into a cluster such that the objects with the
most similarities remain in one group and have fewer or no similarities with the
objects of other groups. An example of the clustering algorithm is grouping the
customers by their purchasing behaviour.

Some of the popular clustering algorithms are given below:

o K-Means Clustering algorithm


o Mean-shift algorithm
o DBSCAN Algorithm
o Principal Component Analysis
o Independent Component Analysis

2) Association

Association rule learning is an unsupervised learning technique, which finds


interesting relations among variables within a large dataset. The main aim of this
learning algorithm is to find the dependency of one data item on another data item
and map those variables accordingly so that it can generate maximum profit. This
algorithm is mainly applied in Market Basket analysis, Web usage mining,
continuous production, etc.

Some popular algorithms of Association rule learning are Apriori Algorithm, Eclat,
FP-growth algorithm.

Advantages and Disadvantages of Unsupervised


Learning Algorithm
Advantages:
o These algorithms can be used for complicated tasks compared to the
supervised ones because these algorithms work on the unlabeled dataset.
o Unsupervised algorithms are preferable for various tasks as getting the
unlabeled dataset is easier as compared to the labelled dataset.

Disadvantages:

o The output of an unsupervised algorithm can be less accurate as the dataset is


not labelled, and algorithms are not trained with the exact output in prior.
o Working with Unsupervised learning is more difficult as it works with the
unlabelled dataset that does not map with the output.

Applications of Unsupervised Learning

o Network Analysis: Unsupervised learning is used for identifying plagiarism


and copyright in document network analysis of text data for scholarly articles.
o Recommendation Systems: Recommendation systems widely use
unsupervised learning techniques for building recommendation applications
for different web applications and e-commerce websites.
o Anomaly Detection: Anomaly detection is a popular application of
unsupervised learning, which can identify unusual data points within the
dataset. It is used to discover fraudulent transactions.
o Singular Value Decomposition: Singular Value Decomposition or SVD is
used to extract particular information from the database. For example,
extracting information of each user located at a particular location.

3. Semi-Supervised Learning
Semi-Supervised learning is a type of Machine Learning algorithm that lies
between Supervised and Unsupervised machine learning. It represents the
intermediate ground between Supervised (With Labelled training data) and
Unsupervised learning (with no labelled training data) algorithms and uses the
combination of labelled and unlabeled datasets during the training period.

Although Semi-supervised learning is the middle ground between supervised and


unsupervised learning and operates on the data that consists of a few labels, it
mostly consists of unlabeled data. As labels are costly, but for corporate purposes,
they may have few labels. It is completely different from supervised and
unsupervised learning as they are based on the presence & absence of labels.
To overcome the drawbacks of supervised learning and unsupervised learning
algorithms, the concept of Semi-supervised learning is introduced. The main aim
of semi-supervised learning is to effectively use all the available data, rather than
only labelled data like in supervised learning. Initially, similar data is clustered along
with an unsupervised learning algorithm, and further, it helps to label the unlabeled
data into labelled data. It is because labelled data is a comparatively more expensive
acquisition than unlabeled data.

We can imagine these algorithms with an example. Supervised learning is where a


student is under the supervision of an instructor at home and college. Further, if that
student is self-analysing the same concept without any help from the instructor, it
comes under unsupervised learning. Under semi-supervised learning, the student has
to revise himself after analyzing the same concept under the guidance of an
instructor at college.

Advantages and disadvantages of Semi-supervised


Learning
Advantages:

o It is simple and easy to understand the algorithm.


o It is highly efficient.
o It is used to solve drawbacks of Supervised and Unsupervised Learning
algorithms.

Disadvantages:

o Iterations results may not be stable.


o We cannot apply these algorithms to network-level data.
o Accuracy is low.

4. Reinforcement Learning
Reinforcement learning works on a feedback-based process, in which an AI
agent (A software component) automatically explore its surrounding by hitting
& trail, taking action, learning from experiences, and improving its
performance. Agent gets rewarded for each good action and get punished for each
bad action; hence the goal of reinforcement learning agent is to maximize the
rewards.
In reinforcement learning, there is no labelled data like supervised learning, and
agents learn from their experiences only.

The reinforcement learning process is similar to a human being; for example, a child
learns various things by experiences in his day-to-day life. An example of
reinforcement learning is to play a game, where the Game is the environment, moves
of an agent at each step define states, and the goal of the agent is to get a high
score. Agent receives feedback in terms of punishment and rewards.

Due to its way of working, reinforcement learning is employed in different fields such
as Game theory, Operation Research, Information theory, multi-agent systems.

A reinforcement learning problem can be formalized using Markov Decision


Process(MDP). In MDP, the agent constantly interacts with the environment and
performs actions; at each action, the environment responds and generates a new
state.

Categories of Reinforcement Learning


Reinforcement learning is categorized mainly into two types of methods/algorithms:

o Positive Reinforcement Learning: Positive reinforcement learning specifies


increasing the tendency that the required behaviour would occur again by
adding something. It enhances the strength of the behaviour of the agent and
positively impacts it.
o Negative Reinforcement Learning: Negative reinforcement learning works
exactly opposite to the positive RL. It increases the tendency that the specific
behaviour would occur again by avoiding the negative condition.

Real-world Use cases of Reinforcement Learning

o Video Games:
RL algorithms are much popular in gaming applications. It is used to gain
super-human performance. Some popular games that use RL algorithms
are AlphaGO and AlphaGO Zero.
o Resource Management:
The "Resource Management with Deep Reinforcement Learning" paper
showed that how to use RL in computer to automatically learn and schedule
resources to wait for different jobs in order to minimize average job
slowdown.
o Robotics:
RL is widely being used in Robotics applications. Robots are used in the
industrial and manufacturing area, and these robots are made more powerful
with reinforcement learning. There are different industries that have their
vision of building intelligent robots using AI and Machine learning technology.
o Text Mining
Text-mining, one of the great applications of NLP, is now being implemented
with the help of Reinforcement Learning by Salesforce company.

Advantages and Disadvantages of Reinforcement


Learning
Advantages

o It helps in solving complex real-world problems which are difficult to be


solved by general techniques.
o The learning model of RL is similar to the learning of human beings; hence
most accurate results can be found.
o Helps in achieving long term results.

Disadvantage

o RL algorithms are not preferred for simple problems.


o RL algorithms require huge data and computations.
o Too much reinforcement learning can lead to an overload of states which can
weaken the results.

The curse of dimensionality limits reinforcement learning for real physical systems.

Feature Engineering for Machine


Learning
Feature engineering is the pre-processing step of machine learning, which is
used to transform raw data into features that can be used for creating a
predictive model using Machine learning or statistical Modelling. Feature
engineering in machine learning aims to improve the performance of models. In this
topic, we will understand the details about feature engineering in Machine learning.
But before going into details, let's first understand what features are? And What is
the need for feature engineering?

What is a feature?
Generally, all machine learning algorithms take input data to generate the output.
The input data remains in a tabular form consisting of rows (instances or
observations) and columns (variable or attributes), and these attributes are often
known as features. For example, an image is an instance in computer vision, but a
line in the image could be the feature. Similarly, in NLP, a document can be an
observation, and the word count could be the feature. So, we can say a feature is an
attribute that impacts a problem or is useful for the problem.

What is Feature Engineering?


Feature engineering is the pre-processing step of machine learning, which
extracts features from raw data. It helps to represent an underlying problem to
predictive models in a better way, which as a result, improve the accuracy of the
model for unseen data. The predictive model contains predictor variables and an
outcome variable, and while the feature engineering process selects the most useful
predictor variables for the model.
Since 2016, automated feature engineering is also used in different machine learning
software that helps in automatically extracting features from raw data. Feature
engineering in ML contains mainly four processes: Feature Creation,
Transformations, Feature Extraction, and Feature Selection.

Backward Skip 10sPlay VideoForward Skip 10s

These processes are described as below:

1. Feature Creation: Feature creation is finding the most useful variables to be


used in a predictive model. The process is subjective, and it requires human
creativity and intervention. The new features are created by mixing existing
features using addition, subtraction, and ration, and these new features have
great flexibility.
2. Transformations: The transformation step of feature engineering involves
adjusting the predictor variable to improve the accuracy and performance of
the model. For example, it ensures that the model is flexible to take input of
the variety of data; it ensures that all the variables are on the same scale,
making the model easier to understand. It improves the model's accuracy and
ensures that all the features are within the acceptable range to avoid any
computational error.
3. Feature Extraction: Feature extraction is an automated feature engineering
process that generates new variables by extracting them from the raw data.
The main aim of this step is to reduce the volume of data so that it can be
easily used and managed for data modelling. Feature extraction methods
include cluster analysis, text analytics, edge detection algorithms, and
principal components analysis (PCA).
4. Feature Selection: While developing the machine learning model, only a few
variables in the dataset are useful for building the model, and the rest features
are either redundant or irrelevant. If we input the dataset with all these
redundant and irrelevant features, it may negatively impact and reduce the
overall performance and accuracy of the model. Hence it is very important to
identify and select the most appropriate features from the data and remove
the irrelevant or less important features, which is done with the help of feature
selection in machine learning. "Feature selection is a way of selecting the
subset of the most relevant features from the original features set by
removing the redundant, irrelevant, or noisy features."

Below are some benefits of using feature selection in machine learning:

o It helps in avoiding the curse of dimensionality.


o It helps in the simplification of the model so that the researchers can easily
interpret it.
o It reduces the training time.
o It reduces overfitting hence enhancing the generalization.

Need for Feature Engineering in Machine


Learning
In machine learning, the performance of the model depends on data pre-processing
and data handling. But if we create a model without pre-processing or data handling,
then it may not give good accuracy. Whereas, if we apply feature engineering on the
same model, then the accuracy of the model is enhanced. Hence, feature engineering
in machine learning improves the model's performance. Below are some points that
explain the need for feature engineering:

o Better features mean flexibility.


In machine learning, we always try to choose the optimal model to get good
results. However, sometimes after choosing the wrong model, still, we can get
better predictions, and this is because of better features. The flexibility in
features will enable you to select the less complex models. Because less
complex models are faster to run, easier to understand and maintain, which is
always desirable.
o Better features mean simpler models.
If we input the well-engineered features to our model, then even after
selecting the wrong parameters (Not much optimal), we can have good
outcomes. After feature engineering, it is not necessary to do hard for picking
the right model with the most optimized parameters. If we have good
features, we can better represent the complete data and use it to best
characterize the given problem.
o Better features mean better results.
As already discussed, in machine learning, as data we will provide will get the
same output. So, to obtain better results, we must need to use better features.

Steps in Feature Engineering


The steps of feature engineering may vary as per different data scientists and ML
engineers. However, there are some common steps that are involved in most
machine learning algorithms, and these steps are as follows:

o Data Preparation: The first step is data preparation. In this step, raw data
acquired from different resources are prepared to make it in a suitable format
so that it can be used in the ML model. The data preparation may contain
cleaning of data, delivery, data augmentation, fusion, ingestion, or loading.
o Exploratory Analysis: Exploratory analysis or Exploratory data analysis (EDA)
is an important step of features engineering, which is mainly used by data
scientists. This step involves analysis, investing data set, and summarization of
the main characteristics of data. Different data visualization techniques are
used to better understand the manipulation of data sources, to find the most
appropriate statistical technique for data analysis, and to select the best
features for the data.
o Benchmark: Benchmarking is a process of setting a standard baseline for
accuracy to compare all the variables from this baseline. The benchmarking
process is used to improve the predictability of the model and reduce the
error rate.

Feature Engineering Techniques


Some of the popular feature engineering techniques include:

1. Imputation
Feature engineering deals with inappropriate data, missing values, human
interruption, general errors, insufficient data sources, etc. Missing values within the
dataset highly affect the performance of the algorithm, and to deal with them
"Imputation" technique is used. Imputation is responsible for handling
irregularities within the dataset.

For example, removing the missing values from the complete row or complete
column by a huge percentage of missing values. But at the same time, to maintain
the data size, it is required to impute the missing data, which can be done as:

o For numerical data imputation, a default value can be imputed in a column,


and missing values can be filled with means or medians of the columns.
o For categorical data imputation, missing values can be interchanged with the
maximum occurred value in a column.

2. Handling Outliers
Outliers are the deviated values or data points that are observed too away from
other data points in such a way that they badly affect the performance of the model.
Outliers can be handled with this feature engineering technique. This technique first
identifies the outliers and then remove them out.

Standard deviation can be used to identify the outliers. For example, each value
within a space has a definite to an average distance, but if a value is greater distant
than a certain value, it can be considered as an outlier. Z-score can also be used to
detect outliers.

3. Log transform
Logarithm transformation or log transform is one of the commonly used
mathematical techniques in machine learning. Log transform helps in handling the
skewed data, and it makes the distribution more approximate to normal after
transformation. It also reduces the effects of outliers on the data, as because of the
normalization of magnitude differences, a model becomes much robust.

Note: Log transformation is only applicable for the positive values; else, it will give an
error. To avoid this, we can add 1 to the data before transformation, which ensures
transformation to be positive.

4. Binning
In machine learning, overfitting is one of the main issues that degrade the
performance of the model and which occurs due to a greater number of parameters
and noisy data. However, one of the popular techniques of feature engineering,
"binning", can be used to normalize the noisy data. This process involves segmenting
different features into bins.
5. Feature Split
As the name suggests, feature split is the process of splitting features intimately into
two or more parts and performing to make new features. This technique helps the
algorithms to better understand and learn the patterns in the dataset.

The feature splitting process enables the new features to be clustered and binned,
which results in extracting useful information and improving the performance of the
data models.

6. One hot encoding


One hot encoding is the popular encoding technique in machine learning. It is a
technique that converts the categorical data in a form so that they can be easily
understood by machine learning algorithms and hence can make a good prediction.
It enables group the of categorical data without losing any information.

Conclusion
In this topic, we have explained a detailed description of feature engineering in
machine learning, working of feature engineering, techniques, etc.

Although feature engineering helps in increasing the accuracy and performance of


the model, there are also other methods that can increase prediction accuracy.
Moreover, from the above-given techniques, there are many more available
techniques of feature engineering, but we have mentioned the most commonly used
techniques.

Top 10 Machine Learning Courses in


2021
One of the latest & booming technologies that are impacting most of the industrial
sectors is Machine Learning & AI. It is one of the most popular & exciting fields of
computer science that is emerging day by day. Chatbots, spam filtering, search
engines, fraud detection, etc., are some most amazing examples of how ML is
making human's life smoother. Due to its popularity & increasing demands among
companies, people are getting more excited about this technology and aiming to
learn it. If you also want to learn this technology without joining any university and
don't want to spend lots of money, then it is possible.
There are various amazing courses of Machine Learning available online, among
which some are absolutely free, and some are paid. Some popular platforms, such
as Coursera, Udemy, EdX, etc., provide online courses along with certification. These
courses are taught by renowned people from the best universities. You can easily
learn these courses online and can access them from anywhere. Some courses are
free; however, to take the certification, you might need to pay. These certification
courses help to learn the basics of machine learning and use them in projects and
help to become an ML expert.

In this topic, we are providing a list of the best machine learning courses. Some of
these courses are easy to start, while some may need some advanced aspects of
learning.

1. Machine Learning by Andrew Ng/ Machine Learning Course by Stanford


University (Coursera's best course)
2. Intro to Machine Learning by Udacity (FREE)
3. Machine Learning A-Z: Hands-On Python & R In Data Science (Udemy
best course)
4. Machine Learning Crash Course - Google AI
5. Machine Learning Courses-EdX
6. Introduction to Machine Learning for Coders - Fast.ai
7. Introduction to Machine Learning by Datacamp
8. Machine Learning Specialization by Coursera
9. Machine Learning by Python
10. Python for Data Science and Machine Learning Bootcamp

1. Machine Learning by Andrew Ng/


Machine Learning Course by Stanford
University (Coursera's best course)
One of the best & popular courses on Machine Learning on the Internet is a course
by Andrew Ng on Coursera. The course is offered by Stanford University on the
Coursera platform. This course is structured and taught by Andrew Ng, the world's
renowned expert, Stanford Professor, and co-founder of Coursera. This course has
approximately 4,330 425 learners worldwide with average ratings of 4.9 out of 5.

PlayNext
Unmute

Current Time 0:00

Duration 18:10
Loaded: 0.37%
Â
Fullscreen
Backward Skip 10sPlay VideoForward Skip 10s

This course helps to understand and learn the theory behind effective machine
learning techniques with practical implementation.

This course not only provides theoretical knowledge of machine learning techniques
but also teach how to apply these techniques practically by yourself.

In this course, after completing each topic, you will be tested, and after the
competition, the course final score will be given. In this course, you will get a detailed
description of the mathematics behind each ML algorithm.

Time to complete the Course: Approx. 55 Hours

Level: Beginner

Pre-requisites:

Ratings: 4.9/5
Cost: Free to Audit, Paid Certification.

Key Highlights of Course

o You will get Silicon Valley's best practices in innovation in the field of Machine
Learning and AI.
o The Course Structure contains topics from basic to advanced that start from
Introduction (Supervised and Unsupervised Learning), and covers Linear
Regression with One Variable, Linear Algebra Review, Logistic Regression,
Regularization, Neural Networks: Representation, Machine Learning System
Design, etc.
o Major skills to learn are Logistic Regression, Artificial Neural Networks skills.
Also, you will get learn to implement your own neural network for digit
recognition.
o Practical Implementation of different algorithms, and learn how to apply these
algorithms for building smart robots (perception, control), text understanding
(web search, anti-spam), computer vision, medical informatics, audio, database
mining, and other areas.
o You can learn complete courses from anywhere at any time online.

2. Intro to Machine Learning by Udacity


(FREE)
This course is one of the top courses of Machine learning that provides theoretical
and practical concepts of machine learning. One of the best things about this course
is that this course is taught by Sebastian, a man behind self-driving cars.

The course structure and way of representation makes machine learning even more
interesting to learn. Along with the concepts of ML, it also provides programming
knowledge of Python.

This course is freely available to learn, but no certification will be awarded.

Key Highlights of Course

o The course involves interactive quizzes that enable you to enhance your
knowledge of the topics covered.
o Join the student support community to exchange ideas and clarify doubts.
o It has a big community that any student can join to share his ideas and ask a
doubtful question.
o Anyone can learn it from anywhere at their convenience.
o Each enrolled student can get a one-on-one mentor, which means personal
career coaching is provided along with access to the student community.

3. Machine Learning A-Z: Hands-On Python


& R In Data Science (Udemy best course)
This machine learning course by Udemy is one of the best machine learning courses.
This course helps you to learn about machine learning algorithms. You can learn ML
in Python and R from two Data Science experts. This is a hands-on course and
includes a lot of code examples for you to practice. Students enrolled in the courses
are 799,851.

Time to complete the Course: Approx. 45 Hours

Level: Beginner

Pre-requisites:

Ratings: 4.5/5

Cost: Paid Course and Certification

Key Highlights of Course

o Great tutorial to get started with the topic with little or no prior experience.
o The Course structure contains different topics that start from Data
Preprocessing, Regression, Clustering, Association Rule Learning, Natural
Language Processing, Artificial Neural Networks, Dimensionality Reduction,
and other important concepts.
o You will get lifetime access to the course once purchased and accessible on
mobile & tv.
o A detailed explanation of each topic with theory as well as practical.
o This course is available in both Python and R programming languages. You
can also download templates and use them in your ML projects.
4. Machine Learning Crash Course - Google
AI
Machine Learning Crash Course is provided by Google AI education, which is a free
platform to learn about AI and Machine Learning key concepts. However, this course
is the best fit for those who want ML concepts to learn at a fast pace and want to
learn the basics of key ML concepts, which may take several hours. But if you are just
a beginner without any prior understanding of ML concepts, linear algebra, statistics,
etc., then this may little difficult for you to learn this course.

This crash course includes theoretical video lectures, practical exercises, real-world
examples, and hands-on practical implementation of examples. This course is taught
by Google experts who explain different key concepts of Machine learning.

Time to complete the Course: Approx. 15 Hours

Pre-requisites: Python Programming knowledge, must be comfortable with linear


equations, graphs of functions, histograms, and statistical means.

Cost: Free

Provider: Google AI

The Course structure has topics that start from Machine Learning Basics, and cover
Generalization, Training and Test Sets, Representation, Logistic Regression,
Classification, neural Networks, Embedding, ML Engineering.

Key highlights, of course.

o Interactive Video lectures with real-world Case studies.


o Visualization of Algorithm in action.
o Lectures on key ML concepts by Google Researchers.
o Covers the basics of ML in the best way and fast pace.
o The course is structured in a straightforward way, which you can learn at your
own pace and pre-knowledge.

5. Machine Learning Courses-EdX


EdX is one of the best platforms to learn different courses and technologies hosted
by different popular Universities such as Harvard, Columbia, etc., across the globe.
You can explore the top courses on Machine Learning, Data Science, AI, and also
other technologies. Each technology has Several courses by the world's best
institutions. Besides courses, it also provides Online Master's degree program, Micro-
Master's Program, etc.

If you search for the Machine Learning Program, you will get search results for
different courses, out of which most are free to audit but to gain a certificate, you
need to pay. Some popular courses on Data Science and Machine Learning are Data
Science from Harvard, Artificial Intelligence from Columbia, Python Data
Science from IBM, Machine Learning from Texas, and Data Science from
Microsoft, among a host of other courses. On each course, the timing is different,
and the mode is online.

Cost: Free to audit, Paid certification.

Provider: EdX platform collaboration with renowned institutions.

Duration: Approx. 9-12 weeks

Key Highlights of the Course

o One can freely audit the course on Machine learning and also on other
technology from renowned institutions.
o Explore the different courses and make a strong and deep understanding of
that.
o Video lectures with theory and practical implementations and knowledge
check.
o Also, get subtitles for each lecture.
o Course may archive after some time if you don't upgrade it.

6. Introduction to Machine Learning for


Coders - Fast.ai
Fast.ai is one of the best portals that provide courses on topics that come under
artificial intelligence and machine learning. This portal focuses on creating different
courses that enable you to make a good start in the field of AI.

Each course is structured in such a way that it covers all concepts from scratch and
focuses on learning by doing. You can choose the course best suited for you as per
your level of learning (beginner or experienced). So, if you are serious about getting
started in this area, then this will be the easiest way is to select a course from here.
"Introduction to machine learning for coders" focuses on the practical
implementation of each algorithm from scratch.

Course Highlights

o Each topic is explained in detail with the help of screenshots and examples.
o You will get the complete guide for the configuration of software and getting
started with the course.
o It allows you to join the forum, where you can communicate with other
learners and professionals and can help each other.
o Models are trained with the fast.ai library.
o One of the great things about this course is that it is available for free, and
other courses on this platform are also free.
o Duration: Self-paced

7. Introduction to Machine Learning by


Datacamp
Machine Learning Course with R by DataCamp is worth taking for those who are
good in R programming language and Statistics.

One important prerequisite for this course is that you need to have knowledge of the
R language. The course mainly focuses on providing useful knowledge on different
machine learning techniques.

The course is designed in an interactive and interesting way and free for some
content, however after some content, you need to pay for it.

The course involved mainly involved how does machine learning works, where to use
ML algorithms, the difference between AI and machine learning, etc. It also involves
information about machine learning models, deep learning, etc.

Course Highlights

o Provide information about how machine learning work, Workflow of ML


model, different steps to build a model, and also provides a comparison
between different ML techniques.
o Content is designed in an interactive way that makes learning simpler and fun.
o Hands-on exercises.
o The basic content of the course is available for free.

8. Machine Learning Specialization by


Coursera
Machine learning specialization by Coursera is an advanced level course that helps
you in creating solutions for complex problems of practical usage of Machine
learning. This course provides you with a certification of course completion after
completing the course and programming assignments of this course. This course is
structured in such a way that you can get the maximum benefit of learning.

As this is an advanced specialization course, hence it is required that you must have
basic or intermediate knowledge of Machine learning, Probability theory, Linear
algebra and calculus, and Python programming to enrol and understand this course.
So, it is suggested that if you are a beginner in machine learning, then first brush on
your maths & programming skill, and then move on to this course to complete your
learning.

Course Highlights

o This course enables us to resolve various machine learning problems with


complex input with the help of modern deep learning.
o This course helps us to participate in various competitions using effective
machine learning tools.
o This course helps us to enhance hands-on experiences in Data exploration,
preprocessing and feature engineering.
o After completion of this course, you can perform Bayesian inference,
understand Bayesian Neural Networks and Variational Autoencoders.
o This course helps you to create agents for games and other environments
using reinforcement learning methods.

9. Machine Learning with Python


Machine Learning by Python is one of the great courses for beginners that mainly
focuses on the Fundamentals of Machine Learning algorithms. The lectures are
represented in a very interesting way that includes slide animations and a great
explanation of algorithms.
The course mainly uses Python programming language to do the practical
implementation of algorithms. With each topic, you will get a chance to practice the
topics that you just learner on your own on Jupyter notebook.

Each notebook will enhance your knowledge and provide you with an understanding
that how to use these algorithms with real-world data.

Provider: IBM, Cognitive Class

Price: Free to audit, Paid Certification

Course Highlights

o Covers the fundamental concepts of Machine learning in a very intuitive way.


o The Course contains topics such as Introduction to Machine Learning,
Regression, Classification, Clustering, Recommender Systems, and
o There is practical knowledge provided in the course for each algorithm.
o For each algorithm, you will get to know its introduction, pros, cons, and
where to use it in real-world situations.
o Suitable for new learners to understand the broader context.
o It will let you understand the purpose of machine learning and where it is
being applied in the real world.

10. Python for Data Science and Machine


Learning Bootcamp(Udemy)
Python for Data Science and Machine Learning is one of the best top 10 courses of
Machine learning. This comprehensive course will show the path to use the Python
programming language for analyzing the data, creating visualizations, and powerful
ML algorithms.

This course is not only focused on Machine learning concepts but also help you to
start your career in the Data Science field.

According to indeed, Data Science job is ranked as number 1 job in Glassdoor


ranking with an average salary of $120,000. So if you really want to make a good
career start in the field of data science and Machine learning, then this course will be
the recommended course.

Time to complete the Course: Approx. 45 Hours


Level: Beginner

Pre-requisites: Basics of Python

Ratings: 4.7/5

Cost: Paid Course and Certification

Course Highlights

o Course start with Python crash course, so anyone can easily learn and
understand each concept of this course.
o Deep explanation of each concept throughout the complete course.
o You will be provided with written notes that would be very helpful in learning.
o It contains different exercises for practising each concept and also provide a
solution to check your knowledge and enhance your confidence.

Epoch in Machine Learning


In Machine Learning, whenever you want to train a model with some data,
then Epoch refers to one complete pass of the training dataset through the
algorithm. Moreover, it takes a few epochs while training a machine learning model,
but, in this scenario, you will face an issue while feeding a bunch of training data in
the model. This issue happens due to limitations of computer storage. To overcome
this issue, we have to break the training data into small batches according to the
computer memory or storage capacity. Then only we can train a machine learning
model by feeding these batches without any hassle. This process is called batch in
machine learning, and further, when all batches are fed exactly once to train the
model, then this entire procedure is known as Epoch in Machine Learning. In this
article, ''Epoch in Machine Learning'' we will briefly discuss the Epoch, batch, and
sample, etc. So let's start with the definition of the Epoch in Machine Learning.
What is Epoch in Machine Learning?
Epochs are defined as the total number of iterations for training the machine
learning model with all the training data in one cycle. In Epoch, all training data is
used exactly once. Further, in other words, Epoch can also be understood as the total
number of passes an algorithm has completed around the training dataset. A forward
and a backward pass together counted as one pass in training.

Usually, when a machine learning model is trained, then it requires a little number of
Epochs. An Epoch is often mixed up with iteration.

What is Iteration?
Iteration is defined as a total number of batches required to complete one epoch,
where a number of batches are equal to the total number of iteration for one epoch.

PlayNext
Unmute

Current Time 0:00

Duration 18:10
Loaded: 0.37%
Â
Fullscreen
Backward Skip 10sPlay VideoForward Skip 10s
Let's understand the iteration and epoch with an example, where we have 3000
training examples that we are going to use to train a machine learning model.

In the above scenario, we can break up the training dataset into sizeable batches. So
let's suppose we have considered the batches of 500 examples in each batch, then it
will take 6 iterations to complete 1 Epoch.

Mathematically, we can understand it as follows;

o Total number of training examples = 3000;


o Assume each batch size = 500;
o Then the total number of Iterations = Total number of training
examples/Individual batch size = 3000/500
o Total number of iterations = 6
o And 1 Epoch = 6 Iterations

Now, understand the Batch size in brief.

What is Batch in Machine Learning?


Before starting the introduction of Batch in machine learning, you must have one
thing in your mind that the batch size and the batch are two separate entities in
machine learning.

Batch size is defined as the total number of training examples that exist in a single
batch. You can understand batch with the above-mentioned example also, where we
have divided the entire training dataset/examples into different batches or sets or
parts.

Let's understand the concept of mixing up an Epoch and iteration with the below
example where we have considered 1000 datasets as shown in the below image.
In the above figure, we can understand this concept as follows:

o If the Batch size is 1000, then an epoch will complete in one iteration.
o If the Batch size is 500, then an epoch will complete in 2 iterations.

Similarly, if the batch size is too small or such as 100, then the epoch will be
complete in 10 iterations. So, as a result, we can conclude that for each epoch, the
required number of iterations times the batch size gives the number of data points.
However, we can use multiple numbers epochs for training the machine learning
model.

Key points about Epoch and Batch in


Machine Learning:
There are a few important points that everyone should keep in mind during training
a machine learning model. These are as follows:

o Epoch is a machine learning terminology that refers to the number of passes


the training data goes through machine learning algorithm during the entire
data points.
o If there is a large amount of data available, then you can divide entire data
sets into common groups or batches.
o The process of running one batch through the learning model is known as
iteration. In Machine Learning, one cycle in entire training data sets is called
an Epoch. However, in ideal conditions, one cycle in entire training data sets is
called an Epoch but training a model typically requires multiple numbers of
Epochs.
o Better generalization can be achieved with new inputs by using more epochs
in the training of the machine learning model.
o Given the complexity and variety of data in real-world applications, hundreds
to thousands of epochs may be required to achieve reasonable test data
correctness. Furthermore, the term epoch has several definitions depending
on the topic at hand.

Why use more than one Epoch?


It may not look correct that passing the entire dataset through an ML algorithm or
neural network is not enough, and we need to pass it multiple times to the same
algorithm.

So it needs to be kept in mind that to optimize the learning, we use gradient descent,
an iterative process. Hence, it is not enough to update the weights with a single pass
or one epoch.

Moreover, one epoch may lead to overfitting in the model.

Machine Learning with Anomaly


Detection
Anomaly detection is a process of finding those rare items, data points, events,
or observations that make suspicions by being different from the rest data
points or observations. Anomaly detection is also known as outlier detection.
Generally, anomalous data is related to some kind of problems such as bank fraud,
medical problems, malfunctioning equipment, etc.

Finding an anomaly is the ability to define what is normal? For example, in the below
image, the yellow vehicle is an anomaly among all red vehicles.

Types of Anomaly Detection


1. Point Anomaly

PlayNext
Unmute
Current Time 0:00

Duration 18:10
Loaded: 0.37%
Â
Fullscreen
Backward Skip 10sPlay VideoForward Skip 10s

A tuple within the dataset can be said as a Point anomaly if it is far away from the
rest of the data.

Example: An example of a point anomaly is a sudden transaction of a huge amount


from a credit card.

2. Contextual Anomaly

Contextual anomaly is also known as conditional outliers. If a particular observation is


different from other data points, then it is known as a contextual Anomaly. In such
types of anomalies, an anomaly in one context may not be an anomaly in another
context.

3. Collective Anomaly

Collective anomalies occur when a data point within a set is anomalous for the whole
dataset, and such values are known as collective outliers. In such anomalies, specific
or individual values are not anomalous as a whole or contextually.

Categories of Anomaly detection


techniques
Anomaly detection techniques are broadly categorized into three types:

1. Supervised Anomaly detection


2. Unsupervised Anomaly detection

Supervised Anomaly Detection


Supervised Anomaly detection needs the labeled training data, which contains both
normal and anomalous data for creating a predictive model. Some of the common
supervised methods are neural networks, support vector machines, k-nearest
neighbors, Bayesian networks, decision trees, etc.

K-nearest neighbor is one of the popular nonparametric techniques, which find the
approximate distance between different points on the input vector. This is one of the
best anomaly detection methods. Another popular model is the Bayesian network,
which is used for anomaly detection when combined with statistical schemes. This
model encodes a probabilistic relationship among variable interests.

Supervised anomaly detection techniques have different advantages, such as the


capability of encoding interdependencies between variables and of predicting events;
it also provides the ability to incorporate both prior knowledge and data.

Unsupervised Anomaly Detection


Unsupervised Anomaly detection does not require labeled training data. These
techniques are based on two assumptions, which are,

o Most of the network connections are from normal traffic, and only a small
amount of data is abnormal.
o Malicious traffic is statistically different from normal traffic.

On the basis of these assumptions, data clusters of similar data points that occur
frequently are assumed to be normal traffic, and those data groups that are
infrequent are considered abnormal or malicious.

Some of the common unsupervised anomaly detection algorithms are self-


organizing maps (SOM), K-means, C-means, expectation-maximization meta-
algorithm (EM), adaptive resonance theory (ART), and one-class support vector
machines. SOM, or Self-organizing map, is a popular technique that aims to reduce
the dimension of data visualization.

Anomaly detection can effectively help in catching the fraud, discovering strange
activity in large and complex Big Data sets. This can prove to be useful in areas such
as banking security, natural sciences, medicine, and marketing, which are prone to
malicious activities. With machine learning, an organization can intensify search and
increase the effectiveness of its digital business initiatives.

Need of Anomaly Detection


1. Anomaly detection for application performance
Application performance of any company can either generate or reduce workforce
productivity and revenue. General or traditional approaches for monitoring the
application performance allow to react to issues, but still business used to suffer, and
hence it affects the user. But with the help of anomaly detection using machine
learning, it is easy to identify and resolve the application performance issues before
they affect the business as well as users.

Anomaly detection using machine learning algorithms can simply correlate data with
corresponding application performance metrics and find out the complete
knowledge of the issue. There are different industries that also employ anomaly
detection techniques for their businesses, such as Telco, Adtech, etc.

2. Anomaly detection for product quality


It is not enough for product managers to trust another department for taking care of
required monitoring and alerts. It is always required for product managers to be able
to trust that product will work smoothly. It is because the product always needs
changes, from each version release to new feature upgradation, and generates
anomalies. If you don't properly monitor these anomalies, it may cause millions of
revenues lost and can also affect the brand reputation.

3. Anomaly detection for user experience


If you release a faulty version, you may experience a DDoS attack, risk of usage
lapses across customer experiences. So, it is required to react to such issues before
they impact user experience to reduce the chances of revenue loss.

Proactively streamlining and improving user experiences will help improve customer
satisfaction in a variety of industries, including Gaming, online business, etc.

Conclusion
In this topic, we have provided a detailed description of anomaly detection and its
use cases in business. Anomaly detection is very helpful in different business
applications such as Credit Card Fraud detection systems, Intrusion detection, etc.

Cost Function in Machine Learning


A Machine Learning model should have a very high level of accuracy in order to
perform well with real-world applications. But how to calculate the accuracy of the
model, i.e., how good or poor our model will perform in the real world? In such a
case, the Cost function comes into existence. It is an important machine learning
parameter to correctly estimate the model.

Cost function also plays a crucial role in understanding that how well your model
estimates the relationship between the input and output parameters.

In this topic, we will explain the cost function in Machine Learning, Gradient descent,
and types of cost functions.

What is Cost Function?


A cost function is an important parameter that determines how well a machine
learning model performs for a given dataset. It calculates the difference between
the expected value and predicted value and represents it as a single real number.

PlayNext
Unmute

Current Time 0:00

Duration 18:10
Loaded: 0.37%
Â
Fullscreen
Backward Skip 10sPlay VideoForward Skip 10s

In machine learning, once we train our model, then we want to see how well our
model is performing. Although there are various accuracy functions that tell you how
your model is performing, but will not give insights to improve them. So, we need a
function that can find when the model is most accurate by finding the spot between
the undertrained and overtrained model.

In simple, "Cost function is a measure of how wrong the model is in estimating


the relationship between X(input) and Y(output) Parameter." A cost function is
sometimes also referred to as Loss function, and it can be estimated by iteratively
running the model to compare estimated predictions against the known values of Y.

The main aim of each ML model is to determine parameters or weights that can
minimize the cost function.

Why use Cost Function?


While there are different accuracy parameters, then why do we need a Cost function
for the Machine learning model. So, we can understand it with an example of the
classification of data. Suppose we have a dataset that contains the height and
weights of cats & dogs, and we need to classify them accordingly. If we plot the
records using these two features, we will get a scatter plot as below:

In the above image, the green dots are cats, and the yellow dots are dogs. Below are
the three possible solutions for this classification problem.
In the above solutions, all three classifiers have high accuracy, but the third solution
is the best because it correctly classifies each datapoint. The reason behind the best
classification is that it is in mid between both the classes, not close or not far to any
of them.

To get such results, we need a Cost function. It means for getting the optimal
solution; we need a Cost function. It calculated the difference between the actual
values and predicted values and measured how wrong was our model in the
prediction. By minimizing the value of the cost function, we can get the optimal
solution.

Gradient Descent: Minimizing the cost


function
As we discussed in the above section, the cost function tells how wrong your model
is? And each machine learning model tries to minimize the cost function in order to
give the best results. Here comes the role of Gradient descent.

"Gradient Descent is an optimization algorithm which is used for optimizing the


cost function or error in the model." It enables the models to take the gradient or
direction to reduce the errors by reaching to least possible error. Here direction
refers to how model parameters should be corrected to further reduce the cost
function. The error in your model can be different at different points, and you have to
find the quickest way to minimize it, to prevent resource wastage.

Gradient descent is an iterative process where the model gradually converges


towards a minimum value, and if the model iterates further than this point, it
produces little or zero changes in the loss. This point is known as convergence, and
at this point, the error is least, and the cost function is optimized.

Below is the equation for gradient descent in linear regression:


In the gradient descent equation, alpha is known as the learning rate. This parameter
decides how fast you should move down to the slope. For large alpha, take big steps,
and for small alpha value, you need to take small steps.

Types of Cost Function


Cost functions can be of various types depending on the problem. However, mainly it
is of three types, which are as follows:

1. Regression Cost Function


2. Binary Classification cost Functions
3. Multi-class Classification Cost Function.

1. Regression Cost Function


Regression models are used to make a prediction for the continuous variables such
as the price of houses, weather prediction, loan predictions, etc. When a cost
function is used with Regression, it is known as the "Regression Cost Function." In
this, the cost function is calculated as the error based on the distance, such as:

1. Error= Actual Output-Predicted output

There are three commonly used Regression cost functions, which are as follows:

a. Means Error

In this type of cost function, the error is calculated for each training data, and then
the mean of all error values is taken.

It is one of the simplest ways possible.

The errors that occurred from the training data can be either negative or positive.
While finding mean, they can cancel out each other and result in the zero-mean error
for the model, so it is not recommended cost function for a model.

However, it provides a base for other cost functions of regression models.


b. Mean Squared Error (MSE)

Means Square error is one of the most commonly used Cost function methods. It
improves the drawbacks of the Mean error cost function, as it calculates the square
of the difference between the actual value and predicted value. Because of the
square of the difference, it avoids any possibility of negative error.

The formula for calculating MSE is given below:

Mean squared error is also known as L2 Loss.

In MSE, each error is squared, and it helps in reducing a small deviation in prediction
as compared to MAE. But if the dataset has outliers that generate more prediction
errors, then squaring of this error will further increase the error multiple times.
Hence, we can say MSE is less robust to outliers.

c. Mean Absolute Error (MAE)

Mean Absolute error also overcome the issue of the Mean error cost function by
taking the absolute difference between the actual value and predicted value.

The formula for calculating Mean Absolute Error is given below:

This means the Absolute error cost function is also known as L1 Loss. It is not
affected by noise or outliers, hence giving better results if the dataset has noise or
outlier.

2. Binary Classification Cost Functions


Classification models are used to make predictions of categorical variables, such as
predictions for 0 or 1, Cat or dog, etc. The cost function used in the classification
problem is known as the Classification cost function. However, the classification cost
function is different from the Regression cost function.

One of the commonly used loss functions for classification is cross-entropy loss.

The binary Cost function is a special case of Categorical cross-entropy, where there is
only one output class. For example, classification between red and blue.
To better understand it, let's suppose there is only a single output variable Y

1. Cross-entropy(D) = - y*log(p) when y = 1


2.
3. Cross-entropy(D) = - (1-y)*log(1-p) when y = 0

The error in binary classification is calculated as the mean of cross-entropy for all N
training data. Which means:

1. Binary Cross-Entropy = (Sum of Cross-Entropy for N data)/N

3. Multi-class Classification Cost Function


A multi-class classification cost function is used in the classification problems for
which instances are allocated to one of more than two classes. Here also, similar to
binary class classification cost function, cross-entropy or categorical cross-entropy is
commonly used cost function.

It is designed in a way that it can be used with multi-class classification with the
target values ranging from 0 to 1, 3, ….,n classes.

In a multi-class classification problem, cross-entropy will generate a score that


summarizes the mean difference between actual and anticipated probability
distribution.

For a perfect cross-entropy, the value should be zero when the score is minimized.

Bayes Theorem in Machine learning


Machine Learning is one of the most emerging technology of Artificial Intelligence.
We are living in the 21th century which is completely driven by new technologies and
gadgets in which some are yet to be used and few are on its full potential. Similarly,
Machine Learning is also a technology that is still in its developing phase. There are
lots of concepts that make machine learning a better technology such as supervised
learning, unsupervised learning, reinforcement learning, perceptron models, Neural
networks, etc. In this article "Bayes Theorem in Machine Learning", we will discuss
another most important concept of Machine Learning theorem i.e., Bayes Theorem.
But before starting this topic you should have essential understanding of this
theorem such as what exactly is Bayes theorem, why it is used in Machine Learning,
examples of Bayes theorem in Machine Learning and much more. So, let's start the
brief introduction of Bayes theorem.
Introduction to Bayes Theorem in Machine
Learning
Bayes theorem is given by an English statistician, philosopher, and Presbyterian
minister named Mr. Thomas Bayes in 17th century. Bayes provides their thoughts in
decision theory which is extensively used in important mathematics concepts as
Probability. Bayes theorem is also widely used in Machine Learning where we need to
predict classes precisely and accurately. An important concept of Bayes theorem
named Bayesian method is used to calculate conditional probability in Machine
Learning application that includes classification tasks. Further, a simplified version of
Bayes theorem (Naïve Bayes classification) is also used to reduce computation time
and average cost of the projects.

Bayes theorem is also known with some other name such as Bayes rule or Bayes
Law. Bayes theorem helps to determine the probability of an event with random
knowledge. It is used to calculate the probability of occurring one event while other
one already occurred. It is a best method to relate the condition probability and
marginal probability.

In simple words, we can say that Bayes theorem helps to contribute more accurate
results.

Backward Skip 10sPlay VideoForward Skip 10s

Bayes Theorem is used to estimate the precision of values and provides a method for
calculating the conditional probability. However, it is hypocritically a simple
calculation but it is used to easily calculate the conditional probability of events
where intuition often fails. Some of the data scientist assumes that Bayes theorem is
most widely used in financial industries but it is not like that. Other than financial,
Bayes theorem is also extensively applied in health and medical, research and survey
industry, aeronautical sector, etc.
What is Bayes Theorem?
Bayes theorem is one of the most popular machine learning concepts that helps to
calculate the probability of occurring one event with uncertain knowledge while
other one has already occurred.

Bayes' theorem can be derived using product rule and conditional probability of
event X with known event Y:

o According to the product rule we can express as the probability of event X


with known event Y as follows;

1. P(X ? Y)= P(X|Y) P(Y) {equation 1}

o Further, the probability of event Y with known event X:

1. P(X ? Y)= P(Y|X) P(X) {equation 2}

Mathematically, Bayes theorem can be expressed by combining both equations on


right hand side. We will get:

Here, both events X and Y are independent events which means probability of
outcome of both events does not depends one another.

The above equation is called as Bayes Rule or Bayes Theorem.

o P(X|Y) is called as posterior, which we need to calculate. It is defined as


updated probability after considering the evidence.
o P(Y|X) is called the likelihood. It is the probability of evidence when hypothesis
is true.
o P(X) is called the prior probability, probability of hypothesis before
considering the evidence
o P(Y) is called marginal probability. It is defined as the probability of evidence
under any consideration.
Hence, Bayes Theorem can be written as:

posterior = likelihood * prior / evidence

Prerequisites for Bayes Theorem


While studying the Bayes theorem, we need to understand few important concepts.
These are as follows:

1. Experiment

An experiment is defined as the planned operation carried out under controlled


condition such as tossing a coin, drawing a card and rolling a dice, etc.

2. Sample Space

During an experiment what we get as a result is called as possible outcomes and the
set of all possible outcome of an event is known as sample space. For example, if we
are rolling a dice, sample space will be:

S1 = {1, 2, 3, 4, 5, 6}

Similarly, if our experiment is related to toss a coin and recording its outcomes, then
sample space will be:

S2 = {Head, Tail}

3. Event

Event is defined as subset of sample space in an experiment. Further, it is also called


as set of outcomes.
Assume in our experiment of rolling a dice, there are two event A and B such that;

A = Event when an even number is obtained = {2, 4, 6}

B = Event when a number is greater than 4 = {5, 6}

o Probability of the event A ''P(A)''= Number of favourable outcomes / Total


number of possible outcomes
P(E) = 3/6 =1/2 =0.5
o Similarly, Probability of the event B ''P(B)''= Number of favourable
outcomes / Total number of possible outcomes
=2/6
=1/3
=0.333
o Union of event A and B:
A∪B = {2, 4, 5, 6}

o Intersection of event A and B:


A∩B= {6}

o Disjoint Event: If the intersection of the event A and B is an empty set or null
then such events are known as disjoint event or mutually exclusive
events also.

4. Random Variable:

It is a real value function which helps mapping between sample space and a real line
of an experiment. A random variable is taken on some random values and each value
having some probability. However, it is neither random nor a variable but it behaves
as a function which can either be discrete, continuous or combination of both.

5. Exhaustive Event:

As per the name suggests, a set of events where at least one event occurs at a time,
called exhaustive event of an experiment.

Thus, two events A and B are said to be exhaustive if either A or B definitely occur at
a time and both are mutually exclusive for e.g., while tossing a coin, either it will be a
Head or may be a Tail.

6. Independent Event:

Two events are said to be independent when occurrence of one event does not
affect the occurrence of another event. In simple words we can say that the
probability of outcome of both events does not depends one another.

Mathematically, two events A and B are said to be independent if:

P(A ∩ B) = P(AB) = P(A)*P(B)

7. Conditional Probability:
Conditional probability is defined as the probability of an event A, given that another
event B has already occurred (i.e. A conditional B). This is represented by P(A|B) and
we can define it as:

P(A|B) = P(A ∩ B) / P(B)

8. Marginal Probability:

Marginal probability is defined as the probability of an event A occurring


independent of any other event B. Further, it is considered as the probability of
evidence under any consideration.

P(A) = P(A|B)*P(B) + P(A|~B)*P(~B)

Here ~B represents the event that B does not occur.

How to apply Bayes Theorem or Bayes rule


in Machine Learning?
Bayes theorem helps us to calculate the single term P(B|A) in terms of P(A|B), P(B),
and P(A). This rule is very helpful in such scenarios where we have a good probability
of P(A|B), P(B), and P(A) and need to determine the fourth term.

Naïve Bayes classifier is one of the simplest applications of Bayes theorem which is
used in classification algorithms to isolate data as per accuracy, speed and classes.

Let's understand the use of Bayes theorem in machine learning with below example.

Suppose, we have a vector A with I attributes. It means


A = A1, A2, A3, A4……………Ai

Further, we have n classes represented as C1, C2, C3, C4…………Cn.

These are two conditions given to us, and our classifier that works on Machine
Language has to predict A and the first thing that our classifier has to choose will be
the best possible class. So, with the help of Bayes theorem, we can write it as:

P(Ci/A)= [ P(A/Ci) * P(Ci)] / P(A)

Here;

P(A) is the condition-independent entity.

P(A) will remain constant throughout the class means it does not change its value
with respect to change in class. To maximize the P(Ci/A), we have to maximize the
value of term P(A/Ci) * P(Ci).

With n number classes on the probability list let's assume that the possibility of any
class being the right answer is equally likely. Considering this factor, we can say that:

P(C1)=P(C2)-P(C3)=P(C4)=…..=P(Cn).

This process helps us to reduce the computation cost as well as time. This is how
Bayes theorem plays a significant role in Machine Learning and Naïve Bayes theorem
has simplified the conditional probability tasks without affecting the precision.
Hence, we can conclude that:

P(Ai/C)= P(A1/C)* P(A2/C)* P(A3/C)*……*P(An/C)

Hence, by using Bayes theorem in Machine Learning we can easily describe the
possibilities of smaller events.

What is Naïve Bayes Classifier in Machine


Learning
Naïve Bayes theorem is also a supervised algorithm, which is based on Bayes
theorem and used to solve classification problems. It is one of the most simple and
effective classification algorithms in Machine Learning which enables us to build
various ML models for quick predictions. It is a probabilistic classifier that means it
predicts on the basis of probability of an object. Some popular Naïve Bayes
algorithms are spam filtration, Sentimental analysis, and classifying articles.
Advantages of Naïve Bayes Classifier in Machine
Learning:

o It is one of the simplest and effective methods for calculating the conditional
probability and text classification problems.
o A Naïve-Bayes classifier algorithm is better than all other models where
assumption of independent predictors holds true.
o It is easy to implement than other models.
o It requires small amount of training data to estimate the test data which
minimize the training time period.
o It can be used for Binary as well as Multi-class Classifications.

Disadvantages of Naïve Bayes Classifier in


Machine Learning:
The main disadvantage of using Naïve Bayes classifier algorithms is, it limits the
assumption of independent predictors because it implicitly assumes that all
attributes are independent or unrelated but in real life it is not feasible to get
mutually independent attributes.

Conclusion
Though, we are living in technology world where everything is based on various new
technologies that are in developing phase but still these are incomplete in absence
of already available classical theorems and algorithms. Bayes theorem is also most
popular example that is used in Machine Learning. Bayes theorem has so many
applications in Machine Learning. In classification related problems, it is one of the
most preferred methods than all other algorithm. Hence, we can say that Machine
Learning is highly dependent on Bayes theorem. In this article, we have discussed
about Bayes theorem, how can we apply Bayes theorem in Machine Learning, Naïve
Bayes Classifier, etc.

Perceptron in Machine Learning


In Machine Learning and Artificial Intelligence, Perceptron is the most commonly
used term for all folks. It is the primary step to learn Machine Learning and Deep
Learning technologies, which consists of a set of weights, input values or scores, and
a threshold. Perceptron is a building block of an Artificial Neural Network.
Initially, in the mid of 19th century, Mr. Frank Rosenblatt invented the Perceptron for
performing certain calculations to detect input data capabilities or business
intelligence. Perceptron is a linear Machine Learning algorithm used for supervised
learning for various binary classifiers. This algorithm enables neurons to learn
elements and processes them one by one during preparation. In this tutorial,
"Perceptron in Machine Learning," we will discuss in-depth knowledge of Perceptron
and its basic functions in brief. Let's start with the basic introduction of Perceptron.

What is the Perceptron model in Machine


Learning?
Perceptron is Machine Learning algorithm for supervised learning of various binary
classification tasks. Further, Perceptron is also understood as an Artificial Neuron
or neural network unit that helps to detect certain input data computations in
business intelligence.

Perceptron model is also treated as one of the best and simplest types of Artificial
Neural networks. However, it is a supervised learning algorithm of binary classifiers.
Hence, we can consider it as a single-layer neural network with four main parameters,
i.e., input values, weights and Bias, net sum, and an activation function.

What is Binary classifier in Machine


Learning?
In Machine Learning, binary classifiers are defined as the function that helps in
deciding whether input data can be represented as vectors of numbers and belongs
to some specific class.
PlayNext
Unmute

Current Time 0:00

Duration 18:10
Loaded: 0.37%
Â
Fullscreen
Backward Skip 10sPlay VideoForward Skip 10s

Binary classifiers can be considered as linear classifiers. In simple words, we can


understand it as a classification algorithm that can predict linear predictor
function in terms of weight and feature vectors.

Basic Components of Perceptron


Mr. Frank Rosenblatt invented the perceptron model as a binary classifier which
contains three main components. These are as follows:

o Input Nodes or Input Layer:

This is the primary component of Perceptron which accepts the initial data into the
system for further processing. Each input node contains a real numerical value.

o Wight and Bias:


Weight parameter represents the strength of the connection between units. This is
another most important parameter of Perceptron components. Weight is directly
proportional to the strength of the associated input neuron in deciding the output.
Further, Bias can be considered as the line of intercept in a linear equation.

o Activation Function:

These are the final and important components that help to determine whether the
neuron will fire or not. Activation Function can be considered primarily as a step
function.

Types of Activation functions:

o Sign function
o Step function, and
o Sigmoid function

The data scientist uses the activation function to take a subjective decision based on
various problem statements and forms the desired outputs. Activation function may
differ (e.g., Sign, Step, and Sigmoid) in perceptron models by checking whether the
learning process is slow or has vanishing or exploding gradients.

How does Perceptron work?


In Machine Learning, Perceptron is considered as a single-layer neural network that
consists of four main parameters named input values (Input nodes), weights and Bias,
net sum, and an activation function. The perceptron model begins with the
multiplication of all input values and their weights, then adds these values together
to create the weighted sum. Then this weighted sum is applied to the activation
function 'f' to obtain the desired output. This activation function is also known as
the step function and is represented by 'f'.
This step function or Activation function plays a vital role in ensuring that output is
mapped between required values (0,1) or (-1,1). It is important to note that the
weight of input is indicative of the strength of a node. Similarly, an input's bias value
gives the ability to shift the activation function curve up or down.

Perceptron model works in two important steps as follows:

Step-1

In the first step first, multiply all input values with corresponding weight values and
then add them to determine the weighted sum. Mathematically, we can calculate the
weighted sum as follows:

∑wi*xi = x1*w1 + x2*w2 +…wn*xn

Add a special term called bias 'b' to this weighted sum to improve the model's
performance.

∑wi*xi + b

Step-2

In the second step, an activation function is applied with the above-mentioned


weighted sum, which gives us output either in binary form or a continuous value as
follows:

Y = f(∑wi*xi + b)

Types of Perceptron Models


Based on the layers, Perceptron models are divided into two types. These are as
follows:

1. Single-layer Perceptron Model


2. Multi-layer Perceptron model

Single Layer Perceptron Model:


This is one of the easiest Artificial neural networks (ANN) types. A single-layered
perceptron model consists feed-forward network and also includes a threshold
transfer function inside the model. The main objective of the single-layer perceptron
model is to analyze the linearly separable objects with binary outcomes.

In a single layer perceptron model, its algorithms do not contain recorded data, so it
begins with inconstantly allocated input for weight parameters. Further, it sums up all
inputs (weight). After adding all inputs, if the total sum of all inputs is more than a
pre-determined value, the model gets activated and shows the output value as +1.

If the outcome is same as pre-determined or threshold value, then the performance


of this model is stated as satisfied, and weight demand does not change. However,
this model consists of a few discrepancies triggered when multiple weight inputs
values are fed into the model. Hence, to find desired output and minimize errors,
some changes should be necessary for the weights input.

"Single-layer perceptron can learn only linearly separable patterns."

Multi-Layered Perceptron Model:


Like a single-layer perceptron model, a multi-layer perceptron model also has the
same model structure but has a greater number of hidden layers.

The multi-layer perceptron model is also known as the Backpropagation algorithm,


which executes in two stages as follows:

o Forward Stage: Activation functions start from the input layer in the forward
stage and terminate on the output layer.
o Backward Stage: In the backward stage, weight and bias values are modified
as per the model's requirement. In this stage, the error between actual output
and demanded originated backward on the output layer and ended on the
input layer.
Hence, a multi-layered perceptron model has considered as multiple artificial neural
networks having various layers in which activation function does not remain linear,
similar to a single layer perceptron model. Instead of linear, activation function can
be executed as sigmoid, TanH, ReLU, etc., for deployment.

A multi-layer perceptron model has greater processing power and can process linear
and non-linear patterns. Further, it can also implement logic gates such as AND, OR,
XOR, NAND, NOT, XNOR, NOR.

Advantages of Multi-Layer Perceptron:

o A multi-layered perceptron model can be used to solve complex non-linear


problems.
o It works well with both small and large input data.
o It helps us to obtain quick predictions after the training.
o It helps to obtain the same accuracy ratio with large as well as small data.

Disadvantages of Multi-Layer Perceptron:

o In Multi-layer perceptron, computations are difficult and time-consuming.


o In multi-layer Perceptron, it is difficult to predict how much the dependent
variable affects each independent variable.
o The model functioning depends on the quality of the training.

Perceptron Function
Perceptron function ''f(x)'' can be achieved as output by multiplying the input 'x' with
the learned weight coefficient 'w'.

Mathematically, we can express it as follows:

f(x)=1; if w.x+b>0

otherwise, f(x)=0

o 'w' represents real-valued weights vector


o 'b' represents the bias
o 'x' represents a vector of input x values.

Characteristics of Perceptron
The perceptron model has the following characteristics.

1. Perceptron is a machine learning algorithm for supervised learning of binary


classifiers.
2. In Perceptron, the weight coefficient is automatically learned.
3. Initially, weights are multiplied with input features, and the decision is made
whether the neuron is fired or not.
4. The activation function applies a step rule to check whether the weight
function is greater than zero.
5. The linear decision boundary is drawn, enabling the distinction between the
two linearly separable classes +1 and -1.
6. If the added sum of all input values is more than the threshold value, it must
have an output signal; otherwise, no output will be shown.

Limitations of Perceptron Model


A perceptron model has limitations as follows:

o The output of a perceptron can only be a binary number (0 or 1) due to the


hard limit transfer function.
o Perceptron can only be used to classify the linearly separable sets of input
vectors. If input vectors are non-linear, it is not easy to classify them properly.

Future of Perceptron
The future of the Perceptron model is much bright and significant as it helps to
interpret data by building intuitive patterns and applying them in the future. Machine
learning is a rapidly growing technology of Artificial Intelligence that is continuously
evolving and in the developing phase; hence the future of perceptron technology will
continue to support and facilitate analytical behavior in machines that will, in turn,
add to the efficiency of computers.

The perceptron model is continuously becoming more advanced and working


efficiently on complex problems with the help of artificial neurons.

Conclusion:
In this article, you have learned how Perceptron models are the simplest type of
artificial neural network which carries input and their weights, the sum of all
weighted input, and an activation function. Perceptron models are continuously
contributing to Artificial Intelligence and Machine Learning, and these models are
becoming more advanced. Perceptron enables the computer to work more efficiently
on complex problems using various Machine Learning technologies. The Perceptrons
are the fundamentals of artificial neural networks, and everyone should have in-
depth knowledge of perceptron models to study deep neural networks.

Entropy in Machine Learning


We are living in a technology world, and somewhere everything is related to
technology. Machine Learning is also the most popular technology in the computer
science world that enables the computer to learn automatically from past
experiences.

Also, Machine Learning is so much demanded in the IT world that most companies
want highly skilled machine learning engineers and data scientists for their business.
Machine Learning contains lots of algorithms and concepts that solve complex
problems easily, and one of them is entropy in Machine Learning. Almost everyone
must have heard the Entropy word once during their school or college days in
physics and chemistry. The base of entropy comes from physics, where it is defined
as the measurement of disorder, randomness, unpredictability, or impurity in the
system. In this article, we will discuss what entropy is in Machine Learning and why
entropy is needed in Machine Learning. So let's start with a quick introduction to the
entropy in Machine Learning.

Introduction to Entropy in Machine


Learning
Entropy is defined as the randomness or measuring the disorder of the information
being processed in Machine Learning. Further, in other words, we can say
that entropy is the machine learning metric that measures the unpredictability
or impurity in the system.

When information is processed in the system, then every piece of information has a
specific value to make and can be used to draw conclusions from it. So if it is easier
to draw a valuable conclusion from a piece of information, then entropy will be lower
in Machine Learning, or if entropy is higher, then it will be difficult to draw any
conclusion from that piece of information.

PlayNext
Unmute

Current Time 0:00

Duration 18:10
Loaded: 0.37%
Â
Fullscreen
Backward Skip 10sPlay VideoForward Skip 10s

Entropy is a useful tool in machine learning to understand various concepts such as


feature selection, building decision trees, and fitting classification models, etc. Being
a machine learning engineer and professional data scientist, you must have in-depth
knowledge of entropy in machine learning.

What is Entropy in Machine Learning


Entropy is the measurement of disorder or impurities in the information processed in
machine learning. It determines how a decision tree chooses to split data.
We can understand the term entropy with any simple example: flipping a coin. When
we flip a coin, then there can be two outcomes. However, it is difficult to conclude
what would be the exact outcome while flipping a coin because there is no direct
relation between flipping a coin and its outcomes. There is a 50% probability of both
outcomes; then, in such scenarios, entropy would be high. This is the essence of
entropy in machine learning.

Mathematical Formula for Entropy


Consider a data set having a total number of N classes, then the entropy (E) can be
determined with the formula below:

Where;

Pi = Probability of randomly selecting an example in class I;

Entropy always lies between 0 and 1, however depending on the number of classes in
the dataset, it can be greater than 1. But the high value of

Let's understand it with an example where we have a dataset having three colors of
fruits as red, green, and yellow. Suppose we have 2 red, 2 green, and 4 yellow
observations throughout the dataset. Then as per the above equation:

E=−(prlog2pr+pplog2pp+pylog2py)

Where;

Pr = Probability of choosing red fruits;


Pg = Probability of choosing green fruits and;

Py = Probability of choosing yellow fruits.

Pr = 2/8 =1/4 [As only 2 out of 8 datasets represents red fruits]

Pg = 2/8 =1/4 [As only 2 out of 8 datasets represents green fruits]

Py = 4/8 = 1/2 [As only 4 out of 8 datasets represents yellow fruits]

Now our final equation will be such as;

So, entropy will be 1.5.

Let's consider a case when all observations belong to the same class; then entropy
will always be 0.

E=−(1log21)

=0

When entropy becomes 0, then the dataset has no impurity. Datasets with 0
impurities are not useful for learning. Further, if the entropy is 1, then this kind of
dataset is good for learning.
What is a Decision Tree in Machine
Learning?
A decision tree is defined as the supervised learning algorithm used for classification
as well as regression problems. However, it is primarily used for solving classification
problems. Its structure is similar to a tree where internal nodes represent the features
of the dataset, branches of the tree represent the decision rules, and leaf nodes as an
outcome.

Decision trees are used to predict an outcome based on historical data. The decision
tree works on the sequence of 'if-then-else' statements and a root which is our
initial problem to solve.

Terminologies used in Decision Tree:


Leaf Node: Leaf node is defined as the output of decision nodes, but if they do not
contain any branch, it means the tree cannot be segregated further from this node.

Root Node: As the name suggests, a root node is the origin point of any decision
tree. It contains the entire data set, which gets divided further into two or more sub-
sets. This node includes multiple branches and is used to make any decision in
classification problems.

Splitting: It is a process that divides the root node into multiple sub-nodes under
some defined conditions.

Branches: Branches are formed by splitting the root node or decision node.

Pruning: Pruning is defined as the process of removing unwanted branches from the
tree.

Parent Node: The root node in a decision tree is called the parent node.

Child Node: Except for the root node, all other nodes are called child nodes in the
decision tree.

Use of Entropy in Decision Tree


In decision trees, heterogeneity in the leaf node can be reduced by using the cost
function. At the root level, the entropy of the target column can be determined by
the Shannon formula, in which Mr. Shannon has described the weighted entropy as
the entropy calculated for the target column at every branch. However, in simple
words, you can understand the weighted entropy as the individual weight of each
attribute. Further, weights are considered as the probability of each class individually.
The more the decrease in entropy, the more information is gained.

What is the information gain in Entropy?


Information gain is defined as the pattern observed in the dataset and reduction in
the entropy.

Mathematically, information gain can be expressed with the below formula:

Information Gain = (Entropy of parent node)-(Entropy of child node)

Note: Information gain is calculated as 1-Entropy.

Let's understand it with an example having three scenarios as follows:

Entropy Information Gain

Scenario 1 0.7812345 0.2187655

Scenario 2 0 1

Scenario 3 1 0

Let's say we have a tree with a total of four values at the root node that is split into
the first level having one value in one branch (say, Branch 1) and three values in the
other branch (Branch 2). The entropy at the root node is 1.

Now, to compute the entropy at the child node 1, the weights are taken as ? for
Branch 1 and ? for Branch 2 and are calculated using Shannon's entropy formula. As
we had seen above, the entropy for child node 2 is zero because there is only one
value in that child node, meaning there is no uncertainty, and hence, the
heterogeneity is not present.

H(X) = - [(1/3 * log2 (1/3)) + (2/3 * log2 (2/3))] = 0.9184

The information gain for the above case is the reduction in the weighted average of
the entropy.

Information Gain = 1 - ( ¾ * 0.9184) - (¼ *0) = 0.3112

The more the entropy is removed, the greater the information gain. The higher the
information gain, the better the split.

How to build decision trees using


information gain:
After understanding the concept of information gain and entropy individually now,
we can easily build a decision tree. See steps to build a decision tree using
information gain:

1. An attribute with the highest information gain from a set should be selected
as the parent (root) node. From the image below, it is attributed A.

2. Build child nodes for every value of attribute A.


3. Repeat iteratively until you finish constructing the whole tree.

Advantages of the Decision Tree:

o A decision tree can be easily understandable as it follows the same process of


human thinking while making any decision.
o It is used to solve any decision-related problem in machine learning.
o It helps in finding out all the possible outcomes for a problem.
o There is less requirement for data cleaning compared to other algorithms.

Issues in Machine Learning


"Machine Learning" is one of the most popular technology among all data scientists
and machine learning enthusiasts. It is the most effective Artificial Intelligence
technology that helps create automated learning systems to take future decisions
without being constantly programmed. It can be considered an algorithm that
automatically constructs various computer software using past experience and
training data. It can be seen in every industry, such as healthcare, education, finance,
automobile, marketing, shipping, infrastructure, automation, etc. Almost all big
companies like Amazon, Facebook, Google, Adobe, etc., are using various machine
learning techniques to grow their businesses. But everything in this world has bright
as well as dark sides. Similarly, Machine Learning offers great opportunities, but some
issues need to be solved.

This article will discuss some major practical issues and their business
implementation, and how we can overcome them. So let's start with a quick
introduction to Machine Learning.

What is Machine Learning?


Machine Learning is defined as the study of computer algorithms for
automatically constructing computer software through past experience and
training data.

It is a branch of Artificial Intelligence and computer science that helps build a model
based on training data and make predictions and decisions without being constantly
programmed. Machine Learning is used in various applications such as email
filtering, speech recognition, computer vision, self-driven cars, Amazon product
recommendation, etc.

PlayNext
Unmute

Current Time 0:00

Duration 18:10
Loaded: 0.37%
Â
Fullscreen
Backward Skip 10sPlay VideoForward Skip 10s

Commonly used Algorithms in Machine Learning


Machine Learning is the study of learning algorithms using past experience and
making future decisions. Although, Machine Learning has a variety of models, here is
a list of the most commonly used machine learning algorithms by all data scientists
and professionals in today's world.

o Linear Regression
o Logistic Regression
o Decision Tree
o Bayes Theorem and Naïve Bayes Classification
o Support Vector Machine (SVM) Algorithm
o K-Nearest Neighbor (KNN) Algorithm
o K-Means
o Gradient Boosting algorithms
o Dimensionality Reduction Algorithms
o Random Forest
Common issues in Machine Learning
Although machine learning is being used in every industry and helps organizations
make more informed and data-driven choices that are more effective than classical
methodologies, it still has so many problems that cannot be ignored. Here are some
common issues in Machine Learning that professionals face to inculcate ML skills and
create an application from scratch.

1. Inadequate Training Data


The major issue that comes while using machine learning algorithms is the lack of
quality as well as quantity of data. Although data plays a vital role in the processing
of machine learning algorithms, many data scientists claim that inadequate data,
noisy data, and unclean data are extremely exhausting the machine learning
algorithms. For example, a simple task requires thousands of sample data, and an
advanced task such as speech or image recognition needs millions of sample data
examples. Further, data quality is also important for the algorithms to work ideally,
but the absence of data quality is also found in Machine Learning applications. Data
quality can be affected by some factors as follows:

o Noisy Data- It is responsible for an inaccurate prediction that affects the


decision as well as accuracy in classification tasks.
o Incorrect data- It is also responsible for faulty programming and results
obtained in machine learning models. Hence, incorrect data may affect the
accuracy of the results also.
o Generalizing of output data- Sometimes, it is also found that generalizing
output data becomes complex, which results in comparatively poor future
actions.

2. Poor quality of data


As we have discussed above, data plays a significant role in machine learning, and it
must be of good quality as well. Noisy data, incomplete data, inaccurate data, and
unclean data lead to less accuracy in classification and low-quality results. Hence,
data quality can also be considered as a major common problem while processing
machine learning algorithms.

3. Non-representative training data


To make sure our training model is generalized well or not, we have to ensure that
sample training data must be representative of new cases that we need to generalize.
The training data must cover all cases that are already occurred as well as occurring.

Further, if we are using non-representative training data in the model, it results in


less accurate predictions. A machine learning model is said to be ideal if it predicts
well for generalized cases and provides accurate decisions. If there is less training
data, then there will be a sampling noise in the model, called the non-representative
training set. It won't be accurate in predictions. To overcome this, it will be biased
against one class or a group.

Hence, we should use representative data in training to protect against being biased
and make accurate predictions without any drift.

4. Overfitting and Underfitting


Overfitting:

Overfitting is one of the most common issues faced by Machine Learning engineers
and data scientists. Whenever a machine learning model is trained with a huge
amount of data, it starts capturing noise and inaccurate data into the training data
set. It negatively affects the performance of the model. Let's understand with a
simple example where we have a few training data sets such as 1000 mangoes, 1000
apples, 1000 bananas, and 5000 papayas. Then there is a considerable probability of
identification of an apple as papaya because we have a massive amount of biased
data in the training data set; hence prediction got negatively affected. The main
reason behind overfitting is using non-linear methods used in machine learning
algorithms as they build non-realistic data models. We can overcome overfitting by
using linear and parametric algorithms in the machine learning models.

Methods to reduce overfitting:

o Increase training data in a dataset.


o Reduce model complexity by simplifying the model by selecting one with
fewer parameters
o Ridge Regularization and Lasso Regularization
o Early stopping during the training phase
o Reduce the noise
o Reduce the number of attributes in training data.
o Constraining the model.
Underfitting:

Underfitting is just the opposite of overfitting. Whenever a machine learning model is


trained with fewer amounts of data, and as a result, it provides incomplete and
inaccurate data and destroys the accuracy of the machine learning model.

Underfitting occurs when our model is too simple to understand the base structure
of the data, just like an undersized pant. This generally happens when we have
limited data into the data set, and we try to build a linear model with non-linear data.
In such scenarios, the complexity of the model destroys, and rules of the machine
learning model become too easy to be applied on this data set, and the model starts
doing wrong predictions as well.

Methods to reduce Underfitting:

o Increase model complexity


o Remove noise from the data
o Trained on increased and better features
o Reduce the constraints
o Increase the number of epochs to get better results.

5. Monitoring and maintenance


As we know that generalized output data is mandatory for any machine learning
model; hence, regular monitoring and maintenance become compulsory for the
same. Different results for different actions require data change; hence editing of
codes as well as resources for monitoring them also become necessary.

6. Getting bad recommendations


A machine learning model operates under a specific context which results in bad
recommendations and concept drift in the model. Let's understand with an example
where at a specific time customer is looking for some gadgets, but now customer
requirement changed over time but still machine learning model showing same
recommendations to the customer while customer expectation has been changed.
This incident is called a Data Drift. It generally occurs when new data is introduced or
interpretation of data changes. However, we can overcome this by regularly updating
and monitoring data according to the expectations.

7. Lack of skilled resources


Although Machine Learning and Artificial Intelligence are continuously growing in
the market, still these industries are fresher in comparison to others. The absence of
skilled resources in the form of manpower is also an issue. Hence, we need
manpower having in-depth knowledge of mathematics, science, and technologies for
developing and managing scientific substances for machine learning.

8. Customer Segmentation
Customer segmentation is also an important issue while developing a machine
learning algorithm. To identify the customers who paid for the recommendations
shown by the model and who don't even check them. Hence, an algorithm is
necessary to recognize the customer behavior and trigger a relevant
recommendation for the user based on past experience.

9. Process Complexity of Machine Learning


The machine learning process is very complex, which is also another major issue
faced by machine learning engineers and data scientists. However, Machine Learning
and Artificial Intelligence are very new technologies but are still in an experimental
phase and continuously being changing over time. There is the majority of hits and
trial experiments; hence the probability of error is higher than expected. Further, it
also includes analyzing the data, removing data bias, training data, applying complex
mathematical calculations, etc., making the procedure more complicated and quite
tedious.

10. Data Bias


Data Biasing is also found a big challenge in Machine Learning. These errors exist
when certain elements of the dataset are heavily weighted or need more importance
than others. Biased data leads to inaccurate results, skewed outcomes, and other
analytical errors. However, we can resolve this error by determining where data is
actually biased in the dataset. Further, take necessary steps to reduce it.

Methods to remove Data Bias:

o Research more for customer segmentation.


o Be aware of your general use cases and potential outliers.
o Combine inputs from multiple sources to ensure data diversity.
o Include bias testing in the development process.
o Analyze data regularly and keep tracking errors to resolve them easily.
o Review the collected and annotated data.
o Use multi-pass annotation such as sentiment analysis, content moderation,
and intent recognition.

11. Lack of Explainability


This basically means the outputs cannot be easily comprehended as it is
programmed in specific ways to deliver for certain conditions. Hence, a lack of
explainability is also found in machine learning algorithms which reduce the
credibility of the algorithms.

12. Slow implementations and results


This issue is also very commonly seen in machine learning models. However, machine
learning models are highly efficient in producing accurate results but are time-
consuming. Slow programming, excessive requirements' and overloaded data take
more time to provide accurate results than expected. This needs continuous
maintenance and monitoring of the model for delivering accurate results.

13. Irrelevant features


Although machine learning models are intended to give the best possible outcome,
if we feed garbage data as input, then the result will also be garbage. Hence, we
should use relevant features in our training sample. A machine learning model is said
to be good if training data has a good set of features or less to no irrelevant features.

Conclusion
An ML system doesn't perform well if the training set is too small or if the data is not
generalized, noisy, and corrupted with irrelevant features. We went through some of
the basic challenges faced by beginners while practicing machine learning. Machine
learning is all set to bring a big bang transformation in technology. It is one of the
most rapidly growing technologies used in medical diagnosis, speech recognition,
robotic training, product recommendations, video surveillance, and this list goes on.
This continuously evolving domain offers immense job satisfaction, excellent
opportunities, global exposure, and exorbitant salary. It is high risk and a high return
technology. Before starting your machine learning journey, ensure that you carefully
examine the challenges mentioned above. To learn this fantastic technology, you
need to plan carefully, stay patient, and maximize your efforts. Once you win this
battle, you can conquer the Future of work and land your dream job!
Precision and Recall in Machine
Learning
While building any machine learning model, the first thing that comes to our mind is
how we can build an accurate & 'good fit' model and what the challenges are that
will come during the entire procedure. Precision and Recall are the two most
important but confusing concepts in Machine Learning. Precision and recall are
performance metrics used for pattern recognition and classification in machine
learning. These concepts are essential to build a perfect machine learning model
which gives more precise and accurate results. Some of the models in machine
learning require more precision and some model requires more recall. So, it is
important to know the balance between Precision and recall or, simply, precision-
recall trade-off.

In this article, we will understand Precision and recall, the most confusing but
important concepts in machine learning that lots of professionals face during their
entire data science & machine learning career. But before starting, first, we need to
understand the confusion matrix concept. So, let's start with the quick introduction
of Confusion Matrix in Machine Learning.

Confusion Matrix in Machine Learning


Confusion Matrix helps us to display the performance of a model or how a model has
made its prediction in Machine Learning.

Confusion Matrix helps us to visualize the point where our model gets confused in
discriminating two classes. It can be understood well through a 2×2 matrix where the
row represents the actual truth labels, and the column represents the predicted
labels.

PlayNext
Unmute

Current Time 0:00

Duration 18:10
Loaded: 0.37%
Â
Fullscreen
Backward Skip 10sPlay VideoForward Skip 10s

This matrix consists of 4 main elements that show different metrics to count a
number of correct and incorrect predictions. Each element has two words either as
follows:

o True or False
o Positive or Negative

If the predicted and truth labels match, then the prediction is said to be correct, but
when the predicted and truth labels are mismatched, then the prediction is said to be
incorrect. Further, positive and negative represents the predicted labels in the matrix.

There are four metrics combinations in the confusion matrix, which are as follows:
o True Positive: This combination tells us how many times a model correctly
classifies a positive sample as Positive?
o False Negative: This combination tells us how many times a model incorrectly
classifies a positive sample as Negative?
o False Positive: This combination tells us how many times a model incorrectly
classifies a negative sample as Positive?
o True Negative: This combination tells us how many times a model correctly
classifies a negative sample as Negative?

Hence, we can calculate the total of 7 predictions in binary classification problems


using a confusion matrix.

Now we can understand the concepts of Precision and Recall.

What is Precision?
Precision is defined as the ratio of correctly classified positive samples (True Positive)
to a total number of classified positive samples (either correctly or incorrectly).

1. Precision = True Positive/True Positive + False Positive


2. Precision = TP/TP+FP

o TP- True Positive


o FP- False Positive

o The precision of a machine learning model will be low when the value of;

1. TP+FP (denominator) > TP (Numerator)

o The precision of the machine learning model will be high when Value of;

1. TP (Numerator) > TP+FP (denominator)

Hence, precision helps us to visualize the reliability of the machine learning


model in classifying the model as positive.

Examples to calculate the Precision in the


machine learning model
Below are some examples for calculating Precision in Machine Learning:
Case 1- In the below-mentioned scenario, the model correctly classified two positive
samples while incorrectly classified one negative sample as positive. Hence,
according to precision formula;

Precision = TP/TP+FP

Precision = 2/2+1 = 2/3 = 0.667

Case 2- In this scenario, we have three Positive samples that are correctly classified,
and one Negative sample is incorrectly classified.

Put TP =3 and FP =1 in the precision formula, we get;


Precision = TP/TP+FP

Precision = 3/3+1 = 3/4 = 0.75

Case 3- In this scenario, we have three Positive samples that are correctly classified
but no Negative sample which is incorrectly classified.

Put TP =3 and FP =0 in precision formula, we get;

Precision = TP/TP+FP

Precision = 3/3+0 = 3/3 = 1

Hence, in the last scenario, we have a precision value of 1 or 100% when all positive
samples are classified as positive, and there is no any Negative sample that is
incorrectly classified.

What is Recall?
The recall is calculated as the ratio between the numbers of Positive samples
correctly classified as Positive to the total number of Positive samples. The recall
measures the model's ability to detect positive samples. The higher the recall, the
more positive samples detected.
1. Recall = True Positive/True Positive + False Negative
2. Recall = TP/TP+FN

o TP- True Positive


o FN- False Negative

o Recall of a machine learning model will be low when the value of;
TP+FN (denominator) > TP (Numerator)
o Recall of machine learning model will be high when Value of;
TP (Numerator) > TP+FN (denominator)

Unlike Precision, Recall is independent of the number of negative sample


classifications. Further, if the model classifies all positive samples as positive, then
Recall will be 1.

Examples to calculate the Recall in the machine


learning model
Below are some examples for calculating Recall in machine learning as follows

Example 1- Let's understand the calculation of Recall with four different cases where
each case has the same Recall as 0.667 but differs in the classification of negative
samples. See how:

In this scenario, the classification of the negative sample is different in each case.
Case A has two negative samples classified as negative, and case B have two negative
samples classified as negative; case c has only one negative sample classified as
negative, while case d does not classify any negative sample as negative.

However, recall is independent of how the negative samples are classified in the
model; hence, we can neglect negative samples and only calculate all samples that
are classified as positive.
In the above image, we have only two positive samples that are correctly classified as
positive while only 1 negative sample that is correctly classified as negative.

Hence, true positivity rate is 2 and while false negativity rate is 1. Then recall will be:

1. Recall = True Positive/True Positive + False Negative

Recall = TP/TP+FN

=2/(2+1)

=2/3

=0.667

Note: This means the model has correctly classified only 0.667% of Positive Samples

Example-2

Now, we have another scenario where all positive samples are classified correctly as
positive. Hence, the True Positive rate is 3 while the False Negative rate is 0.
Recall = TP/TP+FN = 3/(3+0) =3/3 =1

If the recall is 100%, then it tells us the model has detected all positive samples as
positive and neglects how all negative samples are classified in the model. However,
the model could still have so many samples that are classified as negative but recall
just neglect those samples, which results in a high False Positive rate in the model.

Note: This means the model has correctly classified 100% of Positive Samples.

Example-3

In this scenario, the model does not identify any positive sample that is classified as
positive. All positive samples are incorrectly classified as Negative. Hence, the true
positive rate is 0, and the False Negative rate is 3. Then Recall will be:

Recall = TP/TP+FN = 0/(0+3) =0/3 =0

This means the model has not correctly classified any Positive Samples.

Difference between Precision and Recall in


Machine Learning

Precision Recall
It helps us to measure the ability to classify It helps us to measure how many positive sam
positive samples in the model. were correctly classified by the ML model.

While calculating the Precision of a model, we While calculating the Recall of a model, we
should consider both Positive as well as Negative need all positive samples while all neg
samples that are classified. samples will be neglected.

When a model classifies most of the positive When a model classifies a sample as Positive, b
samples correctly as well as many false-positive can only classify a few positive samples, then
samples, then the model is said to be a high model is said to be high accuracy, high preci
recall and low precision model. and low recall model.

The precision of a machine learning model is Recall of a machine learning model is depen
dependent on both the negative and positive on positive samples and independent of neg
samples. samples.

In Precision, we should consider all positive The recall cares about correctly classifying
samples that are classified as positive either positive samples. It does not consider if
correctly or incorrectly. negative sample is classified as positive.

Why use Precision and Recall in Machine


Learning models?
This question is very common among all machine learning engineers and data
researchers. The use of Precision and Recall varies according to the type of problem
being solved.

o If there is a requirement of classifying all positive as well as Negative samples


as Positive, whether they are classified correctly or incorrectly, then use
Precision.
o Further, on the other end, if our goal is to detect only all positive samples,
then use Recall. Here, we should not care how negative samples are correctly
or incorrectly classified the samples.

Conclusion:
In this tutorial, we have discussed various performance metrics such as confusion
matrix, Precision, and Recall for binary classification problems of a machine learning
model. Also, we have seen various examples to calculate Precision and Recall of a
machine learning model and when we should use precision, and when to use Recall.
Genetic Algorithm in Machine
Learning
A genetic algorithm is an adaptive heuristic search algorithm inspired by
"Darwin's theory of evolution in Nature." It is used to solve optimization problems
in machine learning. It is one of the important algorithms as it helps solve complex
problems that would take a long time to solve.

Genetic Algorithms are being widely used in different real-world applications, for
example, Designing electronic circuits, code-breaking, image processing, and
artificial creativity.

In this topic, we will explain Genetic algorithm in detail, including basic terminologies
used in Genetic algorithm, how it works, advantages and limitations of genetic
algorithm, etc.

What is a Genetic Algorithm?


Before understanding the Genetic algorithm, let's first understand basic
terminologies to better understand this algorithm:

PlayNext
Unmute

Current Time 0:00


/

Duration 18:10
Loaded: 0.37%
Â
Fullscreen
Backward Skip 10sPlay VideoForward Skip 10s

o Population: Population is the subset of all possible or probable solutions,


which can solve the given problem.
o Chromosomes: A chromosome is one of the solutions in the population for
the given problem, and the collection of gene generate a chromosome.
o Gene: A chromosome is divided into a different gene, or it is an element of
the chromosome.
o Allele: Allele is the value provided to the gene within a particular
chromosome.
o Fitness Function: The fitness function is used to determine the individual's
fitness level in the population. It means the ability of an individual to compete
with other individuals. In every iteration, individuals are evaluated based on
their fitness function.
o Genetic Operators: In a genetic algorithm, the best individual mate to
regenerate offspring better than parents. Here genetic operators play a role in
changing the genetic composition of the next generation.
o Selection

After calculating the fitness of every existent in the population, a selection process is
used to determine which of the individualities in the population will get to reproduce
and produce the seed that will form the coming generation.

Types of selection styles available

o Roulette wheel selection


o Event selection
o Rank- grounded selection

So, now we can define a genetic algorithm as a heuristic search algorithm to solve
optimization problems. It is a subset of evolutionary algorithms, which is used in
computing. A genetic algorithm uses genetic and natural selection concepts to solve
optimization problems.
How Genetic Algorithm Work?
The genetic algorithm works on the evolutionary generational cycle to generate
high-quality solutions. These algorithms use different operations that either enhance
or replace the population to give an improved fit solution.

It basically involves five phases to solve the complex optimization problems, which
are given as below:

o Initialization
o Fitness Assignment
o Selection
o Reproduction
o Termination

1. Initialization
The process of a genetic algorithm starts by generating the set of individuals, which
is called population. Here each individual is the solution for the given problem. An
individual contains or is characterized by a set of parameters called Genes. Genes are
combined into a string and generate chromosomes, which is the solution to the
problem. One of the most popular techniques for initialization is the use of random
binary strings.

2. Fitness Assignment
Fitness function is used to determine how fit an individual is? It means the ability of
an individual to compete with other individuals. In every iteration, individuals are
evaluated based on their fitness function. The fitness function provides a fitness score
to each individual. This score further determines the probability of being selected for
reproduction. The high the fitness score, the more chances of getting selected for
reproduction.

3. Selection
The selection phase involves the selection of individuals for the reproduction of
offspring. All the selected individuals are then arranged in a pair of two to increase
reproduction. Then these individuals transfer their genes to the next generation.

There are three types of Selection methods available, which are:

o Roulette wheel selection


o Tournament selection
o Rank-based selection

4. Reproduction
After the selection process, the creation of a child occurs in the reproduction step. In
this step, the genetic algorithm uses two variation operators that are applied to the
parent population. The two operators involved in the reproduction phase are given
below:

o Crossover: The crossover plays a most significant role in the reproduction


phase of the genetic algorithm. In this process, a crossover point is selected at
random within the genes. Then the crossover operator swaps genetic
information of two parents from the current generation to produce a new
individual representing the offspring.

The genes of parents are exchanged among themselves until the crossover
point is met. These newly generated offspring are added to the population.
This process is also called or crossover. Types of crossover styles available:
o One point crossover
o Two-point crossover
o Livery crossover
o Inheritable Algorithms crossover
o Mutation
The mutation operator inserts random genes in the offspring (new child) to
maintain the diversity in the population. It can be done by flipping some bits
in the chromosomes.
Mutation helps in solving the issue of premature convergence and enhances
diversification. The below image shows the mutation process:
Types of mutation styles available,
o Flip bit mutation
o Gaussian mutation
o Exchange/Swap mutation

5. Termination
After the reproduction phase, a stopping criterion is applied as a base for
termination. The algorithm terminates after the threshold fitness solution is reached.
It will identify the final solution as the best solution in the population.

General Workflow of a Simple Genetic


Algorithm
Advantages of Genetic Algorithm
o The parallel capabilities of genetic algorithms are best.
o It helps in optimizing various problems such as discrete functions, multi-
objective problems, and continuous functions.
o It provides a solution for a problem that improves over time.
o A genetic algorithm does not need derivative information.

Limitations of Genetic Algorithms


o Genetic algorithms are not efficient algorithms for solving simple problems.
o It does not guarantee the quality of the final solution to a problem.
o Repetitive calculation of fitness values may generate some computational
challenges.

Difference between Genetic Algorithms


and Traditional Algorithms
o A search space is the set of all possible solutions to the problem. In the
traditional algorithm, only one set of solutions is maintained, whereas, in a
genetic algorithm, several sets of solutions in search space can be used.
o Traditional algorithms need more information in order to perform a search,
whereas genetic algorithms need only one objective function to calculate the
fitness of an individual.
o Traditional Algorithms cannot work parallelly, whereas genetic Algorithms can
work parallelly (calculating the fitness of the individualities are independent).
o One big difference in genetic Algorithms is that rather of operating directly on
seeker results, inheritable algorithms operate on their representations (or
rendering), frequently appertained to as chromosomes.
o One of the big differences between traditional algorithm and genetic
algorithm is that it does not directly operate on candidate solutions.
o Traditional Algorithms can only generate one result in the end, whereas
Genetic Algorithms can generate multiple optimal results from different
generations.
o The traditional algorithm is not more likely to generate optimal results,
whereas Genetic algorithms do not guarantee to generate optimal global
results, but also there is a great possibility of getting the optimal result for a
problem as it uses genetic operators such as Crossover and Mutation.
o Traditional algorithms are deterministic in nature, whereas Genetic algorithms
are probabilistic and stochastic in nature.

Normalization in Machine Learning


Normalization is one of the most frequently used data preparation techniques,
which helps us to change the values of numeric columns in the dataset to use a
common scale.

Although Normalization is no mandate for all datasets available in machine


learning, it is used whenever the attributes of the dataset have different ranges. It
helps to enhance the performance and reliability of a machine learning model. In this
article, we will discuss in brief various Normalization techniques in machine learning,
why it is used, examples of normalization in an ML model, and much more. So, let's
start with the definition of Normalization in Machine Learning.

What is Normalization in Machine


Learning?
Normalization is a scaling technique in Machine Learning applied during data
preparation to change the values of numeric columns in the dataset to use a
common scale. It is not necessary for all datasets in a model. It is required only when
features of machine learning models have different ranges.

Mathematically, we can calculate normalization with the below formula:

PlayNext
Unmute

Current Time 0:00


/

Duration 18:10
Loaded: 0.37%
Â
Fullscreen
Backward Skip 10sPlay VideoForward Skip 10s

1. Xn = (X - Xminimum) / ( Xmaximum - Xminimum)

o Xn = Value of Normalization
o Xmaximum = Maximum value of a feature
o Xminimum = Minimum value of a feature

Example: Let's assume we have a model dataset having maximum and minimum
values of feature as mentioned above. To normalize the machine learning model,
values are shifted and rescaled so their range can vary between 0 and 1. This
technique is also known as Min-Max scaling. In this scaling technique, we will
change the feature values as follows:

Case1- If the value of X is minimum, the value of Numerator will be 0; hence


Normalization will also be 0.

1. Xn = (X - Xminimum) / ( Xmaximum - Xminimum)

Put X =Xminimum in above formula, we get;

Xn = Xminimum- Xminimum/ ( Xmaximum - Xminimum)

Xn = 0

Case2- If the value of X is maximum, then the value of the numerator is equal to the
denominator; hence Normalization will be 1.

1. Xn = (X - Xminimum) / ( Xmaximum - Xminimum)

Put X =Xmaximum in above formula, we get;

Xn = Xmaximum - Xminimum/ ( Xmaximum - Xminimum)

Xn = 1

Case3- On the other hand, if the value of X is neither maximum nor minimum, then
values of normalization will also be between 0 and 1.
Hence, Normalization can be defined as a scaling method where values are shifted
and rescaled to maintain their ranges between 0 and 1, or in other words; it can be
referred to as Min-Max scaling technique.

Normalization techniques in Machine


Learning
Although there are so many feature normalization techniques in Machine Learning,
few of them are most frequently used. These are as follows:

o Min-Max Scaling: This technique is also referred to as scaling. As we have


already discussed above, the Min-Max scaling method helps the dataset to
shift and rescale the values of their attributes, so they end up ranging between
0 and 1.
o Standardization scaling:

Standardization scaling is also known as Z-score normalization, in which values are


centered around the mean with a unit standard deviation, which means the attribute
becomes zero and the resultant distribution has a unit standard deviation.
Mathematically, we can calculate the standardization by subtracting the feature value
from the mean and dividing it by standard deviation.

Hence, standardization can be expressed as follows:

Here, µ represents the mean of feature value, and σ represents the standard
deviation of feature values.

However, unlike Min-Max scaling technique, feature values are not restricted to a
specific range in the standardization technique.

This technique is helpful for various machine learning algorithms that use distance
measures such as KNN, K-means clustering, and Principal component analysis,
etc. Further, it is also important that the model is built on assumptions and data is
normally distributed.

Difference between Normalization and


Standardization
Normalization Standardization

This technique uses minimum and max values This technique uses mean and standard deviation
for scaling of model. scaling of model.

It is helpful when features are of different It is helpful when the mean of a variable is set
scales. and the standard deviation is set to 1.

Scales values ranges between [0, 1] or [-1, 1]. Scale values are not restricted to a specific range.

It got affected by outliers. It is comparatively less affected by outliers.

Scikit-Learn provides a transformer called Scikit-Learn provides a transformer c


MinMaxScaler for Normalization. StandardScaler for Normalization.

It is also called Scaling normalization. It is known as Z-score normalization.

It is useful when feature distribution is It is useful when feature distribution is normal.


unknown.

When to use Normalization or


Standardization?
Which is suitable for our machine learning model, Normalization or Standardization?
This is probably a big confusion among all data scientists as well as machine learning
engineers. Although both terms have the almost same meaning choice of using
normalization or standardization will depend on your problem and the algorithm you
are using in models.

1. Normalization is a transformation technique that helps to improve the


performance as well as the accuracy of your model better. Normalization of a
machine learning model is useful when you don't know feature distribution exactly.
In other words, the feature distribution of data does not follow a Gaussian (bell
curve) distribution. Normalization must have an abounding range, so if you have
outliers in data, they will be affected by Normalization.

Further, it is also useful for data having variable scaling techniques such as KNN,
artificial neural networks. Hence, you can't use assumptions for the distribution of
data.

2. Standardization in the machine learning model is useful when you are exactly
aware of the feature distribution of data or, in other words, your data follows a
Gaussian distribution. However, this does not have to be necessarily true. Unlike
Normalization, Standardization does not necessarily have a bounding range, so if you
have outliers in your data, they will not be affected by Standardization.

Further, it is also useful when data has variable dimensions and techniques such
as linear regression, logistic regression, and linear discriminant analysis.

Example: Let's understand an experiment where we have a dataset having two


attributes, i.e., age and salary. Where the age ranges from 0 to 80 years old, and the
income varies from 0 to 75,000 dollars or more. Income is assumed to be 1,000 times
that of age. As a result, the ranges of these two attributes are much different from
one another.

Because of its bigger value, the attributed income will organically influence the
conclusion more when we undertake further analysis, such as multivariate linear
regression. However, this does not necessarily imply that it is a better predictor. As a
result, we normalize the data so that all of the variables are in the same range.

Further, it is also helpful for the prediction of credit risk scores where normalization is
applied to all numeric data except the class column. It uses the tanh
transformation technique, which converts all numeric features into values of range
between 0 to 1.

Conclusion
Normalization avoids raw data and various problems of datasets by creating new
values and maintaining general distribution as well as a ratio in data. Further, it also
improves the performance and accuracy of machine learning models using various
techniques and algorithms. Hence, the concept of Normalization and Standardization
is a bit confusing but has a lot of importance to build a better machine learning
model.

Adversarial Machine Learning


The term "adversary" is used in the field of computer security to make a fool or
misguide a machine learning model with malicious input. Cyber security is one of
the most important concepts for all data scientists and programmers as well. As
hackers always try to hack data using different techniques. Similarly, Adversarial
machine learning is also a technique that misguides any machine learning model
with deceptive data and reduces the accuracy and performance of the model. In this
article, we will discuss a very important concept of Machine Learning and Artificial
intelligence that helps you to protect machine learning models from digital attacks
and make them secure from unauthorized attacks. So, let's start with a quick
introduction to Adversarial Machine Learning.

What is Adversarial Machine Learning?


Adversarial Machine Learning is referred to as a cyber-attack that aims to make
a fool or misguide a model with malicious input. It is used to execute an attack to
corrupt or disrupt a machine learning model by providing deceptive input.
Adversarial Machine Learning can be widely used in image classification and spam
detection, where some changes are made on the set of images that cause a classifier
to produce incorrect predictions.

Examples of Adversarial Machine Learning


Adversarial Machine learning examples are referred to deceptive inputs that aim to
misguide or disrupt a machine learning model or computer program. There are some
images examples crafted by an attacker that our model cannot predict correctly. Let's
understand with the popular example of Panda vs. Gibbon. Although both these
images are different but are indistinguishable to the human's eye.
The image on the left is one of the clean images in the ImageNet dataset, used to
train the GoogLeNet model.

PlayNext
Unmute

Current Time 0:00

Duration 18:10
Loaded: 0.37%
Â
Fullscreen
Backward Skip 10sPlay VideoForward Skip 10s

However, the first image is slightly different from than third image or even a
modified version of the first. The right-side image is created with the help of
introducing a small perturbation in the central image.

The first image is predicted by the model to be a panda, as expected, while the right
side image is recognized as a gibbon with high confidence.

Hence, while introducing a typical image with adversarial input, it can cause a
classifier to misguide a panda as a gibbon.

Now, take another example that shows different views of a 3D turtle the authors
printed and the misclassifications by the Google Inception v3 model.

Adversarial machine learning has yielded results that range from the funny, benign,
and embarrassing-such as to following turtle being mistaken for a rifle-to potentially
harmful examples, such as a self-driving car mistaking a stop sign for a speed limit.
What do you mean by adversarial Whitebox
and Blackbox attacks?
There are two ways in which attacks are categorized in machine learning. These are
as follows:

o Black Box Attack


o White Box Attack

Black Box Attack: Black Box attacks are the scenario where attackers do not have
model information about the targeted model and also have no access to its
architecture, parameters, and gradients.

White Box Attack: These attacks are just opposite to black-box attacks, where
attackers have all access to the targeted model and information of its architecture,
parameters, and gradients as well.

Black box attacks and white box attacks are further categorized into two types as
follows:

o Targeted Attacks: In this type of attack, attackers disrupt the input in such a
way that the model predicts a specific target class.
o Un-targeted Attacks: In this type of attack, attackers disrupt the inputs in
such a way that the model predicts a class, but it should not be a true class.

How to protect against Adversarial


Examples?
Although adversarial machine learning is always harmful to a model from the security
perspective, we can protect our model by giving adversarial training. As a general
machine learning model is trained with some old data or experience for predicting
the outcome; similarly, an adversarial machine learning model is also provided with
the training. In which a model is trained on various adversarial examples to make
them robust against malfunction in the data.
Although, it is not easy to give Adversarial training to model as it is a very slow and
costly process. Every single training example must be probed for adversarial
weaknesses, and then the model must be retrained on all those examples. Scientists
are developing methods to optimize the process of discovering and patching
adversarial weaknesses in machine learning models. Further, some AI researchers are
also working on preventing such attacks with the help of deep learning concepts
through combining parallel neural networks and generalized neural networks.

Types of Adversarial Attacks


There are so many types of adversarial attacks that can harm your machine learning
system. The aim of these adversarial attacks is to decrease the accuracy as well as the
performance of classifiers on specific tasks and misguide the model also. Adversarial
Machine Learning is a department of machine learning that studies these attacks and
reduces their effect on the model.

There are some important types of Adversarial Attacks as follows:

Poisoning Attack:
Poisoning attacks take place whenever the machine learning model is under training
or during deployment. It is also referred to as contaminating attacks.

In poisoning attacks, attackers influence the data or its labels when a model is in the
training phase, which causes system skewed or generates inaccurate decisions in the
future. It reduces the accuracy and performance of the machine learning system.

Further, when a machine learning model is re-trained during deployment, attackers


introduce malicious input and disrupt the model. This is very difficult for data
researchers to identify when data gets poisonous and behaves wrongly on specific
types of input samples. Also, it is hard to detect what types of sample data will
trigger a machine learning model to behave wrongly.

Let's understand with an example of poisoning a Chatbot. Microsoft has launched a


chatbot for Twitter to learn to engage in conversation through repeated interactions
with other users. Initially, it engaged in casual and playful conversation between
users but later, they examined that chatbot does not contain appropriate filters. Due
to this, the system gets started abusive tweets into its algorithm. As soon as the
number of users gets increased, abusive tweets also increase. Hence, as a result,
Microsoft has to close down this chatbot on the same day.

Evasion Attacks:
These attacks are just opposite to the poisoning attacks, where attacks take place
after a machine learning system has already been trained. These attacks are
commonly used attacks type in machine learning.

It occurs when the ML model calculates the probability around a new sample and is
often developed by trial-and-error methods. The attackers manipulate the data
during deployment, but they are unknown when a machine learning model breaks.

Let's understand with an example. Suppose the attacker wants to investigate the
algorithm of the machine learning model that is designed to filter the spam email
content. Then attackers may do various experiments on different emails to bypass
the spam filter by introducing a new email that includes enough extraneous words to
"tip" the algorithm and classify it as not spam from spam.

These attacks may affect the righteousness and confidentiality of a machine learning
model, which leads it to provide malicious output that is intended by an attacker.
These attacks can also be used to reveal private or sensitive information. One of the
most prevalent examples of evasion attacks is spoofing attacks against biometric
verification systems.

Model Extraction:
Model Extraction is referred to as a black box machine learning system. It is used to
reconstruct the model by extracting data on which it got trained. It helps to steal the
stock marketing prediction model, and later attackers reconstruct a new model
similar to the previous model for their own financial benefit. Model Extraction attacks
are important when either the training data or the model itself is sensitive and
confidential.
Techniques/Methods used in generating
Adversarial Attack

Method Description Advantage Disadvantage

Limited- Limited-memory Broyden- It is used to minimize It is significant to gene


memory Fletcher-Goldfarb-Shanno (L- the disruption added adversarial examples. It
BFGS (L- BFGS). to images. It works on very complex method
BFGS) the non-linear is a computatio
gradient-based intensive optimiza
numerical optimization technique. Further, i
algorithm. comparatively more t
consuming.

FGSM It is abbreviated for Fast It is a comparatively It is comparatively


Gradient Sign Method. easier and fast computationally inten
gradient-based In this method, disrup
method used in is added to every attri
adversarial machine in the model.
learning.

JSMA It stands for Jacobian-based It comparatively It is more computatio


Saliency Map Attack. It helps to disrupts less number intensive in compariso
reduce classification errors by of attributes than the FGSM method.
using feature selection. FGSM.

Deepcool Deepfool attack is referred to as It is efficient to It is more computatio


Attacks an untargeted adversarial produce adversarial intensive in compariso
attacks generation method. It examples with less FGSM, and the J
focuses on reducing the disruption and higher method and examples
euclidean distance between classification rates. not appropriate.
malicious training data and
original training data. In this
approach, decision boundaries
are estimated, and disruption is
added iteratively.

C&W It stands for Carlini & Wagner It is the most effective It is more computatio
Attack. This technology is also method for generating intensive in compariso
similar to the L-BFGS attack, but adversarial examples in Deepfool, FGSM, and J
the only difference is related to machine learning and methods, and example
box constraints and different can misguide the not appropriate.
objective functions as it does adversarial defenses
not contain box constraints technologies also.
which makes the method more
effective for generating
adversarial examples.

GAN It stands for Generative It generates different It is highly computatio


Adversarial Networks and is samples than training intensive.
used to generate adversarial samples.
attacks having two neural
networks. One acts as a
generator or product sample,
and the other as a discriminator
or misclassifies them. Hence,
both neural network plays the
zero-sum game. The
discriminator also tries to
distinguish between the actual
sample and generated sample
from the generator.

ZOO It stands for Zeroth-order Its performance is It needs so many qu


optimization attack. It enables us quite similar to the for the target classifier.
to estimate the classifiers' C&W attack, and no
gradients without touching the training on classifiers is
classifier. required.

Conclusion
Well, in this way, we have understood how adversarial machine learning examples
are so important for security perspectives in machine learning and Artificial
Intelligence. Hopefully, you will get complete basic information about adversarial
machine learning after reading this tutorial.

Basic Concepts in Machine Learning


Machine Learning is continuously growing in the IT world and gaining strength in
different business sectors. Although Machine Learning is in the developing phase, it
is popular among all technologies. It is a field of study that makes computers capable
of automatically learning and improving from experience. Hence, Machine Learning
focuses on the strength of computer programs with the help of collecting data from
various observations. In this article, ''Concepts in Machine Learning'', we will discuss a
few basic concepts used in Machine Learning such as what is Machine Learning,
technologies and algorithms used in Machine Learning, Applications and example of
Machine Learning, and much more. So, let's start with a quick introduction to
machine learning.

What is Machine Learning?


Machine Learning is defined as a technology that is used to train machines to
perform various actions such as predictions, recommendations, estimations, etc.,
based on historical data or past experience.

Machine Learning enables computers to behave like human beings by training them
with the help of past experience and predicted data.

There are three key aspects of Machine Learning, which are as follows:

PlayNext
Unmute

Current Time 0:00

Duration 18:10
Loaded: 0.37%
Â
Fullscreen
Backward Skip 10sPlay VideoForward Skip 10s

o Task: A task is defined as the main problem in which we are interested. This
task/problem can be related to the predictions and recommendations and
estimations, etc.
o Experience: It is defined as learning from historical or past data and used to
estimate and resolve future tasks.
o Performance: It is defined as the capacity of any machine to resolve any
machine learning task or problem and provide the best outcome for the same.
However, performance is dependent on the type of machine learning
problems.

Techniques in Machine Learning


Machine Learning techniques are divided mainly into the following 4 categories:

1. Supervised Learning
Supervised learning is applicable when a machine has sample data, i.e., input as well
as output data with correct labels. Correct labels are used to check the correctness of
the model using some labels and tags. Supervised learning technique helps us to
predict future events with the help of past experience and labeled examples. Initially,
it analyses the known training dataset, and later it introduces an inferred function
that makes predictions about output values. Further, it also predicts errors during this
entire learning process and also corrects those errors through algorithms.

Example: Let's assume we have a set of images tagged as ''dog''. A machine learning
algorithm is trained with these dog images so it can easily distinguish whether an
image is a dog or not.

2. Unsupervised Learning
In unsupervised learning, a machine is trained with some input samples or labels
only, while output is not known. The training information is neither classified nor
labeled; hence, a machine may not always provide correct output compared to
supervised learning.

Although Unsupervised learning is less common in practical business settings, it


helps in exploring the data and can draw inferences from datasets to describe hidden
structures from unlabeled data.

Example: Let's assume a machine is trained with some set of documents having
different categories (Type A, B, and C), and we have to organize them into
appropriate groups. Because the machine is provided only with input samples or
without output, so, it can organize these datasets into type A, type B, and type C
categories, but it is not necessary whether it is organized correctly or not.

3. Reinforcement Learning
Reinforcement Learning is a feedback-based machine learning technique. In such
type of learning, agents (computer programs) need to explore the environment,
perform actions, and on the basis of their actions, they get rewards as feedback. For
each good action, they get a positive reward, and for each bad action, they get a
negative reward. The goal of a Reinforcement learning agent is to maximize the
positive rewards. Since there is no labeled data, the agent is bound to learn by its
experience only.
4. Semi-supervised Learning
Semi-supervised Learning is an intermediate technique of both supervised and
unsupervised learning. It performs actions on datasets having few labels as well as
unlabeled data. However, it generally contains unlabeled data. Hence, it also reduces
the cost of the machine learning model as labels are costly, but for corporate
purposes, it may have few labels. Further, it also increases the accuracy and
performance of the machine learning model.

Sem-supervised learning helps data scientists to overcome the drawback of


supervised and unsupervised learning. Speech analysis, web content classification,
protein sequence classification, text documents classifiers., etc., are some important
applications of Semi-supervised learning.

Applications of Machine Learning


Machine Learning is widely being used in approximately every sector, including
healthcare, marketing, finance, infrastructure, automation, etc. There are some
important real-world examples of machine learning, which are as follows:

Healthcare and Medical Diagnosis:


Machine Learning is used in healthcare industries that help in generating neural
networks. These self-learning neural networks help specialists for providing quality
treatment by analyzing external data on a patient's condition, X-rays, CT scans,
various tests, and screenings. Other than treatment, machine learning is also helpful
for cases like automatic billing, clinical decision supports, and development of clinical
care guidelines, etc.

Marketing:
Machine learning helps marketers to create various hypotheses, testing, evaluation,
and analyze datasets. It helps us to quickly make predictions based on the concept of
big data. It is also helpful for stock marketing as most of the trading is done through
bots and based on calculations from machine learning algorithms. Various Deep
Learning Neural network helps to build trading models such as Convolutional Neural
Network, Recurrent Neural Network, Long-short term memory, etc.

Self-driving cars:
This is one of the most exciting applications of machine learning in today's world. It
plays a vital role in developing self-driving cars. Various automobile companies like
Tesla, Tata, etc., are continuously working for the development of self-driving cars. It
also becomes possible by the machine learning method (supervised learning), in
which a machine is trained to detect people and objects while driving.

Speech Recognition:
Speech Recognition is one of the most popular applications of machine learning.
Nowadays, almost every mobile application comes with a voice search facility. This
''Search By Voice'' facility is also a part of speech recognition. In this method, voice
instructions are converted into text, which is known as Speech to text" or "Computer
speech recognition.

Google assistant, SIRI, Alexa, Cortana, etc., are some famous applications of speech
recognition.

Traffic Prediction:
Machine Learning also helps us to find the shortest route to reach our destination by
using Google Maps. It also helps us in predicting traffic conditions, whether it is
cleared or congested, through the real-time location of the Google Maps app and
sensor.

Image Recognition:
Image recognition is also an important application of machine learning for
identifying objects, persons, places, etc. Face detection and auto friend tagging
suggestion is the most famous application of image recognition used by Facebook,
Instagram, etc. Whenever we upload photos with our Facebook friends, it
automatically suggests their names through image recognition technology.

Product Recommendations:
Machine Learning is widely used in business industries for the marketing of various
products. Almost all big and small companies like Amazon, Alibaba, Walmart, Netflix,
etc., are using machine learning techniques for products recommendation to their
users. Whenever we search for any products on their websites, we automatically get
started with lots of advertisements for similar products. This is also possible by
Machine Learning algorithms that learn users' interests and, based on past data,
suggest products to the user.

Automatic Translation:
Automatic language translation is also one of the most significant applications of
machine learning that is based on sequence algorithms by translating text of one
language into other desirable languages. Google GNMT (Google Neural Machine
Translation) provides this feature, which is Neural Machine Learning. Further, you can
also translate the selected text on images as well as complete documents through
Google Lens.

Virtual Assistant:
A virtual personal assistant is also one of the most popular applications of machine
learning. First, it records out voice and sends to cloud-based server then decode it
with the help of machine learning algorithms. All big companies like Amazon,
Google, etc., are using these features for playing music, calling someone, opening an
app and searching data on the internet, etc.

Email Spam and Malware Filtering:


Machine Learning also helps us to filter various Emails received on our mailbox
according to their category, such as important, normal, and spam. It is possible by
ML algorithms such as Multi-Layer Perceptron, Decision tree, and Naïve Bayes
classifier.

Commonly used Machine Learning


Algorithms
Here is a list of a few commonly used Machine Learning Algorithms as follows:
Linear Regression
Linear Regression is one of the simplest and popular machine learning algorithms
recommended by a data scientist. It is used for predictive analysis by making
predictions for real variables such as experience, salary, cost, etc.

It is a statistical approach that represents the linear relationship between two or


more variables, either dependent or independent, hence called Linear Regression. It
shows the value of the dependent variable changes with respect to the independent
variable, and the slope of this graph is called as Line of Regression.

Linear Regression can be expressed mathematically as follows:

y= a0+a1x+ ε

Y= Dependent Variable

X= Independent Variable

a0= intercept of the line (Gives an additional degree of freedom)

a1 = Linear regression coefficient (scale factor to each input value).


ε = random error

The values for x and y variables are training datasets for Linear Regression model
representation.

Types of Linear Regression:

o Simple Linear Regression


o Multiple Linear Regression

Applications of Linear Regression:

Linear Regression is helpful for evaluating the business trends and forecasts such as
prediction of salary of a person based on their experience, prediction of crop
production based on the amount of rainfall, etc.

Logistic Regression
Logistic Regression is a subset of the Supervised learning technique. It helps us to
predict the output of categorical dependent variables using a given set of
independent variables. However, it can be Binary (0 or 1) as well as Boolean
(true/false), but instead of giving an exact value, it gives a probabilistic value between
o or 1. It is much similar to Linear Regression, depending on its use in the machine
learning model. As Linear regression is used for solving regression problems,
similarly, Logistic regression is helpful for solving classification problems.

Logistic Regression can be expressed as an 'S-shaped curve called sigmoid functions.


It predicts two maximum values (0 or 1).

Mathematically, we can express Logistic regression as follows:

Types of Logistic Regression:

o Binomial
o Multinomial
o Ordinal

K Nearest Neighbour (KNN)


It is also one of the simplest machine learning algorithms that come under
supervised learning techniques. It is helpful for solving regression as well as
classification problems. It assumes the similarity between the new data and available
data and puts the new data into the category that is most similar to the available
categories. It is also known as Lazy Learner Algorithms because it does not learn
from the training set immediately; instead, it stores the dataset, and at the time of
classification, it performs an action on the dataset. Let's suppose we have a few sets
of images of cats and dogs and want to identify whether a new image is of a cat or
dog. Then KNN algorithm is the best way to identify the cat from available data sets
because it works on similarity measures. Hence, the KNN model will compare the
new image with available images and put the output in the cat's category.

Let's understand the KNN algorithm with the below screenshot, where we have to
assign a new data point based on the similarity with available data points.

Applications of KNN algorithm in Machine Learning

Including Machine Learning, KNN algorithms are used in so many fields as follows:

o Healthcare and Medical diagnosis


o Credit score checking
o Text Editing
o Hotel Booking
o Gaming
o Natural Language Processing, etc.

K-Means Clustering
K-Means Clustering is a subset of unsupervised learning techniques. It helps us to
solve clustering problems by means of grouping the unlabeled datasets into different
clusters. Here K defines the number of pre-defined clusters that need to be created
in the process, as if K=2, there will be two clusters, and for K=3, there will be three
clusters, and so on.

Decision Tree
Decision Tree is also another type of Machine Learning technique that comes under
Supervised Learning. Similar to KNN, the decision tree also helps us to solve
classification as well as regression problems, but it is mostly preferred to solve
classification problems. The name decision tree is because it consists of a tree-
structured classifier in which attributes are represented by internal nodes, decision
rules are represented by branches, and the outcome of the model is represented by
each leaf of a tree. The tree starts from the decision node, also known as the root
node, and ends with the leaf node.

Decision nodes help us to make any decision, whereas leaves are used to determine
the output of those decisions.

A Decision Tree is a graphical representation for getting all the possible outcomes to
a problem or decision depending on certain given conditions.

Random Forest
Random Forest is also one of the most preferred machine learning algorithms that
come under the Supervised Learning technique. Similar to KNN and Decision Tree, It
also allows us to solve classification as well as regression problems, but it is preferred
whenever we have a requirement to solve a complex problem and to improve the
performance of the model.

A random forest algorithm is based on the concept of ensemble learning, which is a


process of combining multiple classifiers.

Random forest classifier is made from a combination of a number of decision trees as


well as various subsets of the given dataset. This combination takes input as an
average prediction from all trees and improves the accuracy of the model. The
greater number of trees in the forest leads to higher accuracy and prevents the
problem of overfitting. Further, It also takes less training time as compared to other
algorithms.

Support Vector Machines (SVM)


It is also one of the most popular machine learning algorithms that come as a subset
of the Supervised Learning technique in machine learning. The goal of the support
vector machine algorithm is to create the best line or decision boundary that can
segregate n-dimensional space into classes so that we can easily put the new data
point in the correct category in the future. This best decision boundary is called a
hyperplane. It is also used to solve classification as well as regression problems. It is
used for Face detection, image classification, text categorization, etc.

Naïve Bayes
The naïve Bayes algorithm is one of the simplest and most effective machine learning
algorithms that come under the supervised learning technique. It is based on the
concept of the Bayes Theorem, used to solve classification-related problems. It helps
to build fast machine learning models that can make quick predictions with greater
accuracy and performance. It is mostly preferred for text classification having high-
dimensional training datasets.

It is used as a probabilistic classifier which means it predicts on the basis of the


probability of an object. Spam filtration, Sentimental analysis, and classifying articles
are some important applications of the Naïve Bayes algorithm.

It is also based on the concept of Bayes Theorem, which is also known as Bayes' Rule
or Bayes' law. Mathematically, Bayes Theorem can be expressed as follows:

Where,

o P(A) is Prior Probability


o P(B) is Marginal Probability
o P(A|B) is Posterior probability
o P(B|A) is Likelihood probability

Difference between machine learning and


Artificial Intelligence
o Artificial intelligence is a technology using which we can create intelligent
systems that can simulate human intelligence, whereas Machine learning is a
subfield of artificial intelligence, which enables machines to learn from past
data or experiences.
o Artificial Intelligence is a technology used to create an intelligent system that
enables a machine to simulate human behavior. Whereas, Machine Learning is
a branch of AI which helps a machine to learn from experience without being
explicitly programmed.
o AI helps to make humans like intelligent computer systems to solve complex
problems. Whereas, ML is used to gain accurate predictions from past data or
experience.
o AI can be divided into Weak AI, General AI, and Strong AI. Whereas, IML can
be divided into Supervised learning, Unsupervised learning, and
Reinforcement learning.
o Each AI agent includes learning, reasoning, and self-correction. Each ML
model includes learning and self-correction when introduced with new data.
o AI deals with Structured, semi-structured, and unstructured data. ML deals
with Structured and semi-structured data.
o Applications of AI: Siri, customer support using catboats, Expert System,
Online game playing, an intelligent humanoid robot, etc. Applications of
ML: Online recommender system, Google search algorithms, Facebook auto
friend tagging suggestions, etc.

Conclusion
This article has introduced you to a few important basic concepts of Machine
Learning. Now, we can say, machine learning helps to build a smart machine that
learns from past experience and works faster. There are a lot of online games
available on the internet that are much faster than a real game player, such as Chess,
AlphaGo and Ludo, etc. However, machine learning is a broad concept, but also you
can learn each concept in a few hours of study. If you are preparing yourself for
making a data scientist or machine learning engineer, then you must have in-depth
knowledge of each concept of machine learning.

Machine Learning Techniques


Machine learning is a data analytics technique that teaches computers to do what
comes naturally to humans and animals: learn from experience. Machine learning
algorithms use computational methods to directly "learn" from data without relying
on a predetermined equation as a model.

As the number of samples available for learning increases, the algorithm adapts to
improve performance. Deep learning is a special form of machine learning.

How does machine learning work?


Machine learning uses two techniques: supervised learning, which trains a model on
known input and output data to predict future outputs, and unsupervised learning,
which uses hidden patterns or internal structures in the input data.

Supervised learning
Supervised machine learning creates a model that makes predictions based on
evidence in the presence of uncertainty. A supervised learning algorithm takes a
known set of input data and known responses to the data (output) and trains a
model to generate reasonable predictions for the response to the new data. Use
supervised learning if you have known data for the output you are trying to estimate.
PlayNext
Unmute

Current Time 0:00

Duration 18:10
Loaded: 0.37%
Â
Fullscreen
Backward Skip 10sPlay VideoForward Skip 10s

Supervised learning uses classification and regression techniques to develop machine


learning models.

Classification models classify the input data. Classification techniques predict discrete
responses. For example, the email is genuine, or spam, or the tumor is cancerous or
benign. Typical applications include medical imaging, speech recognition, and credit
scoring.

Use taxonomy if your data can be tagged, classified, or divided into specific groups
or classes. For example, applications for handwriting recognition use classification to
recognize letters and numbers. In image processing and computer vision,
unsupervised pattern recognition techniques are used for object detection and
image segmentation.

Common algorithms for performing classification include support vector machines


(SVMs), boosted and bagged decision trees, k-nearest neighbors, Naive Bayes,
discriminant analysis, logistic regression, and neural networks.

Regression techniques predict continuous responses - for example, changes in


temperature or fluctuations in electricity demand. Typical applications include power
load forecasting and algorithmic trading.

If you are working with a data range or if the nature of your response is a real
number, such as temperature or the time until a piece of equipment fails, use
regression techniques.

Common regression algorithms include linear, nonlinear models, regularization,


stepwise regression, boosted and bagged decision trees, neural networks, and
adaptive neuro-fuzzy learning.

Using supervised learning to predict heart attacks


Physicians want to predict whether someone will have a heart attack within a year.
They have data on previous patients, including age, weight, height, and blood
pressure. They know if previous patients had had a heart attack within a year. So the
problem is to combine existing data into a model that can predict whether a new
person will have a heart attack within a year.

Unsupervised Learning
Detects hidden patterns or internal structures in unsupervised learning data. It is
used to eliminate datasets containing input data without labeled responses.

Clustering is a common unsupervised learning technique. It is used for exploratory


data analysis to find hidden patterns and clusters in the data. Applications for cluster
analysis include gene sequence analysis, market research, and commodity
identification.

For example, if a cell phone company wants to optimize the locations where they
build towers, they can use machine learning to predict how many people their towers
are based on.

A phone can only talk to 1 tower at a time, so the team uses clustering algorithms to
design the good placement of cell towers to optimize signal reception for their
groups or groups of customers.

Common algorithms for performing clustering are k-means and k-medoids,


hierarchical clustering, Gaussian mixture models, hidden Markov models, self-
organizing maps, fuzzy C-means clustering, and subtractive clustering.

Ten methods are described and it is a foundation you can build on to improve your
machine learning knowledge and skills:

o Regression
o Classification
o Clustering
o Dimensionality Reduction
o Ensemble Methods
o Neural Nets and Deep Learning
o Transfer Learning
o Reinforcement Learning
o Natural Language Processing
o Word Embedding's

Let's differentiate between two general categories of machine learning: supervised


and unsupervised. We apply supervised ML techniques when we have a piece of
data that we want to predict or interpret. We use the previous and output data to
predict the output based on the new input.

For example, you can use supervised ML techniques to help a service business that
wants to estimate the number of new users that will sign up for the service in the
next month. In contrast, untrained ML looks at ways of connecting and grouping
data points without using target variables to make predictions.

In other words, it evaluates data in terms of traits and uses traits to group objects
that are similar to each other. For example, you can use unsupervised learning
techniques to help a retailer who wants to segment products with similar
characteristics-without specifying in advance which features to use.

1. Regression
Regression methods fall under the category of supervised ML. They help predict or
interpret a particular numerical value based on prior data, such as predicting an
asset's price based on past pricing data for similar properties.

The simplest method is linear regression, where we use the mathematical equation of
the line (y = m * x + b) to model the data set. We train a linear regression model
with multiple data pairs (x, y) by computing the position and slope of a line that
minimizes the total distance between all data points and the line. In other words, we
calculate the slope (M) and the y-intercept (B) for a line that best approximates the
observations in the data.

Let us consider a more concrete example of linear regression. I once used linear
regression to predict the energy consumption (in kW) of some buildings by
gathering together the age of the building, the number of stories, square feet, and
the number of wall devices plugged in.

Since there was more than one input (age, square feet, etc.), I used a multivariable
linear regression. The principle was similar to a one-to-one linear regression. Still, in
this case, the "line" I created occurred in a multi-dimensional space depending on
the number of variables.

Now imagine that you have access to the characteristics of a building (age, square
feet, etc.), but you do not know the energy consumption. In this case, we can use the
fitted line to estimate the energy consumption of the particular building. The plot
below shows how well the linear regression model fits the actual energy
consumption of the building.

Note that you can also use linear regression to estimate the weight of each factor
that contributes to the final prediction of energy consumed. For example, once you
have a formula, you can determine whether age, size, or height are most important.

Linear regression model estimates of building energy consumption (kWh).

Regression techniques run the gamut from simple (linear regression) to complex
(regular linear regression, polynomial regression, decision trees, random forest
regression, and neural nets). But don't get confused: start by studying simple linear
regression, master the techniques, and move on.
2. Classification
In another class of supervised ML, classification methods predict or explain a class
value. For example, they can help predict whether an online customer will purchase a
product. Output can be yes or no: buyer or no buyer. But the methods of
classification are not limited to two classes. For example, a classification method can
help assess whether a given image contains a car or a truck. The simplest
classification algorithm is logistic regression, which sounds like a regression method,
but it is not. Logistic regression estimates the probability of occurrence of an event
based on one or more inputs.

For example, logistic regression can take two test scores for a student to predict that
the student will get admission to a particular college. Because the guess is a
probability, the output is a number between 0 and 1, where 1 represents absolute
certainty. For the student, if the predicted probability is greater than 0.5, we estimate
that they will be admitted. If the predicted probability is less than 0.5, we estimate it
will be rejected.

The chart below shows the marks of past students and whether they were admitted.
Logistic regression allows us to draw a line that represents the decision boundary.

Because logistic regression is the simplest classification model, it is a good place to


start for classification. As you progress, you can dive into nonlinear classifiers such as
decision trees, random forests, support vector machines, and neural nets, among
others.
3. Clustering
We fall into untrained ML with clustering methods because they aim to group or
group observations with similar characteristics. Clustering methods do not use the
output information for training but instead let the algorithm define the output. In
clustering methods, we can only use visualization to observe the quality of the
solution.

The most popular clustering method is K-Means, where "K" represents the number of
clusters selected by the user. (Note that there are several techniques for selecting the
value of K, such as the elbow method.)

o Randomly chooses K centers within the data.


o Assigns each data point closest to the randomly generated centers.

Otherwise, we return to step 2. (To prevent ending in an infinite loop if the centers
continue to change, set the maximum number of iterations in advance.)

The process is over if the centers do not change (or change very little).

The next plot applies the K-means to the building's data set. The four measurements
pertain to air conditioning, plug-in appliances (microwave, refrigerator, etc.),
household gas, and heating gas. Each column of the plot represents the efficiency of
each building.
Linear regression model estimates of building energy consumption (kWh).

Regression techniques run the gamut from simple (linear) to complex (regular linear,
polynomial, decision trees, random forest, and neural nets). But don't get confused:
start by studying simple linear regression, master the techniques, and move on.

Clustering Buildings into Efficient (Green) and Inefficient (Red) Groups.

As you explore clustering, you will come across very useful algorithms such as
Density-based Spatial Clustering of Noise (DBSCAN), Mean Shift Clustering,
Agglomerative Hierarchical Clustering, and Expectation-Maximization Clustering
using the Gaussian Mixture Model, among others.

4. Dimensionality Reduction
We use dimensionality reduction to remove the least important information
(sometimes unnecessary columns) from the data setFor example, and images may
consist of thousands of pixels, which are unimportant to your analysis. Or, when
testing microchips within the manufacturing process, you may have thousands of
measurements and tests applied to each chip, many of which provide redundant
information. In these cases, you need a dimensionality reduction algorithm to make
the data set manageable.

The most popular dimensionality reduction method is Principal Component Analysis


(PCA), which reduces the dimensionality of the feature space by finding new vectors
that maximize the linear variance of the data. (You can also measure the extent of
information loss and adjust accordingly.) When the linear correlations of the data are
strong, PCA can dramatically reduce the dimension of the data without losing too
much information.

Another popular method is t-stochastic neighbor embedding (t-SNE), which


minimizes nonlinear dimensions. People usually use t-SNE for data visualization, but
you can also use it for machine learning tasks such as feature space reduction and
clustering, to mention a few.

The next plot shows the analysis of the MNIST database of handwritten digits. MNIST
contains thousands of images of numbers 0 to 9, which the researchers use to test
their clustering and classification algorithms. Each row of the data set is a vector
version of the original image (size 28 x 28 = 784) and a label for each image (zero,
one, two, three, …, nine). Therefore, we are reducing the dimensionality from 784
(pixels) to 2 (the dimensions in our visualization). Projecting to two dimensions allows
us to visualize higher-dimensional original data sets.

5. Ensemble Methods
Imagine that you have decided to build a bicycle because you are not happy with the
options available in stores and online. Once you've assembled these great parts, the
resulting bike will outlast all other options.

Each model uses the same idea of combining multiple predictive models (supervised
ML) to obtain higher quality predictions than the model.

For example, the Random Forest algorithm is an ensemble method that combines
multiple decision trees trained with different samples from a data set. As a result, the
quality of predictions of a random forest exceeds the quality of predictions predicted
with a single decision tree.

Think about ways to reduce the variance and bias of a single machine learning
model. By combining the two models, the quality of the predictions becomes
balanced. With another model, the relative accuracy may be reversed. It is important
because any given model may be accurate under some conditions but may be
inaccurate under other conditions.

Most of the top winners of Kaggle competitions use some dressing method. The
most popular ensemble algorithms are Random Forest, XGBoost, and LightGBM.

6. Neural networks and deep learning


Unlike linear and logistic regression, which is considered linear models, neural
networks aim to capture nonlinear patterns in data by adding layers of parameters to
the model. The simple neural net has three inputs as in the image below, a hidden
layer with five parameters and an output layer.

Neural network with a hidden layer.

The neural network structure is flexible enough to construct our famous linear and
logistic regression. The term deep learning comes from a neural net with many
hidden layers and encompasses a variety of architectures.
It is especially difficult to keep up with development in deep learning as the research
and industry communities redouble their deep learning efforts, spawning whole new
methods every day.

Deep learning: A neural network with multiple hidden layers.

Deep learning techniques require a lot of data and computation power for best
performance as this method is self-tuning many parameters within vast architectures.
It quickly becomes clear why deep learning practitioners need powerful computers
with GPUs (Graphical Processing Units).

In particular, deep learning techniques have been extremely successful in vision


(image classification), text, audio, and video. The most common software packages
for deep learning are Tensorflow and PyTorch.

7. Transfer learning
Let's say you are a data scientist working in the retail industry. You've spent months
training a high-quality model to classify images as shirts, t-shirts, and polos. Your
new task is to create a similar model to classify clothing images like jeans, cargo,
casual, and dress pants.

Transfer learning refers to reusing part of an already trained neural net and adapting
it to a new but similar task. Specifically, once you train a neural net using the data for
a task, you can move a fraction of the trained layers and combine them with some
new layers that you can use for the new task. The new neural net can learn and adapt
quickly to a new task by adding a few layers.
The advantage of transfer learning is that you need fewer data to train a neural net,
which is especially important because training for deep learning algorithms is
expensive in terms of both time and money.

The main advantage of transfer learning is that you need fewer data to train a neural
net, which is especially important because training for deep learning algorithms is
expensive both in terms of time and money (computational resources). Of course, it
isn't easy to find enough labeled data for training.

Let's come back to your example and assume that you use a neural net with 20
hidden layers for the shirt model. After running a few experiments, you realize that
you can move the 18 layers of the shirt model and combine them with a new layer of
parameters to train on the pant images.

So the Pants model will have 19 hidden layers. The inputs and outputs of the two
functions are different but reusable layers can summarize information relevant to
both, for example, fabric aspects.

Transfer learning has become more and more popular, and there are many concrete
pre-trained models now available for common deep learning tasks such as image
and text classification.

8. Reinforcement Learning
Imagine a mouse in a maze trying to find hidden pieces of cheese. At first, the Mouse
may move randomly, but after a while, the Mouse's feel helps sense which actions
bring it closer to the cheese. The more times we expose the Mouse to the maze, the
better at finding the cheese.

Process for Mouse refers to what we do with Reinforcement Learning (RL) to train a
system or game. Generally speaking, RL is a method of machine learning that helps
an agent to learn from experience.

RL can maximize a cumulative reward by recording actions and using a trial-and-


error approach in a set environment. In our example, the Mouse is the agent, and the
maze is the environment. The set of possible actions for the Mouse is: move forward,
backward, left, or right. The reward is cheese.

You can use RL when you have little or no historical data about a problem, as it does
not require prior information (unlike traditional machine learning methods). In the RL
framework, you learn from the data as you go. Not surprisingly, RL is particularly
successful with games, especially games of "correct information" such as chess and
Go. With games, feedback from the agent and the environment comes quickly,
allowing the model to learn faster. The downside of RL is that it can take a very long
time to train if the problem is complex.

As IBM's Deep Blue beat the best human chess player in 1997, the RL-based
algorithm AlphaGo beat the best Go player in 2016. The current forerunners of RL are
the teams of DeepMind in the UK.

In April 2019, the OpenAI Five team was the first AI to defeat the world champion
team of e-sport Dota 2, a very complex video game that the OpenAI Five team chose
because there were no RL algorithms capable of winning it. You can tell that
reinforcement learning is a particularly powerful form of AI, and we certainly want to
see more progress from these teams. Still, it's also worth remembering the
limitations of the method.

9. Natural Language Processing


A large percentage of the world's data and knowledge is in some form of human
language. For example, we can train our phones to autocomplete our text messages
or correct misspelled words. We can also teach a machine to have a simple
conversation with a human.

Natural Language Processing (NLP) is not a machine learning method but a widely
used technique for preparing text for machine learning. Think of many text
documents in different formats (Word, online blog). Most of these text documents
will be full of typos, missing characters, and other words that need to be filtered out.
At the moment, the most popular package for processing text is NLTK (Natural
Language Toolkit), created by Stanford researchers.

The easiest way to map text to a numerical representation is to count the frequency
of each word in each text document. Think of a matrix of integers where each row
represents a text document, and each column represents a word. This matrix
representation of the term frequency is usually called the term frequency matrix
(TFM). We can create a more popular matrix representation of a text document by
dividing each entry on the matrix by the weighting of how important each word is in
the entire corpus of documents. We call this method Term Frequency Inverse
Document Frequency (TFIDF), and it generally works better for machine learning
tasks.

10. Word Embedding


TFM and TFIDF are numerical representations of text documents that consider only
frequency and weighted frequencies to represent text documents. In contrast, word
embedding can capture the context of a word in a document. As with word context,
embeddings can measure similarity between words, allowing us to perform
arithmetic with words.

Word2Vec is a neural net-based method that maps words in a corpus to a numerical


vector. We can then use these vectors to find synonyms, perform arithmetic
operations with words, or represent text documents (by taking the mean of all word
vectors in the document). For example, we use a sufficiently large corpus of text
documents to estimate word embeddings.

Let's say vector('word') is the numeric vector representing the word 'word'. To
approximate the vector ('female'), we can perform an arithmetic operation with the
vectors:

vector('king') + vector('woman') - vector('man') ~ vector('queen')

Arithmetic with Word (Vectors) Embeddings.

The word representation allows finding the similarity between words by computing
the cosine similarity between the vector representations of two words. The cosine
similarity measures the angle between two vectors.

We calculate word embedding's using machine learning methods, but this is often a
pre-stage of implementing machine learning algorithms on top. For example, let's
say we have access to the tweets of several thousand Twitter users. Let's also assume
that we know which Twitter users bought the house. To estimate the probability of a
new Twitter user buying a home, we can combine Word2Vec with logistic regression.

You can train the word embedding yourself or get a pre-trained (transfer learning)
set of word vectors. To download pre-trained word vectors in 157 different
languages, look at Fast Text.

Summary
Studying these methods thoroughly and fully understanding the basics of each can
serve as a solid starting point for further study of more advanced algorithms and
methods.

There is no best way or one size fits all. Finding the right algorithm is partly just trial
and error - even highly experienced data scientists can't tell whether an algorithm
will work without trying it out. But algorithmic selection also depends on the size and
type of data you're working with, the insights you want to derive from the data, and
how those insights will be used.

AutoML | Automated Machine


Learning
AutoML enables everyone to build the machine learning models and make use of its
power without having expertise in machine learning.

In recent years, Machine Learning has evolved very rapidly and has become one of
the most popular and demanding technology in current times. It is currently being
used in every field, making it more valuable. But there are two biggest barriers to
making efficient use of machine learning (classical & deep learning): skills
and computing resources. However, computing resources can be made available by
spending a good amount of money, but the availability of skills to solve the machine
learning problem is still difficult. It means it is not available for those with limited
machine learning knowledge. To solve this problem, Automated Machine Learning
(AutoML) came into existence. In this topic, we will understand what AuotML is and
how it affects the world?

What is AutoML?
Automated Machine Learning or AutoML is a way to automate the time-consuming
and iterative tasks involved in the machine learning model development process. It
provides various methods to make machine learning available for people with limited
knowledge of Machine Learning. It aims to reduce the need for skilled people to
build the ML model. It also helps to improve efficiency and to accelerate the research
on Machine learning.

To better understand automated machine learning, we must know the life cycle of a
data science or ML project. A typical lifecycle of a data science project contains the
following phases:

o Data Cleaning
o Feature Selection/Feature Engineering
o Model Selection
o Parameter Optimization
o Model Validation.

Although the technology has become so advanced, still all these processes need
manual processes, which are time-consuming and require many skilled data
scientists. The complexity of completing these tasks is very difficult for the non-ml
experts. The rapid growth of ML applications has generated the demand for
automating these processes so that they can also be easily used without expert
knowledge. Hence, to automate the entire process from data cleaning -to-parameter
optimization, Automated machine learning came into existence. It does save not only
time but also gives a tremendous performance.

PlayNext
Unmute

Current Time 0:00

Duration 18:10
Loaded: 0.37%
Â
Fullscreen
Backward Skip 10sPlay VideoForward Skip 10s

AutoML Platforms
AutoML has evolved before many years, but in the last few years, it has gained
popularity. There are several platforms or frameworks that have emerged. These
platforms enable the user to train the model using drag & drop design tools.
1. Google Cloud AutoML

Google has launched several AutoML products for building our own custom machine
learning models as per the business needs, and it also allows us to integrate these
models into our applications or websites. Google has created the following product:

o AutoML Natural Language


o AutoML Tables
o AutoML translation
o AutoML Video Intelligence
o AutoML Vision

The above products provide various tools to train the model for specific use cases
with limited machine learning expertise. For cloud AutoML, we don't need to have
knowledge of transfer learning or how to create a neural network, as it provides the
out-of-box for deep learning models.

2. Microsoft Azure AutoML

The Microsoft Azure AutoML was released in the year 2018. It also offers a
transparent model selection process to non-ml experts to build the ML models.

3. H2O.ai

H2O is an open-source platform that enables the user to create ML models. It can be
used for automating the machine learning workflow, such as automatic training and
tuning of many models within a user-specified time limit. Although H2O AutoML can
make the development of ML models easy for the non-experts still, a good
knowledge of data science is required to build the high-performing ML models.

4. TPOT

TPOT(Tree-based Pipeline Optimization) can be considered as a Data science


assistant for developers. It is a Python packaged Automated Machine Learning tool,
which uses genetic programming to optimize the machine learning pipelines. It is
built on the top of the scikit-learn, so it will be easy for the developers to work with it
(if they are aware of scikit learn). It automates all the tedious parts of the ML lifecycle
by exploring thousands of possible processes to find the best one for the particular
requirement. After finishing the search, it provides us with the Python code for the
best pipeline.

5. DataRobot
DataRobot is one of the best AutoML tools platforms. It provides complete
automation by automating the ML pipeline and supports all the steps required for
the preparation, building, deployment, monitoring, and maintaining the powerful AI
applications.

6. Auto-Sklearn

Auto-Sklearn is an open-source library built on the top of scikit learn. It automatically


does algorithm selection and parameter tuning for a machine learning model. It
provides out-of-the-box features of supervised learning.

7. MLBox

MLBox also provides the powerful Python Library for automated Machine Learning.

How does Automated Machine Learning


Work?
Automated machine learning or AutoML is an open-source library that automates
each step of the machine learning lifecycle, including preparing a dataset to deploy
an ML model. It works in a completely different way than the traditional machine
learning method, where we need to develop the model manually, and each step is
handled separately.
AutoML automatically selects and locates the optimal and most suitable algorithm as
per our problem or given task. It performs by following the two basic concepts:

o Neural Architecture Search: It helps in automating the design of neural


networks. It enables AutoML models to discover new architectures as per the
problem requirement.
o Transfer Learning: With the help of transfer learning, previously trained
models can apply their logic to new datasets that they have learned. It enables
AutoML models to apply available architectures to the new problems.

With AutoML, a Machine learning enthusiast can use Machine learning or deep
learning models by using Python language. Moreover, below are the steps that are
automated by AutoML that occur in the Machine learning lifecycle or learning
process:

o Raw data processing


o Feature engineering
o Model selection
o Hyperparameter optimization and parameter optimization
o Deployment with consideration for business and technology constraints
o Evaluation metric selection
o Monitoring and problem checking
o Result Analysis

Pros of AutoML
o Performance: AutoML performs most of the steps automatically and gives a
great performance.
o Efficiency: It provides good efficiency by speeding up the machine learning
process and by reducing the training time required to train the models.
o Cost Savings: As it saves time and the learning process of machine learning
models, hence also reduces the cost of developing an ML model.

Cons of AutoML
o One of the main challenges of AutoML is that it is currently viewed as the
replacement/alternative of human knowledge & intervention. Similar to other
automation processes, AutoML is designed to perform the routine task
automatically with efficiency and accuracy to allow humans to focus only on a
complex task. Some routine tasks such as monitoring, analysis & problem
detection are much faster when done automatically. However, humans should
also be involved to supervised the model, but no need to involve in a step-by-
step process. Moreover, it is to help the human by enhancing the efficiency,
not to replace the human.
o AutoML is a comparatively new & developing field, and most of the popular
tools are not yet fully developed.

Applications of AutoML
AutoML shares common use cases with traditional machine learning. Some of these
include:

o Image Recognition: AutoML is also used in image recognition for Facial


Recognition.
o Risk Assessment: For banking, finance, and insurance, it can be used for Risk
Assessment and management.
o Cybersecurity: In the cybersecurity field, it can be used for risk monitoring,
assessment, and testing.
o Customer Support: Customer support where can be used for sentiment
analysis in chatbots and to increase the efficiency of the customer support
team.
o Malware & Spam: To detect malware and spam, AutoML can generate
adaptive cyberthreats.
o Agriculture: In the Agriculture field, it can be used to accelerate the quality
testing process.
o Marketing: In the Marketing field, AutoML is employed to predict analytics
and improve engagement rates. Moreover, it can also be used to enhance the
efficiency of behavioral marketing campaigns on social media.
o Entertainment: In the entertainment field, it can be used as the content
selection engine.
o Retail: In Retail, AutoML can be used to improve profits and reduce the
inventory carry.
Demystifying Machine Learning
Machine Learning: This is a powerful term! Machine learning is the hottest topic
these times! Why shouldn't it be? The majority of "enticing" new development in
Computer Science and Software Development generally has something connected to
machine learning hidden behind a veil. Microsoft's Cortana - Machine Learning.
Object and Face Recognition - Machine Learning and Computer Vision. The most
advanced UX improvement programs include Machine Learning (yes! The Amazon
product suggestion we received is the result of the number-crunching efforts of a
Machine Learning Algorithm).

It's not only that. Machine Learning and Data Science generally are everywhere. Why?
Because data is everywhere!

Therefore, it's only natural that someone with an above-average brain and can
distinguish between Programming Paradigms by looking at Code is enthralled at the
prospect of Machine Learning.

What do we mean by Machine Learning? And how big is Machine Learning? Let's
explore Machine Learning, once and for all. Instead of presenting the technical specs,
we'll use the "Understand by Example" approach.

PlayNext
Unmute

Current Time 0:00

Duration 18:10
Loaded: 0.37%
Â
Fullscreen
Backward Skip 10sPlay VideoForward Skip 10s

Machine Learning: What is it really?


Machine Learning is a subfield of Artificial Intelligence that evolved from Pattern
Recognition and Computational Learning theory. Arthur Lee Samuel defines
Machine Learning as a field of study that provides computers with the ability to learn
without needing to code explicitly.

This area is Computer Science and Artificial intelligence, which "learns" by studying
data without human intervention.
However, this notion is not without flaws. Due to this belief, when the term Machine
Learning is thrown around, it is usually thought of as "Artificial Intelligence" as well as
"Neural networks that are able to emulate Human brains (currently it isn't possible)"
or self-Driving cars and so on. However, Machine Learning is far beyond the scope.
We will explore some typical and some not generally thought of aspects in Modern
Computing where Machine Learning is at work.

Machine Learning: The Expectated


Let's start by highlighting some areas in which Machine Learning plays a role.

1. Speech Recognition (Natural Language Processing in more technical


terms):
We communicate with Cortana through Windows Devices. How does it
comprehend what we're saying? The field of Natural Language Processing, or
N.L.P. It is the study of the interactions among Machines and Humans through
Linguistics. The centre of N.L.P. is Machine Learning Algorithms, and Systems
(Hidden Markov Models being just one).
2. Computer Vision:
Computer Vision is a subfield of Artificial Intelligence which studies the
Machine's (probable) perception of the Real World. This means that the
entirety of Facial Recognition, Pattern Recognition Character Recognition
techniques are part of Computer Vision. Additionally, Machine Learning, once
again, with its broad range of Algorithms, is at the centre of Computer Vision.
3. Google's Self Driving Car:
Well, it's possible to imagine what it is that drives car. Further Machine
Learning goodness.
These were not necessarily new applications. Even the most sceptical of
people would have an understanding of these technological feats that were
brought to life by certain "mystical (and extremely difficult) mind-boggling
Computer magic".

Machine Learning: The Unexpected


Let's see some fields where people who don't normally connect easily to Machine
Learning:

o Amazon's Product Reviews: We might have wondered why Amazon always


offers a suggestion that entices you to reduce your spending. It's machine-
learning Algorithm(s) known as "Recommender Systems" that is working
behind the scenes. It analyses each user's preferences and provides
suggestions based on them.
o YouTube/Netflix: They function exactly like the above!
o Data Mining or Data Mining / Big Data: This may not come as an
astonishment to some. Machine Learning is lurking nearby if there's a purpose
of obtaining the information out of data. However, Data Mining and Big Data
are just a way of learning and studying the data at a greater size.
o Real Estate, Stock Markets, Housing Finance: All of these fields make use of
a number of Machine Learning systems in order to be able to assess the
market, specifically "Regression Techniques", for things as basic as predicting
the value of a House or studying trends in the stock market.

Now, that we may have noticed, Machine Learning is everywhere. Everything from
Research and Development to improving the business for Small Companies. It's all
over. This makes for a great career opportunity since the field is growing and is that
will not end anytime very soon.

Challenges of Machine Learning


Machine learning is a subfield of artificial intelligence (AI) and computer science that
focuses on the application of algorithms and data to replicate the way humans learn.
It is a process that improves the accuracy of machine learning.

In this tutorial, we will discuss the challenges of Machine Learning

Challenges of Machine Learning


The advancement of machine learning technology in recent years certainly has
improved our lives. However, the implementation of machine learning in companies
has also brought up several ethical issues regarding AI technology. A few of them
are:

Technological Singularity:
Although this topic attracts lots of attention from the many public, scientists are not
interested in the notion of AI exceeding humans' intelligence anytime in the
immediate future. This is often referred to as superintelligence and superintelligence,
which Nick Bostrum defines as "any intelligence that far surpasses the top human
brains in virtually every field, which includes general wisdom, scientific creativity and
social abilities." In spite of the fact that the concept of superintelligence and strong
AI isn't a reality in the world, the concept poses some interesting questions when we
contemplate the potential use of autonomous systems, such as self-driving vehicles.
It's impossible to imagine that a car with no driver would never be involved in a car
accident, but who would be accountable and accountable in those situations? Do we
need to continue to explore autonomous vehicles, or should we restrict the use of
this technology to produce semi-autonomous cars that encourage the safety of
drivers? The jury isn't yet out on this issue. However, these kinds of ethical debates
are being fought as new and genuine AI technology is developed.

PlayNext
Unmute

Current Time 0:00

Duration 18:10
Loaded: 0.37%
Â
Fullscreen
Backward Skip 10sPlay VideoForward Skip 10s

AI Impact on Jobs:
While the majority of public opinion about artificial intelligence revolves around job
loss, the issue should likely be changed. With each new and disruptive technology,
we can see shifts in demand for certain job positions. For instance, when we consider
the automotive industry, a lot of manufacturers like GM are focusing their efforts on
electric vehicles to be in line with green policies. The energy sector isn't going away,
but the primary source that fuels it is changing from an energy economy based on
fuel to an electrical one. Artificial intelligence must be seen as a way to think about it,
as artificial intelligence is expected to shift the need for jobs to different areas. There
will be people who can control these systems as data expands and changes each day.
It is still necessary resources in order to solve more complicated issues within sectors
that are more likely to suffer from demand shifts, including customer service. The
most important element of artificial intelligence and its impact on the employment
market will be in helping individuals adapt to the new realms that are a result of the
market.

Privacy:
Privacy is often frequently discussed in relation to data privacy security, data
protection, and security. These concerns have helped policymakers advance their
efforts recently. For instance, in 2016, GDPR legislation was introduced to safeguard
the personal information of individuals within Europe's European Union and
European Economic Area, which gives individuals more control over their data.
Within the United States, individual states are creating policies, including the
California Consumer Privacy Act (CCPA), that require companies to inform their
customers about the processing of their data. This legislation is forcing companies to
think about how they handle and store personally identifiable information (PII). In the
process, security investments have become a business priority to remove any
potential vulnerabilities or opportunities to hack, monitor, and cyber-attacks.

Bias and Discrimination:


Discrimination and bias in different intelligent machines have brought up several
ethical issues about using artificial intelligence. How can we protect ourselves from
bias and discrimination when training data could be biased? While most companies
have well-meaning intentions with regard to their automation initiatives, Reuters
highlights the unexpected effects of incorporating AI in hiring practices. As they tried
to automate and make it easier to do so, Amazon unintentionally biased potential
candidates based on gender in positions in the technical field, which led them to end
the project. When events like these come to light, Harvard Business Review (link
located outside of IBM) has raised pertinent questions about the application of AI in
hiring practices. For example, what kind of data could you analyse when evaluating a
candidate for a particular job.

Discrimination and bias aren't just limited to the human resource function. They are
present in a variety of applications ranging from software for facial recognition to
algorithms for social media.

Accountability:
There isn't a significant law to control AI practices. There's no mechanism for
enforcement to make sure that ethical AI is being used. Companies' primary
motivations to adhere to these standards are the negative effects of an
untrustworthy AI system on their bottom lines. To address the issue, ethical
frameworks have been developed in a partnership between researchers and ethicists
to regulate the creation and use of AI models. But, for the time being, they only serve
as a provide guidance the development of AI models. Research has shown that
shared responsibility and insufficient awareness of potential effects aren't ideal for
protecting society from harm.
Difference between Model Parameter
and Hyperparameter
For a Machine learning beginner, there can be so many terms that could seem
confusing, and it is important to clear this confusion to be proficient in this field. For
example, "Model Parameters" and "Hyperparameters". Not having a clear
understanding of both terms is a common struggle for beginners. So, in order to
clear this confusion, let's understand the difference between parameter and
hyperparameter and how they can be related to each other.

What is a Model Parameter?


Model parameters are configuration variables that are internal to the model,
and a model learns them on its own. For example, W Weights or Coefficients of
independent variables in the Linear regression model. Weights or Coefficients of
independent variables SVM, weight, and biases of a neural network, cluster centroid
in clustering.

We can understand model parameters using the below image:


The above plot shows the model representation of Simple Linear Regression. Here, x
is an independent variable, y is the dependent variable, and the goal is to fit the best
regression line for the given data to define a relationship between x and y. The
regression line can be given by the equation:

PlayNext
Unmute

Current Time 0:00

Duration 18:10
Loaded: 0.37%
Â
Fullscreen
Backward Skip 10sPlay VideoForward Skip 10s

1. y= mx+c

Where m is the slope of the line, and c is the intercept of the line. These two
parameters are calculated by fitting the line by minimizing RMSE, and these are
known as model parameters.

ome key points for model parameters are as follows:

o The model uses them for making predictions.


o They are learned by the model from the data itself
o These are usually not set manually.
o These are the part of the model and key to Machine Learning Algorithms.

What is Model Hyperparameter?


Hyperparameters are those parameters that are explicitly defined by the user to
control the learning process.

o These are usually defined manually by the machine learning engineer.


o One cannot know the exact best value for hyperparameters for the given
problem. The best value can be determined either by the rule of thumb or by
trial and error.

Some examples of Hyperparameters are the learning rate for training a neural
network, K in the KNN algorithm, etc.

Comparison table between Parameters and


Hyperparameters
Parameters Hyperparameters

Parameters are the configuration Hyperparameters are the explicitly


model, which are internal to the model. specified parameters that control the
training process.

Parameters are essential for making Hyperparameters are essential for


predictions. optimizing the model.

These are specified or estimated while These are set before the beginning of the
training the model. training of the model.

It is internal to the model. These are external to the model.

These are learned & set by the model These are set manually by a machine
by itself. learning engineer/practitioner.

These are dependent on the dataset, These are independent of the dataset.
which is used for training.

The values of parameters can be The values of hyperparameters can be


estimated by the optimization estimated by hyperparameter tuning.
algorithms, such as Gradient Descent.

The final parameters estimated after The selected or fine-tuned


training decide the model performance hyperparameters decide the quality of the
on unseen data. model.

Some examples of model parameters Some examples of model


are Weights in an ANN, Support hyperparameters are the learning rate for
vectors in SVM, Coefficients in Linear training a neural network, K in the KNN
Regression or Logistic Regression. algorithm, etc.

Conclusion
In this article, we have understood the clear definitions of Model Parameters are
Hyperparameters and the difference between both of them. In brief, Model
parameters are internal to the model and estimated from data automatically,
whereas Hyperparameters are set manually and are used in the optimization of the
model and help in estimating the model parameters.
Hyperparameters in Machine
Learning
Hyperparameters in Machine learning are those parameters that are explicitly
defined by the user to control the learning process. These hyperparameters are
used to improve the learning of the model, and their values are set before starting
the learning process of the model.

In this topic, we are going to discuss one of the most important concepts of machine
learning, i.e., Hyperparameters, their examples, hyperparameter tuning, categories of
hyperparameters, how hyperparameter is different from parameter in Machine
Learning? But before starting, let's first understand the Hyperparameter.

What are hyperparameters?


In Machine Learning/Deep Learning, a model is represented by its parameters. In
contrast, a training process involves selecting the best/optimal hyperparameters that
are used by learning algorithms to provide the best result. So, what are these
hyperparameters? The answer is, "Hyperparameters are defined as the parameters
that are explicitly defined by the user to control the learning process."

Here the prefix "hyper" suggests that the parameters are top-level parameters that
are used in controlling the learning process. The value of the Hyperparameter is
selected and set by the machine learning engineer before the learning algorithm
begins training the model. Hence, these are external to the model, and their
values cannot be changed during the training process.
Backward Skip 10sPlay VideoForward Skip 10s

Some examples of Hyperparameters in Machine


Learning

o The k in kNN or K-Nearest Neighbour algorithm


o Learning rate for training a neural network
o Train-test split ratio
o Batch Size
o Number of Epochs
o Branches in Decision Tree
o Number of clusters in Clustering Algorithm

Difference between Parameter and


Hyperparameter?
There is always a big confusion between Parameters and hyperparameters or model
hyperparameters. So, in order to clear this confusion, let's understand the difference
between both of them and how they are related to each other.

Model Parameters:
Model parameters are configuration variables that are internal to the model, and a
model learns them on its own. For example, W Weights or Coefficients of
independent variables in the Linear regression model. or Weights or
Coefficients of independent variables in SVM, weight, and biases of a neural
network, cluster centroid in clustering. Some key points for model parameters are
as follows:

o They are used by the model for making predictions.


o They are learned by the model from the data itself
o These are usually not set manually.
o These are the part of the model and key to a machine learning Algorithm.

Model Hyperparameters:
Hyperparameters are those parameters that are explicitly defined by the user to
control the learning process. Some key points for model parameters are as follows:
o These are usually defined manually by the machine learning engineer.
o One cannot know the exact best value for hyperparameters for the given
problem. The best value can be determined either by the rule of thumb or by
trial and error.
o Some examples of Hyperparameters are the learning rate for training a
neural network, K in the KNN algorithm,

Categories of Hyperparameters
Broadly hyperparameters can be divided into two categories, which are given below:

1. Hyperparameter for Optimization


2. Hyperparameter for Specific Models

Hyperparameter for Optimization


The process of selecting the best hyperparameters to use is known as
hyperparameter tuning, and the tuning process is also known as hyperparameter
optimization. Optimization parameters are used for optimizing the model.
Some of the popular optimization parameters are given below:

o Learning Rate: The learning rate is the hyperparameter in optimization


algorithms that controls how much the model needs to change in response to
the estimated error for each time when the model's weights are updated. It is
one of the crucial parameters while building a neural network, and also it
determines the frequency of cross-checking with model parameters. Selecting
the optimized learning rate is a challenging task because if the learning rate is
very less, then it may slow down the training process. On the other hand, if the
learning rate is too large, then it may not optimize the model properly.

Note: Learning rate is a crucial hyperparameter for optimizing the model, so if there is a
requirement of tuning only a single hyperparameter, it is suggested to tune the learning
rate.

o Batch Size: To enhance the speed of the learning process, the training set is
divided into different subsets, which are known as a batch. Number of
Epochs: An epoch can be defined as the complete cycle for training the
machine learning model. Epoch represents an iterative learning process. The
number of epochs varies from model to model, and various models are
created with more than one epoch. To determine the right number of epochs,
a validation error is taken into account. The number of epochs is increased
until there is a reduction in a validation error. If there is no improvement in
reduction error for the consecutive epochs, then it indicates to stop increasing
the number of epochs.

Hyperparameter for Specific Models


Hyperparameters that are involved in the structure of the model are known as
hyperparameters for specific models. These are given below:

o A number of Hidden Units: Hidden units are part of neural networks, which
refer to the components comprising the layers of processors between input
and output units in a neural network.

It is important to specify the number of hidden units hyperparameter for the neural
network. It should be between the size of the input layer and the size of the output
layer. More specifically, the number of hidden units should be 2/3 of the size of the
input layer, plus the size of the output layer.
For complex functions, it is necessary to specify the number of hidden units, but it
should not overfit the model.

o Number of Layers: A neural network is made up of vertically arranged


components, which are called layers. There are mainly input layers, hidden
layers, and output layers. A 3-layered neural network gives a better
performance than a 2-layered network. For a Convolutional Neural network, a
greater number of layers make a better model.

Conclusion
Hyperparameters are the parameters that are explicitly defined to control the
learning process before applying a machine-learning algorithm to a dataset. These
are used to specify the learning capacity and complexity of the model. Some of the
hyperparameters are used for the optimization of the models, such as Batch size,
learning rate, etc., and some are specific to the models, such as Number of Hidden
layers, etc.

Importance of Machine Learning


Machine Learning is one of the most popular sub-fields of Artificial Intelligence.
Machine learning concepts are used almost everywhere, such as Healthcare, Finance,
Infrastructure, Marketing, Self-driving cars, recommendation systems, chatbots, social
sites, gaming, cyber security, and many more.
Currently, Machine Learning is under the development phase, and many new
technologies are continuously being added to Machine Learning. It helps us in many
ways, such as analyzing large chunks of data, data extractions, interpretations, etc.
Hence, there are unlimited numbers of uses of Machine Learning. In this topic, we
will discuss various importance of Machine Learning with examples. So, let's start
with a quick introduction to Machine Learning.

What is Machine Learning?


Machine Learning is a branch of Artificial Intelligence that allows machines to learn
and improve from experience automatically. It is defined as the field of study that
gives computers the capability to learn without being explicitly programmed. It is
quite different than traditional programming.

How Machine Learning Works?


Machine Learning is a core form of Artificial Intelligence that enable machine to learn
from past data and make predictions

PlayNext
Unmute

Current Time 0:00

Duration 18:10
Loaded: 0.37%
Â
Fullscreen
Backward Skip 10sPlay VideoForward Skip 10s

It involves data exploration and pattern matching with minimal human intervention.
There are mainly four technologies that machine learning used to work:

1. Supervised Learning:
Supervised Learning is a machine learning method that needs supervision similar to
the student-teacher relationship. In supervised Learning, a machine is trained with
well-labeled data, which means some data is already tagged with correct outputs. So,
whenever new data is introduced into the system, supervised learning algorithms
analyze this sample data and predict correct outputs with the help of that labeled
data.
It is classified into two different categories of algorithms. These are as follows:

o Classification: It deals when output is in the form of a category such as


Yellow, blue, right or wrong, etc.
o Regression: It deals when output variables are real values like age, height, etc.

This technology allows us to collect or produce data output from experience. It works
the same way as humans learn using some labeled data points of the training set. It
helps in optimizing the performance of models using experience and solving various
complex computation problems.

2. Unsupervised Learning:
Unlike supervised learning, unsupervised Learning does not require classified or well-
labeled data to train a machine. It aims to make groups of unsorted information
based on some patterns and differences even without any labelled training data. In
unsupervised Learning, no supervision is provided, so no sample data is given to the
machines. Hence, machines are restricted to finding hidden structures in unlabeled
data by their own.

It is classified into two different categories of algorithms. These are as follows:

o Clustering: It deals when there is a requirement of inherent grouping in


training data, e.g., grouping students by their area of interest.
o Association: It deals with the rules that help to identify a large portion of
data, such as students who are interested in ML and also interested in AI.

3. Semi-supervised learning:
Semi-supervised Learning is defined as the combination of both supervised and
unsupervised learning methods. It is used to overcome the drawbacks of both
supervised and unsupervised learning methods.

In the semi-supervised learning method, a machine is trained with labeled as well as


unlabeled data. Although, it involves a few labeled examples and a large number of
unlabeled examples.

Speech analysis, web content classification, protein sequence classification, and text
documents classifiers are some most popular real-world applications of semi-
supervised Learning.

4. Reinforcement learning:
Reinforcement learning is defined as a feedback-based machine learning method
that does not require labeled data. In this learning method, an agent learns to
behave in an environment by performing the actions and seeing the results of
actions. Agents can provide positive feedback for each good action and negative
feedback for bad actions. Since, in reinforcement learning, there is no training data,
hence agents are restricted to learn with their experience only.

Importance of Machine Learning


Although machine learning is continuously evolving with so many new technologies,
it is still used in various industries.

Machine learning is important because it gives enterprises a view of trends in


customer behavior and operational business patterns, as well as supports the
development of new products. Many of today's leading companies, such
as Facebook, Google, and Uber, make machine learning a central part of their
operations. Machine learning has become a significant competitive differentiator for
many companies.

Machine learning has several practical applications that drive the kind of real
business results - such as time and money savings - that have the potential to
dramatically impact the future of your organization. In particular, we see tremendous
impact occurring within the customer care industry, whereby machine learning is
allowing people to get things done more quickly and efficiently. Through Virtual
Assistant solutions, machine learning automates tasks that would otherwise need to
be performed by a live agent - such as changing a password or checking an account
balance. This frees up valuable agent time that can be used to focus on the kind of
customer care that humans perform best: high touch, complicated decision-making
that is not as easily handled by a machine. At Interactions, we further improve the
process by eliminating the decision of whether a request should be sent to a human
or a machine: unique Adaptive Understanding technology, the machine learns to be
aware of its limitations, and bailout to humans when it has low confidence in
providing the correct solution.

Use cases of Machine Learning


Technology
Machine Learning is broadly used in every industry and has a wide range of
applications, especially that involves collecting, analyzing, and responding to large
sets of data. The importance of Machine Learning can be understood by these
important applications.
Some important applications in which machine learning is widely used are given
below:

1. Healthcare: Machine Learning is widely used in the healthcare industry. It


helps healthcare researchers to analyze data points and suggest outcomes.
Natural language processing helped to give accurate insights for better results
of patients. Further, machine learning has improved the treatment methods by
analyzing external data on patients' conditions in terms of X-ray, Ultrasound,
CT-scan, etc. NLP, medical imaging, and genetic information are key areas of
machine learning that improve the diagnosis, detection, and prediction system
in the healthcare sector.
2. Automation: This is one of the significant applications of machine learning
that helps to make the system automated. It helps machines to perform
repetitive tasks without human intervention. As a machine learning engineer
and data scientist, you have the responsibilities to solve any given task
multiple times with no errors. However, this is not practically possible for
humans. Hence machine learning has developed various models to automate
the process, having the capability of performing iterative tasks in lesser time.
3. Banking and Finance: Machine Learning is a subset of AI that uses statistical
models to make accurate predictions. In the banking and finance sector,
machine learning helped in many ways, such as fraud detection, portfolio
management, risk management, chatbots, document analysis, high-frequency
trading, mortgage underwriting, AML detection, anomaly detection, risk credit
score detection, KYC processing, etc. Hence, machine learning is widely
applied in the banking and finance sector to reduce error as well as time.
4. Transportation and Traffic Prediction: This is one of the most common
applications of Machine Learning that is widely used by all individuals in their
daily routine. It helps to ensure highly secured routes, generate accurate ETAs,
predict vehicle breakdown, Driving Prescriptive Analytics, etc. Although
machine learning has solved transportation problems, it still requires more
improvement. Statistical machine learning algorithms helps to build a smart
transportation system. Further, deep Learning explored the complex
interactions of roads, highways, traffic, environmental elements, crashes, etc.
Hence, machine learning technology has improved daily traffic management
as well as a collection of traffic data to predict insights of routes and traffic.
5. Image Recognition: It is one of the most common applications of machine
learning which is used to detect the image over the internet. Further, various
social media sites such as Facebook uses image recognition for tagging the
images to your Facebook friends with its feature named auto friend tagging
suggestion.
Further, now a day's, almost all mobile devices come with exciting face
detection features. Using this feature, you can secure your mobile data with
face unlocking, so if anyone tries to access your mobile device, they cannot
open without face recognition.
6. Speech Recognition: Speech recognition is one of the biggest achievements
of machine learning applications. It enables users to search content without
writing text or, in other words, 'search by voice'. It can search
content/products on YouTube, Google, Amazon, etc. platforms by your voice.
This technology is referred to as speech recognition.
It is a process of converting voice instructions into the text; hence it is also
known as 'Speech to text' or 'Computer speech recognition. Some important
examples of speech recognitions are Google assistant, Siri, Cortana, Alexa,
etc.
7. Product Recommendation: It is one of the biggest achievements made by
machine learning which helps various e-commerce and entertainment
companies like Flipkart, Amazon, Netflix, etc., to digitally advertise their
products over the internet. When anyone searches for any product, they start
getting an advertisement for the same product while internet surfing on the
same browser.
This is possible by machine learning algorithms that work on users' interests
or past experience and accordingly recommend them for products. For e.g.,
when we search for a laptop on the Amazon platform, then it also gets started
with so many other laptops having the same categories and criteria. Similarly,
when we use Netflix, we find some recommendations for entertainment series,
movies, etc. Hence, this is also possible by machine learning algorithms.
8. Virtual Personal Assistance: This feature helps us in many ways, such as
searching content using voice instruction, calling a number using voice,
searching contact in your mobile, playing music, opening an email, Scheduling
an appointment, etc. Now a day, you all have seen advertising like "Alexa!
Play the Music" this is also done with the help of machine learning. Google
Assistant, Alexa, Cortana, Siri, etc., are a few common applications of machine
learning. These virtual personal assistants record our voice instructions, send
them over to the server on a cloud, decode it using ML algorithms and act
accordingly.
9. Email Spam and Malware detection & Filtering: Machine learning also
helps us for filtering emails in different categories such as spam, important,
general, etc. In this way, users can easily identify whether the email is useful or
spam. This is also possible by machine learning algorithms such as Multi-
Layer Perceptron, Decision tree, and Naïve Bayes classifier. Content filter,
header filter, rules-based filter, permission filter, general blacklist filter, etc.,
are some important spam filters used by Google.
10. Self-driving cars: This is one of the most exciting applications of machine
learning. Machine learning plays a vital role in the manufacturing of self-
driving cars. It uses an unsupervised learning method to train car models to
detect people and objects while driving. Tata and Tesla are the most popular
car manufacturing companies working on self-driving cars. Hence, it is a big
revolution in a technological era which is also done with the help of machine
learning.
11. Credit card fraud detection: Credit card frauds have become very easy
targets for online hackers. As the culture of online/digital payments is
increasing, the risk of credit/debit cards is parallel increasing. Machine
Learning also helps developers to detect and analyze frauds in online
transactions. It develops a novel fraud detection method for Streaming
Transaction Data, with an objective to analyze the past transaction details of
the customers and extract the behavioral patterns. Further, cardholders are
clustered into various categories with their transaction amount so that the
behavioral pattern of the groups can be extracted respectively. Hence, credit
card fraud detection is a novel approach using Aggregation Strategy and
Feedback Mechanism of machine learning.
12. Stock Marketing and Trading: Machine learning also helps in the stock
marketing and trading sector, where it uses historical trends or past
experience for predicting the market risk. As share marketing is another name
of marketing risk, machine learning reduces it to some extent and predicts
data against marketing risk. Machine learning's long short-term neural
memory network is used for the prediction of stock market trends.
13. Language Translation: The use of Machine learning can be seen in language
translation. It uses the sequence-to-sequence learning algorithms for
translating one language into other. Further, it also uses images recognition
techniques to identify the text from one language to other. Similarly, Google's
GNMT (Google Neural Machine Translation) provides this feature, which is a
Neural Machine Learning that translates the text into our familiar language,
and it is called automatic translation.

Conclusion:
Machine Learning is directly or indirectly involved in our daily routine. We have seen
various machine learning applications that are very useful for surviving in this
technical world. Although machine learning is in the developing phase, it is
continuously evolving rapidly. The best thing about machine learning is its High-
value predictions that can guide better decisions and smart actions in real-time
without human intervention. Hence, at the end of this article, we can say that the
machine learning field is very vast, and its importance is not limited to a specific
industry or sector; it is applicable everywhere for analyzing or predicting future
events.

Machine Learning and Cloud


Computing
In this technology-driven time, Machine Learning and Cloud Computing are the most
powerful technologies worldwide. Both these technologies play a crucial role for
small and big organizations to grow their businesses.

Machine Learning helps users make predictions and develop algorithms that can
automatically learn by using historical data. However, various machine learning
algorithms such as Linear Regression, Logistic Regression, SVM, Decision Tree,
Naïve Bayes, K-Means, random forest, Gradient Boosting algorithms, etc.,
require a massive amount of storage that become pretty challenging for a data
scientist as well as machine learning professionals. Cloud computing becomes a
game-changer for deploying machine learning models in such situations. Cloud
computing helps to enhance and expand machine learning applications. The
combination of machine learning and cloud computing is also known as
intelligent Cloud.

This article will discuss machine learning and cloud computing, the advantages of ML
using the Cloud, applications of ML algorithms using Cloud, and much more. So, let's
start with a quick introduction to Machine Learning and Cloud computing.

What is Machine Learning?


Machine Learning is an Artificial Intelligence (AI) application that allows machines to
learn and improve from experience automatically. Machine Learning can be classified
as follows:

PlayNext
Unmute

Current Time 0:00

Duration 18:10
Loaded: 0.37%
Â
Fullscreen
Backward Skip 10sPlay VideoForward Skip 10s

o Supervised
o Unsupervised
o Semi-supervised
o Reinforcement

The primary aim of Machine Learning is to provide the capability to computers learn
automatically without human intervention or assistance and adjust actions
accordingly.

What is Cloud Computing?


Cloud computing is defined as the outsourcing technology of computer
software, which enables us to access applications and data remotely. It does not
require any software installation and storage in your computer hard drive. Only you
have to sign up to enjoy the services online.
Types of Cloud Computing
Cloud computing is mainly categorized into three types as follows:

o Software as a Service (SaaS)


o Platform as a Service (PaaS)
o Infrastructure as a Service (IaaS)

Why Cloud computing in Machine Learning


Although cloud computing and machine learning are emerging technologies,
machine learning is comparatively new. Both technologies play important roles in
companies' growth, but they become more powerful together. Machine learning
makes intelligent machines or software, and on the other hand, cloud computing
provides storage and security to access these applications.
The main connection between machine learning and cloud computing is
resource demand. Machine learning requires a lot of processing power, data
storage, and many servers simultaneously to work on an algorithm. Then Cloud
computing plays a significant role in providing new servers with pre-defined data
and changing resources over the Cloud (internet). Using cloud computing, you can
spin up any number of servers you want, work on the algorithm, then destroy the
machines again when complete.

Cloud Computing is primarily used for computation purposes, machine learning


needs a lot of computational power to create sample data, and not everyone has
access to many strong machines. Machine learning finds (sometimes) task scheduling
and storage in cloud computing.

Advantages of Machine Learning with


Cloud Computing
Although machine learning and cloud computing have their advantages individually,
together, they have 3 core advantages as follows:

1. Cloud works on the principle of 'pay for what you need'. The Cloud's pay-per-
use model is good for companies who wish to leverage ML capabilities for
their business without much expenditure.
2. It provides the flexibility to work with machine learning functionalities without
having advanced data science skills.
3. It helps us ease of experiment with various ML technologies and scales up as
projects go into production and demand increases.
There are so many cloud service providers that offer lots of ML technologies for
everyone without having prior knowledge of AI and ML.

Top Cloud computing platforms for


Machine Learning
Although there are so many cloud computing platforms available on the internet, few
of them are most popular for machine learning. Let's discuss them in detail.

1. Amazon Web Services (AWS)


Amazon Web Services (AWS) is one of the most popular cloud computing platforms
for Machine Learning, developed by Amazon in 2006. There are so many products
provided by AWS as follows:

o Amazon SageMaker: This product primarily helps to create and train machine
learning models.
o Amazon Forecast: This product helps increase the forecast accuracy of ML
models.
o Amazon Translate: It is used to translate languages in NLP and ML.
o Amazon Personalize: This product creates various personal
recommendations in the ML system.
o Amazon Polly: It is used to convert text into a speech format.
o AWS Deep Learning AMI's: This product is primarily used to solve deep
learning problems in ML.
o Amazon Augmented AI: It implements human review in ML models.

2. Microsoft Azure:
Microsoft Azure is also a popular cloud computing platform offered by Microsoft in
2010. It is popular among data scientists and machine learning professionals for data
analytics requirements.

There are some Microsoft Azure products available for machine learning as follows:

o Microsoft Azure Cognitive Service: This product helps you provide


intelligent cognitive services for ML applications.
o Microsoft Azure Bot Service: This product primarily focuses on creating
smart and intelligent bot services for ML applications.
o Microsoft Azure Databricks: This product provides Apache Spark-based
analytics.
o Microsoft Azure Cognitive Search: This product focuses on mobile and web
applications in Machine Learning.
o Microsoft Azure Machine Learning: This product is responsible for
deploying ML models overcloud.

3. Google Cloud
Google Cloud or Google Cloud Platform is a cloud computing platform that is a
subsidiary of Tech Giant Google developed in 2008. It provides its infrastructure to
customers for developing machine learning models overcloud.

There are a few Google Cloud products available for machine learning as follows:

o Google Cloud Vision AI: This product allows machine learning applications to
easily integrate vision detection features such as image labeling, text
detection, face detection, tagging, etc.
o Google Cloud AI Platform: This product helps develop, sample, and manage
machine learning models.
o Google Cloud Text-to-Speech: This product helps transmit text data into
speech format for training machine learning models.
o Google Cloud Speech-to-Text: This is also one of the important products
that support 120+languages for transmitting speech data into text format.
o Google Cloud AutoML: It helps train a machine learning model and generate
automating machine learning models.
o Google Cloud Natural Language: This product is used in NLP to analyze and
classify text.

4. IBM Cloud:
IBM Cloud (formerly known as Bluemix) is also one of IBM's most popular open-
source cloud computing platforms. It includes various cloud delivery models that are
public, private, and hybrid models.

There are a few IBM Cloud products available for machine learning as follows:

IBM Watson Studio: This product helps develop, run, and manage machine learning
and Artificial Intelligent models.
IBM Watson Natural Language Understanding: It helps us analyze and classify text
in NLP.

IBM Watson Speech-to-Text: As the name suggests, this product is responsible for
converting speech or voice instructions into text format.

IBM Watson Assistant: This product is used for creating and managing the personal
virtual assistant.

IBM Watson Visual Recognition: it helps machine learning search visual images
and classify them.

IBM Watson Text-to-Speech: This product is responsible for converting text or


written instructions into voice format.

We have discussed various cloud computing platforms used in machine learning.


These cloud platforms offer machine learning capabilities and provide support for
three types of predictions as follows:

o Binary Prediction
o Category Prediction
o Value Prediction

Binary Prediction:

In this type of machine learning prediction, we get responses either as true or false.
Binary predictions are useful for credit card fraud detections, order processing,
recommendation systems, etc.

Category Prediction:

These machine learning predictions are responsible for categorizing a dataset based
on experience. For instance, insurance companies use category prediction to
categorize different types of claims.

Value Prediction:

This type of prediction finds patterns within the accumulated data by using learning
models to show the quantitative measure of all the likely outcomes. It helps to
predict the future sale of products in a manufacturing industry.

Applications of Machine Learning


Algorithms using the Cloud
Cognitive Computing
Cognitive computing is a special type of technology that works on the principle of
artificial intelligence and signal processing to reflect human actions. In cognitive
computing, a large amount of data is used to train a machine-learning algorithm.
When cloud and machine learning technologies are used together, it is
called cognitive Cloud, which can be used to access cognitive computing
applications.

Cognitive Cloud is considered as a self-learning process that performs human-like


tasks without human intervention. It uses various machine learning algorithms such
as neural networks, pattern recognition, Natural language processing, data mining,
etc., to perform human-like actions. It can be applicable in several industries such as
retail, logistics, banking & finance, power & energy, cyber security, healthcare,
education, and many more.

Business intelligence:
Business intelligence primarily focuses on improving and making better decisions
making for businesses. Machine learning is a process of automated decision making,
and on the other end, business intelligence is used to understand, organize and
improve that decision making. Further, cloud computing deals with a large amount
of data used to train machine learning models; hence business intelligence becomes
important to store raw data. Further, this unstructured data is transformed into a
structured format using manipulation, transformation, and classification techniques.
These structured data sets are referred to as data warehouses.

Business analysts work on exploring structured data sets using some data
visualization techniques. These techniques are used to create visual dashboards,
which help in understanding information to others. The panels help to analyze and
understand past performance and are used to adapt future strategies to improve
KPIs (Key Business Indicators).

Internet of Things (IoT)


Internet of Things (IoT) is a platform that offers cloud facilities, including data storage
and processing through Internet. Recently, cloud-based ML models are getting
popular. It starts with invoking input data from the client end, processes machine
learning algorithms using artificial neural network (ANN) over cloud servers, and
returns with output to the client again. During this scenario, the client's sensitive
information can be stored on the server, raising privacy issues and making users
reluctant to use the services.
Cloud computing is the easiest method to process bulk data packages generated
through IoT over the internet. It is generally used for real-time project scenarios as an
event processing engine. It worked as a part of the collaboration and was used to
store IoT data and can be accessed remotely. E.g., when IoT is integrated with
personal devices, it can fetch the booking status of your bus and train reservations
and rebook these tickets for passengers whose trains got delayed or canceled.

Personal Assistant:
Personal virtual assistant becomes mandate for developing an organization's
business as it provides support to their customer like a human. Nowadays, all
industries such as banking, healthcare, education, infrastructure, etc., are
implementing these chatbots or personal virtual assistants in their businesses to
perform multiple tasks.

Although they are still in their developing phase and require more improvement,
they still reduce the burden to resolve common customer problems using some
frequently asked questions. Cortana, SIRI, and Alexa are such most popular chatbots.

AI-as-a-Service:
Nowadays, all big cloud companies are providing AI facilities using AI-as-a-service
platforms. Open-source AI functionalities are quite cheaper when deployed in cloud.
These services provide Artificial Intelligence and machine learning functionalities, and
build the capacity of cognitive computations and make the system more intelligent.
It helps to make the system relatively fast and efficient.

Conclusion
Machine Learning with cloud computing is very crucial for next-generation
technologies. The demand for machine learning is continuously increasing with cloud
computing as it offers an ideal environment for machine learning models having a
large amount of data. Further, it can be used to train new systems, identify the
pattern, and make predictions. The Cloud offers a scalable, on-demand environment
to collect, store, curate, and process data.

As well, all cloud service providers realize the importance of machine learning in the
Cloud; it is increasing the demand of Cloud based ML models to small, mid, and
large organizations. Machine learning and cloud computing are mutually exclusive to
one another. If machine learning helps cloud computing to make more enhanced,
efficient, and scalable, then on the other end, cloud computing also expands the
horizon for machine learning applications. Hence, we can say Ml and cloud
computing are intricately interrelated and used together; they can also give
tremendous results.

Anti-Money Laundering using


Machine Learning
When we talk about financial crime, money laundering is one of the biggest threats
in the financial world. Money Laundering is one of the most famous ways to convert
black money into white money. Although various financial institutions follow some
acts and rules to prevent the activity of money laundering, in this technology era
where everything is digital and being recorded by financial software, it isn't easy to
prevent such activities in traditional ways. Hence, all financial institutions are
adopting and equipping themselves with powerful technologies and analytical tools
to combat money laundering.

Machine Learning also plays a significant role in detecting money laundering


activities in financial institutions and automatically restricts users from using their
accounts until the issue is resolved. Machine learning employs various algorithms to
identify money laundering activities and prevent them greatly. In this topic, "Anti
Money Laundering using Machine Learning", we will learn how a machine learning
model can help identify suspicious account activity and provide better support to the
AML team. Hence, before starting this topic, we must know these terms like money
laundering anti-money laundering (AML). So let's start with a quick introduction to
money laundering, anti-money laundering, and then Anti-money laundering using
machine learning models.
What is Money Laundering?
Money laundering is defined as converting a large amount of money obtained
from illegal sources into origination from a legitimate source.

In simple words, it is a process to convert black money into white money.

PlayNext
Unmute

Current Time 0:00

Duration 18:10
Loaded: 0.37%
Â
Fullscreen
Backward Skip 10sPlay VideoForward Skip 10s

Source of Money Laundering


Money laundering can be done with many sources such as Black Salaries, Round
tripping, smuggling, illegal weapons, Casinos, multiple cash withdrawn in high cash
jurisdictions, etc.

What is Anti Money laundering?


Anti-money laundering is defined as the laws, regulations, and procedures
followed by banking and financial institutions to prevent money laundering
activities.

It includes three stages as follows:

o Placement: This is the step where money obtained from illegal sources puts
into financial institutions for the first time.
o Layering: In these steps, money launders make various layers by dividing
money into multiple bank accounts to confuse the banking analysts and ML
algorithms, so they cannot identify the actual source of laundering.
o Integration: This is the final step in sending this layered money to the money
launder's account.
Anti Money Laundering (AML) using
Machine Learning Applications:
Machine Learning plays a significant role in preventing money laundering activities in
financial industries. To prevent money laundering, it uses a supervised machine
learning technique in which an ML model is trained with various types of data or
trends to identify the alerts and suspicious transactions flagged by the internal
banking system. These machine learning models help identify these suspicious
transactions, sender, and beneficiary financial records, their pattern of making
transactions using transaction history, etc.

Machine Learning algorithms help in AML and reduce human error to a great extent.
Machine learning models use a few techniques to prevent money laundering.

Natural Language Processing (NLP) helps machines process human language and
identify alerts, process mortgage loans, negative news screening, payments
screening, etc. Further, these machine learning technologies help monitor various
suspicious activities and transaction monitoring. ML teaches machines to detect and
identify the transaction patterns, behavior, associated suspicious users/accounts, and
classification of alerts based on their risk categories such as High risk, medium risk,
and low risk. Further, it checks alerts, automatically clears some alerts and makes
accounts fully operational based on their account behavior and required documents.

Machines can be taught to recognize, score, triage, enrich, close, or hibernate alerts.
However, these processes are very complex for humans and time-consuming, but
with the help of machine learning technologies, they become relatively easier than
the classical approach. Natural Language Generation (NLG) helps fill Suspicious
Activity Reports (SAR) and provides the narratives for the same. This way can reduce
dependencies on human operators to perform routine tasks, reduce the total time it
takes to triage alerts, and allow personnel to focus on more valuable and complex
activities.

With the introduction of ML into AML TM alert triage, SAR conversion rates should
improve from the current unacceptable rate of ~1% in the banking sector.
Why use Machine Learning in Anti Money
laundering (AML)
Machine Learning is widely used in the banking and finance industry, and AML is one
of the best examples of using machine learning. There are a few important reasons
that show machine learning plays a vital role as follows:

o Reduction of false positive in the AML process:

Machine learning helps identify and detect 98% of the false positives in the AML
process, while compliance teams estimate only 1% and 2% of AML alerts. In the AML
process, some alerts are generated wrongly that affect the customer's account by
putting some restrictions. However, these alerts should not be triggered on the user's
account. Machine learning helps to reduce the rate of false positives by using
semantic analysis and statistical analysis to identify the risk factors that lead to true
positive results. Machine learning algorithms help in eliminating these false positives
during the transactions monitoring process.

o Detecting the change in customer behavior

Machine Learning teaches computers past transactions and their Profileprofile that
helps detect the customer behavior. These machines first learn with old data and
then analyze it according to the customer's transaction history. According to their
transaction behavior/patterns, these machines detect all suspicious activities and
associated users who all were associated with any suspicious activity in the past.
Using traditional approaches of finding customer behavior is not accurate and time-
consuming; Machine Learning technology has reduced the chances of human errors.
Also, it reduces the investigation time by monitoring customer transactions using
rule engines.
Hence, machine learning makes this process relatively faster because money
launderers are generally one step ahead.

o Analysis of unstructured data and external data

Banking and financial institutions analyze customer data such as KYC, screening,
residence country, professions, politically exposed person (PEP) status, social status,
etc., to check their behavior. These all are the main factors that affect the business of
any financial institution. To reduce the financial risk, financial institutions use many
external datasets such as LinkedIn, Bloomberg, BBL, Norkom, social networks,
company houses, and other open-source data.

BBL and Norkom are the software that helps find matches or name search using
external data and tells computers if any customer is associated with any
fraud/suspicious activity, PEP, high-risk entity. Hence, The NLP replaces these
classical approaches and helps to analyze this unstructured data and establish the
connection.

Hence, Machine learning technologies help to analyze the unstructured data and
external data in a significant manner in comparison to classical methods with greater
accuracy.

o Robotic Process Automation (RPA) in AML and KYC

RPA plays a significant role in the banking and finance sectors. So many banks are
still adapting the RPA to automate their business process. Further, When RPA is
combined with Machine Learning, it becomes more powerful. It provides intelligent
automatic techniques in different banking operations such as Know your customer
(KYC), transaction monitoring, screening, alert elimination, etc.

RPA with machine learning helps in the following ways:

o It helps create a 360-degree view of customer data, including data duplication


and reconciliation from the back-end.
o It helps create and update customer profile data using external data sources.
o It helps in alerts elimination using external and internal data. Further, it also
supports the enhancement of customer data like periodic KYC, alerts,
Profileprofile, risk status, customer information portfolio, and geolocation
data.
o It helps to perform account analysis of ultimate beneficial owners using
external data sources.
Challenges to Machine Learning in AML
A few challenges have been identified while implementing Machine Learning in Anti-
money laundering or other financial services.

These challenges include data quality management (poor data quality), profile
refresh, lack of 360-degree view of the customer, insufficient knowledge of banking,
finance, and AML process such as know your customer (KYC), limited regulatory
appetite, lack of straightforward processes to follow for machine learning
implementations.

o Data Quality management & Profileprofile refresh:

Data quality management is one of the most important factors for implementing
machine learning applications in AML. It is required for both monitoring as well as for
analytics purposes. Lack of data traceability and data lineage is also found in both
static and dynamic customer profile records. Static data can be like KYC documents,
and dynamic data may be their incoming and outgoing transactions.

Sometimes, it is also found that a few alerts are generated wrongly on customer
accounts, i.e., false-positive, but actually, they are not likely to be generated on the
account. This may lead to various types of restrictions on customers' accounts and
affect the entire business. These issues reduce the reoccurrence of noise or false
positives on the user's account. Further, other techniques are also applicable instead
of using these methods, such as large-scale, one-off data reconciliation or refresh
exercises, etc. Many FIs have undertaken large and costly data remediation projects
to improve data and have implemented frameworks to manage data quality during
the last few years. Hence, financial specialists always find data quality a major issue.
On the other hand, profile refresh can also be a significant solution for managing
quality data. Relationship managers and back-end associates can use profile refresh
within a certain duration by reaching out to customers and validating their
documents.

o Lack of 360-degree view of the customer:

This is another important issue in implementing ML applications in the AML process.


Financial institutions never disclose their customer data to build a comprehensive
network. Further, FIs do not cooperate on AML to build a 360-degree view of
customers among regulatory agencies as this is a cost-consuming method. Instead of
using the above, FIs supports to file suspicious activity reports with appropriate
automatic narrations and submit to the regulator and share information securely
between FIs and regulators using external datasets like KYC. Some acts and
regulations are also formed by financial institutions, such as US Patriot Act 314 a, 314
b & PSD2. Furthermore, the UK treasury also helps share data via an Open Banking
API/Open Banking Working Group.

o Limited knowledge of both banking and financial services and ML:

Machine Learning is a very new technology in the market, and there are very few ML
engineers and professionals in the industry. Further, a lack of knowledge in banking
and financial operations has also been seen in analysts, leading to various major
problems from start-ups and established vendors. This is one of the most common
factors found while implementing machine learning in AML and other banking
operations.

o Limited regulatory appetite:

The regulators need an ideal ML model that includes all choices, limitations, and
results in the documented format before implementing it in the AML process. ML
algorithms do not allow results to be reproduced with a given input, but regulators
expect the result to be reproduced while implementing in the AML process. Some
regulators want intelligent and adaptive solutions for transaction monitoring that
have become a complex scenario for ML learning applications.

o Lack of straightforward process:

Machine learning is a very new technology, and it is even under development. Hence,
there are a few established, straightforward processes to follow to implement it.
Teaching systems to detect certain types of financial crime can be tricky without
knowing what to look for. For example, how does one teach a system to recognize
terrorist financing? There is a carousel process for fraud but nothing similar for
terrorist financing (nothing that is, other than name matching against terrorist lists).
While some of these problems are better suited to unsupervised learning, model
validators should be sure about the desired outcomes.

Conclusion
Anti-money laundering is a broad field in the banking and financial industry, and this
is one of the most important key factors in preventing the illegal flow of money.
Machine Learning plays a significant role in the AML process to get better results
with greater efficiency and effectiveness. Although many financial institutions also
adopt automation like Robotics Process Automation (RPA) in their business process,
some belief in machine learning and artificial intelligence to run their business.
However, robotics can train ML models, and ML models help robotics build strong
decision-making (in the form of NLP) or reading (via optical character recognition).
Data Science Vs. Machine Learning
Vs. Big Data
Data Science, Machine Learning, and Big Data are all buzzwords in today's time. Data
science is a method for preparing, organizing, and manipulating data to perform
data analysis. After analyzing data, we need to extract the structured data, which is
used in various machine learning algorithms to train ML models later. Hence, these
three technologies are interrelated with each other, and together they provide
unexpected outcomes. Data is the most important key player in this IT world, and all
these technologies are based on data.

Data Science, Machine Learning, and Big Data are all the hottest technologies in the
entire world and growing exponentially. All big, as well as small-size companies, are
now looking for IT professionals who can shift through the goldmine of data and
help them drive smooth business decisions efficiently. Data science, Big Data, and
machine learning are crucial terms that help businesses to grow and develop as per
the current competitive situation. In this topic, "Data Science vs. Machine Learning
vs. Big Data", we will discuss the basic definition and required skills to learn them.
Also, we will see the basic difference between Data Science, ML, and Big data. So,
let's start with a quick introduction of all one by one.

What is Data Science?


Data science is defined as the field of study of various scientific methods,
algorithms, tools, and processes that extract useful insights from a vast amount
of data. It also enables data scientists to discover hidden patterns from raw data.
This concept allows us to deal with Big Data that including extraction, organizing,
preparation, and analyzing.

Data can be either structured or unstructured both.

Backward Skip 10sPlay VideoForward Skip 10s


Data Science helps us to transform a business problem into a research project and
then transform it into a practical solution again. The term Data Science has emerged
because of the evolution of mathematical statistics, data analysis, and big data.

Skills required for Data Science


If you are looking to shift your career in Data Science, then you must have in-depth
knowledge of mathematics, statistics, programming, and analytical tools. Below are
some important skills that you should have before entering this domain.

o Strong knowledge of Python, R, SAS, and Scala


o Strong practical knowledge in the SQL domain
o Ability to work with various formats of data such as video, text, audio, etc.
o Knowledge of various analytical functions.
o Basic level knowledge of Machine Learning and AI.

What is Machine Learning?


Machine Learning is defined as the subset of Artificial Intelligence that enables
machines/systems to learn from past experiences or trends and predict future events
accurately.

It helps the systems to learn from sample/training data and predicts results by
teaching itself with various algorithms. An ideal machine learning model does not
require human intervention too; however, still, such ML models are not in existence.

The use of Machine Learning can be seen in various sectors such as healthcare,
infrastructure, science, education, banking, finance, marketing, etc.

Skills required for Machine Learning

Below are a few skills sets that you should have to build a career in this domain:
o In-depth knowledge of computer science and fundamentals.
o Strong programming skills such as Python, Java, R, etc.,
o Basic Mathematical knowledge like probability and statistics
o Knowledge of Data Modelling.

What is Big Data?


Big data is huge, large, or voluminous data, information, or the relevant
statistics acquired by large organizations that are difficult to process by
traditional tools. Big data can analyze structured, unstructured or semi-structured.
Data is one of the key players to run any business, and it is exponentially increasing
with passes of time. Before a decade, organizations were capable of dealing with
gigabytes of data only and suffered problems with data storage, but after emerging
Big data, organizations are now capable of handling petabytes and exabytes of data
as well as able to store huge volumes of data using cloud and big data frameworks
such as Hadoop, etc.

Big Data is used to store, analyze and organize the huge volume of structured as well
as unstructured datasets. Big Data can be described mainly with 5 V's as follows:

o Volume
o Variety
o Velocity
o Value
o Veracity

Skills required for Big Data


o Strong knowledge of Machine Learning concepts
o Understand the Database such as SQL, NoSQL, etc.
o In-depth knowledge of various programming languages such as Hadoop, Java,
Python, etc.
o Knowledge of Apache Kafka, Scala, and cloud computing
o Knowledge of database warehouses such as Hive.

Difference between Data Science and


Machine Learning
Data science and machine learning both technologies are both the most searched
buzzword in the 21st century among all data scientists, machine learning engineers,
and professionals. All small, mid, and large-sized companies like Amazon, Facebook,
Netflix, etc., are using these technologies to run and grow their businesses.

When it comes to the difference between Data science and machine learning
technologies, Drew Conway's Venn Diagram is the best option to understand this.

In the above diagram, there are three primary sections that everyone must have
a look at. These are as follows:
Hacking Skill: These are the skills such as organizing data, learning vectorized
operations, and thinking algorithmically like a computer that makes a skilled data
hacker.

Maths and Statistics Knowledge: After storing and cleaning data, we must know
appropriate mathematical and statistical methods. You must have a good
understanding of ordinary least squares regression.

Substantive Expertise: This is also an important common term that helps you to
erase all your confusion.

Below is the difference table between data science and machine learning.

Data Science Machine Learning

Data science is a field of computer science to Machine Learning is a subset of Artificial Intellig
extracts useful data from structured, that helps to make computers capable of predic
unstructured, and semi-structured data. outcomes based on training from old data/experie

It primarily deals with data. Machine Learning uses data to learn from it
predict insights or results.

Data in Data Science maybe or maybe not It includes various technologies like superv
have evolved from a machine or mechanical unsupervised, semi-supervised and reinforcem
process. learning, regression, clustering, etc.

It is broadly used as a multidisciplinary term. It is used in data science.

It includes various data operations such as It includes operations such as data preparation,
cleaning, collection, manipulation, etc. wrangling, data analysis, training the model, etc.

It requires knowledge of various analytical It needs advanced knowledge of Data Modelling.


functions and a basic understanding of
machine learning and Artificial Intelligence.

It requires strong knowledge of Python, R, It requires knowledge of programming languages


SAS, Scala, as well as hands-on knowledge of Java, Python, R as well as in-depth knowledg
SQL databases. mathematical concepts such as probability
statistics.

Difference between Big Data and Machine


Learning
Big Data deals with a huge volume of data that helps us to discover patterns and
trends as well as make decisions related to human behavior and interaction
technology. On the other hand, machine learning is the study of learning
machines/computers automatically and predicting results from past data using
algorithms. Machine learning uses algorithms to train models and make predictions.
However, machine learning requires bulk data that is possible using 'Big data'. It
helps to extract data from structured as well as unstructured data from the huge
volume of datasets, later which is used to train machine learning models as an input.

Below is the table to understand the difference between Machine Learning and Big
Data.

Machine Learning Big data

It deals with using more data as input and It deals with extraction as well as
algorithms to predict future outcomes analysis of data from a large number
based on trends. of datasets.

It includes technologies such as supervised, Big data can be categorized as


unsupervised, semi-supervised and structured, unstructured, and semi-
reinforcement learning, etc. structured.

It uses tools such as Numpy, Pandas, Scikit It requires tools like Apache Hadoop
Learn, TensorFlow, Keras, etc., to analyze MongoDB.
datasets.

Machine Learning can learn from training Big Data analytics pulls raw data and
data and act intelligently for making looks for patterns to help in stronger
effective predictions by teaching itself decision-making for the firms.
using Algorithms.

Machine Learning is helpful for providing Big Data is helpful for handling
virtual assistance, Product different purposes, including Stock
Recommendations, Email Spam filtering, Analysis, Market Analysis, etc.
etc.

The scope of machine learning is much The scope of big data is not limited
vast such as improving quality of to collecting a huge amount of data
prediction, building strong decision- only but also to optimizing data for
making capability, cognitive analysis, analysis as well.
improving healthcare services, speech and
text recognition, etc.

It has a wide range of applications such as It also has a wide range of


email and spam filtering, product applications for analysis data storage
recommendation, infrastructure, marketing, in a structured format such as stock
transportation, medical, finance & banking, market analysis, etc.
education, self-driving cars, etc.

Machine Learning does not need human It requires human intervention


intervention for a complete process because of the huge amount of
because it uses various algorithms to build multidimensional data. Due to
intelligent models to predict the result. having multidimensional data, it
Further, it contains limited dimensional becomes difficult to extract features
data hence making it easier for recognizing from data.
features.

Difference between Big data and Data


Science
Big data: Big data is huge, large, or voluminous data, information, or the relevant
statistics acquired by large organizations that are difficult to process by traditional
tools. It is referred to as the study of collecting and analyzing the huge volume of
data sets to find a hidden pattern that helps in stronger decision-making for the
firms using specialized software and analytical tools. Big data can be structured,
unstructured, or semi-structured.

Big Data is used to store, analyze and organize the huge volume of structured as well
as unstructured datasets. Big Data can be described mainly with 5 V's such as
Volume, Variety, velocity, value, and Veracity.

Data Science: Data science is the study of working with a huge volume of data and
enables data for prediction, prescriptive, and prescriptive analytical models. It helps
to discriminate useful and raw data/insights from the vast amount of data sets using
various scientific methods, algorithms, tools, and processes. It includes digging,
capturing, analyzing, and utilizing the data from a vast volume of datasets.

It is a combination of various filed such as computer science, machine learning, AI,


Mathematics, business, and statistics.

Let's discuss some major differences between Data Science and Big Data in the
below table.

Data Science Big data


Data science is the study of working with Big data is the study of collecting and
a huge volume of data and enables data analyzing a huge volume of data sets
for prediction, prescriptive, and to find a hidden pattern that helps in
prescriptive analytical models. stronger decision-making.

It is a combination of various concepts of It is a technique to extract meaningful


computer science, statistics, and applied insights from complex data sets.
mathematics.

The main aim of data science is to build The main goal of big data is to extract
data-based products for firms. useful information from the huge
volume of data and use it for building
products for firms.

It requires strong knowledge of Python, It requires tools like Apache Hadoop


R, SAS, Scala, as well as hands-on MongoDB.
knowledge of SQL databases.

It is used for scientific or research It is used for businesses and customer


purposes. satisfaction.

It broadly focuses on the science of the It is more involved with the processes
data. of handling voluminous data.

It includes various data operations such It includes analysis of data stored in a


as cleaning, collection, manipulation, etc. structured format such as stock
market analysis, etc.

Conclusion:
Machine learning, data science, and Big data are all the most popular technologies,
which are widely being used in the entire world. Although these technologies have
their significance individually, when combining them, they became more powerful to
work on models/projects. Big data technology is a huge source of data, Data science
is a technology that extracts useful insights from big data, and this useful information
is used in machine learning for teaching machines or computers to predict future
results based on past experience and build strong decision-making capability.

Popular Machine Learning Platforms


Machine Learning platforms are the software that data scientists and machine
learning professionals use to deploy machine learning models and algorithms. With
the evolution of data, machine learning has been exponentially increased. Machine
learning has solved various problems by automating the business and predicting the
results using experience or historical trends.

Have you ever thought about why you get product recommendations from various
online platforms such as Amazon, Netflix, Flipkart, etc.? The short answer is Machine
Learning. It became the most popular buzzword today in all technologies, and the
entire 21th century, as well as the upcoming generation, is going to use machine
learning technology for their businesses. All small and big companies, including
Facebook, Google, Amazon, IBM, Oracle, etc., employ machine learning technologies
to run and grow their business. So, don't worry! You are exactly in the right place.
Although machine learning is used everywhere, the main problem is the platforms
that support machine learning services. This article will discuss some of the most
popular machine learning platforms that'll help you manage your experiments at
every stage, such as preparing data for deployment, monitoring, and managing
machine learning models. So let's start with a quick introduction to Machine learning
first.

What is Machine Learning?


Machine Learning is defined as the state-of-the-art application of artificial
intelligence that helps machines/computers to learn and improve from experience
and predict results for the future using various algorithms.

ML uses various technologies such as Supervised, unsupervised, Semi-supervised,


and reinforcement learning to teach machines. It has a wide range of applications
such as speech recognition, text recognition, self-driving vehicle, email & spam
filtering, healthcare, medicine, banking & finance, virtual personal assistant, chatbots,
education, marketing, and many more. So the scope of machine learning is not
limited to a few fields; it is employed everywhere around us.

PlayNext
Unmute

Current Time 0:00

Duration 18:10
Loaded: 0.37%
Â
Fullscreen
Backward Skip 10sPlay VideoForward Skip 10s

What are Machine Learning Platforms?


The machine learning platform is used to automate and quicken the delivery lifecycle
of predictive applications that have the capabilities to process big data.

It helps build blocks to solve the various ML and data science problems. It provides a
suitable environment for users to get complete freedom to deploy their products.

We will discuss a few most popular machine learning platforms for deploying ML
models.

Most popular Machine Learning Platforms


Machine Learning is the most popular technology in the 21st century that has various
capabilities such as text recognition, image recognition, training, tuning, etc. There
are some best machine learning platforms or software given below, using which you
can effectively deploy machine learning in your business.

o Amazon Sagemaker
o TIBCO Software
o Alteryx Analytics
o SAS
o H2O.ai
o DataRobot
o RapidMiner

1. Amazon SageMaker
Amazon SageMaker is an Amazon Web Services (AWS) entity that helps data
scientists and ML experts prepare, build, train, and deploy high-quality ML models. It
provides one-click deployment support for various open-source models such as NLP,
object detection, image classification, etc.

Top Features:

o Build highly accurate training datasets


o It helps to extract and analyze data automatically for better accuracy and
faster decision-making.
o It helps detect frauds such as suspicious transactions and trigger alerts on
customer accounts.
o Churn prediction
o It helps deliver customized and personal recommendations to the customer to
improve and grow their business process.
o It allows you not to break down data sets into multiple chunks.

2. Alteryx Analytics
Alteryx is the best data science platform that accelerates digital transformation. It
offers data accessibility and data science processes. It enables you to do complex
things with data without having prior experience in coding and data mining
techniques.

Features of Alteryx Analytics:

o Automate manual data tasks into repeatable analytics workflows


o It provides the flexibility of deploying and managing analytical models and
helps analysts prepare, organize and analyze data faster with zero coding
skills.
o It helps you with the flexibility of using all data sources and visualization tools.
o It does not require complex coding skills to perform statistical problems in
building predictive models.
3. TIBCO
TIBCO is a data science platform that supports the entire analytics lifecycle with
capabilities to include cloud-based analytics that integrates with many open source
libraries.

It is a cloud platform that runs and adapts your connected business.

TIBCO data science allows the user to prepare data and build, deploy, and monitor
the model. It is widely known for use cases, such as product refinement and business
exploration.

Features of TIBCO:

o It enables users to easily and quickly connect applications and APIs using the
browser.
o It provides the services like metadata management, data catalog, data
governance, etc.
o It facilitates users' actionable intelligence in real-time.
o It helps to build smart apps with a single click.
o It supports cloud messaging for reliable and secure data distribution.
o It reduces decision latency to a greater extent and acts in real-time.

4. SAS
SAS provides advanced data science and data analytics software that helps ease-of-
access data facility irrespective of source and format of data.

It works on natural language processing to work on real-time scenarios. Further, it


automatically generates a pipeline that helps to organize data in a better way. It
offers all users to work with open-source models for their projects.

Features of SAS:

o It offers a visual interface for data analytics. It allows users to explore data
within the model studio.
o You can access training data within the model studio from each node.

5. ai
H2O.ai offers various facilities and functionalities of Artificial Intelligence and data
science. It supports a highly scalable elastic environment for the AI life cycle.

Like SAS, it is also an open-source platform that deals with distributed in-memory ML
platforms with linear scalability.

It is a cloud-based AI platform that deals with complex business problems and


accelerates the discovery of new ideas with results you can understand and trust. It is
a single platform with endless solutions that primarily focuses on the following:

Make: It helps build Ml models and applications with more accuracy, speed, and
transparency.

Operate: it supports various Machine learning operations that streamline monitoring


performance and rapidly adapt to changing conditions.

Innovate: It includes various AI AppStore that helps in easily deliverable innovative


solutions to end-users.

Use cases of H2O.ai:

o Credit risk scoring


o Predicting Hospital Acquired Infections (HAIs)
o Medical testing
o Predictive manufacturing design
o Supply chain optimization
o Pricing Optimization
o Anomaly detection
o Customer churn management
o Product recommendation
o Content Personalization
o AML, lead scoring, fraud detection, KYC, smart segmentation, etc.

Features of H2O.ai

o H2O is the open source leader in AI, which aims to democratize AI.
o It supports the facility of building responsible AI models and applications.
o It also helps build explainable AI models with greater transparency,
accountability, and trustworthiness in AI.
o It provides automatic feature recommendation, drift, insights, versioning,
metadata, rank and bias identification, etc.

6. DataRobot
DataRobot is an AI cloud platform that helps build, prepare, deploy, predict, monitor,
and optimize industry data models.

It offers services to various technologies such as data engineering, machine learning,


MLOps, decision intelligence, trusted AI.

DataRobots in Data Engineering:

o It provides cloud capabilities for enterprise AI visual data preparation and


builds and runs sophisticated data pipelines in the desired language.
o It helps to generate the best feature for your models by connecting various
data sources and formats.
o It helps to explore and visualize data to find new patterns and insights.

DataRobots in Machine Learning:

o It is used to create advanced ML models automatically.


o It is used to forecast the real world with an automated time-series feature.
o It uses Natural language processing to extract meaning from text data.
o It adds geospatial context to ML models.
o It supports human-readable mathematical formulas that can solve
sophisticated machine learning problems.

DataRobot in MLOps:

o It helps deploy, monitor, and manage any ML model in any location.


o It provides portable prediction servers that help in easy-to-use Docker
containers to host production models.
o It is used in the model registry, etc.

Features of DataRobot

o Speed: It helps to bring AI into production faster than ever.


o Impact: It helps in transforming data to business results with confidence.
o Scale: It helps to deploy AI anywhere at scale.

7. RapidMiner
RapidMiner is one of the most popular multimodal predictive analytics, Machine
Learning, and end-to-end data science solution platform. It is used to optimize
decision-making. It offers a variety of sophisticated, flexible approaches that will turn
the data into insights that can be used to overcome challenges and achieve unique
goals. It has extensive experience in all major industries such as manufacturing,
energy, utilities, automotive, healthcare, financial services, insurance, life science,
communication, travel, transport, logistics, etc.

Use cases on RapidMiner

o Churn prevention means identifying customers likely to leave take


preventative action.
o It is used to make intelligent decisions automatically through AI and ML using
cognitive RPA.
o In-text mining, i.e., extract insight from unstructured content.
o It helps predict the next best action, which means the right action at the right
time for the right customer.
o It helps in identifying fraudulent activity quickly and resolves it too.
o It gives quality assurance and resolves quality issues before they become a
problem.

Features of RapidMiner

o Ubiquitous, portable & extensible


o Easy to Trust, Tune & Explain
o Deliver ROI & results, not just technically sound models
o Increase productivity and performance
o Transformational business impact
o Upskill Your Organization

Conclusion
With data science and big data, machine learning became more powerful among
data scientists and professionals. These machine learning platforms play a significant
role in developing and deploying ML models. This software is the key player for
growing your business and customer satisfaction and support. If you want to upskill
your organization, you can choose either of the above-given machine learning
platforms to smooth the run of your business.

Deep learning vs. Machine learning


vs. Artificial Intelligence
Deep Learning, Machine Learning, and Artificial Intelligence are the most used terms
on the internet for IT folks. However, all these three technologies are connected with
each other. Artificial Intelligence (AI) can be understood as an umbrella that
consists of both Machine learning and deep learning. Or We can say deep
learning and machine learning both are subsets of artificial intelligence.

As these technologies look similar, most of the persons have misconceptions about
'Deep Learning, Machine learning, and Artificial Intelligence' that all three are similar
to each other. But in reality, although all these technologies are used to build
intelligent machines or applications that behave like a human, still, they differ by
their functionalities and scope.

It means these three terms are often used interchangeably, but they do not quite
refer to the same things. Let's understand the fundamental difference between deep
learning, machine learning, and Artificial Intelligence with the below image.
With the above image, you can understand Artificial Intelligence is a branch of
computer science that helps us to create smart, intelligent machines. Further, ML is a
subfield of AI that helps to teach machines and build AI-driven applications. On the
other hand, Deep learning is the sub-branch of ML that helps to train ML models
with a huge amount of input and complex algorithms and mainly works with neural
networks.

PlayNext
Unmute

Current Time 0:00

Duration 18:10
Loaded: 0.37%
Â
Fullscreen
Backward Skip 10sPlay VideoForward Skip 10s

In this article, "Deep Learning vs. Machine Learning vs. Artificial Intelligence", we will
help you to gain a clear understanding of concepts related to these technologies and
how they differ from each other. So, let's start this topic with each technology
individually.

What is Artificial Intelligence (AI)?


Artificial Intelligence is defined as a field of science and engineering that deals
with making intelligent machines or computers to perform human-like
activities.

Mr. John McCarthy is known as the godfather of this amazing invention. There are
some popular definitions of AI, which are as follows:

"AI is defined as the capability of machines to imitate intelligent human behavior."

"A computer system able to perform tasks that normally require human intelligence,
such as visual perception, speech recognition, decision-making, and translation
between languages."

Types of Artificial Intelligence

AI can be categorized mainly into 4 types as follows:

1. Reactive machine
2. Limited memory
3. Theory of Mind
4. Self-awareness

Application of Artificial Intelligent

o Language Translations
o AI in healthcare
o Speech recognition, text recognition, and image recognition
o AI in astronomy
o AI in gaming
o AI in finance
o AI in data security
o AI in social media
o AI in travel and transport
o AI in Automotive Industry
o AI in robots
o AI in Entertainment, agriculture, E-commerce, education, etc.

We have taken a basic knowledge of Artificial Intelligence. Now, let's discuss the
basic understanding of Machine Learning.

What is Machine Learning?


Machine Learning is defined as the branch of Artificial Intelligence and computer
science that focuses on learning and improving the performance of
computers/machines through past experience by using algorithms.

AI is used to make intelligent machines/robots, whereas machine learning helps


those machines to train for predicting the outcome without human intervention.

How does Machine Learning work?


Machine Learning uses algorithms and techniques that enable the machines to learn
from past experience/trends and predict the output based on that data.

However, firstly, machine learning access a huge amount of data using data pre-
processing. This data can be either structured, semi-structured, or unstructured.
Further, this data is fed through some techniques and algorithms to machines, and
then based on previous trends; it predicts the outputs automatically.

After understanding the working of machine learning models, it's time to move on to
types of machine learning.

Types of Machine Learning


Based on the methods and techniques to teach machines, Machine Learning is
categorized into mainly four types, which are as follows:
1. Supervised Machine Learning
This type of ML method uses labeled datasets to train machines and, based on
these datasets, machines predict the output. It needs supervision to train
models and predict outputs. Image segmentation, medical diagnosis, fraud
detection, spam detection, speech recognition, etc., are some important
applications of supervised machine learning.
Supervised machine learning can be further categorized into 2 types of
problems as follows:
o Classification
o Regression

Advantages of Supervised machine learning

o Supervised machine learning helps to predict output based on prior


experience
o It helps to provide an exact idea about classes of objects.

Disadvantages of Supervised machine learning

o This method is not significant in solving complex problems.


o This method does not guarantee to give exact output as it contains
both structured and unstructured data.
o It needs more computational time to teach ML models.
2. Unsupervised Machine Learning
Unsupervised machine learning is just the opposite of supervised learning.
Unlike supervised machine learning, it does not need supervision, which
means it does not require labeled datasets to train machines. Hence, in
unsupervised machine learning, the output is predicted without any
supervision. The main aim of the unsupervised learning algorithm is to group
or categorize the unsorted dataset according to the similarities, patterns, and
differences. Network analysis, recommendation system, anomaly
detection, singular value decompositions, etc., are some important
applications of unsupervised machine learning.
Unsupervised machine learning is further categorized into two types:
o Clustering
o Association
Advantages of unsupervised machine learning

o It can be used to solve complex ML problems as it works with


unlabelled data sets.
o It is used to solve multiple tasks in comparison to supervised learning.

Disadvantages of unsupervised machine learning

o Using unlabeled data sets may predict inaccurate outputs.


o It is a relatively complex algorithm as it deals with unlabelled datasets
and also does not map with output.
3. Semi-supervised Machine learning
Semi-supervised learning is the combination of both supervised and
unsupervised machine learning. Although it uses both labeled and unlabelled
datasets to train models and predict the output, mostly, it contains the
unlabelled datasets.
Advantages of Semi-supervised machine learning
o It is simple and easy to understand the algorithm.
o It is more efficient.
o It is used to solve the drawbacks of Supervised and Unsupervised
Learning algorithms.

Disadvantages of Semi-supervised machine learning

o It does not include applicable network-level data


o It gives less accurate results
o Iterations results may not be stable.
4. Reinforcement Learning
Reinforcement learning is defined as the feedback-based method to learn
from past experience and improve the performance of models. In this method,
an AI agent automatically explores its surrounding by hitting and trial actions.
Further, in reinforcement learning algorithms, machines learn from experience
or past data and do not use labeled data. It can be applied in various real-
world cases such as video games, resource management, robotics, text
mining, operations & research, etc.
Reinforcement learning is further categorized into two types:
o Positive reinforcement learning
o Negative reinforcement learning

Advantages of reinforcement learning

o It is used to resolve complex real-time scenarios where all other


techniques are not useful.
o It provides the most accurate results because it learns similarly to a
human.
o It is significant for achieving long-term results.

Disadvantages of Reinforcement Learning

o It is not significant for simple scenarios.


o It needs a vast amount of data as well as computations.

Steps involved in machine learning


There are 7 simple steps involved in machine learning as follows:

o Data gathering
o Data pre-processing
o Choose model
o Train model
o Test model
o Tune model
o Prediction

We have discussed machine learning and artificial intelligence basics, and it's time to
move towards the basics of deep learning.

What is Deep Learning?


"Deep learning is defined as the subset of machine learning and artificial intelligence
that is based on artificial neural networks". In deep learning, the deep word refers to
the number of layers in a neural network.

Deep Learning is a set of algorithms inspired by the structure and function of the
human brain. It uses a huge amount of structured as well as unstructured data to
teach computers and predicts accurate results. The main difference between machine
learning and deep learning technologies is of presentation of data. Machine learning
uses structured/unstructured data for learning, while deep learning uses neural
networks for learning models.

In machine learning, if a model predicts inaccurate results, then we need to fix it


manually. Further, in deep learning techniques, these problems get fixed
automatically, and we do not need to do anything explicitly. A self-driving vehicle is
one of the best examples to understand deep learning.

Deep learning can be useful to solve many complex problems with more accurate
predictions such as image recognition, voice recognition, product
recommendations systems, natural language processing (NLP), etc.

The basic structure of deep learning


Deep learning includes various neural networks that possess different layers, such as
input layers, hidden layers, and output layers. The input layer accepts input data;
hidden layers are used to find any hidden pattern and feature from the data, and
output layers show the expected results.

How does deep learning work?


There are a few simple steps that deep learning follows.
1. Calculate the weighted sum
2. Use this weighted sum in step1 as input for the activation function.
3. The activation function adds bias and decides whether the neuron should be
triggered or not.
4. Predict output at the output layer.
5. Compare predicted output and actual output and accordingly use the
backpropagation method for improving the performance of the model. In this
step, the cost function plays a vital role in reducing the error rate.

Types of deep neural networks


There are some different types of deep learning networks available. These are as
follows:

o Feedforward neural network


o Radial basis function neural networks
o Multi-layer perceptron
o Convolution neural network (CNN)
o Recurrent neural network
o Modular neural network
o Sequence to sequence models

Applications of deep learning


Deep learning can be applied in various industries such as:
o Self-driving vehicles
o Fraud detection
o Natural language processing
o Virtual personal assistance
o Text, speech, and image recognition
o Healthcare, infrastructure, banking & finance, marketing
o Entertainment
o Education
o Automatic game playing
o Auto handwriting generation
o Automatic language translation
o Pixel restoration and photo description & tagging
o Demographic and election predictions, etc.

Conclusion
Artificial intelligence is one of the most popular 5 th generation technologies that is
changing the world using its subdomains, machine learning, and deep learning. AI
helps us to create an intelligent system and provide cognitive abilities to the machine.
Further, machine learning enables machines to learn based on experience without
human intervention and makes them capable of learning and predicting results with
given data. At the same time, deep learning is the breakthrough in the field of AI that
uses various layers of artificial neural networks to achieve impressive outputs for
various problems such as image recognition and text recognition. Hence, after reading
this topic, you can say there is no confusion to differentiate these terms that most
people face. This topic must have given you enough confidence to understand the
basic difference between artificial intelligence (AI), machine learning (ML), and deep
learning (DL).

Machine Learning Application in


Defense/Military
Machine learning is one of the most trending technologies today. It is widely used in
various industries such as healthcare, manufacturing, automation, infrastructure,
banking, finance, transport, product recommendations, social media, news,
defense, marketing, and many more. Among all these industries, defense is one of
the most important parts of the development of any country, and machine learning
also plays a significant role in modern warfare systems, such as developing
autonomous weapons. Although autonomous weapons have been in existence for
more than a century by combining them with machine learning, they have more
functionalities now.

Machine learning technologies are used in many ways, such as image


recognition, which helps identify, detect, track, and classify targets or objects using
various sensors. Hence, machine learning applications are very much helpful for the
defense sector. This topic will discuss various ML applications and their use cases in
the military system. So, let's start with a quick introduction to machine learning and
technologies used in ML.

What is Machine Learning?


Machine Learning is a branch of computer science and sub-branch of Artificial
Intelligence. "Machine Learning is defined as the study of various technologies or
algorithms that allow systems to automatically learn and improve from past
experience."

Types of Machine Learning


Machine learning can be categorized in mainly three types as follows:

PlayNext
Unmute

Current Time 0:00

Duration 18:10
Loaded: 0.37%
Â
Fullscreen
Backward Skip 10sPlay VideoForward Skip 10s

o Supervised learning
o Unsupervised learning
o Reinforcement learning

Applications of Machine Learning


Machine Learning is a broad term and can be applied in many industries. There are a
few popular machine learning applications as follows:

1. Speech recognition using natural language processing (NLP)


2. Text recognition
3. Image recognition
4. Big data and business intelligence analysis
5. Robotics and automation
6. Traffic prediction
7. Product recommendations
8. Self-driving cars
9. Email spam and malware filtering
10. Virtual personal assistant
11. Fraud detection
12. Stock marketing and trading
13. Healthcare and medicine
14. Automatic language Translation
15. Manufacturing industry, etc.

Applications of Machine Learning in


Defense
Machine Learning and artificial intelligence is currently being used in various military
applications. Also, most countries spend a huge amount of money researching and
developing military applications. There are a few major military applications where
machine learning is being applied and will prove its importance in the years to come.

1. ML in Warfare Platforms
2. ML in Cyber security
3. ML in Logistics and Transportation
4. ML in Target Recognition and tracking
5. ML in Battlefield Healthcare
6. ML in defense Combat Training
7. ML in Threat Monitoring
8. ML in Maritime situational awareness
9. ML in Unmanned sensor systems: UAVs, UGVs, UUVs
10. ML in Unattended sensors and systems
11. ML in Compound security and force protection
12. Border protection
13. Route planning clearance
14. Reconnaissance and surveillance
15. Vehicle situation awareness
16. Improved visualization

Let's discuss some important ML applications in


defense systems:

o Warfare Platforms
Machine Learning and Artificial Intelligence are being embedded into
Weapons and other military systems of different countries across the globe,
used on land, naval, airborne, and space platforms.
The application of AI-enabled systems on these platforms helps develop
efficient warfare systems, which require less human intervention. It also helps
increase synergy and enhances the performance of warfare systems while
requiring less maintenance. AI and ML are expected to empower autonomous
and high-speed weapons to perform collaborative attacks.
o Defense Cyber security
The military system of any country is one of the most important parts to
maintain the security of the whole nation. Hence military/defense systems are
most sensitive to cyberattacks, as it can lead to loss of crucial information of
military and can also damage the whole system.
However, AI and ML embedded systems can automatically protect networks,
computers programs, and data from any kind of unauthorized access. Further,
ML-enabled web security systems can record the pattern of cyberattacks and
develop counter-attack tools to tackle them.
o Logistics & Transportation
Machine Learning plays a crucial role in defense logistics and transportation
systems. For each successful military operation, it is required to effective
transportation of essential components of a military such as goods, weapons,
ammunition, etc.
Embedding AI/ML with a military transportation system can reduce
transportation costs and also human operational efforts.
Recently, the US Army collaborated with IBM to use its Watson artificial
intelligence platform to help pre-identify maintenance problems in Stryker
combat vehicles.
o Target Recognition and Tracking
Machine learning and artificial intelligence are also involved in enhancing the
accuracy of target recognition in complex combat environments. These
techniques allow defense forces to gain an in-depth understanding of
potential operation areas by analyzing reports, documents, news feeds, and
other forms of unstructured information.
o Battlefield Healthcare
Machine learning and Artificial intelligence help in battlefield healthcare such
as evacuation activities, remote surgical systems, etc. In war zones, various
robotics surgical system and robotics ground platforms are equipped with ML
technologies helps in difficult medical diagnosis and handling injuries in
combat situations.
o Defense combat Training
Machine learning enables computers or machines to train the troopers with
various combat systems deployed in various military operations in warzones. It
provides stimulation and training with various software engineering skills that
help during a difficult situation. The USA is investing so much money in
simulation and training applications. Further, various countries use this ML-
equipped combat training system to train their soldiers instead of the classical
approach that requires more money and time as well. These modern
approaches are more efficient and also be adaptive.
Reinforcement learning helps in building a combat training system where they
learn by reward and punishment as feedback. This approach becomes more
significant in maintaining an enhanced training system for their individuals.
o Threat Monitoring
Threat monitoring is defined as a network monitoring solution/system, which
is dedicated to analyzing, evaluating, and monitoring an organization's
network and endpoints to prevent various security majors such as network
intrusion, ransomware, and other malware attacks.
The typical process of ML in detecting security threats is given in below image:
Machine Learning helps in threat detection through various detection
categories such as Configuration, Modeling, Indicator, and Threat Behavior. By
using sophisticated ML algorithms, computer systems are being trained to
detect malware, run pattern recognition, and detect the malware behaviors or
ransomware attacks before it enters the system. AI also plays a vital role in
developing an intelligent system for threat awareness, such as drones. These
drones are equipped with intelligent software and algorithms that enable
them to detect threats, analyze them, and prevent them from entering into
the system. All big countries like the USA, Russia, China, France, Britain, Japan,
India, etc., are investing huge amounts of money in making drones to detect
threats and target especially useful in remote areas.
o Anomaly detection
Anomaly detection is defined as an outlier's process which is used to identify
suspicious events, items, and observations that deviate from a dataset's
normal behavior. Anomaly detection is also significant to identify the pattern
of abnormality in data and later discriminate these patterns that differ from
the normal state, i.e., outliers. ML and AI help anomaly detection to find the
outliers data in a series of data. Supervised machine learning plays an
important role in pattern recognition in anomaly detection.
o Surveillance applications
Reconnaissance and Surveillance system has become a crucial part of any
country to collect and manage huge amount of defense data. These
applications use various sensors and continuously transmit a stream of
information through data networks to data centers. Data scientists analyze
that data and extract useful information from it. In this entire procedure,
machine learning (ML) helps data analysts to detect, analyze, organize and
manage data automatically.
o Decision-support system
A decision-support system is helpful for various industries in different
applications such as medical treatment, manufacturing, marketing, self-driven
equipment (drones), etc. Similarly, ML also helps to build enhanced decision-
support system for the defense sector, such as intelligent drones, automatic
cruise missiles, automatic weapons that takes decision in accordance with
suspicious objects. ML helps machines to make a decision by analyzing data
and proposing the best course of action for them.
o Border protection
The main goal of the defense sector is to protect their country from border
attacks by means of patrolling that region. Although soldiers are always
positioned to look out the border but nowadays, various smart sensors and
intelligent machines such as drones are playing a crucial role in the border
security system. These drones are equipped with various ML algorithms and
software that detect, analyze, and inform against any suspicious activity by
sending information to data centers. Hence, it is more useful in dangerous
situations where human intervention is not significant.

Conclusion
To this end, we can say that machine learning has become an essential part of the
modern defense system in comparison to conventional systems. Machine learning
and artificial intelligence enable military systems to handle a huge volume of data
more efficiently and improve combat systems with enhanced computing and
decision-making capabilities. AI and ML are being deployed in the entire defense
industry. The Governments and tech industries are continuously investing their
money and efforts to increase ML involvement in their defense sector to ensure
better security of their country inside and outside the borders.

Machine Learning Applications in


Media
Nowadays, Media is one of the most powerful and influencing means in the entire
world, and also the applications of media have rapidly increased in past decades. It is
the term that includes all print, digital, and electronic means of communication.
Content creation is one of the most important factors that is witnessing how the
media industry is continuously transforming and facing more competition in a
market that is driving the need to reduce operating costs and simultaneously
generate more revenue from delivering content. With the evolution in the media
industry, the use of Machine Learning and Artificial intelligence technologies has also
been increased to a great extent. AI and ML help the media industry in various ways,
such as making visual content more interactive, interesting, user friendly, and
improving efficiency as well.

In this topic, "Machine Learning Applications in Media", we will discuss various


machine learning applications that become essential for the media and
entertainment industry and growing businesses with more profit and revenue. So,
let's start with a quick introduction to machine learning in the media industry and
some popular machine learning applications required for the media & entertainment
industry.

Machine Learning in the Media industry


The media and entertainment industry perceives exponential growth globally. With
the use of ML-enabled high-speed network systems and trending video streaming
platforms, the users are accessing unlimited content continuously without any
interruption.

As per the information published by Statista, the value of the global entertainment
and media market from 2011 to 2025 has increased to a great extent.

PlayNext
Unmute
Current Time 0:00

Duration 18:10
Loaded: 0.37%
Â
Fullscreen
Backward Skip 10sPlay VideoForward Skip 10s

As per the reports, the value of the worldwide entertainment and media market fell
to two trillion U.S. dollars in the year 2020. However, the forecast for 2021 suggests
revenue will begin once more rise and surpass the pre-COVID levels, with a 2.2
trillion dollars result. The rapid growth in the media industry is primarily because
most people are using an online platform like YouTube, Facebook, and Netflix
instead of classical channels such as cable and radio FM.

Machine Learning (ML) applications in


Media
The applications of Machine Learning are live witnesses for rapid growth in the
media industry in different forms such as showing & distributing visual content, audio
content (2-D and 3-D), digital advertisement, product recommendation, target
audiences, content classification, and categorization, meta-tagging, automated
transcription, virtual personal chatbots, detecting and removing false information, or
sentimental analysis, etc.

There are a few important applications of machine learning in media with their
example. These are as follows:

o Content personalization and Recommendations


Companies are offering services to their customers so that users can
personalize the audio and video content as per their preferences and previous
experiences. All big companies like YouTube, Netflix, Spotify, etc., are
providing this feature to make their services more reliable and user-friendly.
Machine Learning helps in collecting users' data, behaviors, and demographic
details and accordingly recommends them for appropriate content that they
liked most in the past. Various ML algorithms and deep learning methods help
in delivering more personalized content to users. In this way, companies are
using ML technologies to increase their customers with improved customer
service experience than other competitor companies in the market.
For e.g., Netflix is USA based application that provides various entertainment
services to the user. If you like action web series and in the past, you have
more searched similar types of movies, then Netflix will automatically
recommend some other similar series as per your interest. On the other hand,
while shopping on Amazon, it automatically recommends users more similar
products as per their interest.
o Digital Advertisement and Target audience
Digital advertisement is one of the easiest methods to drive revenue and
promote your business online. It plays a significant role in branding and
business promotion. Machine learning (ML) technologies help significantly in
making digital advertisements more precise and productive. Further, it also
helps in building a target audience with higher conversion rates. Conversion
rate is the term that tells how many users have purchased
products/services through advertisements on your platforms.
Let's understand one of the most popular examples of Google AdSense that
helps in showing advertisements based on users' past history and preferences.
If a user has searched for Apple iPhone in their web browser or eCommerce
sites, then it gets started to show similar category products on different
websites. Hence, Google also uses AI and ML technologies to help advertisers
in targeting the right audience and get maximum outputs from the Ads.
o Content classification and categorization:
Content classification and categorization based on user preference is one of
the important goals of media and entertainment platforms like YouTube,
Amazon Prime, OTT, etc. These platforms have different genres of music
videos, songs, movies, or web series through using various ML algorithms.
Implementing ML technologies and algorithms in media and entertainment
sectors can automate the categorization and classification of content for
developing a better user-friendly environment.
o Meta Tagging Subtitles & Automated Transcription:
Content published in the media and entertainment industry needs to make
comprehensible to the audience. Hence, AI can help in identifying the videos
and other online content to classify them with meta tags and descriptions.
Further, apart from that, movies, music videos, and TV shows are transcribed
into different languages using AI-based technologies like natural language
processing through machine learning and deep learning. The voice of movies
is dubbed into various different languages with subtitles and audio
annotations to generate more customers globally.
o Personal virtual chatbots
Every business requires a personal virtual assistant to assist their customer in
solving their queries remotely. Machine learning and Artificial intelligence play
a vital role in training and developing virtual chatbots for the media and
entertainment industry and improving efficiency as well. That eventually helps
these companies in offering better services to their customers.
o Identifying fake information
Now a day, there is so much fake news, and posts get viral on social media or
other platforms. Such fake news provokes the audience towards certain events
or social issues. ML-based technologies help to identify and report such
content and remove them before circulation.
Further apart from text content, some of the users also create fake videos or
edited videos using deep fake technology. But with the help of ML and AI
deepfake detection services, these videos and images can be detected,
removed, and can be reported. Moreover, we can notify the platform owner to
take appropriate action on such things so that no one can do the same things
again in the future.
o Using social media for Sentiment Analysis
Sentimental analysis is defined as the techniques used by various
organizations to analyze the content published on social media sites. This
published data can be collected and again used by machine learning to
develop ML models that can analyze the sentiments and feelings of the
people interacting with each other on social media platforms.
For e.g., Facebook is the world's largest social media platform which provides
a free space to share views, content, and opinion on different topics.
Analyzing such discussion of different age groups or people from different
regions or demographics provides useful insights about the different people.
o Reporting automation
AI and ML technologies are used to automate the company's business as well
as help them to make strategic business decisions. All big media platforms use
natural language processing and machine learning technologies to generate
channel performance reports from raw information shared by regulatory
authorities. This information was received in the form of large excel sheets.
Analyzing these excel sheets once a week proves that it is very difficult for the
analysis team to generate and implement meaningful exercises.
o Streaming Quality
AI video enhancement software can help media and entertainment channels;
for example, Netflix improves the video quality, and lets devices consume
fewer mobile data required to stream. Netflix has upgraded its code by
embedding ML algorithms for improving streaming quality.
o Search optimization:
The main goal of each audience is to find appropriate content available over
the internet. Sometimes, it becomes really tough to find what we exactly
require. AI and ML help to make search results more similar and accurate as
per the user's requirement. Search optimization is one of the best and most
popular ML applications used for the media industry.

Conclusion
By this topic, we have understood how AI and ML are useful for the media and
entertainment industries. Each media and entertainment industry is using AI/ML
applications to enhance their business and maximize profit. Big data also helps AI
and ML to provide a huge amount of data for training ML models because machine
learning needs a vast amount of data to train their models. The more effective data
an ML model will take, the more efficient result it will generate.

How can Machine Learning be used


with Blockchain?
Machine Learning technology is one of the most trending technologies with amazing
capabilities, whereas Blockchain is the heart of all cryptocurrencies. Blockchain
technology is becoming popular day-by-day, as this allows any user to directly deal
with others through a highly secure decentralized system without requiring any
intermediatory. Machine Learning can be applied with Blockchain technology to
make it more efficient and better. We will see how machine learning and Blockchain
can be combined to get maximum results in this topic. Before starting, let's first
understand the basics of both technologies.
What is Blockchain?
Blockchain can be defined as a shared, immutable digital ledger that allows
storing transactions and tracking assets within a highly secure network. Here
the assets can be tangible (house, car, cash, land) or intangible (patents, copyright,
brandings, intellectual property). As blockchain is immutable, which means once
entered, data is irreversible.

Simply, we can understand blockchain as a type of distributed database system that


stores any type of data, which is very difficult to hack, change or cheat the system.
The main difference between a conventional database and a blockchain is that
database stores data into tables, whereas a blockchain stores data into blocks that
are chained together.

Blockchain is a decentralized system, which means it is not maintained by a


centralized entity (individual, organization, or any group); rather, it is maintained by a
distributed network.

PlayNext
Unmute

Current Time 0:00

Duration 18:10
Loaded: 0.37%
Â
Fullscreen
Backward Skip 10sPlay VideoForward Skip 10s

A blockchain can store different types of information, but mainly this technology is
used behind cryptocurrencies such as Bitcoin.

Components of Blockchain
o Blocks: Each blockchain is made up of several blocks, where each block has
three elements:
o Data
o Nonce
o Hash
o Miners: Miners are used to create new blocks through mining.

Nodes: A node can be understood as a device that contains a copy of the blockchain.
For a complete transaction, there are different nodes, and each node owns a copy of
the blockchain.

How does Blockchain Work?


o Whenever a transaction occurs, it is stored as a block in the chain.
Whenever a new transaction occurs, it is saved as a block. The data block can
store information as per your choice, such as Who, What, When, Where, how
much, and any condition, such as the temperature of a food shipment.
o Each block is connected to the ones before and after it.
Each block is connected and forms a chain, and changes its positions as a
change of ownership. Each block confirms the exact time of the transaction
and is connected in such a secure way that no block can be altered or inserted
between the two existing blocks.
o Transactions are blocked together in an irreversible chain.
The security of the complete blockchain is strengthened by each newly added
block that verifies its previous block. In such a way, blockchain becomes
immutable, and hence each transaction is irreversible.

How did Machine Learning come into Play


with Blockchain?
Machine learning can be understood as a technology that learns from past data
and improves performance with new data. Hence, we can say it is self-adaptive
technology, and we don't need to add new rules manually. We can understand it with
one of the popular examples of machine learning, "Spam Detection". It is software
that automatically improves its performance of detecting spam and junk emails over
time. It does this with the help of an underlying algorithm that helps it learn from
data and make predictions on data.

When such capabilities of machine learning are combined with blockchain, it


generates great opportunities and benefits for its users.

By using ML to govern the blockchain, the security of the chain can be enhanced to a
great extent. Moreover, as Machine learning work better with lots of data, it can
generate a great opportunity to build better models by taking advantage of the
decentralised nature of blockchains.

The combination of both technologies can be a game-changer for the finance


and insurance industries to identify fraud transactions.

Machine Learning in Blockchain-Based


Application
1. Enhanced Customer Service
As customer satisfaction is one of the major challenges for each organization,
companies are using different ML techniques to enhance their customer services. By
combining Machine Learning with a blockchain-based application, customer services
can be enhanced to a great extent.

2. Surveillance System
Security is an important concern of the people because of the increasing crime rate
in the present scenario. Machine learning and Blockchain technology can be used for
surveillance, where blockchain can be used for managing continuous data, and ML
can be used for analyzing the data.

3. Smart Cities
Nowadays, Smart cities are evolving day by day and helping people to enhance their
living standards by making their life easy. A smart city also involves machine learning
and blockchain technologies that play a crucial role. For example, a smart home
enabled with blockchain and Machine learning algorithms can be monitored easily
and can provide device personalization to each individual.

4. Trading (Reinforcement Learning)


As blockchain is the key technology among most of the popular cryptocurrencies
such as Bitcoin and Ethereum. These trading cryptocurrencies are becoming popular
amongst retail investors and large financial institutions. Nowadays, traditional trading
bots are embedded with powerful Machine Learning algorithms.

Reinforcement learning is a type of Machine learning commonly used with complex


games and simulation programs. Reinforcement Learning is a viable approach to
develop cryptocurrency trading strategies that are profitable and adaptive.

5. Optimizing Mining Strategies (Reinforcement


Learning)
In the blockchain, the mining process plays a vital role. This process involves
guessing a set of values to solve a function on a blockchain through different
computer resources. The miner who solves the function can update the blockchain
with valid pending transactions.

Taotao Wang, Soung Chang Liew, and Shengli Zhang authored a research paper,
where they presented how reinforcement learning can be used for optimizing
blockchain mining strategy for cryptocurrencies such as Bitcoin. In this paper, the
author shows a way to use a multidimensional RL algorithm that uses a Q-learning
technique for optimising cryptocurrency mining.

6. Tackling Cryptojacking (Deep Learning):


Another application of machine learning within the blockchain is for making it more
secure. As different computational resources are used to mine cryptocurrencies,
these can be targeted by the Cryptojackers who hijack these computational
resources. Nowadays, these attacks have become common and hence need higher
security. Different researchers have found a new method of detecting the presence of
malicious programs that may hijack computer resources. One of such methods
is SiCaGCN.

SiCaGCN is the system created by the researchers that identify the similarities
between a pair of code. It consists of components of neural networks and different
techniques of deep learning and the ML domain.

Benefits of Combining Blockchain and


Machine Learning together
Combining Machine Learning and Blockchain together can generate enormous
benefits for various industries. Below are some popular benefits of combining
Blockchain and Machine Learning for the Organization:

o Enhancing Security
Data in Blockchain is much more secured because of implicit encryption of the
system. It is the perfect system to store highly sensitive personal data, such as
personalized recommendations.
Although at its base, blockchain is secured, some applications or additional
layers that are using blockchain can be Vulnerable. For such a case, we can
take advantage of Machine learning. ML can help to predict the possible
breaches or security threats in blockchain apps.
o Managing the data Market
Different big companies such as Google, Facebook, LinkedIn, etc., have a
huge amount of data or large data pools, and this data can be very useful for
the AI processes. However, such data is not available to others.
But, by using Blockchain, various start-ups and small companies can access
the same data pool and same AI process.
o Optimizing Energy Consumption
Data Mining is a high-energy consuming process, and it is one of the major
struggles for different industries. However, Google has majorly solved this
issue with the help of Machine Learning. Google does this by training the
DeepMind AI so that it can reduce the energy consumption used for cooling
the data centres by approx. 40 %.
o Implementing Trustable Real-time Payment Process
By combing Blockchain and ML, the most trustworthy real-time payment
process can be implemented in the Blockchain environment.

Conclusion
With the above description, we can conclude that both Machine Learning and
Blockchain perfectly complement each other. Both these technologies can be used as
the pillars of future innovation.

Prerequisites to Learn Artificial


Intelligence and Machine Learning
Machine Learning (ML) and Artificial Intelligence (AI) are the most popular
technologies in the 21st century. Most beginners and professionals want to make a
career in these fields as both are the most lucrative fields of the computer science
and engineering sector.

Artificial Intelligence (AI) is a field of computer science that deals with developing
intelligent machines that can behave like humans, such as speech recognition,
learning and planning, text recognition, etc. On the other hand, machine learning is a
subset of artificial intelligence that enables the machines to use past data or
experience and make a prediction and learn more accurately. Hence, both
technologies are very much important to groom your skills and career in the current
era. To do the same, you must know the primary requirements or prerequisites to
enter in AI and ML fields. Let's start with a quick introduction to AI and ML with
important prerequisites.

What is Artificial Intelligence?


Artificial Intelligence is the branch of computer science and engineering that helps us
to develop humans like intelligent computers or machines. It is a field of study where
we learn how the human brain thinks, learn, decide and work to solve various
problems and then, based on results; it develops intelligent software and systems.

Now, we will discuss some important prerequisites to learn Artificial Intelligence (AI).
Here is a list of some prerequisites as follows:

PlayNext
Unmute

Current Time 0:00


/

Duration 18:10
Loaded: 0.37%
Â
Fullscreen
Backward Skip 10sPlay VideoForward Skip 10s

Prerequisites to learn Artificial Intelligence (AI)

o Strong Knowledge of Mathematics: Before getting started with AI, you must
have sound knowledge of various mathematical concepts such as probability,
statistics, algebra, matrix, calculus, etc. Mathematics is very important to
build logical capability that is widely used in developing software and systems.
o Good knowledge of Programming knowledge: To learn the fundamentals
of writing codes, you must have sound knowledge of programming languages
like Python, R, LISP, Java, C++, Prolog, etc.
o Strong Analytical skills: Analytical skills refer to the ability to think critically,
analyze data, decision making capability as well as solve complex problems.
These important skill sets involve taking in new information and mentally
processing it in a productive manner. Hence, if you are planning to jump into
the AI domain, you must build your analytical skills to a great extent.
o Ability to understand complex algorithms: Artificial Intelligence is a field
that completely depends on various algorithms that tell computers how to
learn and take actions further. There are a few important algorithms that you
must know before getting started with AI as follows:
o Classification algorithms
o Regression algorithms
o Clustering algorithms
o Basic knowledge of Statistics and modelling: Statistical modelling is
defined as the use of mathematical models and statistical assumptions to
generate training data and predict outcomes for the future. A statistical model
is a collection of probability distributions on a set of all possible outcomes of
an experiment. We can say if anyone is looking to learn AI, then one must
enhance statistics and modelling knowledge.
In this way, you are now aware of a few common prerequisites to learn Artificial
Intelligence and ready to get started your career in this domain.

Now, we will discuss machine learning and important prerequisites to learning ML.
So, let's start with a quick introduction to Machine Learning technology.

What is Machine Learning?


Machine Learning is the branch of Artificial Intelligence that deals with enabling
computers/machines to learn and predict results based on past experience or
historical data without much human intervention.

If Artificial Intelligence helps in making intelligent system/software, then machine


learning enable them to learn from available sample data and predict outcomes
more accurately. Hence, we can say AI and ML are part of one another in different
aspects.

Types of Machine Learning


Machine Learning is primarily categorized into 3 types. These are as follows:

o Supervised ML
o Unsupervised ML
o Reinforcement ML

Applications of Machine Learning


Machine Learning is one of the buzzwords of the 21 st century. It is currently being
used in several applications in different industries such as healthcare, medicine,
transportation, social media, marketing, infrastructure, education, product
recommendation, self-driving cars, chatbots, etc.

All small as well as large size organizations want to implement machine learning
techniques in their business to grow more smartly than other competitors. Image
recognition and personal virtual assistance such as Alexa, SIRI, Cortana are the
most common examples of ML applications.

Machine Learning is a much-demanded technology in the IT sector. Most of the


newcomers want to make a career in this domain. Besides fresher, there are so many
experienced people who want to move to the ML industry to groom their skills and
make a career in this domain. As machine learning is a very new technology in the IT
sector, there are so many problems observed by ML experts, such as lack of
knowledge, lack of trained resources, lack of experience, etc. However, organizations
are continuously working to overcome these issues. Hence, if you are also planning
to move your career in machine learning, then there are some key prerequisites that
one should focus on firstly before getting started with ML.

Prerequisites to Learn Machine Learning (ML)


Since we have a basic understanding of Machine Learning and its associated
concepts, now it turns out to know the primary requirements to learn ML. Below are
a few prerequisites to get started with machine learning technology, which are as
follows:

Strong Knowledge of Mathematics:


Similar to Artificial intelligence (AI), machine learning also requires in-depth
knowledge of various mathematical concepts such as statistics, calculus,
probability, and linear algebra.

This is one of the most important prerequisites to learning ML. If you have sound
knowledge of mathematical concepts, you can easily build your own logic and
implement them in developing intelligent software to predict accurately.

Good understanding of Programming Languages:


If you want to grow rapidly in this domain, then you must have a good
understanding of programming languages such as Python, R, Java, C++, etc., to
implement the process. Programming languages help you to perform a basic
function such as:

o Defining and calling functions


o Collection of data
o Implementing loops with multiple variable iterators
o Implement various conditional statements such as if, if-else, etc.
o String formatting and passing statement, etc.

Hence, we can conclude if you are really planning to enter in ML domain, then you
must go with at least one programming language given above. This will not only help
you in learning ML but also help you in data modelling and analytics.

Strong knowledge of Data Analytics & Modeling:


Data modelling refers to the study of the structure of data sets to find hidden
patterns inside them. Machine Learning is a technology that is completely based on
the use of data and predictive data modelling. Hence, you must have a broad
knowledge of data and its properties to identify the errors in the ML models.

Conclusion
Machine Learning and Artificial Intelligence are currently the most popular
technologies, and in upcoming decades these technologies will be the core of the IT
sector. As a prerequisite, both AI and ML technologies require a sound knowledge of
basic mathematics concepts to implement in software or systems. You must have a
good catch on statistics, linear algebra, matrix, calculus, probability, programming
languages and data modelling. If you are confident in these areas, you can go ahead
to make your career in these fields. In this topic, we have discussed a few important
prerequisites to learn AI and ML. Hopefully, after reading this, you must have a clear
understanding of the first step to entering this domain.

List of Machine Learning Companies


in India
Machine Learning is one of the most popular technologies in the IT world and has
also become the first choice for most startups and other organizations. All
companies want to make their business automate, and Machine Learning helps them
to do so by developing smart software and system for predictions. Machine Learning
enables businesses to use the data for creating powerful solutions for specific
requirements. In India, the use of machine learning and its technologies are also
continuously increasing. Most of the small size, as well as large-sized established
companies, are seeking better future scope in this domain.
In this way, these organizations are shifting their business on this technology
through outsourcing. In the last decade, India has seen a rise in the number of ML
organizations around the world. In this topic, "List of Machine Learning Companies
in India", we will discuss some of the most promising companies that are adapting
ML technologies and applications to grow and automate their business across the
country.

1. TRIGMA
Trigma has been a leading provider of custom software development and
consultancy services for 12+ years with 200+ IT professionals. It aims to take client
business to a worldwide audience through smart technology, deep expertise, and
intelligence.

Trigma provides various services with a super speciality in the following technologies
and tools:

PlayNext
Unmute

Current Time 0:00

Duration 18:10
Loaded: 0.37%
Â
Fullscreen
Backward Skip 10sPlay VideoForward Skip 10s

o CMS
o AI and ML
o Infra and DevOps
o Cloud
o Mobility
o Quality assurance (QA)
o Web development
o IoT
o SEO, digital marketing & advertising
o Brand strategy consulting and brand design
o Custom software and support
o Social media and content creation, etc.

Overall rating:5.0

Global Location: Mohali, Vancouver, Las Vegas

Trigma India Address:

Plot No. 228,

JLPL Industrial Estate

Sector 82, Mohali

India - 140306

+919855885133

Clients:

Trigma deals with individuals as well as organizations with ambition and imagination
to unleash the power of IT for their business and ideas. Clients of Trigma
are Samsung, UNDP, Disney, Suzuki, British council, Whirlpool, Government of
India (GOI), Alera, Shell, History, Hero, Walmart, Pernod ricard, Abbott and
IcarAsia, etc.
Company Website: https://trigma.com/

2. Talentica Software
Talentica Software has primarily dealt with startups for two decades with 170+
technology products and 1000+ IT professionals. Talentica has a good track record in
providing custom software development services for data protection platforms.

Talentica helps you to choose technology, setup architecture, and leverage emerging
tools & trends in the following technologies:

o Artificial Intelligence and Machine Learning


o Blockchain
o IoT and connected devices
o Big data
o Augmented reality
o Mobile & wearable
o DevOps and Infrastructure
o UX/UI and Open source

Overall rating:4.8

Global Location: Company has 3 presences in Pune (India) and 1 in the USA.

Registered Office address:

B-7/8, Anmol Pride, Baner Road, Baner, Pune, Maharashtra 411045

Contact: +91-2040751111

Company Website: https://www.talentica.com/

Clients:

o Emtech
o Rupeek
o Rostify
o Citrus
o Wideorbit
o Mist
o Tailored Mail
o Realization
o Step Solutions, etc.

3. InApp
InApp is one of the leading companies that offer world-class mobile and web
application services for startups, SMBs and enterprises around the globe, with 300+
graduate & post-graduate engineers for 21 years.

InApp provides the services in various technologies and tools such as:

o Web application development


o Mobile app development
o Custom Software development
o Testing & QA
o DevOps
o AI and ML
o Manufacturing
o Retail and Ecommerce
o Education, etc.

Global Location:

o USA (California, North Carolina and Washington DC)


o Kanagawa (Japan)

India Location:

o Kerala (Trivandrum, Technopark Phase III) India


o Bangalore (India)

Overall rating:4.9

Clients:

o Align
o Axa
o Informatica
o Innotas by planview
o Pro unlimited
o MPulse, etc.

Company Website: https://inapp.com/

4. Prolitus
Since their inauguration in 2005, Prolitus has constantly been delivering cutting edge
technology to their client for developing the best enterprise solutions and
transforming their business.

The company is well known for its technology synergies which have successfully
moderated challenges faced by the clients. They consist of more than 200 techo-
functional professionals who aim to build market-leading advanced services and
solutions to grow clients' businesses efficiently.

The company offers services in Blockchain Consulting, Blockchain Application


Development, Exchange Development, OTC Exchange Platform, Wallet Development,
Cryptocurrency Development Services, STO Solutions and more.

Prolitus Partners:

o Amazon web services (AWS)


o Odoo
o Hyperledger
o Binance
o Solana

Prolitus clients

o Dalmia (Bros.) private limited and Dalmia healthcare


o Plus necessities
o Xeikon
o Modern coach factory Raebareli
o com
o Nasscom foundation
o tv
o Grocermax
o Apollo, etc.

Global Location: UAE, Qatar, and India

India Location: Stellar IT Park, Tower B, 5th Floor, Sector 62, Noida - 201309, Uttar
Pradesh, India.

+91 85952 04895

Company Website: https://www.prolitus.com/

5. Webtunix AI
Webtunix is a group of talented people who have a common aim ML as a service to
use data for helping organizations to solve complex business problems.

Webtuniz AI is primarily a service-based company that provides services on various


technologies such as Machine Learning, Artificial Intelligence, data science, Deep
learning, data annotation, data analytics, Python development, data
visualization, data scraping and cleaning, etc.

Webtunix AI works as an ML consulting company that deals with the most advanced
problems in data science and machine learning.

The main focus of this company is to make business automated using deep learning
techniques, which uses a huge amount of big data & ML libraries.

ML as a service features include:

o Sentiment analysis and automated classification of unstructured content.


o Behavioural analytics from frequency, past actions or actions of similar users.
o Recommendation system: Contextual information based on preferences, user
behaviours and content similarities
o Comparisons of Unstructured Content for end-user applications.
o Build Relationships between content items based on metadata, topics,
concepts, genres or entities (such as names of people, organizations and
locations)

Global location:
Webtunix AI is currently offering ML as a service in San Francisco, New York, Tampa,
Virginia, Dallas, Texas, Washington DC, USA, Ontario Canada, Denmark, UK, UAE,
Singapore, Germany, Netherlands, Italy, China, Nigeria, Bangalore, Delhi.

Why choose Webtunix AI


There are several reasons that make Webtunix AI best for your career development:

o Curiosity & Creativity


o Security
o Novel Ideas
o Privacy
o Infrastructure, etc.

Company Website: https://www.webtunix.com/

6. QBurst
QBurst is a leading software development and consulting organization which offers
cognitive solutions and custom software development services for SMBs companies
for 17 years. QBurst is currently present in 14 cities with 2500+ projects, 150+ active
clients and 2000+ employees globally.

QBurst provides services on various technologies, which include:

o Cloud enablement
o Data and AI (Machine Learning, Data Science, Big Data, Data visualization,
data engineering, Artificial intelligence. RPA, Computer vision, etc.)
o Digital marketing
o Digitalization
o End-to-end (UX/UI design, API management, Cybersecurity, QA Automation,
DevOps, Performance Monitoring, etc.)
o SaaS (Salesforce, Oracle, ServiceNow, SharePoint, Microsoft Solution, etc.)

Clients:Qburst has worked with so many clients in past decades. Some of the top
clients are Dell, Adani, Omron, Mercedes Benz, United Nations, Genesys, Airtel,
Concentrix, Qlik, Bajaj Allianz, Greenpeace, Spectrum brands, ABB, etc.

Global Location:
QBurst is currently serving in America, Europe, the Middle East, South Asia, East Asia
and Oceania.

India Location:

In India, QBurst is servicing at multiple locations, which include Trivandrum, Cochin,


Koratty, Calicut, Chennai and Bangalore.

Company Website:https://www.qburst.com/

7. ValueCoders
Valuecoders is an Indian software and consulting company established in 2004. It is
one of the top-rated and recognized software outsourcing companies with a team of
650+ IT professionals and 2500+ clients globally ranging from startups to Fortune
500 companies.

ValueCoders works with various technologies and platforms to lend flexibility to your
software development and outsourcing needs. Technologies on which VslueCoders
work are given below:

o Machine Learning (Chatbot, AI, ML and Tensorflow)


o Backend (.Net, Java, PHP, Laravel, Python, Node)
o Frontend and Full Stack (Angular, Vue, DevOps, React JS, Mean and Mern)
o Blockchain (Ethereum, smart contract and Hyperledger)
o Mobility (Android, iOS, React Native)

Services Provided by ValueCoders:


Valuecoders provide software outsourcing services to multiple industries globally,
which includes services such as:

o Healthcare
o Banking & Finance
o Retail and Ecommerce
o Media & entertainment
o Education and E-learning
o ISVs & Product firms
Global Location: North America, Asia Pacific region, Europe, Middle East & Africa,
India.

Company Website: https://www.valuecoders.com/

8. PixelCrayons
Pixelcrayons is a SaaS-based software IT outsourcing company that provides
software product development, digital transformation services, e-commerce
development services across the globe. Pixelcrayons is a 16+ years old company that
is running a business in 38+ countries with 450+ employees and 11500+ projects.

Being a software development outsourcing company, Pixelcrayons serve multiple


industries across the world, such as publishing & advertising, travel & tourism,
education & e-learning, transportation, social networking solutions and healthcare.

Why choose Pixelcrayons?


Pixelcrayons is a result-oriented Indian outsourcing company with a team of expert
software developers to complete clients' projects before service level agreement
(SLA). They also have Business analysts who analyze the needs of the software project
and then decide the quote. It involves documentation of the essentials, prototype
creation, product development, testing, market release, integration with existing
business practices, and ongoing technical support. Hence, it is one of the most
trusted outsourcing partners in India.

Company website: https://www.pixelcrayons.com/

Conclusion
Machine learning became an essential part of our technologies today. Without ML
technologies and applications, no one can compete in the industry. All small and
large companies are hiring ML engineers and data scientists to deliver a seamless
consumer experience around the globe. India is also continuously growing in
developing IT companies with ML solutions. We have concluded a few best-rated ML
and data science companies that have good repudiation in India as well as across the
world.

Mathematics Courses for Machine


Learning
Machine Learning is one of the advanced technologies in the IT world that requires
in-depth knowledge of mathematics concepts. Knowledge of mathematics is
essential to start a career in the machine learning domain. ML algorithms are entirely
based on mathematics concepts such as probability, statistics, linear algebra,
advanced calculus, etc. If anyone wants to accelerate their career in ML, they must
have to brush up and groom their mathematics skills as well. Although there are so
many courses available online but right guidance will let you to the right place to
achieve your goals.

Hence, in this topic, "Maths courses for Machine Learning", we will discuss a few best
courses available over the internet. Referring to these courses, you can enhance the
basic math skills required for entering the machine learning world. Below are some
criteria, based on which we are suggesting to follow given mathematics courses for
ML.

Criteria

o Course ratings are given by benefitted students


o Course coverage
o Trainer engagement
o Interesting lectures
o The review was suggested by various aggregators and forums.

Now, without wasting time, let's start discovering a few best online mathematics
courses for machine learning.
Best Online Mathematics courses for
Machine Learning
1. Mathematics for Machine Learning Specialization
2. Data Science Math Skills
3. Introduction to Calculus
4. Probabilistic Graphical Models Specialization
5. Statistics with R Specialization
6. Probability and Statistics
7. Mathematical Foundation for Machine Learning and AI

1. Mathematics for Machine Learning


Specialization
As per different reviews, this is one of the best courses provided by Coursers for a
better understanding of mathematics skills for machine learning. It covers almost all
mathematics topics required for ML. Moreover, this course aims to fill the gap and
build an intuitive understanding of mathematics.

PlayNext
Unmute

Current Time 0:00

Duration 18:10
Loaded: 0.37%
Â
Fullscreen
Backward Skip 10sPlay VideoForward Skip 10s

This course is categorised into 3-series as follows:

o In the first series, we will learn important concepts of linear algebra, vectors,
matrices and their relationship with data in ML.
o In the second series, we will focus on Multivariate Calculus, which helps you in
getting in-depth knowledge of optimizing fitting functions to get good fits to
data.
o The last and 3rd series of this course is Dimensionality Reduction with
Principal Component Analysis. This course enables you to implement entire
mathematics knowledge in real-time scenarios.

After completing all series, you will feel confident enough to start a career in machine
learning.

Course description:

o Mathematics for Machine Learning: Linear Algebra


o Mathematics for Machine Learning: Multivariate Calculus
o Mathematics for Machine Learning: PCA

What you will learn:


This course will help you to learn so many important mathematics concepts such
as principal component analysis, multivariate calculus, linear algebra (basics and
advanced), vector calculus, gradient descent, Python, dimensionality reduction,
eigenvalues and eigenvectors, etc.

Benefits of this course:


After completion of this course, you will earn Shareable Certificate and Course
Certificates. Further, you will also get the entire course agenda, such as recorded
video lectures, class notes, practice theoretical & programming assignments, Graded
Quizzes, etc.

Pre-requisites for this course:


If you are enrolling on this course, you must have matrix level mathematics
knowledge with a basic understanding of Python and NumPy.

Course Rating- 4.6 out of 5

Source- Imperial College London

Course duration- 16 weeks

Important link: Click here to enrol and know more about this course.

2. Data Science Math Skills


This course is offered by Duke University Durham (North Carolina). This course helps
you in building core concepts of algebra required for machine learning, such as
vocabulary, notation, concepts, and algebra rules.

Topics included in this course.

o Set theory
o Venn diagrams
o Properties of the real number line
o Sigma notation, interval notation and quadratic equations
o Concepts of a Cartesian plane, slope, and distance formulas
o Functions and graphs
o Instantaneous rate of change and tangent lines to a curve
o Logarithmic functions
o Exponential functions
o Probability
o Bayes Theorem

Benefits of this course:


You can earn a Shareable Certificate after successful completion of this course.

Pre-requisites:

To enrol on this course, you do not need a prior understanding of the maths
required for ML and Data Science.

Course Rating- 4.5 out of 5

Source- Duke University Durham (North Carolina)

Course duration- 13 hours

Important link: Click here to enrol and know more about this course package.

3. Introduction to Calculus
This is one of the highest-rated maths courses over the internet by David Easdown.
It covers the entire calculus concepts required for machine learning solutions.
Further, this course helps you to maintain a balance between theory and the
application of calculus.

This course is divided into 5-weeks plans as follows:

1st Week: Precalculus (Setting the scene)

2nd Week: Functions (Useful and important repertoire)

3rd Week: Introducing the differential calculus

4th Week: Properties and applications of the derivative

5th Week: Introducing the integral calculus

Benefits of this course:


Upon completion of this course, you will get an electronic Certificate on your
Accomplishments page.

Pre-requisites:
You must have a basic understanding of calculus and general mathematics concepts
to enrol on this course. This course is significant if you only want to Master yourself
in Calculus.

Rating- 4.8 out of 5

Course Provider- David Easdown (The University of Sydney)

Course Duration- 59 Hours

Important Link: Click here to enrol and know more about this course.

4. Probabilistic Graphical Models


Specialization
This course is offered by Stanford University, which provides a richframework for
probability distributions over complex domains: joint (multivariate) distributions over
large numbers of random variables that interact with each other.

This course is designed in a way that will help you to learn various important skills
such as inference, Bayesian Network, Belief Propagation, Graphical Model,
Markov Random Field, Markov Random Field, Markov Chain Monte Carlo
(MCMC), Algorithmsand Expectation-Maximization (EM) Algorithm.

The complete course includes three specializations, which are as follows:

Course 1- Probabilistic Graphical Models 1: Representation

Course 2- Probabilistic Graphical Models 2: Inference

Course 3- Probabilistic Graphical Models 3: Learning

Benefits:

o The course provides Sharable specialization and Certification after successful


completion of the code.
o Self-Paced Learning Option Adaptable and flexible Learning option
o 24*7 Availability of Course videos and readings.
o Different Practice Quizzes
o Assignments with Peer Feedback
o Quizzes with Feedback with Gradings
o Programming Assignments with a Grading system

Pre-requisites:
Before enrolling on this course, one should have a basic understanding of
mathematics and at least one programming knowledge.

Course Rating- 4.6/5

Course Provider- Daphne Koller (Stanford University)

Course duration- 4 Months (11 hours/week)

Important Link: Click here to enrol and know more information related to this
course.

5. Statistics with R Specialization


This course is offered by Duke University under the guidance of Mine Çetinkaya-
Rundel, David banks, Colin rundel, Merlise A Clyde.
This course helps you to learn to analyze and visualize data in R and create
reproducible data analysis reports, demonstrate a conceptual understanding of the
unified nature of statistical inference, perform frequentist and Bayesian statistical
inference and modelling to understand natural phenomena and make data-based
decisions. Further, it enables you to communicate statistical results correctly,
effectively, and in context without relying on statistical jargon, critique data-based
claims and evaluate data-based decisions, and wrangle and visualize data with R
packages for data analysis.

There are 5 Courses in this Specialization as follows:

o Introduction to Probability and Data with R


o Inferential Statistics
o Linear Regression and Modeling
o Bayesian Statistics
o Statistics with R Capstone

Extra Benefits:

o Shareable Specialization and Course Certificates


o Self-Paced Learning Option
o Course Videos & Readings
o Practice Quizzes
o Assignments with Peer Feedback & grades
o Quizzes with Feedback & grades
o Programming Assignments with Grades

Pre-requisites:
Before enrolling on this course, you must have prior knowledge of basic mathematics
concepts, and good interest in data analysis will be an advantage. Further, no
previous programming knowledge is mandatory to start this course.

Course rating: 4.6 out of 5

Course provider: Duke University

Course Duration: Approx. 7 months

Important Link: Click here to enrol and know more about this course.
6. Probability and Statistics
This course is offered by the University of London under the guidance of Dr James
Abdey. This course is specially designed for probability, descriptive statistics, point
and interval estimation of means and proportions, etc. It helps in building essential
skills for good decision making and predicting future results.

This course includes various topics:

o Dealing with Uncertainty and Complexity in a Chaotic World


o Quantifying Uncertainty With Probability
o Describing The World The Statistical Way
o On Your Marks, Get Set, Infer!
o To p Or Not To p?
o Applications

Extra benefits:
You will be provided with a Shareable Certificate after completion of this course.
Further, you will also get the entire course agenda, such as recorded video lectures,
class notes, practice theoretical & programming assignments, Graded Quizzes, etc.

Pre-requisites:
This course is specially designed for beginners; hence no mathematics and
programming knowledge is required to start this course.

Course rating: 4.6 out of 5

Course provider: University of London

Course duration: 16 hours

Important Link: Click here to enrol and know more about this course

7. Mathematical Foundation for Machine


Learning and AI
This course is designed by Eduonix Learning Solutions on Udemy. This course
enables you to learn the basic math concepts that are required for ML and also learn
to implement them in R and Python.

It provides you with detailed information on some important topics of Mathematics


such as Linear algebra, multivariate calculus, probability theory, etc.

Mathematics is one of the key players to develop programming skills, and this course
is designed in the exact same way to help you to master the mathematical
foundation required for writing programs and algorithms for AI and ML.

Course content
This course is categorised into 3 sections:

1) Linear Algebra:

It helps in understanding the parameters and structures of different ML algorithms.


Further, it gives the basic idea of neural networks also. It includes various topics as
follows:

o Scalars, Vectors, Matrices, Tensors


o Matrix Norms
o Special Matrices and Vectors
o Eigenvalues and Eigenvectors

2) Multivariate calculus

It helps in understanding the learning part of ML. It is what is used to learn from
examples, update the parameters of different models and improve the performance.

It includes various topics as follows:

o Derivatives
o Integrals
o Gradients
o Differential Operators
o Convex Optimization

3) Probability Theory
Probability theory is one of the important concepts that help us to make
assumptions about underlying data in deep learning and AI algorithms. It is
important for us to understand the key probability concepts

It includes various topics as follows:

o Elements of Probability
o Random Variables
o Distributions
o Variance and Expectation
o Special Random Variables

Extra benefits:
Along with a certificate of completion, video lectures and online study materials, this
course also includes projects and quizzes upon unlocking each section, which helps
you to solidify your knowledge. Further, this course not only helps in building your
own algorithms but also start putting your algorithms to use in your next projects.

Pre-requisites:
This course is designed for beginners as well as experienced levels. Further, basic
knowledge of Python is needed as concepts are coded in Python and R.

Course rating: 4.5 out of 5

Course provider: Eduonix Learning Solutions, Eduonix-Tech

Course duration: 4.5 hours

Important link:Click here to enrol and know more about this course.

Conclusion
Mathematics is always a key player in entering the programming domain. All
programming languages like Java, Python, R, Apex, C, etc., are required to have good
mathematics knowledge to build your logical concepts and algorithms. In this topic,
we have discussed a few important and best maths courses available online for
learning Machine learning and AI solutions. Hopefully, after reading this article, you
will be able to choose the best maths course to start your journey in ML and build
your career in the IT world.
Probability and Statistics Books for
Machine Learning
Probability and statistics both are the most important concepts for Machine
Learning. Probability is about predicting the likelihood of future events, while
statistics involves the analysis of the frequency of past events.

Nowadays, Machine Learning has become one of the first choices for most freshers
and IT professionals. But, in order to enter this field, one must have some pre-
specified skills and one of those skills in Mathematics. Yes, Mathematics is very much
important to learn ML technology and develop efficient applications for the business.
When talking about mathematics for Machine Learning, it especially focuses on
Probability and Statistics, which are the essential topics to get started with ML.
Probability and statistics are considered as the base foundation for ML and data
science to develop ML algorithms and build decision-making capabilities. Also,
Probability and statistics are the primary prerequisites to learn ML.

In this topic, we will discuss a few important books on Probability and statistics that
help you in making the ML process easy and implementing algorithms to business
scenarios too. Here, we will discuss some of the best books for Probability and
Statistics from basic to advanced levels.

Probability in Machine Learning


Probability is the bedrock of ML, which tells how likely is the event to occur. The
value of Probability always lies between 0 to 1. It is the core concept as well as a
primary prerequisite to understanding the ML models and their applications.
PlayNext
Unmute

Current Time 0:00

Duration 18:10
Loaded: 0.37%
Â
Fullscreen
Backward Skip 10sPlay VideoForward Skip 10s

Probability can be calculated by the number of times the event occurs divided
by the total number of possible outcomes. Let's suppose we tossed a coin, then the
probability of getting head as a possible outcome can be calculated as below
formula:

P (H) = Number of ways to head occur/ total number of possible outcomes

P (H) = ½

P (H) = 0.5

Where;

P (H) = Probability of occurring Head as outcome while tossing a coin.

Types of Probability
For better understanding the Probability, it can be categorized further in different
types as follows:

Empirical Probability: Empirical Probability can be calculated as the number of


times the event occurs divided by the total number of incidents observed.

Theoretical Probability:Theoretical Probability can be calculated as the number of


ways the particular event can occur divided by the total number of possible
outcomes.

Joint Probability:It tells the Probability of simultaneously occurring two random


events.

P(A ∩ B) = P(A). P(B)


Where;

P(A ∩ B) = Probability of occurring events A and B both.

P (A) = Probability of event A

P (B) = Probability of event B

Conditional Probability:It is given by the Probability of event A given that event B


occurred.

The Probability of an event A conditioned on an event B is denoted and defined as;

P(A|B) = P(A∩B)/P(B)

Similarly, P(B|A) = P(A ∩ B)/ P(A) . We can write the joint Probability of as A and B as
P(A ∩ B)= p(A).P(B|A), which means: "The chance of both things happening is the
chance that the first one happens, and then the second one is given when the first
thing happened."

We have a basic understanding of Probability required to learn Machine Learning.


Now, we will discuss the basic introduction of Statistics for ML.

Statistics in Machine Learning


Statistics is also considered as the base foundation of machine learning which deals
with finding answers to the questions that we have about data. In general, we can
define statistics as:

Statistics is the part of applied Mathematics that deals with studying and developing
ways for gathering, analyzing, interpreting and drawing conclusion from empirical data.
It can be used to perform better-informed business decisions.

Statistics can be categorized into 2 major parts. These are as follows:

o Descriptive Statistics
o Inferential Statistics

Use of Statistics in ML
Statistics methods are used to understand the training data as well as interpret the
results of testing different machine learning models. Further, Statistics can be used to
make better-informed business and investing decisions.
Best Probability and Statistics books for
Machine Learning
Probability and statistics both are equally important for learning Machine learning
technology, but the main question is regarding the best books or sources of learning
Probability and statistics for ML. Although there are so many books available over
the internet as well as offline stores choosing the best appropriate book is the main
problem for aspirants. There are a few best books on Probability and Statistics are
given as follows:

1. Probability for Statistics and Machine Learning


Authors of the Book:Anirban DasGupta

Price (Amazon):$118.15

Star Ratings: 3.6/5

Overview:This book is written by Anirban Das Gupta, which includes all


fundamental and advanced topics of Probability and Statistics for ML. As per the
different reviews, this is one of the best books available in both online and offline
modes. This book mainly consists of the unification of Probability, statistics, and
machine learning tools that provides a complete background for self-study and
future research in multiple areas.

Topic covered in this book:

o Review of Univariate Probability


o Multivariate Discrete Distributions
o Multidimensional Densities
o Advanced Distribution Theory
o Multivariate Normal and Related Distributions
o Finite Sample Theory of Order Statistics and Extremes
o Essential Asymptotics and Applications
o Characteristic Functions and Applications
o Asymptotic of Extremes and Order Statistics
o Markov Chains and Applications
o Random Walks
o Brownian Motion and Gaussian Processes
o Poisson Processes and Applications
o Discrete-Time Martingales and Concentration Inequalities
o Probability Metrics
o Empirical Processes and VC Theory
o Large Deviations
o The Exponential Family and Statistical Applications
o Simulation and Markov Chain Monte Carlo
o Useful Tools for Statistics and Machine Learning

2. Python for Probability, Statistics, and Machine


Learning
Authors of the Book:José Unpingco

Price (Amazon):$ 82.36

Star Ratings:4.4/5

This book is available with the latest Python version 3.6+, which includes all essential
areas of Probability, Statistics, and ML illustrated using Python. This book gives you
exposure to various machine learning methods and examples using different
analytical methods and Python codes which help you in deploying your theoretical
concepts into real-time scenarios. It also provides detailed descriptions of various
important results using modern Python libraries such as Pandas, Scikit-learn,
TensorFlow, and Keras. Many abstract mathematical ideas, such as convergence in
probability theory, are developed and illustrated with numerical examples.

Topics covered in this book:This book is divided into 5 chapters as follows:

o Getting Started with Scientific Python


o Probability
o Statistics
o Machine Learning
o Correction to: Probability

3. An Introduction to Statistical Learning


Authors of the Book:Gareth James, Daniela Witten, Trevor Hastie and Rob Tibshirani
Price (Amazon):$29.22

Star Ratings: 4.5/5

Overview: An Introduction to Statistical Learning with application in R is offered by


Springer in two editions. Statistics is one of the main toolkits for Machine learning
and data scientists' aspirants. This book provides a broad and less technical
treatment of key topics in statistical learning with the help of R. This book is suitable
for all users who want good exposure to data analysis with statistics learning.

This book is available in various languages such as Chinese, Italian, Japanese,


Korean, Mongolian, Russian and Vietnamese.

The authors of this book Gareth James, Daniela Witten, Trevor Hastie and Rob
Tibshirani, have divided this book into two editions.

Topics covered in this book:

1st Edition of this book covers the following topics:

o Sparse methods for classification and regression


o Decision trees
o Boosting
o Support vector machines
o Clustering

2nd edition of this book covers the following topics:

o Deep learning
o Survival analysis
o Multiple testing
o Naive Bayes and generalized linear models
o Bayesian additive regression trees
o Matrix completion

This book is available in both online and offline modes. Either you can download a
PDF of this book or also order it on the Amazon marketplace site.

Get this book: Click here to order this book online.

4. The Elements of Statistical Learning


Authors of Book: Jerome Friedman, Trevor Hastie, and Robert Tibshirani

Price:$84.95 (Amazon)

Star Ratings:4.6/5

Overview: The books illustrate important ideas in different fields such as medical,
finance, marketing, etc., which is a reference of a common framework.

As this book shows the statistical approach, hence it mainly focuses on explaining the
concepts rather than mathematics. It contains different examples of each topic with
different colour graphics.

This book is one of the best resources for Machine Learning professionals and one
who is interested in data mining concepts. The various concepts of the book range
from supervised to unsupervised learning.

It includes different important topics such as neural network, support vector


machine, Classification trees and boosting. This book also contains a chapter on
methods for "wide'' data (p bigger than n) along with multiple testing and false
discovery rates.

5. Probability and Statistical Inference


Author: Robert V. Hogg, Elliot Tanis, and Dale Zimmerman

Price on Amazon: $181.99

Star Rating: 4.9/5

Overview: This book is written and designed by three popular statisticians named
Robert V. Hogg, Elliot Tanis, and Dale Zimmerman. The latest edition of this book is
the tenth edition, which focuses on the existence of variation in each process, and
also helps readers to understand this variation with the help of Probability and
Statistics.

The book includes the applied introduction to Probability and statistics that
reinforces the mathematical concepts with different real-world examples and
applications. These examples also illustrate relevance to the key concepts of statistics.
The book's syllabus is designed for two-semester courses, but it can be completed in
a one-semester course only.

There is no requirement to have knowledge of Probability and statistics to read this


book, but sound knowledge of calculus is required.
This book includes popular concepts of Probability and statistics such as Probability,
Conditional Probability, Bayes' Theorem, statistical hypotheses, standard chi-square
tests, analysis of variance including general factorial designs, and some procedures
associated with regression, correlation, and statistical quality control, etc.

Conclusion
Machine learning is a very broad technology that has so many concepts related to
mathematics and computer programming; based on that, ML can be used to build
intelligent software & system for future prediction. If you are very much confident in
basic and advanced mathematics such as Probability and statistics, then you can
perform better in this industry. Hopefully, this topic will help you to select the best
books for Probability and statistics.

Risks of Machine Learning


Machine Learning is one of the most trending technologies for IT professionals as
well as business tycoons. Almost all small, as well as large-sized companies want to
run their business using machine learning technology. ML systems have various
disruptive capabilities in different sectors such as healthcare, finance, banking,
marketing, infrastructure, trading, IT, etc.

Although implementing machine learning technology in your business can be


difficult and challenging but having deep knowledge of machine learning concepts
and their algorithms makes you capable of implementing ML systems significantly.
Although machine learning has become an essential part of today's technology and
businesses, still there are so many risks found while analyzing ML systems by data
scientists and machine learning professionals. These ML risks may be such as security
risk, poor data quality, overfitting, data biasing, lack of strategy and experience, etc.
In this topic, "Risks of Machine Learning", we will discuss various risks associated
with Machine Learning systems and how can we access machine learning risks. So,
let's start with a quick introduction to machine learning and then important risks
associated with ML systems.

What is Machine Learning?


Machine Learning is defined as the sub-branch of artificial intelligence (AI) and
computer science that deals with making systems capable of automatically learning,
predicting, and improving from historical data. It makes machines more intelligent,
improving with new data without human intervention.

PlayNext
Unmute

Current Time 0:00

Duration 18:10
Loaded: 0.37%
Â
Fullscreen
Backward Skip 10sPlay VideoForward Skip 10s

Types of Machine Learning


Machine Learning help to solve different complex business problems, and based on
learning methods, it can be categorised into mainly four types. These are as follows:

o Supervised Machine Learning


o Unsupervised Machine Learning
o Semi-Supervised Machine Learning
o Reinforcement Learning

Applications of Machine Learning


Machine learning uses a huge amount of structured as well as unstructured data and
enables a computer system to predict accurately future events. Machine learning is a
broad term and applicable in various industries and have so many applications as
well. Below is a list of a few important ML applications:

o Healthcare and medicine


o Finance & banking
o Marketing and trading
o Personal virtual assistant
o Speech recognition, text recognition and image recognition
o Traffic prediction
o Product recommendation
o Self-driving cars
o Email spam and filtering
o Fraud detection
o Automatic language translation

Risks of Machine Learning


Nowadays, Machine Learning is playing a big role in helping organizations in
different aspects such as analyzing structured and unstructured data, detecting risks,
automating manuals tasks, making data-driven decisions for business growth, etc. It
is capable of replacing the huge amount of human labour by applying automation
and providing insights to make better decisions for assessing, monitoring, and
reducing the risks for an organization.

Although machine learning can be used as a risk management tool, it also contains
many risks itself. While 49% of companies are exploring or planning to use machine
learning, only a small minority recognize the risks it poses. In which, only 41% of
organizations in a global McKinsey survey say they can comprehensively identify and
prioritize machine learning risks. Hence, it is necessary to be aware of some of the
risks of machine learning-and how they can be adequately evaluated and managed.

Below are a few risks associated with Machine Learning:

1. Poor Data

As we know, a machine learning model only works on the data that we provide to it,
or we can say it completely depends on human-given training data to work. What we
will be input that we will get as an output, so if we will enter the poor data, the ML
model will generate abrupt output. Poor data or dirty data includes errors in training
data, outliers, and unstructured data, which cannot be adequately interpreted by the
model.

2. Overfitting

Overfitting is commonly found in non-parametric and non-linear models that are


more flexible to learn target function.

An overfitted model fits the training data so perfectly that it becomes unable to learn
the variability for the algorithm. It means it won't be able to generalize well when it
comes to testing real data.

3. Biased data

Biased data means that human biases can creep into your datasets and spoil
outcomes. For instance, the popular selfie editor FaceApp was initially inadvertently
trained to make faces "hotter" by lightening the skin tone-a result of having been fed
a much larger quantity of photos of people with lighter skin tones.

4. Lack of strategy and experience:

Machine learning is a very new technology in the IT sector; hence, less availability of
trained and skilled resources is a very big issue for the industries. Further, lack of
strategy and experience due to fewer resources leads to wastage of time and money
as well as negatively affect the organization's production and revenue. According to
a survey of over 2000 people, 860 reported to lack of clear strategy and 840 were
reported to lack of talent with appropriate skill sets. This survey shows how lack of
strategy and relevant experience creates a barrier in the development of machine
learning for organizations.

5. Security Risks

Security of data is one of the major issues for the IT world. Security also affects the
production and revenue of organizations. When it comes to machine learning, there
are various types of security risks exist that can compromise machine learning
algorithms and systems. Data scientists and machine learning experts have reported
3 types of attacks, primarily for machine learning models. These are as follows:

o Evasion attacks:These attacks are commonly arisen due to adversarial input


introduced in the models; hence they are also known as adversarial attacks.
An evasion attack happens when the network uses adversarial examples as
input which can influence the classifiers, i.e., disrupting ML models. When a
security violation involves supplying malicious data that gets classified as
genuine. A targeted attack attempts to allow a specific intrusion or disruption,
or alternatively to create general mayhem.
Evasion attacks are the most dominant type of attack, where data is modified
in a way that it seems as genuine data. Evasion doesn't involve influence over
the data used to train a model, but it is comparable to the way spammers and
hackers obfuscate the content of spam emails and malware.
o Data Poisoning attacks:
In data poisoning attacks, the source of raw data is known, which is used to
train the ML models. Further, it strives to bias or "poison" the data to
compromise the resulting machine learning model's accuracy. The effects of
these attacks can be overcome by prevention and detection. Through proper
monitoring, we can prevent ML models from data poisoning.
Model skewing is one the most common type of data poisoning attacks in
which spammers categorise the classifiers with bad input as good.
o Model Stealing:
Model stealing is one of the most important security risks in machine learning.
Model stealing techniques are used to create a clone model based on
information or data used in the training of a base model. Why we are saying
model stealing is a major concern for ML experts because ML models are the
valuable intellectual property of organizations that consist of sensitive data of
users such as account details, transactions, financial information, etc. The
attackers use public API and sample data of the original model and
reconstruct another model having a similar look and feel.

6. Data privacy and confidentiality

Data is one of the main key players in developing Machine learning models. We
know machine learning requires a huge amount of structured and unstructured data
for training models so they can predict accurately in future. Hence, to achieve good
results, we need to secure data by defining some privacy terms and conditions as
well as making it confidential. Hackers can launch data extraction attacks that can fly
under the radar, which can put your entire machine learning system at risk.

7. Third-party risks

These types of security risks are not so famous in industries as there are very minimal
chances of these risks in industries. Third-party risks generally exist when someone
outsources their business to third-party service providers who may fail to properly
govern a machine learning solution. This leads to various types of data breaches in
the ML industry.

8. Regulatory challenges

Regulatory challenges occur whenever a knowledge gap is found in an organization,


such as teammates do not aware of how ML algorithms work and create decisions.
Hence, a lack of knowledge to justify decisions to regulators can also be a major
security risk for industries.

How can we assess Machine Learning


Risks?
Machine learning is the hottest technology in the IT world. Although ML is being
used in every industry, it has some associated risks too. We can also access these
risks when the ML solution is implemented into your organization. Below are a few
important steps to assess machine learning risks in your organization. These are as
follows:

o Implement a machine learning risk management framework instead of a


general framework to identify the risks in real-time scenarios.
o By providing training to employees for ML technologies and giving them the
knowledge to follow protocols for effective risk management in ML.
o By developing assessment criteria to identify and manage the risks in
business, we can assess the risks in business.
o ML Risk can also be assessed by adapting the risk monitoring process and risk
appetites regularly from past experience or feedback of customers.

Hence, machine learning risks can be identified and minimized through appropriate
talent, strategy and skilled resources throughout the organization.

Conclusion
There is no surprise if we say machine learning is a continuously growing technology
that is employed in so many industries to make business automated and faster. But
as well, as we have recently seen, there are some risks also associated with machine
learning solutions. However, data scientists and ML experts are continuously
researching more on ML technology and developing new solutions for improving it.
In this topic, we have discussed a few important risks associated with ML solutions
when implementing them in your business and steps to assess these risks as well.
Hopefully, after reading this topic, you have in-depth knowledge of various risks
associated with machine learning.

Best Laptops for Machine Learning


If you are preparing to become a machine learning engineer, then apart from a good
knowledge of machine learning algorithms and concepts, it is also necessary to
choose the best suitable performance-oriented laptop/computer.

There are so many popular brands in the market who claims their laptops as the best,
but you should not stop just by here to see their name and reputation only. Instead,
you must do a bit of research before purchasing the best laptop for a machine
learning application. Further, apart from configuration and features, budget is also a
crucial factor when purchasing a laptop. In this article, ''Best Laptops for Machine
Learning'', we will discuss various laptops with their features and configurations of
GPU and RAM.

Before purchasing any laptop for machine learning, we must be aware of a few
important factors such as portability, RAM, CPU, GPU, etc. So, let's start with a
quick overview of these factors.

1. Portability: This is one of the most important factors when you are purchasing the
best suitable device for Machine Learning especially. However, if you do not have any
concern with portability, then you can go with a personal computer. Nowadays, all
companies are following remote working culture, so portability becomes a significant
factor when purchasing a device.

Backward Skip 10sPlay VideoForward Skip 10s

The Higher the Processing Power, the heavier is the laptop. Now, this can mean a lot
of things.

o More RAM leads to More Weight


o More Battery leads to More Weight
o Larger Screen Size leads to More Weight
o Higher the Power Lower the Battery Life.

2. RAM (Random Access Memory): It is highly recommended to purchase at least a


16 GB RAM laptop for ML, but if you can afford more money, then always purchase a
32 GB RAM laptop. If RAM is less, there would be many problems when performing
multitasking.
3. CPU (Central Processing Unit): Always choose a more powerful and high-
performance deliverable device specifically for machine learning. So, it is
recommended to go with processors above Intel Corei7 7th Generation.

4. GPU: This is one of the key factors required for solving complex matrix problems.
In machine learning and deep learning, there is a various neural network that is
computationally more intensive. Hence, GPU becomes important for enabling parallel
processing. Any task that takes months or weeks to perform, you can complete
within a few hours only with the help of a GPU.

5. Storage: Although storage matters when purchasing a laptop, but still if you feel
less storage, you can opt for cloud storage options too. Further, a minimum of 1TB
HDD is advised while purchasing any laptop, especially for Machine Learning.

6. Operating System (OS): When talking about operating systems, you can go to
either Linux, windows, or Mac too.

List of Best Laptops for Machine Learning


1. Lambda TensorBook
2. GIGABYTE G5 GD
3. Apple MacBook Pro 15″
4. Acer Nitro 5 AN515
5. ASUS ROG Strix GL702VS
6. Acer Predator Helios 300
7. Razer Blade 15
8. MSI P65 Creator-654 15.6″

Now, we will discuss in brief all the above-listed laptops individually.

1. Lambda TensorBook
This is one of the best laptops with out-of-box functionalities and is pre-installed
with TensorFlow and PyTorch. This laptop is specially designed for deep learning
that comes with Lambda Stack, which includes frameworks like TensorFlow and
PyTorch. Lambda Stack makes upgrading frameworks easy such as ubuntu,
TensorFlow, PyTorch, Jupyter, nvidia cuda, and cuDNN.

Features and Specifications:

o GPU: RTX 3080 Super Max-Q (8 GB of VRAM).


o CPU: Intel Core i7-10870H (16 threads, 5.00 GHz turbo, and 16 MB cache).
o Memory: 64 GB of DDR4 SDRAM.
o Storage: 2 TB (1 TB NVMe SSD + 1 TB of SATA SSD).
o Operating system: Ubuntu 20.04 and/or Windows 10 Pro.
o Link to buy: Click here

2. GIGABYTE G5 GD
Gigabytes have always been the first choice for all data scientists, machine learning
professionals as well as gamers. This laptop comes under 1000$ and is available in
various online as well as offline stores. In this price range, this laptop comes with a
very nice set of specifications.

If you are really looking for the most affordable laptop for machine learning and
gaming, then this laptop will fulfill all your requirements.

Features and Specification:

o Processor: Intel Core i5 up to 4.5 GHz.


o Memory: 16 GB DDR4.
o Hard Drives: 512 GB NVMe SSD.
o GPU: NVIDIA GeForce RTX 3050 Ti 4 GB.
o Computing Power: 8.6
o Ports: 1x HDMI 2.0, 1x USB 3.1 Type-C, 2x USB 3.1, 1x USB 2.0.
o OS: Windows 10 Home.
o Weight: 4.80 lbs.
o Display: 15.6, 1920 x 1080.
o Connectivity: WiFi 802.11ax, Gigabit LAN (Ethernet), Bluetooth.
o Battery life: Average ~ 4 hours.
o Link to buy: Click here

3. Apple MacBook Pro 15″


This is also one of the most popular multi-tasking laptops for Machine Learning and
Deep Learning. This becomes special for Apple lovers who don't want to compromise
with the Apple brand.

It comes with both 14-inches as well as 16-inches displays and is also the best option
for machine learning professionals. Although MacBook is made of aluminum, so it is
quite expensive than other laptops. The price range of the Apple MacBook Pro varies
from 2600 USD to 3000 USD.

Features and Specifications:

o Processor: 2.6GHz 6-core Intel Core i7


o Memory: 16GB of 2400MHz DDR4 onboard memory.
o Hard Drives: 256 GB SSD/512 GB SSD and configurable to 512GB, 1TB, 2TB, or
4TB SSD.
o GPU: Radeon Pro 555X with 4GB of GDDR5 - Intel UHD Graphics 630.
o OS: macOS
o Weight: 4.02 pounds (1.83 kg)3
o Display: 13.30-inch and 2560x1600 pixels and 15.4″ Retina Display IPS
Technology 2880×1800.
o Battery life: Up to 10 hours wireless web.
o Other Features: Voice Control, VoiceOver, Zoom, Increase Contrast, Reduce
Motion, Siri and Dictation, Switch Control, Closed Captions, and Text to
Speech
o Link to buy: Click here

4. Acer Nitro 5 AN515


This laptop comes with a beautiful exterior and provides premium looks at an
affordable price. With an Nvidia RTX 3050 Ti GPU, the Acer Nitro 5 is great for PC
gaming on the go.

Although this laptop is specially designed for gaming professionals, still, it is also one
of the best choices among all data scientists and machine learning professionals. The
price range of Acer Nitro 5 AN515 varies between 1300-1400 USD and is available on
various online as well as offline stores.

Features and Specification:

o Processor: Intel Core i5 up to 4.5 GHz.


o Memory: 16 GB DDR4.
o Hard Drives: 512 GB NVMe SSD.
o GPU: NVIDIA GeForce RTX 3050 4 GB.
o Computing Power: 8.6
o Ports: 1x HDMI 2.0, 1x USB 3.1 Type-C, 2x USB 3.1, 1x USB 2.0.
o OS: Windows 10 Home.
o Weight: 4.85 lbs.
o Display: 15.6, 1920 x 1080.
o Connectivity: WiFi 802.11ax, Gigabit LAN (Ethernet), Bluetooth.
o Battery life: Average ~ 4 hours.
o Link to buy: Click here

5. ASUS ROG Strix GL702VS

This laptop is looked like a gaming laptop, but one of the best laptops for AI and
Machine Learning; it is powered by some of AMD's finest desktop hardware at a low
price. This laptop came with a big screen and was outfitted with a Pascal GPU and
Kaby lake processor.

The price range of ASUS ROG Strix GL702VS is between 1600 USD to 1700 USD.

Features and Specification:

o Processor: 3GHz AMD Ryzen 7 1700 (8-core, 16MB cache)


o RAM: 16GB DDR4
o Storage: 256GB SanDisk SSD , 1TB hard disk
o Display: 17.3″, 1,920 x 1,080 non-touch IPS
o GPU: AMD Radeon RX 580 4GB
o Battery Life: Average ~ 3 hours.
o Operating System: 3GHz AMD Ryzen 7 1700 (8-core, 16MB cache)
o Weight: 2.9 Kg.

6. Acer Predator Helios 300

It is one of the best laptops under a $2K budget and ideal for ML professionals who
want Intel processors, excellent RAM size, and RTX 30X GPUs.

It comes with great features, including a Full-size Island-style RGB Backlit Keyboard
with Numeric Keypad, Dual front-facing stereo speakers with dual digital
microphones, 15.6" Full HD (1920 x 1080) Widescreen LED-backlit IPS Display with
16:9 aspect ratio, etc. It is best suited for Gaming, business, personal, and ML
projects.

Feature Specifications

o Screen Size: 15.6 Inches


o Operating System: Windows 10 Home
o Human Interface Input: Keypad, Keyboard
o CPU Manufacturer: Intel
o Color: Black
o Hard Disk Size: 1024 GB
o Memory: 64GB DDR4 3200MHz
o Screen Size: 15.6 inches
o Operating System: Windows 10 Home
o Link to buy: Click here

7. Razer Blade 15

The next best laptop for machine learning applications is Razer Blade 15 series
laptop. It is specifically used for multimedia, business purpose, gaming, and building
ML applications.

This laptop is available in 15 inches screen with classic black color and beautiful
design. It is great for machine learning projects as it comes with an i7 core processor.
It comes with a dedicated Graphics card.

Feature Specifications:

o Processor: Core i7
o Memory: 16 GB DDR4
o Display: Available in 15 inches Screen
o Weight: 3 kg 920 g
o Storage: 1 TB in Hard Disk Drive
o OS: Windows 11 Home
o Link to buy: Click here

8. MSI P65 Creator-654 15.6″

While looking for the best laptop for Machine learning, MSI P65 can't be ignored.
MSI is one of the popular brands that offer a great range of best laptops. It is also
known for providing the best laptops for Gaming. The best thing about this laptop is
its processor with high processing power and great performance with the impressive
screen.

Feature Specifications:

o Processor: Core i9
o Memory: 32GB RAM DDR4
o Display: Available in 15.6 inches Screen with 4K display
o Battery Life: 4 Kilowatt Hours
o Weight: 1 Kg 900g
o Storage: 1 TB in Hard Disk Drive
o OS: Windows 10 Pro
o Link to buy: Click here

Conclusion
In this topic, we have discussed various laptops suitable for machine learning
professionals as well as data scientists. However, choosing the best laptops depends
on your project as well as your budget; for e.g., if you are looking for laptops with
high performance regardless of look and feel, then you can prefer Apple MacBook
Pro 15.

Machine Learning in Finance


Machine learning is one of the most popular technologies in this digital era. It is a
subfield of artificial intelligence that allows machines to learn and predict accurately
without being much human intervention. It is being used almost everywhere, such in
sectors including finance, marketing, trading, healthcare, banking, infrastructure,
education, etc. Every industry wants to move its business with machine learning and
associated technology.

Due to the popularity of machine learning across the world, all organizations are
adopting this technology. Similar to other industries, the finance sector also has seen
exponential growth in the use cases of machine learning applications to get better
outcomes for both consumers and businesses. In this topic, "Machine Learning in
Finance", we will discuss various important concepts related to the finance industry
using machine learning algorithms, benefits of ML in finance, use cases of ML in
finance, etc. Before starting this topic, firstly, we will understand the basic
introduction to machine learning and its relation to the finance sector.

What is Machine Learning?


Machine Learning is a subset of artificial intelligence (AI) that allows computer
software or algorithms to learn and predict accurately for the future. It enables
machines to learn from past experience or old data, and based on that; it predicts
outcomes.

Types of Machine Learning


Based on various learning methods, machine learning is primarily categorized into 4
types. These are as follows:
PlayNext
Unmute

Current Time 0:00

Duration 18:10
Loaded: 0.37%
Â
Fullscreen
Backward Skip 10sPlay VideoForward Skip 10s

o Supervised Learning
o Unsupervised Learning
o Semi-supervised Learning
o Reinforcement Learning

Machine Learning in Finance Industry


In recent few years, the use of machine learning in the finance sector has
exponentially increased, and it is considered the key factor in various financial
services and applications such as credit score calculation, personal loans, mortgage,
risk categorization of the customer, etc.

Initially, machine learning was adopted by very few financial service providers, but in
recent past years, the use of machine learning and its application has been seen in
several areas of the finance industry like banks, fintech, banking regulators, insurance
sectors, trading, etc.

Further, with the rise in big data, machine learning in finance has become more
prominent; hence leading banks and other financial services are deploying ML
technologies to optimize portfolios, streamline their business, and manage financial
assets across the globe.

Why Machine Learning in Finance?


Machine Learning is the technology that helps machines to be automated so that
they can learn and predict accurately. Also, with the integration of big data, it is
being used to handle large and complex volumes of data in the finance industry.

In the finance sector, machine learning algorithms are used to detect fraud, money
laundering activities, trading activities, and various financial advisory services
to investors. It can analyze millions of data sets within a short time to improve the
outcomes without being explicitly programmed.

Below are a few reasons to use Machine Learning in the finance industry:

o Enhanced revenue owing to better productivity and improved user experience.


o Since machine learning is an automatic process, so it requires very low
operational cost.
o It provides improved reinforced security and better compliance.

Use cases of Machine Learning in Finance


Machine Learning is being used in the finance industry to make businesses
automated and more secure.

Here are a few important use cases where ML algorithms are being used in the
finance industry as follows:

1. Financial Monitoring
2. Process automation
3. Secure transaction
4. Risk management
5. Algorithmic trading
6. Financial regulators and advisory
7. Customer data management
8. Decision making and investment prediction
9. Customer service improvement
10. Customer retention program
11. Marketing

1. Financial monitoring
Financial monitoring is a monitoring process by which financial analyst prevents
money laundering, enhance network security, detect flags, etc. hence, machine
learning helps the analyst to provide improved financial monitoring services to
clients.

2. Process automation
Machine Learning has replaced most of the manual work in fiancé sectors by
automating repetitive tasks through intelligent process automation for enhanced
business productivity. Further, with automation, organizations have achieved
improved customer service experience at a reduced cost.

Chatbots, auto-fill forms, employee training gamification, etc., are a few popular
examples of process automation in the finance sector.

3. Secure transaction
Since all banking and finance activities are mostly happening with digital payment
systems, hence the chances of transactional fraud also increased in a few years.
Machine learning has reduced the risk of transactional fraud as well as a number of
false rejections.

4. Risk Management
Financial Sector is one of the most sensitive industries that may involve lots of risky
situations if not managed in a perfect manner. Financial Sector is about lots of cash
or credit transactions between different institutions or banks and their customers.
Due to this reason, there are various chances of being mishandled.

However, to reduce such risky situations, machine learning provides security to


institutions by analyzing the huge volume of data sources. To achieve this, the ML
system goes through different levels and also analyzes the personal information of
users to reduce the chances of risks.

We can understand ML in financial risk with an example of a lending loan. The


lending loan to an individual or an organization firstly goes through a machine
learning process, where the system analyzes the user's or organization's previous
data and personal information, which could prevent the fraudulent borrowers from
lending the loan.

5. Algorithmic Trading
Algorithmic trading is one of the best use cases of Machine Learning in the Finance
sector. In fact, Algorithmic Trading (AT) has become a dominant force in the global
financial markets.

Machine learning allows the trading companies to make decisions after analyzing the
trade results and closely monitoring the funds and news in real-time. With real-time
monitoring, it can detect patterns of the stock market going up or down.

Some of the other advantages of algorithm trading involve:

o It enhances accuracy with reduces the chances of mistakes.


o ML solutions allow automatic and simultaneous checks of different market
conditions.
o It greatly removes the Humans errors.

6. Financial regulators and advisory


Machine Learning also provides various ML-powered apps to their customers in the
financial sector, which can help customers by offering the benefits of advice and
guidance.

ML algorithms used in these apps enable the customers to keep an eye on their daily
spending on the app and also allow them to analyze this data in finding their
spending patterns and areas where they can save their money.

One of the great examples of such ML apps is Robo-advisor, one of the rapidly
growing apps in this sector. These advisors work as regular advisors, and they
specifically focus on the target investors with limited resources who want to
efficiently manage their funds. These ML-based Robo-advisors use traditional data
processing techniques for building financial portfolios and solutions, for example,
trading, investments, retirement plans, etc., for their users.

7. Customer data management


For each bank and financial institution, data is one of the most crucial resources.
Efficient data management helps a business to be successful and achieve growth.

But nowadays, financial data has become very vast due to its different sources, such
as social media activities, transactional details, mobile transactions, and market data.
Hence it has become very difficult for financial specialists to manage such a huge
amount of data manually.

To solve this issue, different machine learning techniques can be integrated with
finance systems which can manage such large volumes of data and can offer the
benefit of extracting real intelligence from data. Different AI and ML tools, such as
NLP (Natural language processing), data mining, etc., can help to get insights from
data that make the business more profitable.

8. Decision making and investment prediction


Banking and financial institutions can analyze both structured and unstructured data
with the help of ML algorithms. Data involves customer requests, social media
interactions, different business processes internal to the company, etc. This data
analysis help to discover trends for assessing risk and helps customers to make
informed decisions accurately.

9. Customer Service Improvement


Nowadays, intelligent chatbots are widely being used in almost every sector as they
enhance customer service and give benefits to their companies. In the finance sector,
with the help of these chatbots, customers can instantly get answers to most of their
queries, including finding their monthly expenses, loan eligibility, customer-specific
insurance plan, and many more.

Moreover, there are various ML-based applications related to a payment system,


which can analyze customers' accounts and let them know the ways to save and
grow their money.

Various ML algorithms help the companies in analyzing the customer's transaction


behavior and can generate customized offers for specific customers. We can
understand it with an example, suppose a customer is planning to make an
investment in some financial plan; then, with the help of ML algorithms, companies
can offer him a personalized investment offer after analyzing his existing financial
situation.

10. Customer Retention Program


Customer retention programs are applied by most companies to prevent switching
their customers to other competitors. In this case, also ML has various applications.
For example, Credit card companies use ML systems to predict at-risk customers and
specifically retain selected ones out of these. On the basis of users' transaction
activities and past behaviors, they can easily design specific offers for these
customers.

The binary classification model is used that determines the customers at risk, which
then follows a recommender.

11. Marketing
As AI and Machine Learning models make better predictions on the basis of
past/historical data, which makes them the best tools for marketing. These ML tools
use different algorithms which can help finance companies for creating a robust
marketing strategy by analyzing the mobile app usage, web activity, responses to the
previous ad campaign, etc.

Conclusion
In this topic, we have seen how machine learning is currently being used and
benefiting the Finance industry. The value of ML applications in finance is increasing
day by day. However, the real long-term value will probably appear in next coming
years. Because of lots of applications of ML tools in the finance sector, various banks,
and financial institutions are investing billions in this technology. With these
investments, companies are getting various benefits, including reduced operational
costs, increased revenue, enhanced customer experiences, and many more.

Lead Generation using Machine


Learning
Lead generation is a marketing term that is used to identify and cultivate potential
customers for a business. Whenever a company starts making new customers through
various sources, then this process is called lead generation for a particular product or
service in a company. Although lead generation can be made through various
sources such as Facebook, YouTube, advertisements on TV, etc., nowadays,
machine learning is also being used to generate leads for a business.

Besides making intelligent machines or computer software, machine learning is also


known to do cool and complex things like targeting the audience automatically for a
particular product and service without using many resources. In this topic, "Lead
generation using machine learning", we will discuss what exactly lead generation
is? And how machine learning is helpful in lead generation. So, let's start with a
definition of lead generation.

What is Lead Generation?


Lead generation is defined as the action or process to build a potential
customer for a business. It generally consists of the record of the potential
customer, such as contact information (email address, phone number & fax) and
possibly additional features about the customers like product preferences and
demographic data.

Lead generation is a marketing term that requires a significant amount of time,


money, and effort to generate potential customers.

PlayNext
Unmute

Current Time 0:00


/

Duration 18:10
Loaded: 0.37%
Â
Fullscreen
Backward Skip 10sPlay VideoForward Skip 10s

Automation using Machine Learning


As machine learning technology can be used to automate the business, which
includes automated mail, product recommendation, self-driving cars,
chatbots, etc. Similarly, it is also being used to generate leads in businesses, where
various ML algorithms are used to run campaigns, suggest products and services,
collect information and demographic data of customers automatically.

In simple marketing, we use various sources to approach customers, whereas using


machine learning, we should not worry about resources, time, money, and extra
efforts.

How do Machine Learning (ML) algorithms


help in a Lead generation?
Machine Learning is one of the most popular technologies, which uses various
algorithms to solve complex business problems. It is successfully being used in
generating leads for businesses. There are a few important steps to successfully
generate a lead using ML algorithms as follows:

o Storing new Leads: Machine Learning helps to train machines for storing
data in a database through past data. Whenever a new lead appears in the
database, it gets automatically stored database on the basis of previous
training data and classification metrics.
o Lead analysis: Machine learning algorithms help to determine whether a lead
is valuable or not, and based on demographic scores, lead analysis is done by
ML algorithms.
o Lead classification: based on demographic score, leads are automatically
classified in a system. Whenever the lead score is below the classification
score, it gets neglected by the system, and if the lead score is above the
classification score, ML algorithms wait for the next possible action of lead.
o Behaviour Analysis: Whenever a lead is successfully classified and takes the
next action, machine learning algorithms help to count the sales threshold.
And based on this calculation, the system analyzes various details such as lead
revert time, link clicks, insights, acquisition, events, web visits, etc.
o Forwarding for the next 'targeted action: Whenever the system qualifies a
lead by crossing the benchmark sales threshold, it is forwarded to the next
level for further manual/targeted actions such as arranging a call or meeting
with leads.
o Enhance calculator function: At this stage, the final output is again utilized in
training the sales threshold counting function and demographic counting
function. This process ensures the continual refinement of machine learning
algorithms.

Methods of Lead generation using Machine


Learning
Machine Learning uses various algorithms to analyze the data and generate a new
lead in the database. To create that list of lead's data, multiple methods are being
used in the industry. After the analysis of a few industrial tools, here are a few ways
to provide input to systems that engage or find out the potential leads.

o Contact creation: Whenever a customer visits a website and fills in the


required detail for enquiring about a product or service of your business, then
it is called lead creation or potential customer. In this entire process, the
customer enters a few basic details such as email and phone number. Later,
they render the list of most accurate matches from the millions of contracts
they have. As the leads are verified, and the contact details are dynamically
updated, it works efficiently.
o Automated mail: Machine learning technology is being used for making
business automated by various organizations. So, whenever a lead is
generated, a system-generated email is a dent to the customer, which is
tracked by ML algorithms and tools. Further, based on previous mails status,
new mail was triggered. For e.g., the first e-mail was opened by the lead, and a
link for service 'ML automation' was clicked. Then the next email will be shared
to target that customer by sending a link, which is more likely to nurture the
leads.
Here are the following matrices that ML algorithms tracks;

o Link clicks
o Open rate
o Replied

o Chatbots and chat histories

Personal virtual assistance or chatbots are one of the best applications of Machine
learning technologies today. ML engineers are continuously focusing on developing
advanced chatbots for conversation with the customers in business. ML tools are
dedicated to tracking entire chat history based on their geographical location,
region, occurrence frequency, text strings, etc. Further, if any customer returns again
and again or shows more interest, then machine learning algorithms try to ask for
contact detail and save them for you to contact later.

o Competitive technologies stack analysis:

There are millions of websites running over the internet, and many of them may have
the same technology which your company is targeting. So, if it is the same, then it
can solve many issues in your business-like ranking keywords, most queried
keywords, etc. In this analysis, machine learning algorithms and tools are also helpful
to find out the competitors/similar websites.

o Website Pixel trackers: Sometimes, a customer visits your website to check


home page details but leaves without checking the service details page.

Machine learning allows you to identify such visitors and differentiate them from
your target audience as those visitors may come for other reasons instead of buying
services.

Advantages of Machine Learning in Lead


Generation
Machine Learning has played a vital role in lead generation for products and services
in businesses. For a few years, companies used various ways to create potential
customers, such as filling a form through email or other sources, but now machine
learning has solved the classical approach of generating leads by automation. You
don't need to call or meet customers individually to get a question answered; they
expect to get all the answers from your website.
There are a few important reasons that make machine learning beneficial for lead
generation in your business. These are as follows:

o Remove unwanted form filling: Most of the leads are generated through
smartphones, and form is one of the best ways to generate leads through
smartphones. Even with the auto-fill feature, no one wants to waste their time
accessing a post. Hence, machine learning helps the customers to access
blogs without filling multi fields forms, and only they need to browse at their
own pace. Sometimes customers are ready to provide their contact detail but
don't want to fill out forms; then, in these cases, machine learning algorithms
take care of these things automatically.
o Develop a hyper-personalized experience: Machine learning helps to create
a truly personalized experience. However, you can create content and target
your audience, but without ML, it is impossible to deliver a hyper-personalized
experience to customers.
o Allow leads to self-nurture: Machine Learning allows the customer to self-
nurture before interrupting the sales and marketing team. It allows the
customer to access the content at their own pace and inform them about
products and services through personalized content recommendations.
However, you can retarget them by social advertisements, but on your site,
they can be unrestricted by forms or pushy sales teams.

Conclusion
Machine Learning is one of the most popular technologies that is used in various
industries such as marketing, healthcare, finance, banking, infrastructure, digital
marketing, SEO, product recommendation, etc. Based on some research, it is found
that adding an AI engine with ML in lead generation strategy can deliver 51% more
lead conversions instantly. Machine learning is also useful to automate the lead
generation process through various tools across your website, such as adaptive
content hubs, self-nurturing landing pages, Personalized Exit-Intent Popups,
Human Lead Verification, etc. Hence, we can say lead generation is a complex
process when you have a large customer base, but machine learning has solved this
process by narrowing down your target list and reducing the efforts needed to
convert the customer as well as increase the business revenue.

Machine Learning and Data Science


Certification
Machine Learning is one of the fastest-growing technologies of the 21 st century. The
scope of machine learning technology and applications is rapidly increasing in all
industries such as healthcare, marketing, finance, banking, trading, education,
infrastructure, etc. Due to ML's popularity, the demand for ML engineers is also
exponentially increasing in companies. Everyone wants to implement ML
technologies into their business and make ML a pivotal product feature. ML
professionals are much in demand and are offered unexpected packages in their
careers.

Machine learning is the subset of artificial intelligence that makes machines


capable of learning by using algorithms & statistical models through
experience. Image recognition is one of the best examples of Machine learning that
helps differentiate between multiple images, grouping them based on their
categories such as color, location, etc.

Further, data science is the field of study that helps us extract useful data from
structured and unstructured formatsdata format. Later, this extracted data is used to
train machine learning models. Hence, we can say data science is the study of
cleaning, preparing, and analyzing the data, whereas Machine learning is the
subfield of data science. When we talk about a career in data science and Machine
learning, then yes, both these technologies have a great future scope with
tremendous jobs in the IT & software domain. Although various institutions and
organizations offer so many certification courses, we have listed a few repudiated
certification courses for ML and data science that will surely help you boost your
career.

Best Machine learning and Data Science


certifications
1. IBM Data Science Professional Certificate
One of the best IT companies, IBM, offers this course under different instructors. It
helps you jump-start your career in data science and machine learning. Further, it
helps you build data science skills, learn Python, SQL, and build ML models.

PlayNext
Unmute

Current Time 0:00

Duration 18:10
Loaded: 0.37%
Â
Fullscreen
Backward Skip 10sPlay VideoForward Skip 10s

This course will help you to learn:

o Data science introduction, roles & responsibilities of data scientists, and


methodology to think and work like a data scientist.
o Import and clean data sets, analyze and visualize data, and build and evaluate
machine learning models and pipelines using Python.
o Theoretical and Real-time projects exposure based on tools, programming
languages, libraries, etc.

Course offered in this certification:

o What is Data Science?


o Tools for Data Science
o Data Science Methodology
o Python for Data Science, AI & Development
o Python Project for Data Science
o Databases and SQL for Data Science with Python
o Data Analysis with Python
o Data Visualization with Python
o Machine Learning with Python
o Applied Data Science Capstone

Benefits of this certification:


o It provides various courses, lectures, and videos related to data science and
ML. Further, along with Professional Certificate from Coursera, you'll also
receive a digital Badge from IBM recognizing your proficiency in data science.
o This certification course is remotely available so that you can learn instantly on
your schedule.
o This course is available with various subtitles such as English, Arabic, French,
Portuguese (European), Italian, Vietnamese, German, Russian, Turkish, Spanish,
Persian, Korean.
o Hands-on projects that will help you to build a portfolio that showcases your
job readiness to potential employers

Link for the course: Click here

2. Data Science and Machine Learning Developer


Certification
Udemy sponsors this course under a team of the best ML and data science experts.
This course helps you learn the powerful tools used in Data science and ML to solve
real-world problems. This certification helps you gain in-depth knowledge and skills
in data science, ML, and deep learning. It covers all concepts related to these
technologies, requirements in projects, and practice sets to implement your skills in
real-time scenarios. This course covers all fundamental concepts to solve complex
problems with the help of lectures and interactive online labs. The training uses
open-source tools - and helps you develop your judgment and intuition; to address
actual business needs and real-world challenges.

Prerequisites of this certification:

o Basic understanding of Python coding


o Beginner level knowledge of mathematical concepts such as linear algebra but
not mandatory

Modules of courses are as follows:

1. Introduction to ML
2. Exploring and Using data sets
3. Review of Machine Learning Algorithms
4. Machine Learning with Scikit
5. Deep Learning with Keras and TensorFlow
6. Building a Machine Learning Pipeline

Benefits of this course:

o You will be provided a certification of completion, which you can expose to


your resume for job searching.
o You can access this course remotely from your mobile and your TV.

Who is this certification for?

o This is one of the best courses for anyone who wants to become a data
scientist or machine learning engineer.
o This course gives you analytics skills so you can lead a team of analysts.
o Business analysts (BA) want to learn data science and ML techniques.
o Information architects who need expertise in machine learning algorithms
o Analytics professionals who work in machine learning or artificial intelligence
o Graduates are looking to build a career in data science and machine learning.

Link for the course: Click here

3. Data Science and Machine Learning: Making


Data-Driven Decisions
This course is offered by one of the most popular learning platforms, "Great
Learning," under the guidance of the best MIT faculty and mentorship from
industry practitioners. This certification provides you with the skills and knowledge of
ML and data science techniques that help make data-driven decisions. The
curriculum of this course is of 12 weeks with 3 industry-relevant hands-on projects
and 15+ case studies you can expose in your portfolio. This course helps you
implement ML and data science concepts into real-world examples through practical
applications and exposures. Further, this course covers various programming
languages and tools such as Python, NumPy, Keras, TensorFlow, Scikit learns,
Matplotlib, etc.

Week wise curriculum of the course:

o Weeks 1-2: Foundations of Data Science


o Week 3: Making Sense of Unstructured Data
o Week 4: Online MasterClass on Regression and Prediction
o Week 5: Regression and Prediction
o Week 6: Online MasterClass: Hands-on Machine Learning with Python
o Week 7: Classification and Hypothesis Testing
o Week 8: Deep Learning
o Week 9: Recommendation Systems
o Week 10: Online MasterClass: Hands-on Machine Learning with Python
o Week 11: Networking and Graphical Models
o Week 12: Predictive Analytics

Upon completing the 12th-week syllabus, you will be provided a certificate of


completion from the Massachusetts Institute of Technology (MIT) IDSS. This
certification will help you to get a job in leading IT companies. The format of
certification is given in the below screenshot.

Who is this certification for?

o Data scientists, data analysts, and professionals wish to turn large volumes of
data into actionable insights.
o Early career professionals and senior managers, including Technical managers,
Business intelligence analysts, IT practitioners, Management consultants, and
Business managers.
o Those with some academic/ professional training in applied mathematics/
statistics. Participants without this experience will have to put in extra work
and be provided support by Great Learning.

Benefits of this certification:

o This course lets you learn from the best MIT faculty with recorded video
lectures to build industry-valued skills.
o This course also provides the facility of weekend support from other mentors
or experts in data science and ML.
o After completing this course, you will become entitled to a certificate of
completion from the Massachusetts Institute of Technology (MIT) IDSS.
o This course gives you hands-on exposure to 3 projects and 15+ case studies.

4. Machine Learning with TensorFlow on Google


Cloud Platform Specialization
Google Cloud offers this course, which aims to learn ML with Google cloud to solve
complex real-world problems. This course is designed to help you understand the
basic to advanced level machine learning concepts and neural networks use cases.
Further, it focused on various supervised learning methods, generalizable solutions
using gradient descent, and the creation of datasets for ML models. Hence, this
course gives you practical exposure to end-to-end ML to solve different types of ML
problems.

This specialization incorporates hands-on labs using Google's Qwiklabs platform,


which you can showcase in your CV to find multiple jobs in leading organizations.

Upon completing this course and hands-on projects, you will be provided a
certificate that you can share with prospective employers and your professional
network.

This entire specialization is divided into 5 courses as follows:

o How Google does Machine Learning


o Launching into Machine Learning
o TensorFlow on Google Cloud
o Feature Engineering
o Art and Science of Machine Learning

Extra Benefits:

o This course gives you a Shareable Specialization and Course Certificate.


o Self-paced learning option with video lectures and notes.
o Practice quizzes and graded quizzes with feedback to boost your confidence
in the ML industry.
o Graded programming assignments with experts feedback

Registration: Every 2 months on Coursera

Course Duration: 5 months

Mode of Teaching: Online

Prerequisites: Before starting this course, you must have a computer science &
engineering background.

Link for the course: Click here

5. Harvard University Machine Learning


This course comprises the core concepts of machine learning algorithms, PCA,
regularization techniques for a movies recommendation system, etc. Further, you will
know about training/sample data and how to use it in the training process to predict
accurate outputs for the future. Also, you will learn how to use a data set to discover
potentially predictive relationships. Further, this course also helps you learn
overfitting and techniques to avoid it, such as cross-validation.

When to register: You can register for this certification anytime through
the edx website.

Course Fee: Available at no cost.

Course Duration: Almost 8 weeks

Mode of Learning: Online

Prerequisites: This course requires a basic understanding of Python programming


language.
Key Benefits

Along with a globally valid certificate, you will also know various core areas of
machine learning, such as:

o Introduction to basics of machine learning


o Knowledge of overtraining and how to avoid it using cross-validation
concepts.
o Various popular machine learning algorithms and applications in real-world
examples.
o How to build a recommendation system
o What are regularization techniques in machine learning, and why is it
important?

6. eCornell Machine Learning Certificate


This certification course is specially designed to give you exposure to learning
various ML algorithms and how to deploy them using Python. Using a combination
of math and intuition, students learn to frame machine learning problems and
construct a mental model to understand data scientists' approach to these problems
programmatically. Various machine learning algorithms explore the implementation
of concepts such as k-nearest neighbors, naive Bayes, regression trees, and
others.

This program enables you to implement the live data algorithms and practice
debugging and improving models with the help of SVM(support vector machines)
and ensemble methods. Moreover, this course also offers you the internal working of
neural networks and their construction and adoption of neural networks for different
data types. This program uses Python and the NumPy library for code exercises and
projects. Projects can be submitted and performed in Jupyter Notebooks.

Registration: throughout the year

Fee: $3,600 or $565/month

Course Duration: 3.5 months

Mode of Teaching: Online

Prerequisites: Python
What is Big Data and Machine
Learning
Big Data and Machine Learning have become the reason behind the success of
various industries. Both these technologies are becoming popular day by day among
all data scientists and professionals. Big data is a term that is used to describe
large, hard-to-manage, structured, and unstructured voluminous data.
Whereas, Machine learning is a subfield of Artificial Intelligence that enables
machines to automatically learn and improve from experience/past data.

Both Machine learning and big data technologies are being used together by most
companies because it becomes difficult for the companies to manage, store, and
process the collected data efficiently; hence in such a case, Machine learning helps
them.

Before going in deep with these two most popular technologies, i.e., Big Data and
Machine Learning, we will discuss a quick introduction to big data and machine
learning. Further, we will discuss the relationship between big data and machine
learning. So, let's start with the introduction to Big data and Machine Learning.

What is Big Data?


Big Data is defined as large or voluminous data that is difficult to store and
also cannot be handled manually with traditional database systems. It is a
collection of structured as well as unstructured data.

PauseNext
Unmute

Current Time 0:41

Duration 18:10
Loaded: 9.54%
Â
Fullscreen

Big data is a very vast field for anyone who is looking to make a career in the IT
industry.

Challenges in Big Data


Big data has tremendous growth and collection of structured as well as unstructured
data. Almost all companies are using this technology for running their business and
to store, process, and extract value from a bulk amount of data. Hence, it is
becoming a challenge for them to use the collected data in the most efficient way.
There are a few challenges while using Big data are, which are as follows:

o Capturing
o Curating
o Storing
o Searching
o Sharing
o Transferring
o Analyzing
o Visualization

5V's in Big Data


Big data is defined by 5V's, which refers to the volume, Variety, value, velocity, and
veracity. Let's discuss each term individually.

o Volume (Huge volume of data)


o Data is the core of any technology, and the huge volume of data flow in the
system makes it necessary to appoint a dynamic storage system. Nowadays,
data is coming from various sources such as social media sites, e-commerce
platforms, new sites, financial transactions, etc., and it is becoming mandated
to store data in the most efficient manner. Although, with the passing of time,
storage cost is gradually decreasing, thus permitting storage of collected data.
The gravitas that the term big data owns is because of its volume.
o Variety (Different formats of data from various sources)

Data can be structured as well as unstructured and comes from various sources. It
can be audio, video, text, emails, transactions, and many more. Due to various
formats of data, storing, managing, and organizing the data becomes a big challenge
for organizations. Although storing raw data is not difficult but converting
unstructured data into a structured format and making them accessible for business
uses is practically complex for IT expertise.

o Velocity (velocity at which data is processed)

Rendering and data sorting is very necessary to control data flows. Further, the
superiority of processing data with high accuracy and speed is also necessary for
storing, managing, and organizing data in an efficient manner. Smart sensors, smart
metering, and RFID tags make it necessary to deal with huge data influx in almost
real-time. Sorting, assessing, and storing such deluges of data in a timely fashion
become necessary for most organizations.

o Veracity (Accuracy)

In general, Veracity refers to the accuracy of data sets. But when it comes to Big data,
it is not only limited to the accuracy of big data but also tells us how trustworthy is
the data source. Further, it also determines the reliability of data and how meaningful
it is for analysis. In one line, we can say Veracity is defined as the quality and
consistency of data.

o Value (Meaningful data)

Value in Big Data refers to the meaningful or usefulness of stored data for your
business. In big data, data is stored in structured as well as an unstructured format,
but regardless of its volume, usually, it is not meaningful. Hence, we need to convert
it into a useful format for the business requirements of organizations. For e.g., data
having missing or corrupt values, missing key structured elements, etc., are not useful
for companies to provide better customer service, create marketing campaigns, etc.
Hence, it leads to reducing the revenue and profit in their businesses.
Sources of data in Big Data

Big data can be of various formats of data either in structured as well as unstructured
form, and comes from various different sources. The main sources of big data can be
of the following types:

o Social Media

Data is collected from various social media platforms such as Facebook, Twitter,
Instagram, Whatsapp, etc. Although data collected from these platforms can be
anything like text, audio, video, etc., the biggest challenge is to store, manage and
organize these data in an efficient way.

o Online cloud platforms:

There are various online cloud platforms, such as Amazon AWS, Google Cloud, IBM
cloud, etc., that are also used as a source of big data for machine learning.

o Internet of things:

The Internet of Things (IoT) is a platform that offers cloud facilities, including data
storage and processing through IoT. Recently, cloud-based ML models are getting
popular. It starts with invoking input data from the client end and processing
machine learning algorithms using an artificial neural network (ANN) over cloud
servers and then returning with output to the client again.

o Online Web pages:

Nowadays, every second, thousands of web pages are created and uploaded over
the internet. These web pages can be in the form of text, images, videos, etc. Hence,
these web pages are also a source of big data.

What is Machine Learning?


Machine Learning is one of the most crucial subsets of Artificial Intelligence in the
computer science field. It is referred to as the study of automated data processing or
decision-making algorithms that improve themselves automatically based on
experience or past experience.

It makes systems capable of learning automatically and improves from experience


without being explicitly programmed. The primary aim of a machine learning model
is to develop computer programs that can access data and use it for learning
purposes.
With the rise in Big Data, Machine Learning has become a key player in solving
problems in various areas such as:

o Image recognition
o Speech Recognition
o Healthcare
o Finance and Banking industry
o Computational Biology
o Energy production
o Automation
o Self-driven vehicle
o Natural Language Processing (NLP)
o Personal virtual assistance
o Marketing and Trading
o The education sector, etc.

Difference between Big Data and Machine


Learning

With the rise of big data, the use of machine learning has also increased in all
industries. Below is the table to show the differences between machine learning and
big data as follows:

Machine Learning Big Data


Machine Learning is used to predict the data for Big Data is defined as large or voluminous
the future based on applied input and past that is difficult to store and also cannot
experience. handled manually with traditional data
systems.

Machine Learning can be categorized mainly as Big Data can be categorized as structu
supervised learning, unsupervised learning, semi- unstructured, and semi-structured data.
supervised learning, and reinforcement learning.

It helps to analyze input datasets with the use of It helps in analyzing, storing, managing,
various algorithms. organizing a huge volume of unstructured
sets.

It uses tools such as Numpy, Pandas, Scikit Learn, It uses tools such as Apache Hadoop, MongoD
TensorFlow, Keras.

In machine learning, machines or systems learn Big data mainly deals in extracting raw data
from training data and are used to predict future looks for a pattern that helps to build st
results using various algorithms. decision-making ability.

It works with limited dimensional data; hence it is It works with high-dimensional data; henc
relatively easier to recognize features. shows complexity in recognizing features.

An ideal machine learning model does not require It requires human intervention because it m
human intervention. deals with a huge amount of high-dimensi
data.

It is useful for providing better customer service, It is also helpful in areas as diverse as s
product recommendations, personal virtual marketing analysis, medicine & health
assistance, email spam filtering, automation, agriculture, gambling, environmental protec
speech/text recognition, etc. etc.

The scope of machine learning is to make The scope of big data is very vast as it will no
automated learning machines with improved just limited to handling voluminous data; inst
quality of predictive analysis, faster decision it will be used for optimizing the data stored
making, cognitive analysis, more robust, etc. structured format for enabling easy analysis.

Big data with Machine Learning


Big Data and Machine Learning both technologies have their own advantages and
aren't competing for concepts or mutually exclusive. Although both are very crucial
individually, when combined, they provide the opportunity to achieve some
incredible results. When talking about 5V's in big data, machine learning models
helps to deal with them and predict accurate results. Similarly, while developing
machine learning models, big data helps to extract high-quality data as well as
improved learning methods by means of providing analytics teams.

There is no secret that almost all organizations, such as Google, Amazon, IBM,
Netflix, etc., have already discovered the power of big data analytics enhanced by
machine learning.

Machine Learning is a very crucial technology, and with big data, it has become more
powerful for data collection, data analysis, and data integration. All big organizations
use machine learning algorithms for running their business properly.

We can apply machine learning algorithms to every element of Big data operation,
including:

o Data Labeling and Segmentation


o Data Analytics
o Scenario Simulation

In machine learning algorithms, we need multiple varieties of data for training a


machine and predicting accurate results. However, sometimes it becomes difficult to
manage these bulkified data. So, it becomes a challenge to manage and analyze Big
Data. Further, this unstructured data is useless until it is well interpreted. Thus, to use
information, there is a need for talent, algorithms, and computing infrastructure.

Machine Learning enables machines or systems to learn from past experience and
use data received from big data, and predict accurate results. Hence, this leads to
generating improved quality business operations and building better customer
relationship management. Big Data helps machine learning by providing a variety of
data so machines can learn more or multiple samples or training data.

In such ways, businesses can accomplish their dreams and get the benefit of big data
using ML algorithms. However, for using the combination of ML and big data,
companies need skilled data scientists.

How to apply Machine Learning in Big data


Machine Learning provides efficient and automated tools for data gathering, analysis,
and integration. In collaboration with cloud computing superiority, machine learning
ingests agility into processing and integrates large amounts of data regardless of its
source.

Machine learning algorithms can be applied to every element of Big Data operation,
including:
o Data Segmentation
o Data Analytics
o Simulation

All these stages are integrated to create the big picture out of Big Data with insights,
patterns, which later get categorized and packaged into an understandable format.

Conclusion
In this article, we have discussed Big data and machine learning separately and the
basic differences between both technologies. Also, we have seen how machine
learning and big data can be used together to learn machine learning models using
the high quality of data from the huge amount of unstructured as well as structured
data. Further, we have also seen some applications that use big data and machine
learning and provide amazing results.

How to Save a Machine Learning


Model
While using the scikit learn library for machine learning, it is necessary to save and
restore the models to use them again to compare with other models or test the
model against new data. The process of saving data is referred to as serialization,
while the process of restoring data is referred to as Deserialization. We also handle
different types and sizes of data. While some datasets can be trained quickly (e.g.
they take less time), but the large datasets (more than 1GB) may take a lot of time to
train, even on a local computer with GPU. To avoid losing time and avoid wastage,
save the trained model from being used in future projects.

Two Ways to Save a Model from scikit-


learn:
1. Pickle string: The pickle module implements an efficient yet fundamental
algorithm for serializing or deserializing Python object structures.

The pickle model offers the following functions:

o dump: For serializing an object hierarchy, we can use dump() function.


o load: For deserializing a data stream, we can use the loads() function.

Example: Let's use K Nearest Neighbor to the iris dataset, then save the model.
PlayNext
Unmute

Current Time 0:00

Duration 18:10
Loaded: 0.37%
Â
Fullscreen
Backward Skip 10sPlay VideoForward Skip 10s

Code:

1. import numpy as nmp


2. from sklearn.model_selection import train_test_split as tts
3.
4. # Loading the dataset
5. from sklearn.datasets import load_iris as li
6. iris_1 = li()
7.
8. A = iris_1.data
9. b = iris_1.target
10.
11. # here, we are Spliting the dataset into train and test
12. A_train, A_test, b_train, b_test = \
13. tts(A, b, test_size = 0.5,
14. random_state = 2020)
15.
16. # now, we are importing the KNeighborsClassifier model
17. from sklearn.neighbors import KNeighborsClassifier as KNNC
18. knn_1 = KNNC(n_neighbors = 4)
19.
20. # training model
21. knn_1.fit(A_train, b_train)

Output:
Now, we will save the above model to string using pickle -

Code:

1. import pickle as pkl


2.
3. # now, we are saving the trained model as a pickle string.
4. saved_model1 = pkl.dumps(knn_1)
5.
6. # here, we are Loading the pickled model
7. knn_from_pkl = pkl.loads(saved_model1)
8.
9. # at last we will use the loaded pickled model for making predictions
10. knn_from_pkl.predict(A_test)

Output:

2. Pickled Model as File using joblib: Joblib replaces pickle because it is faster on
objects with large numpy arrays. These functions only accepts file-like object instead
of filename.

The pickled model as file using joblib offers the following functions:

o dump: This is used for serializing object hierarchy.


o load: This is used for deserializing a data stream.

Use joblib to save to pickled file

Example:

1. import joblib as jbl


2.
3. # Now, we are saving the model as a pickle in a file
4. jbl.dump(knn_1, 'jtp.pkl')
5.
6. # Here, we are loading the model from the file
7. knn_from_joblib1 = jbl.load('jtp.pkl')
8.
9. # at last we will use the loaded pickled model for making predictions
10. knn_from_joblib1.predict(A_test)

Output:

Machine Learning Model with


Teachable Machine
Machine Learning and Artificial Intelligence are bringing new applications to the
table. Artificial Intelligence is a topic of great interest to many organizations. Artificial
Intelligence is built on machine learning. However, not everyone is familiar with
machine learning and how to make models that can be used for intelligence. Non-
coders and coders who are not familiar with machine learning can create a machine
intelligence model and integrate it into an application. It is possible, and it is
happening now. Perhaps the users might be wondering how to do it.

This tutorial will show us how to create a machine-learning model without writing
code.

We will create a model to classify food items. We will use a Kaggle food dataset that
includes different food items such as Salads, Potatoes, and Cakes. You can download
the dataset from https://www.kaggle.com/cristeaioan/ffml-dataset.

Teachable Machine
Yes, it is possible with the help of a teaching machine. A teachable machine is a web-
based tool that quickly and easily creates models. It can be used for image, sound
and pose recognition. It is also flexible. It can be used to teach a model how to
identify images and pose through images or live webcam. It is free and best for
students. Teachable Machine creates a Tensorflow model, which can be integrated
with any website app, Android application, or other platforms. There is no need to
create an account. It was so easy.

PlayNext
Unmute

Current Time 0:00

Duration 18:10
Loaded: 0.37%
Â
Fullscreen
Backward Skip 10sPlay VideoForward Skip 10s

Let's Build a Model

Step 1: Go to Teachable Machine: https://teachablemachine.withgoogle.com/train

We will be directed to the below-shown screen that consists of three options -


Image, Audio, and Pose.

Step 2: Choose an image project. We will see two options again: standard or
embedded. We aren't making this for micro-controllers, so we recommend choosing
a standard. If the users are interested, then select Embedded Image Model. Even if
they choose Embedded, the process will remain the same. It is only the model that
will differ.
Clicking on Standard Image Project will take us to the screen below. Here we can
add classes to the model. we have two choices: upload images from the
databank or use the live camera to capture images.

Step 3: Now create classes and upload the images. We will only create three
classes: Salad, Potato, and Cake. We have
replaced class1 with Salad and class2 with Potato and class3 is now called Cake.
The user can make classes as many as they like.
Click on Train Model after we have uploaded the images. There are three options
available: Batch Size, Epochs and Learning rate. These options are not something
we have ever heard of, so we don't be alarmed if they're new to us. It's important
that we play with the models and determine which values give the best accuracy to
make it more efficient. A model is useless if it's not accurate. We can adjust their
values to find the best model. Here we will use default values.

Step 4: After the model has been trained, it is time to export it.
We will see several options when we click Export Model. The code snippets can help
integrate the model into our application. Tensorflow.js models are compatible with
all JavaScript libraries and frameworks. Some frameworks only support a specific type
of model. We will check to see if our library or framework supports this model.

The download of the model can take some time. This is how we create a machine-
learning model.

We can also create models for audio and pose, similar to the image project. Let's see
what we can do.

Poses Model
We started training the model. The pose project must be selected in the teachable
machine to create the pose model. We will create two classes, one for sitting and one
for standing. Then we will upload the images.
After the training is complete, we can preview the model's Output by uploading any
image. This allows us to check the model's efficiency and Output before exporting it.
The below image shows that the Output from an image we uploaded to preview is
correct, i.e., sitting. This means that the model is doing well.

Audio Model
An audio project will create a model capable of detecting sound. We created three
classes: Background Noise, Clapping Rain, and Thunderstorm. In the preview section,
after training the model, we tested the model's efficiency using noise. In the Output
of the preview, we can see more background noise. We need to increase the number
of samples to improve the model's learning.
Data Structure for Machine Learning
Machine Learning is one of the hottest technologies used by data scientists or ML
experts to deploy a real-time project. However, only skills of machine learning are
not sufficient for solving real-world problems and designing a better product, but
also you have to gain good exposure to the data structure.

The data structure used for machine learning is quite similar to other software
development fields where it is often used. Machine Learning is a subset of
artificial intelligence that includes various complex algorithms to solve
mathematical problems to a great extent. Data structure helps to build and
understand these complex problems. Understanding the data structure also helps
you to build ML models and algorithms in a much more efficient way than other ML
professionals. In this topic, "Data Structure for Machine Learning", we will discuss
various concepts of data structure used in Machine Learning, along with the
relationship between data structure and ML. So, let's start with a quick overview of
Data structure and Machine Learning.

What is Data Structure?


The data structure is defined as the basic building block of computer
programming that helps us to organize, manage and store data for efficient
search and retrieval.

In other words, the data structure is the collection of data type 'values' which are
stored and organized in such a way that it allows for efficient access and
modification.

Backward Skip 10sPlay VideoForward Skip 10s

Types of Data Structure


The data structure is the ordered sequence of data, and it tells the compiler how a
programmer is using the data such as Integer, String, Boolean, etc.

There are two different types of data structures: Linear and Non-linear data
structures.
Now let's discuss popular data structures used for Machine Learning:

1. Linear Data structure:


The linear data structure is a special type of data structure that helps to organize and
manage data in a specific order where the elements are attached adjacently.

There are mainly 4 types of linear data structure as follows:

Array:
An array is one of the most basic and common data structures used in Machine
Learning. It is also used in linear algebra to solve complex mathematical problems.
You will use arrays constantly in machine learning, whether it's:

o To convert the column of a data frame into a list format in pre-processing


analysis
o To order the frequency of words present in datasets.
o Using a list of tokenized words to begin clustering topics.
o In word embedding, by creating multi-dimensional matrices.

An array contains index numbers to represent an element starting from 0. The lowest
index is arr[0] and corresponds to the first element.

Let's take an example of a Python array used in machine learning. Although the
Python array is quite different from than array in other programming languages, the
Python list is more popular as it includes the flexibility of data types and their length.
If anyone is using Python in ML algorithms, then it's better to kick your journey from
array initially.

Python Array method:

Method Description

Append() It is used to add an element at the end of the list.

Clear() It is used to remove/clear all elements in the list.

Copy() It returns a copy of the list.

Count() It returns the count or total available element with an integer value.
Extend() It is used to add the element of a list to the end of the current list.

Index() It returns the index of the first element with the specified value.

Insert() It is used to add an element at a specific position using an index number.

Pop() It is used to remove an element from a specified position using an index number.

Remove() Used to remove the elements with specified values.

Reverse() Used to show list in reverse order

Sort() Used to sort the list in an array.

Stacks:
Stacks are based on the concept of LIFO (Last in First out) or FILO (First In Last
Out). It is used for binary classification in deep learning. Although stacks are easy
to learn and implement in ML models but having a good grasp can help in many
computer science aspects such as parsing grammar, etc.

Stacks enable the undo and redo buttons on your computer as they function similar
to a stack of blog content. There is no sense in adding a blog at the bottom of the
stack. However, we can only check the most recent one that has been added.
Addition and removal occur at the top of the stack.

Linked List:
A linked list is the type of collection having several separately allocated
nodes. Or in other words, a list is the type of collection of data elements that
consist of a value and pointer that point to the next node in the list.

In a linked list, insertion and deletion are constant time operations and are very
efficient, but accessing a value is slow and often requires scanning. So, a linked list is
very significant for a dynamic array where the shifting of elements is required.
Although insertion of an element can be done at the head, middle or tail position, it
is relatively cost consuming. However, linked lists are easy to splice together and split
apart. Also, the list can be converted to a fixed-length array for fast access.

Queue:
A Queue is defined as the "FIFO" (first in, first out). It is useful to predict a queuing
scenario in real-time programs, such as people waiting in line to withdraw cash in the
bank. Hence, the queue is significant in a program where multiple lists of codes need
to be processed.

The queue data structure can be used to record the split time of a car in F1 racing.

2. Non-linear Data Structures


As the name suggests, in Non-linear data structures, elements are not arranged in
any sequence. All the elements are arranged and linked with each other in a
hierarchal manner, where one element can be linked with one or more elements.

1) Trees
Binary Tree:

The concept of a binary tree is very much similar to a linked list, but the only
difference of nodes and their pointers. In a linked list, each node contains a data
value with a pointer that points to the next node in the list, whereas; in a binary
tree, each node has two pointers to subsequent nodes instead of just one.

Binary trees are sorted, so insertion and deletion operations can be easily done with
O(log N) time complexity. Similar to the linked list, a binary tree can also be
converted to an array on the basis of tree sorting.
In a binary tree, there are some child and parent nodes shown in the above image.
Where the value of the left child node is always less than the value of the parent
node while the value of the right-side child nodes is always more than the parent
node. Hence, in a binary tree structure, data sorting is done automatically, which
makes insertion and deletion efficient.

2) Graphs
A graph data structure is also very much useful in machine learning for link
prediction. Graphs are directed or undirected concepts with nodes and ordered or
unordered pairs. Hence, you must have good exposure to the graph data structure
for machine learning and deep learning.

3) Maps
Maps are the popular data structure in the programming world, which are mostly
useful for minimizing the run-time algorithms and fast searching the data. It stores
data in the form of (key, value) pair, where the key must be unique; however, the
value can be duplicated. Each key corresponds to or maps a value; hence it is named
a Map.

In different programming languages, core libraries have built-in maps or, rather,
HashMaps with different names for each implementation.
o In Java: Maps
o In Python: Dictionaries
o C++: hash_map, unordered_map, etc.

Python Dictionaries are very useful in machine learning and data science as various
functions and algorithms return the dictionary as an output. Dictionaries are also
much used for implementing sparse matrices, which is very common in Machine
Learning.

4) Heap data structure:


Heap is a hierarchically ordered data structure. Heap data structure is also very much
similar to a tree, but it consists of vertical ordering instead of horizontal ordering.

Ordering in a heap DS is applied along the hierarchy but not across it, where the
value of the parent node is always more than that of child nodes either on the left or
right side.

Here, the insertion and deletion operations are performed on the basis of promotion.
It means, firstly, the element is inserted at the highest available position. After that, it
gets compared with its parent and promoted until it reaches the correct ranking
position. Most of the heaps data structures can be stored in an array along with the
relationships between the elements.

Dynamic array data structure:


This is one of the most important types of data structure used in linear algebra to
solve 1-D, 2-D, 3-D as well as 4-D arrays for matrix arithmetic. Further, it requires
good exposure to Python libraries such as Python NumPy for programming in deep
learning.

How is Data Structure used in Machine


Learning?
For a Machine learning professional, apart from knowledge of machine learning skills,
it is required to have mastery of data structure and algorithms.

When we use machine learning for solving a problem, we need to evaluate the
model performance, i.e., which model is fastest and requires the smallest amount of
space and resources with accuracy. Moreover, if a model is built using algorithms,
comparing and contrasting two algorithms to determine the best for the job is
crucial to the machine learning professional. For such cases, skills in data structures
become important for ML professionals.

With the knowledge of data structure and algorithms with ML, we can answer the
following questions easily:

o How much memory is required to execute?


o How long will it take to run?
o With the business case on hand, which algorithm will offer the best
performance?

Conclusion
In this article, we have discussed how Data structure is helpful in building Machine
Learning algorithms. A data structure is a key player in the programming world to
solve most of the computing problems, and gaining the knowledge of data structure
and implementing the best algorithm gives you the best and optimum solution for
an ML problem. Further, having a strong knowledge of data structure will help you to
build a strong foundation and use the skills to create a better Project in Machine
Learning.

Hypothesis in Machine Learning


The hypothesis is a common term in Machine Learning and data science projects. As
we know, machine learning is one of the most powerful technologies across the
world, which helps us to predict results based on past experiences. Moreover, data
scientists and ML professionals conduct experiments that aim to solve a problem.
These ML professionals and data scientists make an initial assumption for the
solution of the problem.

This assumption in Machine learning is known as Hypothesis. In Machine Learning, at


various times, Hypothesis and Model are used interchangeably. However, a
Hypothesis is an assumption made by scientists, whereas a model is a mathematical
representation that is used to test the hypothesis. In this topic, "Hypothesis in
Machine Learning," we will discuss a few important concepts related to a hypothesis
in machine learning and their importance. So, let's start with a quick introduction to
Hypothesis.

What is Hypothesis?
The hypothesis is defined as the supposition or proposed explanation based on
insufficient evidence or assumptions. It is just a guess based on some known facts
but has not yet been proven. A good hypothesis is testable, which results in either
true or false.

Example: Let's understand the hypothesis with a common example. Some scientist
claims that ultraviolet (UV) light can damage the eyes then it may also cause
blindness.

PlayNext
Unmute

Current Time 0:00

Duration 18:10
Loaded: 0.37%
Â
Fullscreen
Backward Skip 10sPlay VideoForward Skip 10s

In this example, a scientist just claims that UV rays are harmful to the eyes, but we
assume they may cause blindness. However, it may or may not be possible. Hence,
these types of assumptions are called a hypothesis.

Hypothesis in Machine Learning (ML)


The hypothesis is one of the commonly used concepts of statistics in Machine
Learning. It is specifically used in Supervised Machine learning, where an ML model
learns a function that best maps the input to corresponding outputs with the help of
an available dataset.

In supervised learning techniques, the main aim is to determine the possible


hypothesis out of hypothesis space that best maps input to the corresponding or
correct outputs.

There are some common methods given to find out the possible hypothesis from the
Hypothesis space, where hypothesis space is represented by uppercase-h (H) and
hypothesis by lowercase-h (h). Th ese are defined as follows:

Hypothesis space (H):


Hypothesis space is defined as a set of all possible legal hypotheses; hence it is
also known as a hypothesis set. It is used by supervised machine learning
algorithms to determine the best possible hypothesis to describe the target function
or best maps input to output.

It is often constrained by choice of the framing of the problem, the choice of model,
and the choice of model configuration.

Hypothesis (h):
It is defined as the approximate function that best describes the target in supervised
machine learning algorithms. It is primarily based on data as well as bias and
restrictions applied to data.

Hence hypothesis (h) can be concluded as a single hypothesis that maps input to
proper output and can be evaluated as well as used to make predictions.
The hypothesis (h) can be formulated in machine learning as follows:

y= mx + b

Where,

Y: Range

m: Slope of the line which divided test data or changes in y divided by change in x.

x: domain

c: intercept (constant)

Example: Let's understand the hypothesis (h) and hypothesis space (H) with a two-
dimensional coordinate plane showing the distribution of data as follows:

Now, assume we have some test data by which ML algorithms predict the outputs for
input as follows:
If we divide this coordinate plane in such as way that it can help you to predict
output or result as follows:

Based on the given test data, the output result will be as follows:
However, based on data, algorithm, and constraints, this coordinate plane can also
be divided in the following ways as follows:

With the above example, we can conclude that;

Hypothesis space (H) is the composition of all legal best possible ways to divide the
coordinate plane so that it best maps input to proper output.

Further, each individual best possible way is called a hypothesis (h). Hence, the
hypothesis and hypothesis space would be like this:
Hypothesis in Statistics
Similar to the hypothesis in machine learning, it is also considered an assumption of
the output. However, it is falsifiable, which means it can be failed in the presence of
sufficient evidence.

Unlike machine learning, we cannot accept any hypothesis in statistics because it is


just an imaginary result and based on probability. Before start working on an
experiment, we must be aware of two important types of hypotheses as follows:

o Null Hypothesis: A null hypothesis is a type of statistical hypothesis which


tells that there is no statistically significant effect exists in the given set of
observations. It is also known as conjecture and is used in quantitative analysis
to test theories about markets, investment, and finance to decide whether an
idea is true or false.
o Alternative Hypothesis: An alternative hypothesis is a direct contradiction of
the null hypothesis, which means if one of the two hypotheses is true, then the
other must be false. In other words, an alternative hypothesis is a type of
statistical hypothesis which tells that there is some significant effect that exists
in the given set of observations.

Significance level
The significance level is the primary thing that must be set before starting an
experiment. It is useful to define the tolerance of error and the level at which effect
can be considered significantly. During the testing process in an experiment, a 95%
significance level is accepted, and the remaining 5% can be neglected. The
significance level also tells the critical or threshold value. For e.g., in an experiment, if
the significance level is set to 98%, then the critical value is 0.02%.

P-value
The p-value in statistics is defined as the evidence against a null hypothesis. In other
words, P-value is the probability that a random chance generated the data or
something else that is equal or rarer under the null hypothesis condition.

If the p-value is smaller, the evidence will be stronger, and vice-versa which means
the null hypothesis can be rejected in testing. It is always represented in a decimal
form, such as 0.035.

Whenever a statistical test is carried out on the population and sample to find out P-
value, then it always depends upon the critical value. If the p-value is less than the
critical value, then it shows the effect is significant, and the null hypothesis can be
rejected. Further, if it is higher than the critical value, it shows that there is no
significant effect and hence fails to reject the Null Hypothesis.

Conclusion
In the series of mapping instances of inputs to outputs in supervised machine
learning, the hypothesis is a very useful concept that helps to approximate a target
function in machine learning. It is available in all analytics domains and is also
considered one of the important factors to check whether a change should be
introduced or not. It covers the entire training data sets to efficiency as well as the
performance of the models.

Hence, in this topic, we have covered various important concepts related to the
hypothesis in machine learning and statistics and some important parameters such
as p-value, significance level, etc., to understand hypothesis concepts in a better way.

Gaussian Discriminant Analysis


There are two types of Supervised Learning algorithms are used in Machine Learning
for classification.

1. Discriminative Learning Algorithms


2. Generative Learning Algorithms

Logistic Regression, Perceptron, and other Discriminative Learning Algorithms are


examples of discriminative learning algorithms. These algorithms attempt to
determine a boundary between classes in the learning process. A Discriminative
Learning Algorithm might be used to solve a classification problem that will
determine if a patient has malaria. The boundary is then checked to see if the new
example falls on the boundary, P(y|X), i.e., Given a feature set X, what is its
probability of belonging to the class "y".

Generative Learning Algorithms, on the other hand, take a different approach. They
try to capture each class distribution separately rather than finding a boundary
between classes. A Generative Learning Algorithm, as mentioned, will examine the
distribution of infected and healthy patients separately. It will then attempt to learn
each distribution's features individually. When a new example is presented, it will be
compared to both distributions, and the class that it most closely resembles will be
assigned, P(X|y) for a given P(y) here, P(y) is known as a class prior.

These Bayes Theory predictions are used to predict generative learning algorithms:

PlayNext
Unmute

Current Time 0:00

Duration 18:10
Loaded: 0.37%
Â
Fullscreen
Backward Skip 10sPlay VideoForward Skip 10s

By analysing only, the numbers of P(X|y) as well as P(y) in the specific class, we can
determine P(y), i.e., considering the characteristics of a sample, how likely is it that it
belongs to class "y".

Gaussian Discriminant Analysis is a Generative Learning Algorithm that aims to


determine the distribution of every class. It attempts to create the Gaussian
distribution to each category of data in a separate way. The likelihood of an outcome
in the case using an algorithm known as the Generative learning algorithm is very
high if it is close to the centre of the contour, which corresponds to its class. It
diminishes when we move away from the middle of the contour. Below are images
that illustrate the differences between Discriminative as well as Generative Learning
Algorithms.

Let's take a look at the case of a classification binary problem in which all datasets
have IID (Independently and identically distributed). To determine P(X|y), we can use
Multivariate Gaussian Distribution to calculate a probability density equation for
every particular class. In order to determine P(y) or the class prior for each class, we
can make use of the Bernoulli distribution since all sample data used in binary
classification could be 0 or 1.

So the probability distribution, as well as a class prior to a sample, could be


determined using the general model of Gaussian and Bernoulli distributions:

To understand the probability distributions in terms of the above parameters, we can


formulate the likelihood formula, which is the product of the probability distribution
as well as the class before every data sample (Taking the probability distribution as a
product is reasonable since all samples of data are considered IID).

In accordance with the principle of Likelihood estimation, we need to select the


parameters so as to increase the probability function, as shown in Equation 4. Instead
of maximizing the Likelihood Function, we can boost the Log-Likelihood Function, a
strict growing function.
In the above equations, "1{condition}" is the indicator function that returns 1 if this
condition holds; otherwise returns zero. For instance, 1{y = 1} returns 1 only if the
class of the data sample is 1. Otherwise, it returns 0 in the same way, and similarly, in
the event of 1{y = 0}, it will return 1 only if the class of the sample is 0. Otherwise, it
returns 0.

The parameters derived can be used in equations 1, 2, and 3, to discover the


probability distribution and class before the entire data samples. The values
calculated can be further multiplied in order to determine the Likelihood function, as
shown in Equation 4. As previously mentioned, it is the probability function, i.e., P(X|
y). P(y) is integrated into the Bayes formula to calculate P(y|X) (i.e., determine the
type 'y' of a data sample for the specified characteristics ' X').

Thus, Gaussian Discriminant Analysis works extremely well with a limited volume of
data (say several thousand examples) and may be more robust than Logistic
Regression if our fundamental assumptions regarding data distribution are correct.

How Machine Learning is used by


Famous Companies
Machine Learning has become the technology of the future! Some people believe
this technology will end the world. Others think it can make our lives easier. It is not
surprising that nearly all companies use this technology to attract customers and
provide personalized customer experiences. There has been a 270% increase in
companies that have adopted ML in the past four years.

It is easier for large tech companies to invest in Machine Learning or Artificial


Intelligence. This tutorial will focus on the fascinating ways ML is used in companies
such as Google and Pintin. Let's take a look at these companies and the different
methods they use in Machine Learning.

Google
Instead of asking, "Which Google apps use ML?" we should ask, "Do any Google
Applications not use ML?" The answer is probably no! Google has a lot of money in
Machine Learning Research and plans eventually to integrate it into all its products.
Google's flagship products, Google Search and Google Translate use ML currently.

Google Search uses RankBrain, which is a deep neural net that assists in providing
relevant search results. RankBrain uses intelligent guesses to determine if our search
is for "Tim Cook" and if there are unique words or phrases in Google Search.
However, Google Translate analyses millions of documents and is able to identify the
most common patterns and vocabulary. Google Photos uses image recognition.

PlayNext
Unmute

Current Time 0:00

Duration 18:10
Loaded: 0.37%
Â
Fullscreen
Backward Skip 10sPlay VideoForward Skip 10s

Google Photos uses image recognition. Deep Learning is used by Google Photos to
sort millions upon millions of images online in order to classify them better. Google
Assistant uses Image Recognition and Natural Language Processing, which allows it
to be multi-talented and answer our questions.

Facebook
Facebook is where we should go if we want to see our friends, watch celebrities, or
look at cat photos. Facebook has 2.41 billion Monthly Active Users! Machine Learning
is the only way to achieve this level of popularity. Facebook uses Machine Learning in
all aspects of its News Feed, including Targeted Advertising.

Facebook utilizes Facial recognition to recognize our friends and suggests their
names. A Machine Learning System analyses the pixels of an image to generate
unique templates for every face. The facial fingerprint can be used to identify the
face and suggest tags.

Targeted advertising on Facebook uses a deep neural network to analyses our


location, age, gender, page likes, and interests, to identify users and show them ads
targeted at these groups. Facebook now uses chatbots to provide human-like
customer service interactions. These chatbots interact with users using ML and NLP
and look almost human-like.

Twitter
Twitter is the best place to find interesting tweets, intelligent debates, and more!
Twitter is the best place to find out about current politics, global warming dangers,
and smart comments from celebrities. Guess how all those tweets are managed?
Machine Learning is the answer!

Twitter uses an ML algorithm for organizing our tweets. Tweets based on what we
like and tweets from family and friends will be given a higher priority and appear
higher in our feed. Tweets that receive a large number of retweets or likes will have a
higher chance of getting noticed. These tweets can be located in the "In case you
missed it" category. The tweets were previously arranged in reverse chronological
order. This is what some people want back. Twitter currently uses the Natural
Language Process capabilities by IBM Watson to find and delete abusive tweets.

Twitter uses deep learning to determine what's happening in the live feed. This is
achieved by training the neural network using tags to recognize images in videos.
Let's say we add the tags "Puppy", "Animal", "Poodle", "Husky" etc. The algorithm will
identify a dog in our video and use that information to identify other dogs in our
videos.

Baidu
Baidu Google for China! While this may not be the case, Baidu is the Chinese Search
Engine most often in comparison to Google. Like Google, it utilizes Machine Learning
in many of Baidu's applications, such as Baidu Search, as well as DuerOS, Baidu's
assistant for voice. The Xiaoyu Zaikia home robot, similar to Alexa, is also used.

Service. Baidu's Search Engine is the main focus, as 75% of Chinese use it. Machine
Learning Algorithms (HMLA) are used to recognize images and voice recognition.
This allows for the best possible (and also smarter!) service. Baidu also made
significant investments in natural language processing. This is evident in DuerOS.

DuerOS Baidu's Voice Assistant makes use of natural language processing, image,
and voice recognition to build an intelligent system that is able to have an entire
conversation while sounding human. The voice assistant makes use of ML to
understand the complexity of human speech and duplicates it in a flawless manner.
Baidu's NLP expertise is also applied to the Little Fish home robot, similar to Alexa
but different. It can turn its head in order to "listen" to the voice coming from the
other direction and then respond accordingly.

Pinterest
The users might have heard of Pinterest, whether they are a regular pinner or a
beginner. Pinterest allows us to pin images, videos, and GIFs we are interested in.
Since this app relies on images being saved from the internet, it makes sense that its
most important feature is to identify images.
Machine Learning is the answer! Pinterest uses Image Recognition algorithms for
identifying patterns in images we pin so similar images can be displayed when we
search them. Imagine we pin a green shirt. We will be able to view images of similar
green shirts by Image Recognition. Pinterest can't guarantee that these green shirts
will be fashionable!

Pinterest offers more personalized recommendations based on our Pinning history.


This is in contrast to ML algorithms used for social networking apps that also
consider our age, gender, and friends.

Introduction to Transfer Learning in


ML
Humans are extremely skilled at transferring knowledge from one task to another.
This means that when we face a new problem or task, we immediately recognize it
and use the relevant knowledge we have gained from previous learning experiences.
This makes it easy to complete our tasks quickly and efficiently. If a user can ride a
bike and are asked to drive a motorbike, this is a good example. Their experience
with riding a bike will be helpful in such situations. They can balance the bike and
steer the motorbike. This will make it easier than if they were a complete beginner.
These lessons are extremely useful in real life because they make us better and allow
us to gain more experience.

The same approach was used to introduce Transfer learning into machine learning.
This involves using knowledge from a task to solve a problem in the target task.
Although most machine learning algorithms are designed for a single task, there is
an ongoing interest in developing transfer learning algorithms.

Why Transfer Learning?


One curious feature that many deep neural networks built on images share is the
ability to detect edges, colours, intensities variations, and other features in the early
layers. These features are not specific to any particular task or dataset. It doesn't
matter what kind of image we are using to detect lions or cars. These low-level
features must be detected in both cases. These features are present regardless of
whether the image data or cost function is exact. These features can be learned in
one task, such as detecting lions. They can also be used to detect humans. Transfer
learning is exactly what this is. Nowadays, it isn't easy to find people who train whole
convolutional neural networks from scratch. Instead, it is common to use pre-trained
models that have been trained using a variety of images for similar tasks, such
as ImageNet (1.2million images with 1000 categories), and then use the features to
solve a new task.
Blocked Diagram:

The freezing of layers characterizes transfer learning. When a layer is unavailable to


train, it is called a "Frozen Layer". It can be either a CNN layer or a hidden layer.
Layers that have not been frozen are subject to regular training. Training will not
update the layers' weights that have been frozen.

PlayNext
Unmute

Current Time 0:00

Duration 18:10
Loaded: 0.37%
Â
Fullscreen
Backward Skip 10sPlay VideoForward Skip 10s

Transfer learning is a method of solving problems using knowledge from a pre-


trained model. There are two ways to make use of knowledge from the pre-trained
model. The first is to freeze some layers from the pre-trained model and then train
layers using our new dataset. The second way is to create a new model but also
remove some features from the layer in the pre-trained model. These features can
then be used in a new model. Both cases involve removing some of the previously
learned features and trying to train the rest. This ensures that only one feature is
used in each task. The rest of the model can then be trained to adapt to the new
dataset.
Freezed and Trainable Layers:

One might wonder how to decide which layers are best to freeze and which ones to
train. It is easy to see that layers must be frozen if we wish to inherit features from a
pre-trained model. We need to find new species if the model that detected some
flowers is not working. A new dataset with new species will contain many similar
features to the model. Therefore, we keep fewer layers in order to make the most of
that model's knowledge. Consider another example: If a model that detects people in
images is already trained and we want to use this knowledge to detect cars in those
images, it's not a good idea to freeze many layers. This is because high-level features
such as noses, eyes, mouth, etc., will be lost, making them useless for the new
dataset (car detection). We only use low-level features of the base network to train
the network using a new dataset.

Let's look at all scenarios where the target task size and data set differ from the base
network.

o The target dataset is smaller than the base network data: Because the
target dataset is so small, we can fine-tune our pre-trained network using this
target dataset. This could lead to overfitting. There may also be changes in the
number of classes for the target task. In such cases, we may need to remove
some layers that are not fully connected from the end and add a new layer
that is fully connected. We now freeze the rest of our model and train only
newly added layers.
o The target dataset is large, similar to the base training dataset: If the
dataset is large enough to hold a pre-trained model, there won't be any
chance of overfitting. This is where the last fully connected layer is removed,
and a new fully connected layer with the correct number of classes is added.
The entire model is now trained on a new dataset. This allows the model to be
tuned on a large new dataset while keeping the architecture unchanged.
o The target dataset is smaller than the base network data, and therefore,
it is different: The target dataset is unique, so pre-trained models with high-
level features will not work. We can remove the most layers from the end of a
pre-trained model and add layers that satisfy the number of classes in the new
dataset. We can then use the low-level features of the pre-trained model to
train the remaining layers to adapt to a new dataset. Sometimes it can be
beneficial to train the entire network, even after adding a layer at the end.
o The target dataset is larger than the base network data: As the target
network is complex and diverse, it is best to remove layers from pre-trained
networks and add layers that satisfy a number of classes. Then train the entire
network without freezing any layers.

Conclusion
Transfer learning can be a quick and effective way to solve a problem. Transfer
learning gives us the direction to go. Most of the best results can be achieved by this
method.

Linear Discriminant Analysis (LDA) in


Machine Learning
Linear Discriminant Analysis (LDA) is one of the commonly used dimensionality
reduction techniques in machine learning to solve more than two-class
classification problems. It is also known as Normal Discriminant Analysis
(NDA) or Discriminant Function Analysis (DFA).

This can be used to project the features of higher dimensional space into lower-
dimensional space in order to reduce resources and dimensional costs. In this topic,
"Linear Discriminant Analysis (LDA) in machine learning”, we will discuss the LDA
algorithm for classification predictive modeling problems, limitation of logistic
regression, representation of linear Discriminant analysis model, how to make a
prediction using LDA, how to prepare data for LDA, extensions to LDA and much
more. So, let's start with a quick introduction to Linear Discriminant Analysis (LDA) in
machine learning.
Note: Before starting this topic, it is recommended to learn the basics of Logistic
Regression algorithms and a basic understanding of classification problems in machine
learning as a prerequisite

What is Linear Discriminant Analysis


(LDA)?
Although the logistic regression algorithm is limited to only two-class, linear
Discriminant analysis is applicable for more than two classes of classification
problems.

Linear Discriminant analysis is one of the most popular dimensionality


reduction techniques used for supervised classification problems in machine
learning. It is also considered a pre-processing step for modeling differences in ML
and applications of pattern classification.

PlayNext
Unmute

Current Time 0:00

Duration 18:10
Loaded: 0.37%
Â
Fullscreen
Backward Skip 10sPlay VideoForward Skip 10s

Whenever there is a requirement to separate two or more classes having multiple


features efficiently, the Linear Discriminant Analysis model is considered the most
common technique to solve such classification problems. For e.g., if we have two
classes with multiple features and need to separate them efficiently. When we classify
them using a single feature, then it may show overlapping.

To overcome the overlapping issue in the classification process, we must increase the
number of features regularly.
Example:
Let's assume we have to classify two different classes having two sets of data points
in a 2-dimensional plane as shown below image:

However, it is impossible to draw a straight line in a 2-d plane that can separate
these data points efficiently but using linear Discriminant analysis; we can
dimensionally reduce the 2-D plane into the 1-D plane. Using this technique, we can
also maximize the separability between multiple classes.

How Linear Discriminant Analysis (LDA)


works?
Linear Discriminant analysis is used as a dimensionality reduction technique in
machine learning, using which we can easily transform a 2-D and 3-D graph into a 1-
dimensional plane.

Let's consider an example where we have two classes in a 2-D plane having an X-Y
axis, and we need to classify them efficiently. As we have already seen in the above
example that LDA enables us to draw a straight line that can completely separate the
two classes of the data points. Here, LDA uses an X-Y axis to create a new axis by
separating them using a straight line and projecting data onto a new axis.
Hence, we can maximize the separation between these classes and reduce the 2-D
plane into 1-D.

To create a new axis, Linear Discriminant Analysis uses the following criteria:

o It maximizes the distance between means of two classes.


o It minimizes the variance within the individual class.

Using the above two conditions, LDA generates a new axis in such a way that it can
maximize the distance between the means of the two classes and minimizes the
variation within each class.

In other words, we can say that the new axis will increase the separation between the
data points of the two classes and plot them onto the new axis.

Why LDA?
o Logistic Regression is one of the most popular classification algorithms that
perform well for binary classification but falls short in the case of multiple
classification problems with well-separated classes. At the same time, LDA
handles these quite efficiently.
o LDA can also be used in data pre-processing to reduce the number of
features, just as PCA, which reduces the computing cost significantly.
o LDA is also used in face detection algorithms. In Fisherfaces, LDA is used to
extract useful data from different faces. Coupled with eigenfaces, it produces
effective results.

Drawbacks of Linear Discriminant Analysis


(LDA)
Although, LDA is specifically used to solve supervised classification problems for two
or more classes which are not possible using logistic regression in machine learning.
But LDA also fails in some cases where the Mean of the distributions is shared. In this
case, LDA fails to create a new axis that makes both the classes linearly separable.

To overcome such problems, we use non-linear Discriminant analysis in machine


learning.

Extension to Linear Discriminant Analysis


(LDA)
Linear Discriminant analysis is one of the most simple and effective methods to solve
classification problems in machine learning. It has so many extensions and variations
as follows:

1. Quadratic Discriminant Analysis (QDA): For multiple input variables, each


class deploys its own estimate of variance.
2. Flexible Discriminant Analysis (FDA): it is used when there are non-linear
groups of inputs are used, such as splines.
3. Flexible Discriminant Analysis (FDA): This uses regularization in the
estimate of the variance (actually covariance) and hence moderates the
influence of different variables on LDA.

Real-world Applications of LDA


Some of the common real-world applications of Linear discriminant Analysis are
given below:

o Face Recognition
Face recognition is the popular application of computer vision, where each
face is represented as the combination of a number of pixel values. In this
case, LDA is used to minimize the number of features to a manageable
number before going through the classification process. It generates a new
template in which each dimension consists of a linear combination of pixel
values. If a linear combination is generated using Fisher's linear discriminant,
then it is called Fisher's face.
o Medical
In the medical field, LDA has a great application in classifying the patient
disease on the basis of various parameters of patient health and the medical
treatment which is going on. On such parameters, it classifies disease as mild,
moderate, or severe. This classification helps the doctors in either increasing
or decreasing the pace of the treatment.
o Customer Identification
In customer identification, LDA is currently being applied. It means with the
help of LDA; we can easily identify and select the features that can specify the
group of customers who are likely to purchase a specific product in a
shopping mall. This can be helpful when we want to identify a group of
customers who mostly purchase a product in a shopping mall.
o For Predictions
LDA can also be used for making predictions and so in decision making. For
example, "will you buy this product” will give a predicted result of either one
or two possible classes as a buying or not.
o In Learning
Nowadays, robots are being trained for learning and talking to simulate
human work, and it can also be considered a classification problem. In this
case, LDA builds similar groups on the basis of different parameters, including
pitches, frequencies, sound, tunes, etc.

Difference between Linear Discriminant


Analysis and PCA
Below are some basic differences between LDA and PCA:

o PCA is an unsupervised algorithm that does not care about classes and labels
and only aims to find the principal components to maximize the variance in
the given dataset. At the same time, LDA is a supervised algorithm that aims
to find the linear discriminants to represent the axes that maximize separation
between different classes of data.
o LDA is much more suitable for multi-class classification tasks compared to
PCA. However, PCA is assumed to be an as good performer for a
comparatively small sample size.
o Both LDA and PCA are used as dimensionality reduction techniques, where
PCA is first followed by LDA.

How to Prepare Data for LDA


Below are some suggestions that one should always consider while preparing the
data to build the LDA model:

o Classification Problems: LDA is mainly applied for classification problems to


classify the categorical output variable. It is suitable for both binary and multi-
class classification problems.
o Gaussian Distribution: The standard LDA model applies the Gaussian
Distribution of the input variables. One should review the univariate
distribution of each attribute and transform them into more Gaussian-looking
distributions. For e.g., use log and root for exponential distributions and Box-
Cox for skewed distributions.
o Remove Outliers: It is good to firstly remove the outliers from your data
because these outliers can skew the basic statistics used to separate classes in
LDA, such as the mean and the standard deviation.
o Same Variance: As LDA always assumes that all the input variables have the
same variance, hence it is always a better way to firstly standardize the data
before implementing an LDA model. By this, the Mean will be 0, and it will
have a standard deviation of 1.

Stacking in Machine Learning


There are many ways to ensemble models in machine learning, such as Bagging,
Boosting, and stacking. Stacking is one of the most popular ensemble machine
learning techniques used to predict multiple nodes to build a new model and
improve model performance. Stacking enables us to train multiple models to solve
similar problems, and based on their combined output, it builds a new model with
improved performance.
In this topic, "Stacking in Machine Learning", we will discuss a few important
concepts related to stacking, the general architecture of stacking, important key
points to implement stacking, and how stacking differs
from bagging and boosting in machine learning. Before starting this topic, first,
understand the concepts of the ensemble in machine learning. So, let's start with the
definition of ensemble learning in machine learning.

What is Ensemble learning in Machine


Learning?
Ensemble learning is one of the most powerful machine learning techniques that use
the combined output of two or more models/weak learners and solve a particular
computational intelligence problem. E.g., a Random Forest algorithm is an ensemble
of various decision trees combined.

Ensemble learning is primarily used to improve the model performance, such as


classification, prediction, function approximation, etc. In simple words, we can
summarise the ensemble learning as follows:

PlayNext
Unmute

Current Time 0:00

Duration 18:10
Loaded: 0.37%
Â
Fullscreen
Backward Skip 10sPlay VideoForward Skip 10s

"An ensembled model is a machine learning model that combines the


predictions from two or more models.”

There are 3 most common ensemble learning methods in machine learning. These
are as follows:

o Bagging
o Boosting
o Stacking

However, we will mainly discuss Stacking on this topic.

1. Bagging
Bagging is a method of ensemble modeling, which is primarily used to solve
supervised machine learning problems. It is generally completed in two steps as
follows:

o Bootstrapping: It is a random sampling method that is used to derive


samples from the data using the replacement procedure. In this method, first,
random data samples are fed to the primary model, and then a base learning
algorithm is run on the samples to complete the learning process.
o Aggregation: This is a step that involves the process of combining the output
of all base models and, based on their output, predicting an aggregate result
with greater accuracy and reduced variance.

Example: In the Random Forest method, predictions from multiple decision trees are
ensembled parallelly. Further, in regression problems, we use an average of these
predictions to get the final output, whereas, in classification problems, the model is
selected as the predicted class.

2. Boosting
Boosting is an ensemble method that enables each member to learn from the
preceding member's mistakes and make better predictions for the future. Unlike the
bagging method, in boosting, all base learners (weak) are arranged in a sequential
format so that they can learn from the mistakes of their preceding learner. Hence, in
this way, all weak learners get turned into strong learners and make a better
predictive model with significantly improved performance.

We have a basic understanding of ensemble techniques in machine learning and


their two common methods, i.e., bagging and boosting. Now, let's discuss a different
paradigm of ensemble learning, i.e., Stacking.

3. Stacking
Stacking is one of the popular ensemble modeling techniques in machine
learning. Various weak learners are ensembled in a parallel manner in such a
way that by combining them with Meta learners, we can predict better
predictions for the future.

This ensemble technique works by applying input of combined multiple weak


learners' predictions and Meta learners so that a better output prediction model can
be achieved.

In stacking, an algorithm takes the outputs of sub-models as input and attempts to


learn how to best combine the input predictions to make a better output prediction.

Stacking is also known as a stacked generalization and is an extended form of the


Model Averaging Ensemble technique in which all sub-models equally participate as
per their performance weights and build a new model with better predictions. This
new model is stacked up on top of the others; this is the reason why it is named
stacking.

Architecture of Stacking
The architecture of the stacking model is designed in such as way that it consists of
two or more base/learner's models and a meta-model that combines the predictions
of the base models. These base models are called level 0 models, and the meta-
model is known as the level 1 model. So, the Stacking ensemble method
includes original (training) data, primary level models, primary level prediction,
secondary level model, and final prediction. The basic architecture of stacking can
be represented as shown below the image.
o Original data: This data is divided into n-folds and is also considered test
data or training data.
o Base models: These models are also referred to as level-0 models. These
models use training data and provide compiled predictions (level-0) as an
output.
o Level-0 Predictions: Each base model is triggered on some training data and
provides different predictions, which are known as level-0 predictions.
o Meta Model: The architecture of the stacking model consists of one meta-
model, which helps to best combine the predictions of the base models. The
meta-model is also known as the level-1 model.
o Level-1 Prediction: The meta-model learns how to best combine the
predictions of the base models and is trained on different predictions made
by individual base models, i.e., data not used to train the base models are fed
to the meta-model, predictions are made, and these predictions, along with
the expected outputs, provide the input and output pairs of the training
dataset used to fit the meta-model.

Steps to implement Stacking models:


There are some important steps to implementing stacking models in machine
learning. These are as follows:

o Split training data sets into n-folds using the RepeatedStratifiedKFold as this
is the most common approach to preparing training datasets for meta-
models.
o Now the base model is fitted with the first fold, which is n-1, and it will make
predictions for the nth folds.
o The prediction made in the above step is added to the x1_train list.
o Repeat steps 2 & 3 for remaining n-1folds, so it will give x1_train array of size
n,
o Now, the model is trained on all the n parts, which will make predictions for
the sample data.
o Add this prediction to the y1_test list.
o In the same way, we can find x2_train, y2_test, x3_train, and y3_test by using
Model 2 and 3 for training, respectively, to get Level 2 predictions.
o Now train the Meta model on level 1 prediction, where these predictions will
be used as features for the model.
o Finally, Meta learners can now be used to make a prediction on test data in
the stacking model.

Stacking Ensemble Family


There are some other ensemble techniques that can be considered the forerunner of
the stacking method. For better understanding, we have divided them into the
different frameworks of essential stacking so that we can easily understand the
differences between methods and the uniqueness of each technique. Let's discuss a
few commonly used ensemble techniques related to stacking.

Voting ensembles:
This is one of the simplest stacking ensemble methods, which uses different
algorithms to prepare all members individually. Unlike the stacking method, the
voting ensemble uses simple statistics instead of learning how to best combine
predictions from base models separately.

It is significant to solve regression problems where we need to predict the mean or


median of the predictions from base models. Further, it is also helpful in various
classification problems according to the total votes received for prediction. The label
with the higher numbers of votes is referred to as hard voting, whereas the label that
receives the largest sums of probability or lesser votes is referred to as soft voting.

The voting ensemble differs from than stacking ensemble in terms of weighing
models based on each member's performance because here, all models are
considered to have the same skill levels.

Member Assessment: In the voting ensemble, all members are assumed to have the
same skill sets.

Combine with Model: Instead of using combined prediction from each member, it
uses simple statistics to get the final prediction, e.g., mean or median.
Weighted Average Ensemble
The weighted average ensemble is considered the next level of the voting ensemble,
which uses a diverse collection of model types as contributing members. This
method uses some training datasets to find the average weight of each ensemble
member based on their performance. An improvement over this naive approach is to
weigh each member based on its performance on a hold-out dataset, such as a
validation set or out-of-fold predictions during k-fold cross-validation. Furthermore,
it may also involve tuning the coefficient weightings for each model using an
optimization algorithm and performance on a holdout dataset.

Member Assessment: Weighted average ensemble method uses member


performance based on the training dataset.

Combine With Model: It considers the weighted average of prediction from each
member separately.

Blending Ensemble:
Blending is a similar approach to stacking with a specific configuration. It is
considered a stacking method that uses k-fold cross-validation to prepare out-of-
sample predictions for the meta-model. In this method, the training dataset is first to
split into different training sets and validation sets then we train learner models on
the training sets. Further, predictions are made on the validation set and sample set,
where validation predictions are used as features to build a new model, which is later
used to make final predictions on the test set using the prediction values as features.

Member Predictions: The blending stacking ensemble uses out-of-sample


predictions on a validation set.

Combine With Model: Linear model (e.g., linear regression or logistic regression).

Super Learner Ensemble:


This method is quite similar to blending, which has a specific configuration of a
stacking ensemble. It uses out-of-fold predictions from learner models and prepares
a meta-model. However, it is considered a modified form of blending, which only
differs in the selection of how out-of-sample predictions are prepared for the meta
learner.

Summary of Stacking Ensemble


Stacking is an ensemble method that enables the model to learn how to use
combine predictions given by learner models with meta-models and prepare a final
model with accurate prediction. The main benefit of stacking ensemble is that it can
shield the capabilities of a range of well-performing models to solve classification
and regression problems. Further, it helps to prepare a better model having better
predictions than all individual models. In this topic, we have learned various
ensemble techniques and their definitions, the stacking ensemble method, the
architecture of stacking models, and steps to implement stacking models in machine
learning.

Complement Naive Bayes (CNB)


Algorithm
Naive Bayes algorithms are one of a number of highly popular and commonly
utilized Machine Learning algorithms used for classification. There are numerous
ways that the Naive Bayes algorithm is applied, such as Gaussian Naive Bayes,
Multinomial Naive Bayes, and so on.

Complement Naive Bayes is somewhat a modification of the standard Multinomial


Naive Bayes algorithm. Multinomial Naive Bayes is not able to do very well with
unstable data. Imbalanced data sets are instances where the number of instances
belonging to a particular class is greater than the number of instances belonging to
different classes. This implies the spread of the examples is not even. This kind of
data can be difficult to analyse as models can easily overfit this data to benefit a class
with a larger instance.

How CNB Works:


Complement Naive Bayes is particularly suited to deal with data that is imbalanced.
In Complement Naive Bayes, instead of calculating the probability of an item
belonging to a specific class, we calculate the probability of an item being part of all
classes. That is what the term means in its literal sense complement and thus is
referred to as Complement Naive Bayes.

A step-by-step overview of the algorithm (without any maths involved):

PlayNext
Unmute

Current Time 0:00

/
Duration 18:10
Loaded: 0.37%
Â
Fullscreen
Backward Skip 10sPlay VideoForward Skip 10s

o Calculate the likelihood of the instance not being part of it for each class.
o After we calculate all classes, we review all the calculated values and pick the
smallest value.
o The most minimal value (lowest chance) is chosen because it has the lowest
chance that it does not belong to the class in question. This means it is most
likely to be part of the class. This is why this class is chosen.

Let's consider an example: For instance, there are two types of classes: Apples and
Bananas, and we need to determine if a sentence is connected to bananas or apples
in light of that the word frequency is a particular number of words. Here is a table-
based representation of the basic dataset:

S. No. Round Red Long Yellow Soft Class

1 2 1 1 0 0 Apples

2 1 1 3 9 6 Bananas

3 3 4 0 0 1 Apples

4 2 3 1 1 0 Apples

Total word count in class 'Apples' = (2+1+1) + (3+4+1) + (2 + 3 + 1 + 1) = 19

Total word count in class 'Bananas' = (1 + 1 + 3 + 9 + 6) = 20

So, the Probability of a sentence to belong to the class, 'Apples':

Likewise, the probability of a sentence to belong to the class, 'Bananas',


In the table above, we have represented an array of data. The columns indicate how
many words are used within the sentence and determine which category the
sentence is part of. Before we get started with the data, we must first learn about
Bayes Theorem.

Bayes Theorem can be utilized to calculate the likelihood of an given that another
event takes place. The formula is:

In the case of A and B being two events, P(A) is the likelihood of occurring of A. P(A|
B) is the chance for A occurring in the event that has already occurred. P(B) means
that the chance of an event happening can't be zero since it already happened.

Now let's look at the way Naive Bayes is used and how Complement Naive Bayes
operates. The standard Naive Bayes algorithm works:

where "fi" is frequency of some attribute. For instance, the number of times specific
words appear in the same sentence.

To complement Naive Bayes, the formula is

When we have a close examine the formulae, we will notice how the complement
Naive Bayes is essentially the opposite of normal Naive Bayes. CNB formula will be
the class that is predicted. In Naive Bayes, the class that has the largest value derived
by the formula will be the one that will be predicted. Also, as Complement Naive
Bayes is just the reverse of the CNB formula, the class with the lowest value
calculated by the CNB formula is the predicted class.

Now, let's look at an example of a shopper and attempt to model it by using our
CNB and our data,

Round Red Long Yellow Soft Class

2 2 0 1 1 ?
It is necessary to evaluate the numbers and choose the expected class with the lower
value. It is necessary to do this for bananas and select the class with the lowest value.
i.e., If we have a value of (y equals Apples) is lower than it is predicted to be Apples;
however, if it is the case that (y = bananas) is less than the value for (y = Apples), the
class is forecast as Bananas.

Utilizing this formula, we can use the Complement Naive Bayes Formula for both
classes.

In the present, as 5.601 < 75.213, The predicted class would be Apples.

We don't employ the class with the highest value since higher values mean there is a
higher probability the sentence that contains these words is not related to that class.
This is the reason this algorithm is referred to as "complement" Naive Bayes.

When should we use CNB?


o If the data set that the classification will be performed is not balanced, the
Multinomial, as well as Gaussian Naive Bayes, might yield low accuracy. Yet,
Complement Naive Bayes will be quite effective and offer a much higher
accuracy.
o To classify text for text classification: The Complement Naive Bayes
outperforms both Gaussian Naive Bayes and Multinomial Naive Bayes in text
classification tasks.
Implementation of CNB within Python:
In this case, we will be using the wine dataset, which is slightly off. It determines the
source of wine based on different chemical parameters. To learn more about this
data set, go to the link.

To assess our model, we'll verify the accuracy of the test set as well as the report on
the classification of the classifier. We will utilize the scikit-learn library for
implementing our Complement Naive Bayes algorithm.

Code:

1. # First import all the required modules


2. from sklearn.datasets import load_wine as LW
3. from sklearn.model_selection import train_test_split as TTS
4. from sklearn.metrics import accuracy_score as ACC_SC
5. from sklearn.metrics import classification_report as CLA_RE
6. from sklearn.naive_bayes import ComplementNB as CNB
7.
8. # Now we will load the dataset
9. dataset = LW()
10. X = dataset.data
11. y = dataset.target
12.
13. # Here, we will split the data into train and test sets
14. X_train, X_test, y_train, y_test = TTS(X, y, test_size = 0.25, random_state = 62)
15.
16. # Now, we will create and train the CNB Classifier
17. classifier = CNB()
18. classifier.fit(X_train, y_train)
19.
20. # We will evaluate the classifier here
21. prediction1 = classifier.predict(X_test)
22. prediction1_train = classifier.predict(X_train)
23.
24. print (f" Accuracy of Training Set: {ACC_SC(y_train, prediction1_train) * 100} %\
n")
25. print (f" Accuracy of Test Set: {ACC_SC(y_test, prediction1) * 100} % \n\n")
26. print (f" Classifier Report : \n\n {CLA_RE(y_test, prediction1)}")
Output:

Accuracy of Training Set: 65.41353383458647 %

Accuracy of Test Set: 60.0 %

Classifier Report :

precision recall f1-score support

0 0.67 0.92 0.77 13


1 0.56 0.88 0.68 17
2 0.00 0.00 0.00 15

accuracy 0.60 45
macro avg 0.41 0.60 0.49 45
weighted avg 0.40 0.60 0.48 45

We can get an accuracy rate of 65.41 percent for the training set, and the accuracy is
60.00 percent on the testing set. These are the same and quite high given the high
quality of the data. The data is known because it is difficult to identify using simple
classifiers like those we've applied in this case. So, the accuracy is acceptable.

Conclusion
We now know the basics of Complement Naive Bayes classifiers and how they
function when we find ourselves in an unbalanced dataset, test employing
Complement Naive Bayes.

Deploy a Machine Learning Model


using Streamlit Library
Machine Learning:
Machine learning is the ability of a computer to learn from its experience, even if it
has not been programmed. Machine Learning is a hot field right now, and many top
companies around the globe are using it to improve their products and services. A
Machine Learning model that is not trained in our Jupyter Notebook is useless. We
need to make these models available for everyone so they can be used.

In this tutorial we will train an Iris Species classification classifier and then deploy the
model with Streamlit, an open-source app framework that allows us to deploy ML
models easily.

Streamlit Library:
Streamlit allows us to create apps for our machine-learning project with simple
Python scripts. Hot reloading is also supported, so our app can be updated live while
we edit and save our file. Streamlit API allows us to create an app in a few lines of
code (as we'll see below). Declaring a variable is the same thing as adding a widget.
We don't need to create a backend, handle HTTP requests or define different routes.
It's easy to set up and maintain.

First, we will train the model. As the primary purpose of this tutorial, we will not be
doing much pre-processing.

PlayNext
Unmute

Current Time 0:00

Duration 18:10
Loaded: 0.37%
Â
Fullscreen
Backward Skip 10sPlay VideoForward Skip 10s

Required Modules and Libraries:


First, we must install the following:

1. !pip3 install pandas


2. !pip3 install numpy
3. !pip3 install sklearn
4. !pip3 install streamlit

Dataset:

1. import pandas as pnd


2. import numpy as nmp
3.
4. dataframe1 = pnd.read_csv('iris.csv')
5. dataframe1.head()

Output:
We will now drop the Id column as it is not necessary for Iris species classification.
Next, we will divide the data into a training and testing set and use a Random Forest
Classifier. Any other classifier can also be used, such as logistic
regression or support vector machine.

Code:

1. # Now, we will be dropping the Id column


2. dataframe1.drop('Id', axis = 1, inplace = True)
3. dataframe1.head()
4. # Here, we will be renaming the target column into the numbers for aid traini
ng of the model
5. dataframe1['Species'] = dataframe1['Species'].map({'Iris-setosa':1, 'Iris-
versicolor':2, 'Iris-virginica':3})
6.
7. # Now, we will split the data into the columns which would need to be trained
(A) and the target column (b)
8. A = dataframe1.iloc[:, :-1]
9. b = dataframe1.iloc[:, -1]
10.
11. # After splitting the data into training and testing data with 30 % of data as te
st data respectively
12. from sklearn.model_selection import train_test_split as tts
13. A_train, A_test, b_train, b_test = tts(X, y, test_size = 0.3, random_state = 0)
14.
15. # Now, we will import the random forest classifier model and train it on the d
ataset
16. from sklearn.ensemble import RandomForestClassifier as RFC
17. classifier1 = RFC()
18. classifier1.fit(A_train, b_train)
19.
20. # Here, we will predict on the testing dataset
21. b_pred = classifier1.predict(A_test)
22.
23. # At last we will find out the accuracy of our prediction
24. from sklearn.metrics import accuracy_score as a_s
25. score = a_s(b_test, b_pred)
26. print ("Our Prediction Accuracy is: ", score)

Output:

Our Prediction Accuracy is: 0.9777777777777777

As, we got the accuracy of 97.77%, which is quite good.

To use this model for predicting unknown data, we must save it. A pickle is a tool
that serializes and deserializes a Python object structure.

Code:

1. alizes a Python object structure.


2. Code:
3. # Here, we will pickle the model
4. import pickle as pkl
5. pickle_out1 = open("classifier1.pkl", "wb")
6. pkl.dump(classifier1, pickle_out1)
7. pickle_out1.close()

A new file called "classifier1.pkl", will be created in the same directory. We can now
use Streamlit to deploy our model -

Copy the code below into another Python file.

Code:

1. import pandas as pnd


2. import numpy as nmp
3. import pickle as pkl
4. import streamlit as smt
5. from PIL import Image as img
6.
7. # loading in the model to predict on the data
8. pickle_in1 = open('classifier1.pkl', 'rb')
9. classifier1 = pkl.load(pickle_in1)
10.
11. def welcome():
12. return 'welcome you all'
13.
14. # here, we will define the function which will make the prediction using the
# data which the user have imported
15. def prediction1(sepal_length1, sepal_width1, petal_length1, petal_width1):
16.
17. prediction1 = classifier1.predict(
18. [[sepal_length1, sepal_width1, petal_length1, petal_width1]])
19. print(prediction1)
20. return prediction1
21.
22.
23. # Here, this is the main function in which we will be defining our webpage
24. def main():
25. # Now, we will give the title to out web page
26. smt.title("Iris Flower Prediction")
27.
28. # Now, we will be defining some of the frontend elements of our web
# page like the colour of background and fonts and font size, the padding an
d # the text to be displayed
29. html_temp = """
30. <div style = "background-colour: #FFFF00; padding: 16px">
31. <h1 style = "color: #000000; text-align: centre; "> Streamlit Iris Flower Classi
fier ML App
32. </h1>
33. </div>
34. """
35.
36. # Now, this line will allow us to display the front-end aspects we have
37. # defined in the earlier
38. smt.markdown(html_temp, unsafe_allow_html = True)
39.
40. # Here, the following lines will create the text boxes in which the user can
# enter the data which is required for making the prediction
41. sepal_length1 = smt.text_input ("Sepal Length ", " Type Here")
42. sepal_width1 = smt.text_input ("Sepal Width ", " Type Here")
43. petal_length1 = smt.text_input ("Petal Length ", " Type Here")
44. petal_width1 = smt.text_input ("Petal Width ", " Type Here")
45. result = " "
46.
47. # here, the below line will ensure that whenever the button named 'Predict'
# is clicked, the prediction function that is defined earlier is called for making
# the prediction and it will also store it in the variable result
48. if smt.button ("Predict"):
49. result = prediction1 (sepal_length1, sepal_width1, petal_length1, petal_wi
dth1)
50. smt.success ('The output of the above is {}'.format(result))
51. if __name__== '__main__':
52. main()

This command can be executed by entering the following command into the
terminal:

1. streamlit run app1.py

Output:
app1.py is where the Streamlit code was written.

After the website opens in our browser, we can then test it. We can also use this
method to deploy deep learning and machine-learning models.

Different Types of Methods for


Clustering Algorithms in ML
The algorithms for clustering are of a variety. They do not have all the models they
use for their clusters and therefore are not easily categorized. In this tutorial, we will
give the most popular methods of algorithms for clustering because there are more
than 100 clustering algorithms that have been published.

Distribution Based Methods:


It is a clustering model that can fit data based on the likelihood that it is likely to be
part of an identical distribution. The clustering that is done could be either normal as
well as gaussian. Gaussian distribution can be more prevalent in the case of a fixed
number of distributions, and all the data that is to come will be incorporated into it
so that data distribution can be maximized. This results in the grouping, which is
illustrated in the figure as follows:

Additionally, Distribution-based clustering generates clusters that rely on concisely


specified mathematical models for the data, which is a high-risk assumption for
certain data distributions. This model is able to work well with synthetic data as well
as diversely sized clusters. However, this model could have problems if the
constraints were not applied to reduce the complexity of the model.

For example: The expectation-maximization algorithm, which uses multivariate


normal distributions, is one of the popular examples of this algorithm.

PlayNext
Unmute

Current Time 0:00

Duration 18:10
Loaded: 0.37%
Â
Fullscreen
Backward Skip 10sPlay VideoForward Skip 10s
Centroid Based Methods:
This is the most basic of the algorithms for iterative clustering in which clusters are
formed due to the proximity of points of information to the centre of the cluster. In
this case, the cluster's centre, i.e., the centroid, is constructed in a way that the
distance between data points is minimal with the centre. This is the most basic of the
NP-Hard challenges, and therefore solutions are usually constructed over several
trials.

For example: K- which is a reference to the algorithm, can be one of the most
popular instances of the algorithm.

The main issue in this algorithm is that we have to define K prior to the start of the
process. The algorithm also has issues when dense clusters are based on density.

Connectivity Based Methods:


The fundamental idea behind the model based on connectivity is similar to the
Centroid model, that is basically defining clusters on the basis of the distance
between data points. This model is based on the idea that data points that are closer
share the same behaviour in comparison to data point further away.

The choice of the distance functions is a matter of personal preference. It's not a
simple parting out of the data set; rather, it offers an extensive array of clusters that
merge at specific distances. These models are easy to understand, but they lack the
ability to scale.

For example: Hierarchical algorithm and its variations.

Density Models:
This model of clustering will search the data space to find areas with the various
amount of data points that are in this data area. It will separate different density
areas according to the different densities that exist within the space of data.

For example: DBSCAN in addition to OPTICS.


Subspace Clustering Method:
Subspace-based clustering (subspace) is an unsupervised method that seeks to
group data points into clusters in order that all the data points in one cluster are
located in a linear subspace of low-dimensional. It is an extended form of feature
selection in the same way as to feature selection. Subspace clustering requires a
search technique and evaluation criteria; however, the subspace-based clustering
method limits the range of criteria for evaluation. Subspace clustering algorithms
localize the search to relevant dimensions and allow it to identify the cluster present
across multiple subspaces. Subspace clustering was initially designed to solve specific
computer vision issues that require the subspace structure to be merged into the
data. Still, it is gaining more attention in the machine learning communities. It is used
in movie and social network recommendations as well as biological datasets.
Subspace clustering raises questions regarding data privacy since numerous of these
applications work with sensitive data. Data points are believed to be incoherent since
it protects only the distinct privacy of each aspect of a user instead of the complete
profile of the database user.

There are two types of subspace clustering based on their search strategies.

o Top-down algorithms identify an initial clustering within the entire range of


dimensions and then evaluate the subspaces of each cluster.
o The bottom-up method finds dense regions in low-dimensional space. These
regions then merge to create clusters.

Conclusion
In this tutorial, we have discussed different types of methods used for clustering
algorithms, which can be used for differentiating the attribute values.

EM Algorithm in Machine Learning


The EM algorithm is considered a latent variable model to find the local
maximum likelihood parameters of a statistical model, proposed by Arthur
Dempster, Nan Laird, and Donald Rubin in 1977. The EM (Expectation-
Maximization) algorithm is one of the most commonly used terms in machine
learning to obtain maximum likelihood estimates of variables that are sometimes
observable and sometimes not. However, it is also applicable to unobserved data or
sometimes called latent. It has various real-world applications in statistics, including
obtaining the mode of the posterior marginal distribution of parameters in
machine learning and data mining applications.

In most real-life applications of machine learning, it is found that several relevant


learning features are available, but very few of them are observable, and the rest are
unobservable. If the variables are observable, then it can predict the value using
instances. On the other hand, the variables which are latent or directly not
observable, for such variables Expectation-Maximization (EM) algorithm plays a vital
role to predict the value with the condition that the general form of probability
distribution governing those latent variables is known to us. In this topic, we will
discuss a basic introduction to the EM algorithm, a flow chart of the EM algorithm, its
applications, advantages, and disadvantages of EM algorithm, etc.

What is an EM algorithm?
The Expectation-Maximization (EM) algorithm is defined as the combination of
various unsupervised machine learning algorithms, which is used to determine
the local maximum likelihood estimates (MLE) or maximum a posteriori
estimates (MAP) for unobservable variables in statistical models. Further, it is a
technique to find maximum likelihood estimation when the latent variables are
present. It is also referred to as the latent variable model.

A latent variable model consists of both observable and unobservable variables


where observable can be predicted while unobserved are inferred from the observed
variable. These unobservable variables are known as latent variables.

Backward Skip 10sPlay VideoForward Skip 10s

Key Points:

o It is known as the latent variable model to determine MLE and MAP


parameters for latent variables.
o It is used to predict values of parameters in instances where data is missing or
unobservable for learning, and this is done until convergence of the values
occurs.

EM Algorithm
The EM algorithm is the combination of various unsupervised ML algorithms, such as
the k-means clustering algorithm. Being an iterative approach, it consists of two
modes. In the first mode, we estimate the missing or latent variables. Hence it is
referred to as the Expectation/estimation step (E-step). Further, the other mode is
used to optimize the parameters of the models so that it can explain the data more
clearly. The second mode is known as the maximization-step or M-step.
o Expectation step (E - step): It involves the estimation (guess) of all missing
values in the dataset so that after completing this step, there should not be
any missing value.
o Maximization step (M - step): This step involves the use of estimated data in
the E-step and updating the parameters.
o Repeat E-step and M-step until the convergence of the values occurs.

The primary goal of the EM algorithm is to use the available observed data of the
dataset to estimate the missing data of the latent variables and then use that data to
update the values of the parameters in the M-step.

What is Convergence in the EM algorithm?


Convergence is defined as the specific situation in probability based on
intuition, e.g., if there are two random variables that have very less difference in their
probability, then they are known as converged. In other words, whenever the values
of given variables are matched with each other, it is called convergence.

Steps in EM Algorithm
The EM algorithm is completed mainly in 4 steps, which include Initialization Step,
Expectation Step, Maximization Step, and convergence Step. These steps are
explained as follows:
o 1st Step: The very first step is to initialize the parameter values. Further, the
system is provided with incomplete observed data with the assumption that
data is obtained from a specific model.

o 2nd Step: This step is known as Expectation or E-Step, which is used to


estimate or guess the values of the missing or incomplete data using the
observed data. Further, E-step primarily updates the variables.
o 3rd Step: This step is known as Maximization or M-step, where we use
complete data obtained from the 2 nd step to update the parameter values.
Further, M-step primarily updates the hypothesis.
o 4th step: The last step is to check if the values of latent variables are
converging or not. If it gets "yes", then stop the process; else, repeat the
process from step 2 until the convergence occurs.

Gaussian Mixture Model (GMM)


The Gaussian Mixture Model or GMM is defined as a mixture model that has a
combination of the unspecified probability distribution function. Further, GMM
also requires estimated statistics values such as mean and standard deviation or
parameters. It is used to estimate the parameters of the probability distributions to
best fit the density of a given training dataset. Although there are plenty of
techniques available to estimate the parameter of the Gaussian Mixture Model
(GMM), the Maximum Likelihood Estimation is one of the most popular techniques
among them.

Let's understand a case where we have a dataset with multiple data points generated
by two different processes. However, both processes contain a similar Gaussian
probability distribution and combined data. Hence it is very difficult to discriminate
which distribution a given point may belong to.

The processes used to generate the data point represent a latent variable or
unobservable data. In such cases, the Estimation-Maximization algorithm is one of
the best techniques which helps us to estimate the parameters of the gaussian
distributions. In the EM algorithm, E-step estimates the expected value for each
latent variable, whereas M-step helps in optimizing them significantly using the
Maximum Likelihood Estimation (MLE). Further, this process is repeated until a good
set of latent values, and a maximum likelihood is achieved that fits the data.

Applications of EM algorithm
The primary aim of the EM algorithm is to estimate the missing data in the latent
variables through observed data in datasets. The EM algorithm or latent variable
model has a broad range of real-life applications in machine learning. These are as
follows:

o The EM algorithm is applicable in data clustering in machine learning.


o It is often used in computer vision and NLP (Natural language processing).
o It is used to estimate the value of the parameter in mixed models such as
the Gaussian Mixture Modeland quantitative genetics.
o It is also used in psychometrics for estimating item parameters and latent
abilities of item response theory models.
o It is also applicable in the medical and healthcare industry, such as in image
reconstruction and structural engineering.
o It is used to determine the Gaussian density of a function.

Advantages of EM algorithm
o It is very easy to implement the first two basic steps of the EM algorithm in
various machine learning problems, which are E-step and M- step.
o It is mostly guaranteed that likelihood will enhance after each iteration.
o It often generates a solution for the M-step in the closed form.
Disadvantages of EM algorithm
o The convergence of the EM algorithm is very slow.
o It can make convergence for the local optima only.
o It takes both forward and backward probability into consideration. It is
opposite to that of numerical optimization, which takes only forward
probabilities.

Conclusion
In real-world applications of machine learning, the expectation-maximization (EM)
algorithm plays a significant role in determining the local maximum likelihood
estimates (MLE) or maximum a posteriori estimates (MAP) for unobservable variables
in statistical models. It is often used for the latent variables, i.e., to estimate the latent
variables through observed data in datasets. It is generally completed in two
important steps, i.e., the expectation step (E-step) and the Maximization step (M-
Step), where E-step is used to estimate the missing data in datasets, and M-step is
used to update the parameters after the complete data is generated in E-step.
Further, the importance of the EM algorithm can be seen in various applications such
as data clustering, natural language processing (NLP), computer vision, image
reconstruction, structural engineering, etc.

Machine Learning Pipeline


What is Machine Learning Pipeline?
A Machine Learning pipeline is a process of automating the workflow of a
complete machine learning task. It can be done by enabling a sequence of data to
be transformed and correlated together in a model that can be analyzed to get the
output. A typical pipeline includes raw data input, features, outputs, model
parameters, ML models, and Predictions. Moreover, an ML Pipeline contains multiple
sequential steps that perform everything ranging from data extraction and pre-
processing to model training and deployment in Machine learning in a modular
approach. It means that in the pipeline, each step is designed as an independent
module, and all these modules are tied together to get the final result.

The ML pipeline is a high-level API for MLlib within the "spark.ml" package. A typical
pipeline contains various stages. However, there are two main pipeline stages:
1. Transformer: It takes a dataset as an input and creates an augmented dataset
as output. For example, A tokenizer works as Transformer, which takes a text
dataset, and transforms it into tokenized words.
2. Estimator: An estimator is an algorithm that fits on the input dataset to
generate a model, which is a transformer. For example, regression is an
Estimator that trains on a dataset with labels and features and produces a
logistic regression model.

Importance of Machine Learning Pipeline


To understand the importance of a Machine learning pipeline, let's first understand a
typical workflow of an ML task:

A typical workflow consists of Ingestion, Data cleaning, Data pre-processing,


Modelling, and deployment.
PlayNext
Unmute

Current Time 0:00

Duration 18:10
Loaded: 0.37%
Â
Fullscreen
Backward Skip 10sPlay VideoForward Skip 10s

In ML workflow, all these steps are run together with the same script. It means the
same script will be used to extract data, clean data, model, and deploy. However, it
may generate issues while trying to scale an ML model. These issues involve:

o If we need to deploy multiple versions of the same model, we need to run the
complete workflow cycle multiple times, even when the very first step, i.e.,
ingestion and preparation, are exactly similar in each model.
o If we want to expand our model, we need to copy and paste the code from
the beginning of the process, which is an inefficient and bad way of software
development.
o If we want to change the configuration of any part of the workflow, we need
to do it manually, which is a much more time-consuming process.

For solving all the above problems, we can use a Machine learning pipeline. With the
ML pipeline, each part of the workflow acts as an independent module. So
whenever we need to change any part, we can choose that specific module and use
that as per our requirement.

We can understand it with an example. Building any ML model requires a huge


amount of data to train the model. As data is collected from different resources, it is
necessary to clean and pre-process the data, which is one of the crucial steps of an
ML project. However, whenever a new dataset is included, we need to perform the
same pre-processing step before using it for training, and it becomes a time-
consuming and complex process for ML professionals.

To solve such issues, ML pipelines can be used, which can remember and automate
the complete pre-processing steps in the same order.

Machine Learning Pipeline Steps


On the basis of the use cases of the ML model and the requirement of the
organization, each machine learning pipeline may be different to some extent.
However, each pipeline follows/works upon the general workflow of Machine
learning, or there are some common stages that each ML pipeline includes. Each
stage of the pipeline takes the output from its preceding stage, which acts as the
input for that particular stage. A typical ML pipeline includes the following stages:

1. Data Ingestion
Each ML pipeline starts with the Data ingestion step. In this step, the data is
processed into a well-organized format, which could be suitable to apply for further
steps. This step does not perform any feature engineering; rather, this may perform
the versioning of the input data.

2. Data Validation
The next step is data validation, which is required to perform before training a new
model. Data validation focuses on statistics of the new data, e.g., range, number of
categories, distribution of categories, etc. In this step, data scientists can detect if any
anomaly present in the data. There are various data validation tools that enable us to
compare different datasets to detect anomalies.

3. Data Pre-processing
Data pre-processing is one of the most crucial steps for each ML lifecycle as well as
the pipeline. We cannot directly input the collected data to train the model without
pr-processing it, as it may generate an abrupt result.

The pre-processing step involves preparing the raw data and making it suitable for
the ML model. The process includes different sub-steps, such as Data cleaning,
feature scaling, etc. The product or output of the data pre-processing step becomes
the final dataset that can be used for model training and testing. There are different
tools in ML for data pre-processing that can range from simple Python scripts to
graph models.

4. Model Training & Tuning


The model training step is the core of each ML pipeline. In this step, the model is
trained to take the input (pre-processed dataset) and predicts an output with the
highest possible accuracy.

However, there could be some difficulties with larger models or with large training
data sets. So, for this, efficient distribution of the model training or model tuning is
required.

This issue of the model training stage can be solved with pipelines as they are
scalable, and a large number of models can be processed concurrently.

5. Model Analysis
After model training, we need to determine the optimal set of parameters by using
the loss of accuracy metrics. Apart from this, an in-depth analysis of the model's
performance is crucial for the final version of the model. The in-depth analysis
includes calculating other metrics such as precision, recall, AUC, etc. This will also
help us in determining the dependency of the model on features used in training and
explore how the model's predictions would change if we altered the features of a
single training example.

6. Model Versioning
The model versioning step keeps track of which model, set of hyperparameters, and
datasets have been selected as the next version to be deployed. For various
situations, there could occur a significant difference in model performance just by
applying more/better training data and without changing any model parameter.
Hence, it is important to document all inputs into a new model version and track
them.

7. Model Deployment
After training and analyzing the model, it's time to deploy the model. An ML model
can be deployed in three ways, which are:

o Using the Model server,


o In a Browser
o On Edge device

However, the common way to deploy the model is using a model server. Model
servers allow to host multiple versions simultaneously, which helps to run A/B tests
on models and can provide valuable feedback for model improvement.

8. Feedback Loop
Each pipeline forms a closed-loop to provide feedback. With this close loop, data
scientists can determine the effectiveness and performance of the deployed models.
This step could be automated or manual depending on the requirement. Except for
the two manual review steps (the model analysis and the feedback step), we
can automate the entire pipeline.

Benefits of Machine Learning Pipelines


Some of the benefits of using pipelines for the ML workflows are as follows:

o Unattended runs
The pipeline allows to schedule different steps to run in parallel in a reliable
and unattended way. It means you can focus on other tasks simultaneously
when the process of data modeling and preparation is going on.
o Easy Debugging
Using pipeline, there is a separate function for each task(such as different
functions for data cleaning and data modeling). Therefore, it becomes easy to
debug the complete code and find out the issues in a particular step.
o Easy tracking and versioning
We can use a pipeline to explicitly name and version the data sources, inputs,
and output rather than manually tracking data and outputs for each iteration.
o Fast execution
As we discussed above, in the ML pipeline, each part of the workflow acts as
an independent element, which allows the software to run faster and generate
an efficient and high-quality output.
o Collaboration
Using pipelines, data scientists can collaborate over each phase of the ML
design process and can also work on different pipeline steps simultaneously.
o Reusability
We can create pipeline templates for particular scenarios and can reuse them
as per requirement. For example, creating a template for retraining and batch
scoring.
o Heterogeneous Compute
We can use multiple pipelines which are reliably coordinated over
heterogeneous computer resources as well as different storage locations. It
allows making efficient use of resources by running separate pipelines steps
on different computing resources, e.g., GPUs, Data Science VMs, etc.

Considerations while building a Machine


Learning Pipeline
o Create each step as reusable components:
We should consider all the steps that involve in an ML workflow for creating
an ML model. Start building a pipeline with how data is collected and pre-
processed, and continue till the end. It is recommended to limit the scope of
each component to make it easier to understand and iterate.
o Always codify tests into components:
Testing should be considered an inherent part of the pipeline. If you, in a
manual process, do some checks on how the input data and the model
predictions should look like, you should codify this into a pipeline.
o Put all the steps together:
We must put all the steps together and define the order in which components
of the workflow are processed, including how input and outputs run through
the pipeline.
o Automate as per needed:
When we create a pipeline, it already makes the workflow automated as it
manages and handles the different running steps of workflow without any
human intervention. However, various people's aim is to make the complete
pipeline automated when specific criteria are met. For example, you may
monitor model drift in production to trigger a re-training run or - simply do it
more periodically, like daily.

ML Pipeline Tools
There are different tools in Machine learning for building a Pipeline. Some are given
below along with their usage:
Steps while building Tools
the pipeline

Obtaining the Data Managing the Database - PostgreSQL, MongoDB,


DynamoDB, MySQL. Distributed Storage - Apache
Hadoop, Apache Spark/Apache Flink.

Scrubbing / Cleaning Scripting Language - SAS, Python, and R. Processing in


the Data a Distributed manner - MapReduce/ Spark, Hadoop.
Data Wrangling Tools - R, Python Pandas

Exploring / Visualizing Python, R, MATLAB, and Weka.


the Data to find the
patterns and trends

Modeling the data to Machine Learning algorithms - Supervised,


make the predictions Unsupervised, Reinforcement, Semi-Supervised, and
Semi-unsupervised learning. Important libraries -
Python (Scikit learn) / R (CARET)

Interpreting the result Data Visualization Tools -ggplot, Seaborn, D3.JS,


Matplotlib, Tableau.

Exploitation and Exploration in


Machine Learning
Exploitation and exploration are the key concepts in Reinforcement Learning,
which help the agent to build online decision making in a better
way. Reinforcement learning is a machine learning method in which an intelligent
agent (computer program) learns to interact with the environment and take actions
to maximize rewards in a specific situation. This ML method is currently being used in
so many industries such as automobile, healthcare, medicine, education, etc.
As in Reinforcement learning, the agent is not aware of the different states, actions
for each state, associate rewards, and transition to the next state, but it learns it by
exploring the environment. However, the knowledge of an agent about the state,
actions, rewards, and resulting states is partial, and this results in Exploration-
Exploitation Dilemma. In this topic, "Exploitation and Exploration in Machine
Learning," we will discuss both these terms in detail with suitable examples. But
before starting the topic, let's first understand reinforcement learning in ML.

What is Reinforcement Learning?


Unlike supervised and unsupervised learning, reinforcement learning is a feedback-
based approach in which agent learns by performing some actions as well as their
outcomes. Based on action status (good or bad), the agent gets positive or negative
feedback. Further, for each positive feedback, they get rewarded, whereas, for each
negative feedback, they also get penalized.

Key points in Reinforcement Learning

o Reinforcement learning does not require any labeled data for the learning
process. It learns through the feedback of action performed by the agent.
Moreover, in reinforcement learning, agents also learn from past experiences.
o Reinforcement learning methods are used to solve tasks where decision-
making is sequential and the goal is long-term, e.g., robotics, online chess, etc.
o Reinforcement learning aims to get maximum positive feedback so that they
can improve their performance.
o Reinforcement learning involves various actions, which include taking action,
changing/unchanged state, and getting feedback. And based on these actions,
agents learn and explore the environment.

Hence, we can define reinforcement learning as:

PlayNext
Unmute

Current Time 0:00

Duration 18:10
Loaded: 0.37%
Â
Fullscreen
Backward Skip 10sPlay VideoForward Skip 10s

"Reinforcement learning is a type of machine learning technique, where an intelligent


agent (computer program) interacts with the environment, explore it by itself, and
makes actions within that."

What are Exploration and Exploitation in


Reinforcement Learning
Before going to a brief description of exploration and exploitation in machine
learning, let's first understand these terms in simple words. In reinforcement learning,
whenever agents get a situation in which they have to make a difficult choice
between whether to continue the same work or explore something new at a specific
time, then, this situation results in Exploration-Exploitation Dilemma because the
knowledge of an agent about the state, actions, rewards and resulting states is
always partial.

Now we will discuss exploitation and exploration in technical terms.

Exploitation in Reinforcement Learning


Exploitation is defined as a greedy approach in which agents try to get more rewards
by using estimated value but not the actual value. So, in this technique, agents
make the best decision based on current information.

Exploration in Reinforcement Learning


Unlike exploitation, in exploration techniques, agents primarily focus on improving
their knowledge about each action instead of getting more rewards so that they can
get long-term benefits. So, in this technique, agents work on gathering more
information to make the best overall decision.

Examples of Exploitation and Exploration


in Machine Learning
Let's understand exploitation and exploration with some interesting real-world
examples.

Coal mining:
Let's suppose people A and B are digging in a coal mine in the hope of getting a
diamond inside it. Person B got success in finding the diamond before person A and
walks off happily. After seeing him, person A gets a bit greedy and thinks he too
might get success in finding diamond at the same place where person B was digging
coal. This action performed by person A is called greedy action, and this policy is
known as a greedy policy. But person A was unknown because a bigger diamond
was buried in that place where he was initially digging the coal, and this greedy
policy would fail in this situation.

In this example, person A only got knowledge of the place where person B was
digging but had no knowledge of what lies beyond that depth. But in the actual
scenario, the diamond can also be buried in the same place where he was digging
initially or some completely another place. Hence, with this partial knowledge about
getting more rewards, our reinforcement learning agent will be in a dilemma on
whether to exploit the partial knowledge to receive some rewards or it should
explore unknown actions which could result in many rewards.

However, both these techniques are not feasible simultaneously, but this issue can be
resolved by using Epsilon Greedy Policy (Explained below).

There are a few other examples of Exploitation and Exploration in Machine Learning
as follows:

Example 1: Let's say we have a scenario of online restaurant selection for food
orders, where you have two options to select the restaurant. In the first option, you
can choose your favorite restaurant from where you ordered food in the past; this is
called exploitation because here, you only know information about a specific
restaurant. And for other options, you can try a new restaurant to explore new
varieties and tastes of food, and it is called exploration. However, food quality might
be better in the first option, but it is also possible that it is more delicious in another
restaurant.

Example 2: Suppose there is a game-playing platform where you can play chess with
robots. To win this game, you have two choices either play the move that you believe
is best, and for the other choice, you can play an experimental move. However, you
are playing the best possible move, but who knows new move might be more
strategic to win this game. Here, the first choice is called exploitation, where you
know about your game strategy, and the second choice is called exploration, where
you are exploring your knowledge and playing a new move to win the game.

Epsilon Greedy Policy


Epsilon greedy policy is defined as a technique to maintain a balance between
exploitation and exploration. However, to choose between exploration and
exploitation, a very simple method is to select randomly. This can be done by
choosing exploitation most of the time with a little exploration.

In the greedy epsilon strategy, an exploration rate or epsilon (denoted as ε) is initially


set to 1. This exploration rate defines the probability of exploring the
environment by the agent rather than exploiting it. It also ensures that the agent
will start by exploring the environment with ε=1.

As the agent start and learns more about the environment, the epsilon decreases by
some rate in the defined rate, so the likelihood of exploration becomes less and less
probable as the agent learns more and more about the environment. In such a case,
the agent becomes greedy for exploiting the environment.
To find if the agent will select exploration or exploitation at each step, we generate a
random number between 0 and 1 and compare it to the epsilon. If this random
number is greater than ε, then the next action would be decided by the exploitation
method. Else it must be exploration. In the case of exploitation, the agent will take
action with the highest Q-value for the current state.

1. if random_number > epsilon:


2. //choose next action via exploitation
3. else:
4. // choose next action via exploration

Examples-

We can understand the above concept with rolling dice. Let's say the agent will
explore if the dice land on 1; otherwise, he will exploit. This method is called an
epsilon greedy action with the value of epsilon ε=1/6, which is the probability of
getting 1 on dice. It can be expressed as follows:

In the above formula, the action selected at attempt 't' will be a greedy action
(exploit) with probability 1- ε or maybe a random action (explore) with probability ε.

Notion of Regret
Whenever we do something and don't find the proper outcome, then regret our
decision as we have previously discussed an example of exploitation and exploration
for choosing a restaurant. For that example, if we choose a new restaurant instead of
our favorite, but the food quality and overall experience are poor, then we will regret
our decision and will consider what we paid for as a complete loss. Moreover, if we
order food from the same restaurant again, the regret level increases along with the
number of losses. However, reinforcement learning methods can reduce the amount
of loss and the level of regret.
Regret in Reinforcement Learning

Before understanding the regret in reinforcement learning, we must know the


optimal action 'a*', which is the action that gives the highest rewards. It is given as
follows:

Hence, the regret in reinforcement learning can be defined as the difference between
the reward generated by the optimal action a* multiplied by T and the sum from 1 to
T of each reward of arbitrary action. It can be expressed as follows:

Regret:LT=TE[r?a^* ]-∑[r|at]

Conclusion
Exploitation and exploration techniques in reinforcement machine learning have
enhanced various types of parameters such as improved performance, increased
learning rate, better decision making, etc. All these parameters are significant for
learning the agents in the reinforcement learning method. Further, the disadvantage
of exploitation and exploration techniques is that both require synchronization with
these parameters as well as the specific environment, which may cause more
supervision for reinforcement learning agents. This topic exposed some most used
exploration techniques used in reinforcement learning. From the above examples, we
can conclude that we must prefer exploration methods to reduce regrets and make
the learning process faster and more significant.

You might also like