Final Thesis

Machine Learning
From Roots To Networks
BS Thesis
by
Muhammad Waqar Zain
Arfa Farooq
CIIT/SP20-BSM-018/LHR
COMSATS University Islamabad

Pakistan
Fall 2023
Machine Learning From Roots To Networks
Dr. Adeel Farooq

In partial fulfillment
of the requirement for the degree of
Bachelor of Science
in
Mathematics
by
Muhammad Waqar Zain
Arfa Farooq
Department of Mathematics
Faculty of Science

Pakistan
Fall 2023
ii
This thesis is submitted to the department of Mathematics in partial fulfill-

ment of the requirement for the award of degree of Master of Science in
Mathematics
Student’s name Registration Number
M. Waqar Zain CIIT/SP20-BSM-018/LHR
Arfa Farooq CIIT/SP20-BSM-028/LHR
Supervisory Committee
Supervisor Member
Dr. Adeel Farooq Dr.

Associate Professor Associate Professor
Department of Mathematiccs Department of Mathematiccs
COMSATS University Islamabad (CUI) COMSATS University Islamabad (CUI)
Lahore Campus Lahore Campus
Member Member
Name Dr.
Associate Professor Associate Professor
Department of Mathematiccs Department of Mathematiccs
COMSATS University Islamabad (CUI) COMSATS University Islamabad (CUI)
Lahore Campus Lahore Campus
iii
Certificate of Approval

By
M. Waqar Zain
Arfa Farooq
Has been approved
For COMSATS University Islamabad, Lahore Campus.
External Examiner:
Dr. External Examinor

University Name
Supervisor:
Dr. Adeel Farooq

Department of Mathematics, (CUI) Lahore Campus
Head of Department:
Prof. Dr. Muhammad Hussain

Department of Mathematics, (CUI) Lahore Campus
iv
Declaration
We M. Waqar Zain (SP20-BSM-018) and Arfa Farooq (SP20-BSM-028) hereby declare

that we have produced the work presented in this thesis, during the scheduled period of
study. We also declare that we have not taken any material from any source except referred
to wherever due to that amount of plagiarism is within an acceptable range. If a violation
of HEC rules on research has occurred in this thesis, we shall be liable to punishable action
under the plagiarism rules of HEC.
Date:
Muhammad Waqar Zain

Arfa Farooq
v
Certificate
It is certified that M. Waqar Zain CIIT/SP20-BSM-018/LHR and Arfa Farooq CIIT/SP20-

BSM-028/LHR has carried out all the work related to this thesis under my supervision at
the Department of Mathematics, COMSATS University Islamabad, Lahore Campus and
the work fulfills the requirement for award of BS degree.
Date:
Supervisor
Dr. Adeel Farooq

Associate Professor,
Mathematics,
Lahore Campus
vi
Dedication
We dedicate our thesis to our parents, honourable teachers and

to each other.
vii
Acknowledgements
Praise to be ALLAH, the Cherisher and Lord

of the World, Most gracious and Most Merciful
First and foremost, we would like to thank ALLAH Almighty (the most beneficent and
most merciful) for giving me the strength, knowledge, ability and opportunity to undertake
this research study and to preserve and complete it satisfactorily. Without countless bless-
ing of ALLAH Almighty, this achievement would not have been possible. May His peace
and blessings be upon His messenger Hazrat Muhammad (PBUH), upon his family, com-
panions and whoever follows him. Our insightful gratitude to Hazrat Muhammad (PBUH)
Who is forever a track of guidance and knowledge for humanity as a whole. In our journey
towards this degree, we have found a teacher, an inspiration, a role model and a pillar of
support in our life.
M. Waqar Zain
Arfa Farooq
viii
Abstract

By
M. Waqar Zain
Arfa Farooq
This work provides a foundational exploration into the realm of machine learning, covering
essential concepts and methodologies. The study begins with an introduction to machine
learning, delineating its significance in contemporary technology. It then delves into the
classification of machine learning into various types. The investigation includes an in-depth
analysis of prominent supervised learning algorithms, such as regression, decision trees,
support vector machines (SVM), and neural networks. Each algorithm is scrutinized for
its characteristics, applications, and underlying principles. By comprehensively addressing
these core components, this work aims to furnish a solid understanding of the fundamental
aspects of machine learning and its diverse applications.
ix
Table of Contents
1 The Machine Learning Landscape . . . . . . . . . . . . . . . . . . . . . . 1

1.1 Machine learning: . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1
1.2 Components of Machine Learning . . . . . . . . . . . . . . . . . . . . . 2
1.3 Why Use Machine Learning? . . . . . . . . . . . . . . . . . . . . . . . 3
1.4 Data Mining . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3
2 Machine Learning And Its Types . . . . . . . . . . . . . . . . . . . . . . . 5

2.1 Supervised Machine Learning . . . . . . . . . . . . . . . . . . . . . . . 5
2.1.1 Implementation of Linear Regression: . . . . . . . . . . . . . . 6
2.1.2 Real Life Example Of Linear Regression: . . . . . . . . . . . . 7
2.2 Unsupervised Machine Learning . . . . . . . . . . . . . . . . . . . . . . 9
2.3 Semi-supervised Machine Learning . . . . . . . . . . . . . . . . . . . . 10
2.4 Self-supervised learning: . . . . . . . . . . . . . . . . . . . . . . . . . . 11
2.5 Reinforcement Learning . . . . . . . . . . . . . . . . . . . . . . . . . . 12
2.6 Batch Learning . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14
2.7 Online Learning . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15
2.8 Instance-based learning . . . . . . . . . . . . . . . . . . . . . . . . . . . 16
2.9 Lazy Learning . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17
2.10 Model-based learning . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18
2.11 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19
3 Algorithms In Machine Learning . . . . . . . . . . . . . . . . . . . . . . . 20

3.1 Regression . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20
3.1.1 Simple Linear Regression . . . . . . . . . . . . . . . . . . . . . 21
3.1.2 Objectives of Simple Linear Regression . . . . . . . . . . . . . 21
3.2 Use of Linear Regression in Machine Learning . . . . . . . . . . . . . 21
3.3 Multilinear Regression . . . . . . . . . . . . . . . . . . . . . . . . . . . 22
x
3.3.1 Applications of Multilinear Regression . . . . . . . . . . . . . . 25
3.4 Decision Trees . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25
3.4.1 Components of Decision Trees . . . . . . . . . . . . . . . . . . 26
3.5 Implementation of Decision Trees . . . . . . . . . . . . . . . . . . . . . 27
3.6 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 28
4 Support Vector Machine . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29

4.1 Definition: . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29
4.2 Aim of Support Vector Machine: . . . . . . . . . . . . . . . . . . . . . . 29
4.3 Components of Support Vector Machine: . . . . . . . . . . . . . . . . . 29
4.3.1 Characteristics of Hyperplane: . . . . . . . . . . . . . . . . . . . 30
4.4 Types of Support Vector Machines: . . . . . . . . . . . . . . . . . . . . 33
4.5 Working of Support Vector Machine: . . . . . . . . . . . . . . . . . . . 34
4.6 Implementation of Support Vector Machine: . . . . . . . . . . . . . . . 34
4.7 Advantages of Support Vector Machine: . . . . . . . . . . . . . . . . . 35
4.8 Disadvantages of Support Vector Machine: . . . . . . . . . . . . . . . . 36
4.9 Conclusion: . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 36
5 Neural Networks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 37
5.1 Definition: . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 37
5.2 Applications of Neural Networks: . . . . . . . . . . . . . . . . . . . . . 37
5.3 Structure of neural networks: . . . . . . . . . . . . . . . . . . . . . . . . 38
5.3.1 Components of neural networks: . . . . . . . . . . . . . . . . . 38
5.4 Working of neural networks: . . . . . . . . . . . . . . . . . . . . . . . . 39
5.4.1 Forward propagation: . . . . . . . . . . . . . . . . . . . . . . . . 39
5.4.2 Backward propagation: . . . . . . . . . . . . . . . . . . . . . . . 40
5.5 Implementation of neural networks: . . . . . . . . . . . . . . . . . . . . 41
5.6 Learning in neural networks: . . . . . . . . . . . . . . . . . . . . . . . . 42
5.7 Types of neural networks: . . . . . . . . . . . . . . . . . . . . . . . . . . 43
5.8 Advantages of neural networks: . . . . . . . . . . . . . . . . . . . . . . 44
5.9 Disadvantages of neural networks: . . . . . . . . . . . . . . . . . . . . . 44
xi
5.10 Conclusion: . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 45
6 Problem Solving . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 46
6.1 Problem 01 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 46
6.2 Problem 02 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 50
6.3 Problem 03 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 53
6.4 Problem 04 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 57
6.5 Problem 05 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 61
References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 64
xii
List of Figures
Figure1.1 Email Spamming . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2
Figure2.1 Types of Supervised Learning . . . . . . . . . . . . . . . . . . . . . . . 6

Figure2.2 Linear Regression Code In Python . . . . . . . . . . . . . . . . . . . . 7
Figure2.3 Real life example of Linear regression in python . . . . . . . . . . . . . 8
Figure2.4 Types of Unsupervised Learning . . . . . . . . . . . . . . . . . . . . . 9
Figure2.5 Reinforcement Learning . . . . . . . . . . . . . . . . . . . . . . . . . . 13
Figure3.1 Multilinear Regression . . . . . . . . . . . . . . . . . . . . . . . . . . . 23

Figure3.2 Multilinear Regression Code in Python . . . . . . . . . . . . . . . . . . 24
Figure3.3 Decision tree representation . . . . . . . . . . . . . . . . . . . . . . . . 26
Figure3.4 Decision Trees Code in Python . . . . . . . . . . . . . . . . . . . . . . 27
Figure4.1 Support Vector Machine . . . . . . . . . . . . . . . . . . . . . . . . . . 29

Figure4.2 Hyperplanes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30
Figure4.3 Support Vectors . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31
Figure4.4 Margins . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 32
Figure4.5 Hard and Soft Margins . . . . . . . . . . . . . . . . . . . . . . . . . . . 32
Figure4.6 1D TO 2D . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 33
Figure4.7 2D TO 3D . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 33
Figure4.8 Result . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 35
Figure5.1 Simple neural network . . . . . . . . . . . . . . . . . . . . . . . . . . . 37

Figure5.2 Components . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 39
Figure6.1 SVR Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 47

Figure6.2 SVR Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 48
Figure6.3 SVR Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 49
xiii
Figure6.4 Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 50
Figure6.5 GridsearchCV Model . . . . . . . . . . . . . . . . . . . . . . . . . . . 51
Figure6.6 GridsearchCV Model . . . . . . . . . . . . . . . . . . . . . . . . . . . 52
Figure6.7 Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 53
Figure6.8 Pipeline Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 54
Figure6.9 Pipeline Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 55
Figure6.10 Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 56
Figure6.11 Single Pipeline . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 57
Figure6.15 Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 60
Figure6.16 Model for GridsearchCV . . . . . . . . . . . . . . . . . . . . . . . . . 61
Figure6.17 Model for GridsearchCV . . . . . . . . . . . . . . . . . . . . . . . . . 62
Figure6.18 Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 63
xiv
List of Tables
Table 2.1 Teacher’s Information . . . . . . . . . . . . . . . . . . . . . . . . . . . 9
xv
Chapter 1
The Machine Learning Landscape
1.1 Machine learning:
Machine learning is the science and art of programming computers so they can learn from
data.
Here is a slightly more general definition:
Machine learning is the field of study that gives computers the ability to learn without being
explicitly programmed.
Mathematically, machine learning is defined as:
Let S be the training dataset, where
S = (x1 , y1 ), (x2 , y2 ), ...(xn , yn ) (1.1.1)
represents n training examples. Here, xi denotes the input or features of the i-th example,
and yi represents the corresponding output or target value. The goal of machine learning
is to find a function or model f that maps inputs to outputs, such that f (xi ) ≈ yi for all
(xi , yi ) ∈ S
Machine learning is about extracting knowledge from data. It is a research field at the
intersection of statistics and computer science. It is also known as predictive analytics or
statistical learning.
Example:
Spam emails whose job is to move the inappropriate incoming email messages to a spam
folder. We could make up a blacklist of words that would result in an email being marked
as spam.
1
Figure 1.1: Email Spamming
From automatic recommendations of which movies to watch, to what food to order

or which products to buy and recognizing people in photos, many modern websites and
devices have machine learning algorithms at their core. When we look at a complex website
like Facebook, Amazon, or Netflix, it is very likely that every part of the site contains
multiple machine learning models.
1.2 Components of Machine Learning
Training:
Training sets are the examples that the system uses to learn. Training sets are a fundamen-
tal concept in machine learning and artificial intelligence. It is a collection of data used to
train a machine learning model. They are like textbooks and exercise books for AI models.
When we want to teach a model to perform a task, we expose it to lots of examples. These
are the training sets. Each example consists of input data and the correct output, and the
model learns to make predictions based on this input-output relationship. It’s like a teacher
showing a student a bunch of math problems along with the correct answers, so the student
learns how to solve similar problems on their own.
Samples:
Each training example is called a training instance or sample. A sample is essentially an
individual data entry within the training set. It could be an image, a text document, a set of
features, or any other unit of data depending on the type of problem. During the training
process, the machine learning model learns patterns and relationships from these samples
2
in the training set. Once trained, the model can then make predictions or classifications on
new, unseen data. The quality and representativeness of the training set are crucial factors
in the performance of the trained model.
Model:
The part of a machine learning system that learns and makes predictions is called a model.
A model is essentially a mathematical representation or framework that captures patterns,
relationships, and structures within the data. The purpose of a machine learning model is to
make predictions or decisions without being explicitly programmed. A machine learning
model is a computational representation of patterns in data that has been trained to make
predictions or decisions. There are various types of a machine learning models, including
linear regression, decision trees, support vector machines, neural networks, and many oth-
ers. The choice of model depends on the nature of the data.
Example:
Neural networks and random forests.
1.3 Why Use Machine Learning?
Machine learning is used to solve diverse problems, enhance decision-making processes,

and automate tasks, contributing to advancements and efficiencies across various industries.
Machine learning is great for the following:
• Problems for which existing solutions require a lot of fine-tuning or long lists of rules.
• Complex problems for which using a traditional approach yields no good solution.
• Fluctuating environments.
• Getting insights about complex problems and large amounts of data.
1.4 Data Mining
Digging into large amounts of data to discover hidden patterns is called data mining. Data
mining is a crucial component of machine learning, and it involves the process of discover-
ing patterns, trends, and valuable information from large datasets. It is more about extract-
ing knowledge from data. Data mining is a complementary process to machine learning,
3
providing the foundational steps of exploring, preprocessing, and extracting knowledge
from data. The insights gained through data mining inform the selection of appropriate
machine learning models and features, contributing to the overall success of the predictive
or decision-making tasks.
4
Chapter 2
Machine Learning And Its Types
There are so many different types of machine learning systems that it is useful to classify
them in broad categories, based on the following criteria:
• How they are supervised during training. It includes supervised, unsupervised, semi-
supervised, self-supervised, and others.
• Whether or not they can learn incrementally on the fly. It includes online and batch learn-
ing.
• Whether they work by simply comparing new data points to known data points, or in-
stead by detecting patterns in the training data and building a predictive model. It includes
instance-based and model-based learning.
2.1 Supervised Machine Learning
Supervised learning is the type of machine learning in which machines are trained using
well labelled training data, and based on that data, machines predict the output. In a super-
vised learning, the algorithm is provided with a training set, which consists of input-output
pairs. The algorithm learns from this training data by adjusting its parameters or weights
based on the input-output relationships. The training process involves iteratively refining
the model until it can generalize well to new, unseen data. There are two main types of
supervised machine learning:
5
Figure 2.1: Types of Supervised Learning
Regression:
In regression, the algorithm learns to predict a continuous output variable. For example,
predicting house prices based on features like square footage, number of bedrooms, and
location.
Classification:
In classification tasks, the algorithm learns to assign input data to discrete classes or cat-
egories. Examples include spam detection in emails, image classification, or predicting
whether a customer will return or not. Supervised learning is widely used in various field-
s, such as finance, health care, natural language processing, computer vision, and many
others, due to its ability to make predictions or classifications based on labeled data. In
supervised learning, the training set we feed to the algorithm includes the desired solution-
s, called labels. The labelled data means some input data is already tagged with correct
output.
2.1.1 Implementation of Linear Regression:
6
Figure 2.2: Linear Regression Code In Python
2.1.2 Real Life Example Of Linear Regression:
This linear regression example predict the salaries of teachers on the base of their experi-
ence
7
Figure 2.3: Real life example of Linear regression in python
8
2.2 Unsupervised Machine Learning
In unsupervised learning, the training data is unlabeled. The system tries to learn without
a teacher. Unsupervised learning is a type of machine learning where the algorithm is
given input data without explicit output labels. The goal of unsupervised learning is to
explore the inherent structure or patterns within the data. Unlike supervised learning, there
is no provided ”teacher” or labeled target for the algorithm to learn from. Instead, the
algorithm tries to find hidden patterns or relationships on its own. There are two main
types of unsupervised learning tasks:
Figure 2.4: Types of Unsupervised Learning
Example:
Consider a dataset containing information about teachers’ salaries, including features such
as teacher ID, age, years of experience, and income. In unsupervised learning, the goal is
to explore the data, uncover patterns, and extract meaningful insights without relying on
labeled information or predefined objectives.
Teacher’s ID Age Experience (in years) Gender Income (Rs)

1 28 4 M 55000
2 37 10 F 70000
3 26 3 M 42000
4 51 15 M 94000
5 33 6 F 42000
Table 2.1: Teacher’s Information
Clustering:
In clustering, the algorithm groups similar data points together based on certain features or
9
characteristics, forming clusters. The goal is to identify natural groupings in the data with-
out any prior knowledge of the categories. Common clustering algorithms include k-means
clustering and hierarchical clustering.
Dimensionality Reduction:
In dimensionality reduction techniques, the aim is to reduce the number of features or vari-
ables in the data while preserving its important information. It is useful for simplifying
complex datasets and extracting the most relevant features. Principal Component Anal-
ysis (PCA) and t-Distributed Stochastic Neighbor Embedding (t-SNE) are examples of
dimensionality reduction methods. Unsupervised learning is often used in scenarios where
labeled data is scarce or expensive to obtain. It helps to discover patterns, associations,
and structures in the data that may not be apparent initially. Some applications of unsuper-
vised learning include customer segmentation, anomaly detection, and feature extraction
for subsequent supervised learning tasks. It is important to note that the distinction be-
tween supervised and unsupervised learning is not always strict, and some methods, like
semi-supervised learning, combine elements of both paradigms.
Example: Visualization algorithm.
2.3 Semi-supervised Machine Learning
Semi-supervised learning is a machine learning paradigm that combines elements of both

supervised and unsupervised learning. In semi-supervised learning, the algorithm is trained
on a dataset that contains both labeled and unlabeled examples. This approach is partic-
ularly useful when obtaining a fully labeled dataset is expensive or time-consuming, but
there is some labeled data available. The training process typically involves using the la-
beled data to guide the learning process and make predictions, while also leveraging the
unlabeled data to discover additional patterns or structure within the dataset. The idea is
that the algorithm can learn from the labeled examples to make predictions on similar un-
labeled examples and improve its overall performance. Semi-supervised learning methods
can be applied to both classification and regression tasks. The labeled data provides explic-
it information about certain patterns or relationships, while the unlabeled data allows the
10
algorithm to explore the data more broadly and potentially discover hidden structures.
There are various techniques used in semi-supervised learning, including:
Self-training:
The model is initially trained on the labeled data, and then it makes predictions on the un-
labeled data. The confident predictions on the unlabeled data are then added to the labeled
dataset, and the model is retrained.
Co-training:
The algorithm is trained on multiple views or representations of the data. Each view might
have a different set of features, and the model is trained on one set of features at a time.
The agreement between predictions on the labeled data helps in improving performance.
Multi-view learning:
Like co-training, but the different views of the data are considered independently, and mod-
els are trained on each view separately. Semi-supervised learning is applied in scenarios
where acquiring labeled data is challenging, such as in medical imaging, where labeling re-
quires expert annotation, or in natural language processing, where labeling large amounts
of text data can be time-consuming. It aims to take advantage of both labeled and unlabeled
data to build more robust and accurate models.
2.4 Self-supervised learning:
Another approach to machine learning involves generating a fully labeled data set from a
fully unlabeled one. Once the whole data set is labeled, any supervised learning algorithm
can be used. This approach is called self-supervised learning. Self-supervised learning is a
machine learning paradigm where the algorithm learns from the data itself without explicit
external labels. In self-supervised learning, the algorithm generates its own supervisory
signal or labels from the input data, creating a surrogate or proxy task that the model aims
to solve. The goal is to leverage the inherent structure or relationships within the data to en-
able learning without the need for human-annotated labels. Key features of self-supervised
learning include:
11
Proxy Tasks:
Instead of relying on external labels, self-supervised learning involves defining proxy tasks
that are derived from the input data. These tasks are designed to be solvable by the model
using the inherent information present in the data.
Data Augmentation:
Self-supervised learning often involves creating variations or augmentations of the input
data. The model is then trained to predict or generate the original data from its augmented
versions. Self-supervised learning has gained popularity due to its ability to learn useful
representations from large amounts of unlabeled data. It is widely used in computer vi-
sion, natural language processing, and other domains where obtaining labeled data can be
expensive or impractical. The learned representations can then be fine-tuned on specific
downstream tasks using smaller amounts of labeled data.
2.5 Reinforcement Learning
Reinforcement learning is a very different beast. The learning system, called an agent, can
observe the environment, select and perform actions, and get rewards in return or penalties
in the form of negative rewards. It must then learn by itself what is the best strategy,
called a policy, to get the most reward over time. A policy defines what action the agent
should choose when it is in each situation. Reinforcement Learning (RL) is a type of
machine learning paradigm where an agent learns to make decisions by interacting with an
environment. The agent receives feedback in the form of rewards or penalties based on the
actions it takes, and its goal is to learn a strategy or policy that maximizes the cumulative
reward over time. Key components of reinforcement learning include:
12
Figure 2.5: Reinforcement Learning
Agent:
The entity that makes decisions and takes actions in the environment. The objective of the
agent is to learn the optimal strategy to maximize cumulative rewards.
Environment:
It is the external system or context with which the agent interacts. The environment defines
the state of the system and provides feedback to the agent in the form of rewards or penal-
ties.
State:
A state is the representation of the current situation or configuration of the environment.
The state contains all the relevant information which is needed to make decisions.
Action:
The action is the set of possible moves or decisions that the agent can take in each state.
These actions influence the subsequent state of the environment.
Reward:
A reward is the numerical value that the agent receives as feedback from the environment
after taking a particular action in a specific state. The goal of the agent is to maximize the
cumulative reward over time.
Policy:
The policy is the strategy or set of rules that the agent uses to determine its actions in dif-
ferent states. The objective is to learn an optimal policy that leads to the highest cumulative
reward. Reinforcement learning involves the agent interacting with the environment over
13
multiple time steps, learning from its experiences, and adjusting its policy to improve per-
formance. Popular reinforcement learning algorithms include Q-learning, Deep Q Network
(DQN), Policy Gradient methods, and more recently, algorithms based on deep neural net-
works, such as Proximal Policy Optimization (PPO) and Trust Region Policy Optimization
(TRPO).
Reinforcement learning has applications in various domains, including robotics, game play-
ing, autonomous systems, finance, and more. It has been notably successful in training
agents to play complex games like Go and Poker, as well as in optimizing control strategies
for robotics and autonomous vehicles.
2.6 Batch Learning
Batch learning, also known as offline learning or batch training, is a machine learning
paradigm where a model is trained on a complete dataset at once. In batch learning, the
entire dataset, including both input features and corresponding labels, is used to update the
parameters of the model in a single iteration. In batch learning, the system is incapable
of learning incrementally, that is, it must be trained using all the available data. This will
generally take a lot of time and computing resources, so it is typically done offline. First
the system is trained, and then it is launched into production and runs without learning any-
more. It just applies what it has learned. This is called offline learning. Key characteristics
of batch learning include:
Training on the Entire Dataset:
In batch learning, the model sees the entire dataset during each iteration of the training
process. The model computes the gradients and updates its parameters using the average
of the gradients calculated over the entire dataset.
Offline Processing:
Batch learning is often used in offline or batch processing scenarios, where the entire
dataset is available before training begins. This is common in scenarios where data can
be collected and processed in bulk, rather than in a streaming or real-time fashion.
Computationally Intensive:
14
Training on the entire data set at once can be computationally intensive, especially when
dealing with large datasets. However, it may also be more efficient in terms of hardware
utilization, as the processing can be optimized for matrix operations.
Iterative Optimization:
Batch learning typically involves multiple iterations over the entire dataset. The parame-
ters of the model are updated iteratively to minimize a predefined loss function until con-
vergence. Batch learning is suitable for scenarios where the dataset fits into memory, and
computational resources are sufficient to process the entire dataset in one go. It is com-
monly used in tasks such as model training for offline analytics, where the focus is on
optimizing the model based on the available historical data. While batch learning has its
advantages, it may not be well-suited for applications where data is constantly streaming
in, as it requires periodic retraining with updated datasets. In contrast, online learning is
more suitable for scenarios where the model needs to adapt to changing data over time.
2.7 Online Learning
Online learning, also known as incremental learning, streaming learning, or online machine
learning, is a machine learning paradigm where a model is updated continuously as new
data becomes available. In online learning, the model processes data one observation at
a time, updating its parameters incrementally. In online learning, we train the system in-
crementally by feeding it with data instances sequentially, either individually or in small
groups called mini batches. Each learning step is fast and cheap, so the system can learn
about new data on the fly, as it arrives. Key characteristics of online learning include:
Sequential Processing:
Data is processed in a sequential manner, with the model learning from one observation at
a time. This allows the model to adapt to new patterns or changes in the data as they occur.
Continuous Update:
The model is updated incrementally with each new data point. The updates can be per-
formed iteratively, with the model by adjusting its parameters to minimize a predefined
loss function based on the most recent data.
15
Adaptability to Changing Data:
Online learning is particularly useful in scenarios where the underlying patterns in the data
may change over time. The model can continuously adapt to new trends and patterns with-
out the need to retrain the entire dataset.
Real-time Processing:
Online learning is well-suited for applications that require real-time decision-making or
analysis, as the model can be updated with each new piece of data as it arrives.
Online learning algorithms include:
Stochastic Gradient Descent (SGD):
It is a popular optimization algorithm used in online learning. It updates the model param-
eters based on the gradient of the loss function computed for each individual data point.
Online Passive-Aggressive Algorithms:
These algorithms are designed for online learning and are particularly useful in classifica-
tion tasks.
Perceptron Algorithm:
A simple online learning algorithm used for binary classification. Online learning is applied
in various domains, including fraud detection, recommendation systems, anomaly detec-
tion, and other applications where data is continuously generated, and the model needs to
adapt to changes in the underlying patterns. It is particularly beneficial in situations where
the data is too large to fit into memory for batch processing or when the model needs to
respond quickly to new information.
2.8 Instance-based learning
In instance-based learning, the system learns the examples by heart, then generalizes to new
cases by using a similarity measure to compare them to the learned examples. Instance-
based learning, also known as memory-based learning or lazy learning, is a type of machine
learning where the model is trained on the entire dataset and makes predictions for new in-
stances based on the similarity to instances in the training data. Instead of learning a general
model during training, instance-based learning stores the training examples and uses them
16
directly for prediction. Key characteristics of instance-based learning include:
Memory-Intensive:
Instance-based learning methods store the entire training dataset in memory. During pre-
diction, the algorithm identifies the most similar instances in the training set to the new
input and uses their information to make predictions.
No Explicit Model:
Instance-based learning does not build an explicit model during training. It relies on the
stored instances for making predictions.
Similarity Measure:
The choice of similarity measure is crucial in instance-based learning. Common similarity
measures include Euclidean distance, cosine similarity, or other metrics that quantify the
similarity between instances.
2.9 Lazy Learning
Instance-based learning is referred as lazy learning because it delays the processing or

learning phase until a prediction is required. The learning is performed at the time of
prediction, based on the specific instance to be classified or predicted. K-Nearest Neighbors
(K-NN) is a well-known instance-based learning algorithm. In K-NN, the prediction for
a new instance is determined by the class labels of its k nearest neighbors in the training
data. The algorithm calculates distances or similarities between the new instance and all
instances in the training set, selects the k closest ones, and assigns the class label based on
majority voting. Instance-based learning can be effective in scenarios where the decision
boundary is complex and not easily captured by a simple model. However, it may be
computationally expensive, especially with large datasets, as predictions involve comparing
the new instance to all stored instances in the training set.
17
2.10 Model-based learning
Another way to generalize from a set of examples is to build a model of these examples and
then use that model to make predictions. This is called model-based learning. Model-based
learning is a machine learning paradigm in which a model is trained to make predictions or
decisions based on a given dataset. In contrast to instance-based learning, where the entire
dataset is stored for later use, model-based learning involves constructing a general model
during the training phase. This model is then used to make predictions on new, unseen data.
Key characteristics of model-based learning include:
Generalization:
Model-based learning aims to learn general patterns or relationships in the data that can be
applied to make predictions on new instances not seen during training. The goal is to create
a model that generalizes well to unseen data.
Parameterization:
The model is typically defined by a set of parameters that are learned from the training
data. The learning process involves adjusting these parameters to minimize a predefined
loss function, representing the difference between the model’s predictions and the actual
outcomes in the training data.
Explicit Representation:
The trained model provides an explicit representation of the underlying patterns in the data.
This representation can be used to make predictions or gain insights into the relationships
between input features and output predictions.
Computational Efficiency:
Once the model is trained, making predictions on new instances is usually computationally
efficient, as the model encapsulates the learned patterns without needing to store the entire
training dataset.
Common types of model based learnings.
Linear Models:
Linear regression and logistic regression are examples of model-based learning algorithms
that use linear relationships between input features and output predictions.
18
Decision Trees:
Decision tree-based models, such as Random Forests or Gradient Boosted Trees, are con-
structed during training to capture non-linear relationships and complex decision bound-
aries.
Neural Networks:
Deep learning models, such as artificial neural networks, use layered architectures to learn
hierarchical representations of data.
Support Vector Machines (SVM):
SVM is a model-based learning algorithm that finds the optimal hyperplane to separate
different classes in the feature space. Model-based learning is widely used in various ap-
plications, and the choice of a specific model depends on the characteristics of the data and
the nature of the problem being addressed.
2.11 Conclusion
In conclusion, machine learning can be broadly categorized into three main types: su-
pervised learning, unsupervised learning, and reinforcement learning. Supervised learn-
ing involves training a model on labeled data to make predictions or classifications. Un-
supervised learning explores patterns and relationships in unlabeled data, often through
clustering or dimensionality reduction. Reinforcement learning, inspired by behavioral
psychology, focuses on an agent learning to make decisions through trial and error, with
rewards or penalties shaping its behavior. Each type serves distinct purposes and applica-
tions, collectively contributing to the diverse and powerful landscape of machine learning.
As technology advances, the integration of these approaches continues to drive innova-
tion across various fields, promising a future where machines can adapt, reason, and learn
autonomously.
19
Chapter 3
Algorithms In Machine Learning
Machine Learning (ML) algorithms are the core components of machine learning system-
s. Machine Learning involves the use of algorithms to enable computers to learn patterns,
make decisions, and improve their performance on a task over time without being explicitly
programmed. There are various algorithms in machine learning, and they can be broadly
categorized into different types based on the nature of the task they are designed to solve.
Here are some common types of machine learning algorithms:
• Regression
• Support Vector Machines (SVM)
• Decision Trees
• Neural Networks
These are just a few examples, and there are many other algorithms and variations within
each category. The choice of algorithm depends on the specific task, the nature of the data,
and the goals of the machine learning project. Additionally, the field of machine learning
is dynamic, with ongoing research leading to the development of new algorithms and im-
provements to existing ones. Here, well discuss a few of them.
3.1 Regression
Regression means to predict, forecast and to assess the relationship between a dependent
and independent variable. More generally:
A statistical technique that relates a dependent variable to one or more independent vari-
ables. The goal of regression analysis is to understand the nature of the relationship between
variables and make predictions based on that understanding. A regression model is able to
show whether changes observed in the dependent variable are associated with changes in
one or more of the explanatory variables. Regression algorithms are used if there is a re-
lationship between the input and output variable. In deals in numerical values. There are
20
different types of regression analysis, and two major categories are simple regression and
multiple regression.
3.1.1 Simple Linear Regression
If a single independent variable is used to predict the value of a numerical dependent vari-
able, then such a linear regression algorithm is called simple linear regression. Simple
regression involves modeling the relationship between a dependent variable and a single
independent variable. The key point in linear regression is that the dependent variable must
be a continuous real value. The most common form is linear regression, where the relation-
ship is assumed to be a straight line. The equation for simple linear regression is:
y = mx + b (3.1.1)
where, y is the dependent variable x is the independent variable m is the slope, and b is the
intercept.
Mathematically, it can be expressed as:
Y = b0 + b1 x1 + b2 x2 + . . . + bn xn + ε (3.1.2)
3.1.2 Objectives of Simple Linear Regression
Simple linear regression has mainly two objectives:

• Model the relationship between two variables such as the relationship between experience
and salary etc.
• Forecasting new observations such as weather forecasting according to temperature, rev-
enue of a company according to investment in a year, etc.
3.2 Use of Linear Regression in Machine Learning
Linear regression is a fundamental and widely used algorithm in machine learning, partic-
ularly in the field of supervised learning. It is employed for tasks that involve predicting a
21
continuous outcome based on one or more input features. Here’s a brief overview of how
linear regression is used in machine learning:
Problem Formulation:
Linear regression is suitable for problems where the relationship between the input and the
output is assumed to be linear. The goal is to find the best-fitting straight line that mini-
mizes the difference between the predicted and actual values.
Training the Model:
The training process involves finding the values for the model parameters that minimize
the difference between the predicted and actual output values.
Evaluation:
Once the model is trained, it needs to be evaluated on new unseen data to assess its per-
formance. Common metrics for regression problems include mean squared error (MSE),
mean absolute error (MAE), and R-squared.
Prediction:
After successful training and evaluation, the model can be used to make predictions on new
data by inputting the features into the trained model.
Applications:
Linear regression is used in various fields for tasks such as predicting house prices, sales
forecasts, stock prices, and many other scenarios where a linear relationship is assumed
between input features and the target variable.
Assumptions:
Linear regression assumes that the relationship between variables is linear, the residuals are
normally distributed, and the variance of residuals is constant. Linear regression serves as a
foundational building block for more complex models, and its simplicity and interpretabil-
ity make it a valuable tool in many machine learning applications.
3.3 Multilinear Regression
Multilinear regression, often referred as multiple linear regression, is an extension of linear

regression to accommodate multiple input features. Here’s an overview of multiple linear
22
regression:
Figure 3.1: Multilinear Regression
Model Representation:
The multiple linear regression model can be represented as:
Y = b0 + b1 x1 + b2 x2 + . . . + bn xn + ε (3.3.1)
where, y is the target variable, are the coefficients for each feature, and ε represents the
error term.
Training the Model:
The training process involves finding the values for the coefficients that minimize the dif-
ference between the predicted and actual output values.
Matrix Notation:
The multiple linear regression equation can be written in matrix notation as:
Y = Xβ + ε (3.3.2)
where Y is a vector of target values, X is a matrix of input features, is a vector of coeffi-

cients, and ε is a vector of errors.
Implementation Of Multilinear Regression:
23
Figure 3.2: Multilinear Regression Code in Python
24
Assumptions:
Like simple linear regression, multiple linear regression assumes that the relationship be-
tween variables is linear, there is little or no multicollinearity, the residuals are normally
distributed, and the variance of residuals is constant.
Interpretability:
Coefficients represent the change in the target variable for a one-unit change in the corre-
sponding input feature, assuming other features are held constant.
Feature Scaling:
It is often recommended to scale or normalize input to ensure that all features contribute
equally to the model and to facilitate convergence during the training process.
3.3.1 Applications of Multilinear Regression
Multiple linear regression is applied in various domains for tasks such as predicting sales
based on advertising spending in multiple channels, predicting a person’s income based on
education, experience, and other factors, and many other scenarios where multiple features
contribute to the target variable. Multiple linear regression is a powerful tool for modeling
complex relationships between multiple variables and is commonly used in data analysis
and predictive modeling when dealing with datasets that have multiple influencing factors.
3.4 Decision Trees
A decision tree is a popular machine learning algorithm used for both classification and
regression tasks. It works by recursively partitioning the data into subsets based on the
values of input features, creating a tree-like structure of decisions.
25
Figure 3.3: Decision tree representation
3.4.1 Components of Decision Trees
Root Node:
The topmost node in the tree, which represents the entire dataset. It is split into two or more
child nodes based on the most significant feature.
Decision Nodes:
Nodes represent a decision based on the value of a particular feature. Each decision node
leads to further nodes or leaves.
Leaves:
Nodes that do not split further and represent the final output or decision. In a classification
tree, each leaf corresponds to a class label, while in a regression tree, it represents a numeric
value.
Splitting:
The process of dividing a node into two or more child nodes based on a chosen feature
and a splitting criterion (e.g., Gini impurity for classification or mean squared error for
regression).
Decision Criteria:
The criteria for planning at each node are determined by the algorithm during the training
process. For example, in a classification tree, it could be based on which feature and
26
threshold lead to the best separation of classes.
Recursive Process:
The process of splitting is applied recursively to each child node until a stopping criterion is
met. This could be a predefined depth of the tree, a minimum number of samples in a node,
or other criteria. Decision trees have several advantages, including interpretability, ease
of understanding, and the ability to handle both numerical and categorical data. However,
they can be prone to overfitting, especially when the tree is deep and captures noise in the
training data. Techniques like pruning and using ensemble methods (e.g., Random Forests)
can help mitigate this issue.
3.5 Implementation of Decision Trees
Figure 3.4: Decision Trees Code in Python
27
3.6 Conclusion
In conclusion, machine learning encompasses a diverse set of algorithms, each designed

for specific tasks. Linear regression serves as a foundational method for modeling relation-
ships between variables, while multilinear regression extends this capability to multiple
predictors. Decision trees, with their hierarchical structure, excel in classification tasks by
iteratively partitioning data based on feature conditions. These algorithms collectively con-
tribute to the versatility and adaptability of machine learning, providing effective tools for
modeling and decision-making across various domains.
28
Chapter 4
Support Vector Machine
4.1 Definition:
Support vector machine is an algorithm of supervised machine learning used to solve classi-
fication and regression problems. But mainly it is used in regression tasks, its main purpose
is to draw a boundary line between two sets of data. So, it is easy to classify that which
object belongs to which data set.
Figure 4.1: Support Vector Machine
Figure 4.1 shows two data set which are sets of triangle and sets of squares and between
them is a boundary line which is separating both classes of data.
4.2 Aim of Support Vector Machine:
The initial aim of support vector machine is to create a boundary line or decision boundary
between two classes of data. Its other purpose is to form a hyper plane and to maximize
the size of margins from boundary line to the nearest data point.
4.3 Components of Support Vector Machine:
• Hyperplanes:
It is a decision boundary or a subspace of a feature space which helps to separate the data
points into different classes. There are numerous number of decision boundaries in feature
29
space but the best decision boundary which classify data points is known as hyperplane.
Hyperplanes always formed where the margin is maximum between data points.
4.3.1 Characteristics of Hyperplane:
• N Dimensional Hyperplanes:
Usually we see support vector machine in 2-D plane, while hyperplane exist in N-Dimensional
spaces, where N is the number of features or dimensions in dataset. In 2-D hyperplane, it
is a line while in 3-D hyperplane it is a space.
• It is also represent in linear equation form
wT x + b = 0 (4.3.1)
Where w represents vector, x represents data points and b is bias term or intercepts which
shift hyperplane away from origin.
• Optimal Hyperplane:
In SVM, it is the one that has maximum margin while satisfying constraints that all data
points are correctly classified.
Figure 4.2: Hyperplanes
• Support Vectors:
These are data points closed to hyperplanes, which are critical in determining the hyper-
plane. They used to determine the position of the hyperplane and defining margins. These
points have some impact on the calculation of margins.
30
Figure 4.3: Support Vectors
• Margins:
It is the distance between the hyperplane and the nearest data points from either class each
side of the decision boundary. Support vector machine maximize the margin because a
longer margin leads to better generalization and lower risks the over fitting. As, shown in
figure below:
31
Figure 4.4: Margins
• Hard and Soft Margins:

It is the hyperplane which have maximum margin and classify the data points correctly
without any wrong classification. If data points are not correctly separated then SVM have
soft margin technique which makes the margin restriction soft and allow few violations or
misclassifications. In short, it slightly increases the margin and reduce violations.
Figure 4.5: Hard and Soft Margins
• Kernel Function:
Kernel function is useful for both linear separable data and nonlinear separable data. Its
basic function is to convert non separable data from lower dimension to higher dimension
32
space and make it easily separable by a linear hyperplane.
Figure 4.6: 1D TO 2D
Figure 4.7: 2D TO 3D
Figures illustrates how kernel works to separate inseparable data by increasing the di-
mension. Similarly, figure 4.7 illustrates conversion of inseparable data in 2-D plane to 3-D
space.
4.4 Types of Support Vector Machines:
Mainly SVM has two types and their significance is mentioned below:
• Linear Support Vector Machine:
In linear support vector machine, a linear straight line can distribute the data into separate
33
classes, when the data in already separated linearly then linear SVM is suitable, which
shows in 2-d plane that a single straight line differentiates the classes. The hyperplane
which maximize the margin between classes is called decision boundary.
• Non Linear Support Vector Machine:
In nonlinear support vector machine, the data is not separated rather it is in clustered form,
which is nearly impossible to separate the data with linear straight line as all the data sets
are mixed together. So, in this case, we use kernel function which separates the data by
converting it into higher dimension from lower dimension as described in the figure 4.7.
4.5 Working of Support Vector Machine:
Lets try to understand the working of SVM with an example. If we see a strange cat with
some of the features resembling with the dog and we want to know whether it is a cat or a
dog. In this case, we use the algorithms of support vector machine to know accurately what
it is. For predicting, we will train our model with an input having a lot of images of cats
and dogs through which the model will learn by analyzing the features of both animals and
test it with the object we want to know and then SVM creates a decision boundary between
two classes and analyze it thoroughly with data and try to resemble with features of both.
After that, with the help of support vectors, it will predict that it is a cat not a dog. SVM is
also used in various areas like detecting faces, recognizing images and categorizingtext.
4.6 Implementation of Support Vector Machine:
Import matplotlib.pyplot as plt

x = d f [0 SepalLengthCm0 ]
y = d f [0 PetalLengthCm0 ]
rosex = x[: 50]

rosey = y[: 50]
34
tulipx = x[50 :]
tulipy = y[50 :]
plt. f igure( f igsize = (7, 5))

plt.scatter(rosex , rosey , marker =0 +0 , color =0 green0 )
plt.scatter(tulipx ,tupily , marker =0 −0 , color =0 red 0 )
plt.show()
OUTPUT:
Figure 4.8: Result
4.7 Advantages of Support Vector Machine:
• Effectiveness In Higher Dimensions:

It is a type of data in which number of features are greater than the samples given to the
model. So, SVM is effectual in handling high dimensional data.
•Versatility:
Support vector machines are vast in field and solve many complex problems. It is applica-
35
ble to both regression and classification problems and have numerous functions to tackle
complex tasks.
• Handling of nonlinear data:
Support vector machine can handle nonlinear data easily by the use of kernel trick by
changing the dimensions of data.
• Effectuality of kernel function:
Kernel function is effective when we are dealing with nonlinear data which is not easily
separable. It helps in converting the data from lower dimensions to higher dimensions and
make it linear.
4.8 Disadvantages of Support Vector Machine:
• Scalability:
It is always difficult to train the model when the samples are in millions. This always seems
impractical due to memory shortage and some computational restrictions.
• Computational restrictions:
When dealing with large set of data computationally, it cost a lot and learning takes a lot of
time and memory requirement increases gradually.
• Lack of accuracy:
When working with large data sets support vector machine does not show accurate or good
results.
4.9 Conclusion:
In conclusion, we say that support vector machines are very versatile machine learning
algorithm widely use to solve regression and classification problems. SVMs main purpose
is to draw optimal hyperplane which maximizes the gap within data points and separate data
into accurate classes. Its versatility is dealing with nonlinear data by using the kernel tricks
to change the dimensions. But despite of all, SVM lacks in giving accurate results when
dealing with large datasets. But, again it is the most powerful tool in machine learning.
36
Chapter 5
Neural Networks
5.1 Definition:
Neural network is a model of machine learning which is based on the mixture of artificial
intelligence and human brain. This model is inspired by the working of human brain for
explicating information. They are the complex networks of interconnected neurons or n-
odes to solve difficult problems by learning from training data, then testing it and giving
output.
Figure 5.1: Simple neural network
5.2 Applications of Neural Networks:
Nowadays phenomenon neural networks are using behind many complex algorithms like
• Detecting faces and understanding spoken language in different applications like voice
assistance.
• Language recognition.
• Handwriting recognition by analyzing how you write different alphabets and numbers.
• Predicting stock rates, fraud prediction.
• In medical, diagnosing diseases and viewing different reports and images like X-rays,
37
Ct-scans etc.
5.3 Structure of neural networks:
As, we have discussed earlier that neural networks are the complex networks of intercon-
nected neurons or nodes. It has input and output layers. The information passes between
different nodes through interconnected links. These links have different weights and hid-
den layers between them. As, shown in figure 5.2
5.3.1 Components of neural networks:
• Neuron/nodes
• Input layer
• Output layer
• Weights
• Hidden layers
• Loss calculations
• Adjusting weights
• Activation function
• Training
• Gradient descent
38
Figure 5.2: Components
5.4 Working of neural networks:
Lets take a real world example of email detection that how neural network works. It is
discussed above that we have input and output layers in our neural network. All emails first
go through the input layer. This layer checks the senders information, subject and content.
Then by training all the data, it will decide whether the email is spam or not. Now, with
the help of binary activation function, output layer will detect its legitimate or not. It is a
difficult network based on few features our brain have. It has three types of layers; input
layer, output layer and hidden layers. Mainly this process comprises of two stages:
Forward propagation and backwardpropagation.
5.4.1 Forward propagation:
In forward propagation, it proceed input data in forward direction through hidden lay-
ers and these hidden layers fed data to successive layers. It has following components:
• Neurons/nodes:
These are data processing units which passes data through each other.
• Input layer:
It consist of different nodes which receives data and passes it to hidden layers.
39
• Output layer:
It gives us the output after processing all the input data through hidden layers.
• Weights:
It tells us about the strength or how strong is the connection between two nodes. The weight
assigned to each layer or connection is not constant. It varies throughout the process.
• Hidden layers:
These are most the important component of neural networks. They help neural networks
to learn from complex input data and give accurate results. Its quantity depends upon how
complex the problem is.
5.4.2 Backward propagation:
In backward propagation, we move backward from output node towards input node by test-
ing errors. It is very helpful in increasing the accuracy of our predictions. Its components
are given below:
• Loss calculations:
It tells us the difference between targeted output and the predicted output. In regression
problems, we call it as mean squared error. Mathematically, it is defined as
1 n
MSE = ∑ (yi − ybi )2 (5.4.1)
n i=1
• Adjusting weights:
We adjust the weight of every connection between two nodes by using backward propaga-
tion across our neural network.
• Training:
We have a large samples of data during training. Throughout the process forward propaga-
tion, backward propagation and loss calculation occurs again and again so that our network
learns the data pattern.
• Activation Function:
Activation function performs nonlinear transformation to our input so that our input learns
40
and solve complex problems. Otherwise it will be a simple linear regression problem.
• Gradient Descent:
This function helps in reducing the inaccuracy and loss of our output by changing weights
and weights are changed by taking derivative of loss w.r.t each weight.
5.5 Implementation of neural networks:
Import numpy as np
# Updated values for X and Y
X = np.array([[0.5, 1.0, 1.5],[1.5, 2.0, 0.5],[1.0, 2.5, 1.5]])
Y = np.array([[0.2, 0.1, 0.3]])
Y = Y.T
sigm = 2
delt = np.random.random((3, 3)) - 1
# Training Loop
for j in range(100):
m1 = (y - (1 / (1 + np.exp(-(np.dot(1 / (1 + np.exp(-(np.dot(X, sigm)))), delt)))))) * ( (1 / (1
+ np.exp(-(np.dot(1 / (1 + np.exp(-(np.dot(X, sigm)))), delt)))))) * ( 1 - (1 / (1 + np.exp(-
(np.dot(1 / (1 + np.exp(-(np.dot(X, sigm)))), delt))))))
m2 = m1.dot(delt.T) * (1 / (1 + np.exp(-(np.dot(X, sigm))))) * ( 1 - (1 / (1 + np.exp(-

(np.dot(X, sigm))))))
delt = delt + (1 / (1 + np.exp(-(np.dot(X, sigm))))).T.dot(m1)

sigm = sigm + X.T.dot(m2)
# Print the final output

output = 1 / (1 + np.exp(-(np.dot(X, sigm))))
print(output)
41
Output:
 
0.99753435 0.99754312 0.99757271
 
 
 0.9996661 0.99966809 0.99967704
 
0.99995482 0.99995515 0.99995639
5.6 Learning in neural networks:
• Supervised learning
• Unsupervised learning
• Reinforcement learning
Supervised learning in neural networks:

As we know in supervised learning, the instructor gives both input and output data to the
network. Then that network makes prediction for the outputs on the bases of inputs fed to
it. After getting outputs, we compare it with those which the instructor gave it. If errors
occurred, then we try to reduce the errors until the output results match the desired out-
comes.
Unsupervised learning in neural networks:
In unsupervised learning, the data is unlabeled and only inputs are given to the network.
With the help of this unlabeled data and inputs, network makes a new model to carry out
predictions. Likewise supervised learning, there is no instructor present this time to train
the data. The new model network creates train and test data and make predictions.
Reinforcement learning in neural networks:
In reinforcement learning, model have to train itself and learn on its own. In this learning,
model is directly getting feedback in terms of penalties and rewards. It has little similari-
ties with supervised learning but different in a case that it gets deep analysis of the model
working, rather than getting desired outputs.
42
5.7 Types of neural networks:
• Recurrent neural networks (RNN):

This type of neural networks are designed for sequential sort of data. These are useful in
time series prediction, text sentences and natural language processing. One important point
about recurrent neural networks is that it only receives the new input after the previous in-
puts have already received and pass into the hidden layers. But it is not necessary that all
the inputs in first layer achieves the output.
• Convolutional neural networks (CNN:)
Convolutional neural networks are the most successful among all other neural networks.
They play important role in detecting object, image analyzation and text processing. This
network trains itself by recognizing the features of the object and on the bases of that, this
network tells us which object or whom image is this.
• Feedforward networks:
It is the simplest neural network which passes input data from input layer to the output in
single direction. It is useful in number of applications like pattern recognition and regres-
sion.
• Long short term memory (LSTM):
It is the type of recurrent neural networks which helps in removing the inaccuracy from our
network usually while training process. It also read, write and erase information by using
memory cells.
• Multilayer perceptron (MLP):
It is a globally recognized algorithm initially used for images recognition. The word per-
ceptron is derived from perception which is the human feature of recognizing and perceiv-
ing images. It is a type of feedforward neural network and has more than three layers input
layer, output layer and one or more hidden layers.
43
5.8 Advantages of neural networks:
• Multiple processing:
Neural networks have the ability to perform multiple tasks which means it can solve more
than one problem at a time.
• Fault tolerence:
Neural networks have ability to tolerate fault. If one or more node is faulty, it does not
affect the working of whole network model.
• Non-linearity:
It converts our model into non-linear which helps data to learn and solve more complex
problems easily.
• Processing of unorganized data:
Neural networks have ability to sort and categorize large amount of data by processing it.
• Pattern recognition:
Neural networks are very effective in image recognition, natural language processing and
analyzing many other data patterns.
• Versatility:
Neural networks are very fast and efficient in learning new data. They are useful where
links between inputs and output are not defined well.
5.9 Disadvantages of neural networks:
• Requirement of large data:

Neural networks required largely labeled data for effective training. If not, then the perfor-
mance of the network may get effected.
• Inaccuracy in outputs:
Neural networks required perfect training of the model. If it is not trained properly then we
may get inaccurate results.
• Black box nature:
44
Without black box model in neural networks, how prediction are being made and how data
is categorized becomes difficult to understand.
• Hardware dependent:
Neural networks required good processors for making the model reliable and to make per-
formance of network smooth, which shows that they are highly dependent upon hardware.
5.10 Conclusion:
In conclusion, neural networks are one of the powerful tools in machine learning which is
inspired by the ability of human brain to learn and adopt. It is very successful in many areas
from recognizing images to natural language processing which shows how vast it become
in less time. It has very important role in transforming artificial intelligence in future by
having an ability to solve complex problems across various fields.
45
Chapter 6
Problem Solving
These exercise problems are taken from the book ”Hands-on Machine Learning with Scikit-
Learn, Keras, and TensorFlow by Aurelien Geron.” Chapter 02.
6.1 Problem 01
Try a Support Vector Machine regressor (sklearn.svm.SVR), with various hyper- param-
eters such as kernel=”linear” (with various values for the C hyperpara- meter) or ker-
nel=”rbf” (with various values for the C and gamma hyperparameters). Dont worry about
what these hyperparameters mean for now. How does the best SVR predictor perform?
Solution:
46
Figure 6.1: SVR Model
47
48
49
Figure 6.4: Results
6.2 Problem 02
Try replacing GridSearchCV with RandomizedSearchCV.

Solution:
50
Figure 6.5: GridsearchCV Model
51
Figure 6.6: GridsearchCV Model
52
Figure 6.7: Results
6.3 Problem 03
Try adding a transformer in the preparation pipeline to select only the most important at-
tributes.
53
Solution:
Figure 6.8: Pipeline Model
54
Figure 6.9: Pipeline Model
55
Figure 6.10: Results
56
6.4 Problem 04
Try creating a single pipeline that does the full data preparation plus the final prediction.
Solution:
Figure 6.11: Single Pipeline
57
58
59
60
6.5 Problem 05
Automatically explore some preparation options using GridSearchCV.

Solution:
Figure 6.16: Model for GridsearchCV
61
Figure 6.17: Model for GridsearchCV
62
63
References
[1] Neural networks and deep learning by Charu.C.Agarwal
[2] Hands-on Machine Learning with Scikit-Learn, Keras, and TensorFlow by Aurelien
Geron
[3] https://www.techtarget.com/searchenterpriseai/definition/neural-network
[4] https://www.analyticsvidhya.com/blog/2022/01/introduction-to-neural-
networks/#h-how-does-a-neural-network-work
[5] https://www.geeksforgeeks.org/neural-networks-a-beginners-guide/
[6] https://www.baeldung.com/cs/hidden-layers-neural-network
[7] https://towardsdatascience.com/forward-propagation-in-neural-networks-
simplified-math-and-code-version-bbcfef6f9250
[8] https://www.sciencedirect.com/topics/computer-science/artificial-neural-network
[9] https://www.techtarget.com/searchenterpriseai/definition/backpropagation-
algorithm
[10] https://towardsdatascience.com/loss-functions-and-their-use-in-neural-networks-
a470e703f1e9
64
[11] https://www.javatpoint.com/unsupervised-artificial-neural-networks
[12] https://towardsdatascience.com/multilayer-perceptron-explained-with-a-real-life-
example-and-python-code-sentiment-analysis-cb408ee93141
[13] https://towardsdatascience.com/multilayer-perceptron-explained-with-a-real-life-
example-and-python-code-sentiment-analysis-cb408ee93141
[14] https://www.geeksforgeeks.org/support-vector-machine-algorithm
[15] https://uedufy.com/calculate-multiple-linear-regression-using-spss
[16] https://www.javatpoint.com/machine-learning-support-vector-machine-algorithm
65

Final Thesis

Uploaded by

Document Information

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Final Thesis

Uploaded by

Copyright:

Available Formats

Machine Learning

From Roots To Networks

COMSATS University Islamabad

Dr. Adeel Farooq

COMSATS University Islamabad

This thesis is submitted to the department of Mathematics in partial fulfill-

Student’s name Registration Number

M. Waqar Zain CIIT/SP20-BSM-018/LHR

Arfa Farooq CIIT/SP20-BSM-028/LHR

Dr. Adeel Farooq Dr.

Machine Learning From Roots To Networks

For COMSATS University Islamabad, Lahore Campus.

Dr. External Examinor

Dr. Adeel Farooq

Prof. Dr. Muhammad Hussain

We M. Waqar Zain (SP20-BSM-018) and Arfa Farooq (SP20-BSM-028) hereby declare

Muhammad Waqar Zain

It is certified that M. Waqar Zain CIIT/SP20-BSM-018/LHR and Arfa Farooq CIIT/SP20-

Dr. Adeel Farooq

We dedicate our thesis to our parents, honourable teachers and

Praise to be ALLAH, the Cherisher and Lord

Machine Learning From Roots To Networks

1 The Machine Learning Landscape . . . . . . . . . . . . . . . . . . . . . . 1

2 Machine Learning And Its Types . . . . . . . . . . . . . . . . . . . . . . . 5

3 Algorithms In Machine Learning . . . . . . . . . . . . . . . . . . . . . . . 20

4 Support Vector Machine . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29

Figure1.1 Email Spamming . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2

Figure2.1 Types of Supervised Learning . . . . . . . . . . . . . . . . . . . . . . . 6

Figure3.1 Multilinear Regression . . . . . . . . . . . . . . . . . . . . . . . . . . . 23

Figure4.1 Support Vector Machine . . . . . . . . . . . . . . . . . . . . . . . . . . 29

Figure5.1 Simple neural network . . . . . . . . . . . . . . . . . . . . . . . . . . . 37

Figure6.1 SVR Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 47

Table 2.1 Teacher’s Information . . . . . . . . . . . . . . . . . . . . . . . . . . . 9

S = (x1 , y1 ), (x2 , y2 ), ...(xn , yn ) (1.1.1)

From automatic recommendations of which movies to watch, to what food to order

1.2 Components of Machine Learning

1.3 Why Use Machine Learning?

Machine learning is used to solve diverse problems, enhance decision-making processes,

1.4 Data Mining

2.1 Supervised Machine Learning

2.1.1 Implementation of Linear Regression:

2.1.2 Real Life Example Of Linear Regression:

Figure 2.4: Types of Unsupervised Learning

Teacher’s ID Age Experience (in years) Gender Income (Rs)

Table 2.1: Teacher’s Information

2.3 Semi-supervised Machine Learning

Semi-supervised learning is a machine learning paradigm that combines elements of both

2.4 Self-supervised learning:

2.5 Reinforcement Learning

2.6 Batch Learning

2.7 Online Learning

2.8 Instance-based learning

2.9 Lazy Learning

Instance-based learning is referred as lazy learning because it delays the processing or

3.1.1 Simple Linear Regression

3.1.2 Objectives of Simple Linear Regression

Simple linear regression has mainly two objectives:

3.2 Use of Linear Regression in Machine Learning

3.3 Multilinear Regression

Multilinear regression, often referred as multiple linear regression, is an extension of linear

Figure 3.1: Multilinear Regression

where Y is a vector of target values, X is a matrix of input features, is a vector of coeffi-

3.3.1 Applications of Multilinear Regression

3.4 Decision Trees