Cs601pc - Machine Learning Unit - 2-1

You might also like

Download as pdf or txt
Download as pdf or txt
You are on page 1of 148

B.Tech.

III YEAR II Semester

MACHINE LEARNING
CS601PC: MACHINE LEARNING

DR DV RAMANA,
DATA STRATEGIST –CONSULTANT
AND

DVR
ACADEMIC ADVISOR
MACHINE LEARNING

MACHINE LEARNING
Prerequisites

Data Structures
Knowledge on statistical methods
.

DVR
MACHINE LEARNING

MACHINE LEARNING
Course Objective

This course explains machine learning techniques such as decision tree learning, Bayesian learning etc

To understand computational learning theory.


.
To study the pattern comparison techniques

DVR
MACHINE LEARNING

MACHINE LEARNING
Course Outcomes

Understand the concepts of computational intelligence like machine learning

Ability to get the skill to apply machine learning techniques to address the real time problems in different areas
.
Understand the Neural Networks and its usage in machine learning application.

DVR
MACHINE LEARNING

MACHINE LEARNING
References

TEXT BOOK
Machine Learning – Tom M. Mitchell, - MGH
.
REFERENCE BOOK:
Machine Learning: An Algorithmic Perspective, Stephen Marshland, Taylor & Francis

DVR
MACHINE LEARNING

MACHINE LEARNING
Agenda – Unit- II
Artificial Neural Networks - 1

Introduction - Artificial Neural Networks

Neural network representation


Appropriate problems for neural network learning

Perceptions
Multilayer networks and the back-propagation algorithm

Artificial Neural Networks - 2

Remarks on the Back-Propagation algorithm

An illustrative example: Face recognition

Advanced topics in artificial neural networks.

DVR
MACHINE LEARNING

MACHINE LEARNING
Agenda – Unit- II
Evaluation Hypotheses

Motivation

Estimation hypothesis accuracy


Basics of sampling theory

A general approach for deriving confidence intervals


Difference in error of two hypotheses
Comparing learning algorithms

DVR
MACHINE LEARNING
Artificial Neural Networks - 1

DVR
MACHINE LEARNING
INTRODUCTION

Artificial neural networks (ANNs) provide a general, practical method


for learning real-valued, discrete-valued, and vector-valued target
functions from examples.

DVR
9
Introduction - Artificial Neural Networks

MACHINE LEARNING
DVR
MACHINE LEARNING
Biological Motivation

• The study of artificial neural networks (ANNs) has been inspired by the
observation that biological learning systems are built of very complex webs of
interconnected Neurons
• Human information processing system consists of brain neuron: basic building
block cell that communicates information to and from various parts of body
• Simplest model of a neuron: considered as a threshold unit –a processing element
(PE)
• Collects inputs & produces output if the sum of the input exceeds an internal
threshold value

DVR
11
12

DVR MACHINE LEARNING


13

DVR MACHINE LEARNING


14

DVR MACHINE LEARNING


15

DVR MACHINE LEARNING


16

DVR MACHINE LEARNING


Introduction - Artificial Neural Networks

MACHINE LEARNING
DVR
MACHINE LEARNING
Facts of Human Neurobiology

• Number of neurons ~ 1011


• Connection per neuron ~ 10 4 – 5
• Neuron switching time ~ 0.001 second or 10 -3
• Scene recognition time ~ 0.1 second
• 100 inference steps doesn’t seem like enough
• Highly parallel computation based on distributed representation

DVR
18
Neuron

19

DVR MACHINE LEARNING


Neuron

20

DVR MACHINE LEARNING


Neuron

21

DVR MACHINE LEARNING


Neuron

22

DVR MACHINE LEARNING


MACHINE LEARNING
NEURAL NETWORK REPRESENTATIONS

DVR
23
24

DVR MACHINE LEARNING


MACHINE LEARNING
• A prototypical example of ANN learning is provided by Pomerleau's (1993)
system ALVINN, which uses a learned ANN to steer an autonomous vehicle
driving at normal speeds on public highways.

• The input to the neural network is a 30x32 grid of pixel intensities obtained from
a forward-pointed camera mounted on the vehicle.

• The network output is the direction in which the vehicle is steered.

DVR
25
Introduction - Artificial Neural Networks

MACHINE LEARNING
DVR
Introduction - Artificial Neural Networks

MACHINE LEARNING
DVR
Introduction - Artificial Neural Networks

MACHINE LEARNING
DVR
Introduction - Artificial Neural Networks

MACHINE LEARNING
DVR
Introduction - Artificial Neural Networks

MACHINE LEARNING
DVR
Introduction - Artificial Neural Networks

MACHINE LEARNING
DVR
Introduction - Artificial Neural Networks

MACHINE LEARNING
DVR
DVR MACHINE LEARNING
Introduction - Artificial Neural Networks

MACHINE LEARNING
Artificial neural networks (ANNs) provide a general, practical method for learning

Real-valued

Discrete-valued, and

Vector-valued functions from examples

Algorithms such as Back Propagation and Gradient Descent to tune network


parameters to best fit a training set of input-output pairs

ANN learning is robust to errors in the training data and has been successfully
applied to problems such as

Interpreting visual scenes

Speech recognition, and

Learning robot control strategies

DVR
Introduction - Artificial Neural Networks

MACHINE LEARNING
ANNs are computational models which work similar to the functioning of a human nervous
system.

There are several kinds of ANNs.

These type of networks are implemented based on the mathematical operations and a set
of parameters required to determine the output

NNs can be

Biological models

Artificial models

Desire to produce artificial systems capable of sophisticated computations similar to the


human brain

DVR
Introduction - Artificial Neural Networks

MACHINE LEARNING
Brain vs Digital Computers

Computers require hundreds of cycles to simulate a firing of a neuron

The brain can fire all the neurons in a single step

Parallelism

Serial computers require billions of cycles to perform some tasks but the
brain takes less than a second

Example
Face Recognition

DVR
Introduction - Artificial Neural Networks

MACHINE LEARNING
Human Brain
• 1011 Neurons
• On average, each connected to 104 others
• The fastest neuron switching times are known to be
on the order of 10-3 seconds--quite slow compared to
computer switching speeds of 10-10 seconds.
• It requires approximately 10-1 seconds to visually
recognize mother.
• Highly parallel processes on distributed environment.

DVR
Introduction - Artificial Neural Networks

MACHINE LEARNING
• ANNs are loosely motivated by biological neurons.
• ANNs whose individual units output a single constant value, whereas
biological neurons output a complex time series of spikes.
• One set of researchers are using ANNs to study biological learning
processes.
• Other set of researchers are using ANNs to build Machine Learning
algorithms without considering of mirroring of biological process.

DVR
Introduction - Artificial Neural Networks

MACHINE LEARNING
Connectionist Models

Consider Humans:

Neuron Switching time ~.001

Number of neurons ~1010

Connections per neuron ~ 104-5

Scene recognition time ~.1 sec

100 inference steps does'nt seem like enough

DVR
Introduction - Artificial Neural Networks

MACHINE LEARNING
Properties of Neural Networks

Many neuron-like threshold switching units

Many weighted interconnections among units

Highly parallel , distributed process

Emphasis on tuning weights automatically

Input is a high-dimensional discrete or real-valued (Example: SENSOR INPUT)

DVR
Introduction - Artificial Neural Networks

MACHINE LEARNING
When to consider Neural Networks?

Input is a high-dimensional discrete or real-valued(e.g Sensor Input)

Output is discrete or real-valued

Output is a vector of values

Possibly noisy data

Form of target function is unknown

Human readability of result is unimportant

Examples:

Speech phoneme recognition

Image classification

Financial Position

DVR
Introduction - Artificial Neural Networks

MACHINE LEARNING
Neuron
Neurons, also known as nerve cells, send and receive signals from your brain
While neurons have a lot in common with other types of cells, they're
structurally and functionally unique
Specialized projections called axons allow neurons to transmit electrical
and chemical signals to other cell

Parts of a Neuron

DVR
Introduction - Artificial Neural Networks

MACHINE LEARNING
Characteristics of Artificial Neural Network

Large number of very simple processing neuron-like processing elements

Larger number of weighted connections between the elements

Distributed representation of knowledge over the connections

Knowledge is acquired by network through a learning process.

DVR
Introduction - Artificial Neural Networks

MACHINE LEARNING
Application of ANN

Controlling the movements of a robot based on self- perception and other


information
Deciding the category of potential food items in an artificial world

Recognizing a visual object

Predicting where a moving object goes , when robot wants to catch it

DVR
Introduction - Artificial Neural Networks

MACHINE LEARNING
Strengths of Artificial Neural Network

1. Greatest power of Neural Networks is that is endowed with a finite number of


hidden units, can yet approximate any continuous function to any desired degree
of accuracy.

This has been commonly referred to as the property of


universal approximate.

2. No prior knowledge of the data generating process is needed for


Implementing Neural Network(NN)

3. Problem of model misspecification does not occur

4. No prior knowledge of the data generating process is needed for


Implementing Neural Network(NN)

Adaptive Learning

An ability to learn how to do tasks based on the data given for training or initial

DVR
experience
Introduction - Artificial Neural Networks

MACHINE LEARNING
Strengths of Artificial Neural Network

Self Organization

An Artificial Neural Networks(ANN) can create its own organization or


representation of the information it receives during learning time

DVR
Introduction - Artificial Neural Networks

MACHINE LEARNING
Weakness of Artificial Neural Network

1. The addition of too many hidden units incites the problem of overfitting the data

2. The construction of the Neural Networks(NN) model can be a time consuming process

DVR
Neural Network Representation

MACHINE LEARNING
Neural networks are a biologically-inspired algorithm that attempt to mimic the functions of neurons
in the brain

Each neuron acts as a computational unit, accepting input from the dendrites and outputting signal
through the axon terminals

Actions are triggered when a specific combination of neurons are activated

Single layer Neural Network is a Neural network with one input layer, one hidden layer and the output
layer, which is a single node layer, and it is responsible for generating the predicted value y^

DVR
Neural Network Representation

MACHINE LEARNING
We have the following parts of the neural network:

x1,x2 and x3 are inputs of a Neural Network. These elements are scalars and they
are stacked vertically. This also represents an input layer

Variables in a hidden layer are not seen in the input set. Thus, it is called a hidden
layer

The output layer consists of a single neuron only and y^ is the output of the neural
network

DVR
Neural Network Representation

MACHINE LEARNING
We have the following parts of the neural network:

x1,x2 and x3 are inputs of a Neural Network. These elements are scalars and they
are stacked vertically. This also represents an input layer

Variables in a hidden layer are not seen in the input set. Thus, it is called a hidden
layer

The output layer consists of a single neuron only and y^ is the output of the neural
network

In the training set we see what the inputs are and we see what the output should be. But the things
in the hidden layer are not seen in the training set, so the name hidden layer just means you don’t

DVR
see it in the training set.
NEURAL NETWORK REPRESENTATIONS

MACHINE LEARNING
DVR
Appropriate problems for neural network learning

MACHINE LEARNING
ANN learning is well-suited to problems in which the training data corresponds to
noisy, complex sensor data, such as inputs from cameras and microphones
The BACKPROPAGATION algorithm is the most commonly used ANN learning
technique
ANN learning technique is appropriate for problems with the following
characteristics:
Instances are represented by many attribute-value pairs

The target function output may be discrete-valued, real-valued, or a


vector of several real- or discrete-valued attributes

The training examples may contain errors.

The ability of humans to understand the learned target function is not


important.

DVR
Appropriate problems for neural network learning

MACHINE LEARNING
Instances are represented by many attribute-value pairs

The target function to be learned is defined over instances that can be described by
a vector of predefined features, such as the pixel values in the ALVINN example

These input attributes may be highly correlated or independent of one another


Input values can be any real values.

DVR
Appropriate problems for neural network learning

MACHINE LEARNING
The target function output may be discrete-valued, real-valued, or a vector of several real-
or discrete-valued attributes

The value of each output is some real number between 0 and 1, which in this case
corresponds to the confidence in predicting the corresponding steering direction

We can also train a single network to output both the steering command and
suggested acceleration, simply by concatenating the vectors that encode these two
output predictions

DVR
Appropriate problems for neural network learning

MACHINE LEARNING
Instances are represented by many attribute-value pairs

DVR
Appropriate problems for neural network learning

MACHINE LEARNING
The training examples may contain errors.

ANN learning methods are quite robust to noise in the training data

Long training times are acceptable

Network training algorithms typically require longer training times than,


say, decision tree learning algorithms

Training times can range from a few seconds to many hours, depending on
factors such as the number of weights in the network, the number of
training examples considered, and the settings of various learning
algorithm parameters

DVR
Appropriate problems for neural network learning

MACHINE LEARNING
Fast evaluation of the learned target function may be required

Although ANN learning times are relatively long, evaluating the learned
network, in order to apply it to a subsequent instance, is typically very fast

Example

ALVINN applies its neural network several times per second to continually
update its steering command as the vehicle drives forward

The ability of humans to understand the learned target function is not important.

The weights learned by neural networks are often difficult for


humans to interpret

Learned neural networks are less easily communicated to humans than


learned rules

DVR
Perceptron

MACHINE LEARNING
A Perceptron is an Artificial Neuron

Perceptron is the simplest possible Neural Network Neural Networks are the building blocks of
Machine Learning

A perceptron is a simple model of a biological neuron in an artificial neural network

Perceptron is also the name of an early algorithm for supervised learning of binary classifiers

A Perceptron is a neural network unit that does certain computations to detect features or
business intelligence in the input data

Perceptron is a function that maps its input “x,” which is multiplied by the learned weight
coefficient, and generates an output value ”f(x)

Perceptrons could simulate brain principles, with the ability to learn and make decisions

DVR
Perceptron

MACHINE LEARNING
The original Perceptron was designed to take a number of binary inputs, and produce one binary
output (0 or 1)

The idea was to use different weights to represent the importance of each input, and that the
sum of the values should be greater than a threshold value before making a decision like true
or false (0 or 1).

DVR
Perceptron

MACHINE LEARNING
Perceptron Example
Imagine a perceptron (in your brain)
The perceptron tries to decide if you should go to a
concert

Is the artist good? Is the weather good?

What weights should these facts have?

DVR
Perceptron

MACHINE LEARNING
The Perceptron Algorithm Frank Rosenblatt suggested this algorithm:
Set a threshold value

Multiply all inputs with its weights

Sum all the results

Activate the output


1. Set a threshold value:
Threshold = 1.5
2. Multiply all inputs with its weights:
x1 * w1 = 1 * 0.7 = 0.7
x2 * w2 = 0 * 0.6 = 0
x3 * w3 = 1 * 0.5 = 0.5
x4 * w4 = 0 * 0.3 = 0
x5 * w5 = 1 * 0.4 = 0.4
3. Sum all the results:
0.7 + 0 + 0.5 + 0 + 0.4 = 1.6 (The Weighted Sum)
4. Activate the Output:

DVR
Return true if the sum > 1.5 ("Yes I will go to the Concert")
Perceptron

DVR MACHINE LEARNING


Perceptron

DVR MACHINE LEARNING


Perceptron

DVR MACHINE LEARNING


Reprentational Power of Perceptron's

MACHINE LEARNING
DVR
Reprentational Power of Perceptron's

MACHINE LEARNING
DVR
The Perceptron Training Rule

MACHINE LEARNING
The learning problem is to determine a weight vector that causes the perceptron to produce
the correct +1 or -1 output for each of the given training examples

To learn an acceptable weight vector:

Begin with random weights, then iteratively apply the perceptron to each training example,
modifying the perceptron weights whenever it misclassifies an example

This process is repeated, iterating through the training examples as many times as needed
until the perceptron classifies all training examples correctly.

Weights are modified at each step according to the perceptron training rule, which revises the
weight wi associated with input xi according to the rule

DVR
The Perceptron Training Rule

MACHINE LEARNING
DVR
Limitations of the perceptron

MACHINE LEARNING
While the perceptron classified the instances of AND/OR/NOT well, the model has
limitations.

Linear models like the perceptron with a Heaviside activation function are not universal
function approximators ; they cannot represent some functions.

Specifically, linear models can only learn to approximate the functions for linearly
separable datasets.

The linear classifiers that we have examined find a hyperplane that separates the
positive classes from the negative classes;

if no hyperplane exists that can separate the classes, the problem is not linearly
separable.

DVR
69
Limitations of the perceptron

MACHINE LEARNING
DVR
70
Multilayer networks and the back-propagation algorithm

MACHINE LEARNING
Multilayer networks solve the classification problem for non linear sets by employing hidden layers, whose neurons
are not directly connected to the output.

The additional hidden layers can be interpreted geometrically as additional hyper-planes, which enhance the
separation capacity of the network

Backpropagation (backward propagation) is an important mathematical tool for improving the accuracy of
predictions in data mining and machine learning. Essentially,

Backpropagation is an algorithm used to calculate derivatives quickly.

The backpropagation algorithm performs learning on a multilayer feed-forward neural network

Backpropagation algorithm iteratively learns a set of weights for prediction of the class label of tuples

A multilayer feed-forward neural network consists of an input layer, one or more hidden layers, and an output
layer

Backpropagation is a common method for training a neural network.

Backpropagation is a method to calculate the gradient of the loss function with respect to the weights in an
artificial neural network.

Backpropagation is commonly used as a part of algorithms that optimize the performance of the network by

DVR
adjusting the weights, for example in the gradient descent algorithm.
Back propagation algorithm

MACHINE LEARNING
The Back propagation algorithm in neural network computes the gradient of the loss function for a single weight
by the chain rule

Back propagation algorithm efficiently computes one layer at a time, unlike a native direct computation

Back propagation algorithm computes the gradient, but it does not define how the gradient is used

Back-propagation learning algorithm computes the mean-squared error of the difference between the desired
output and the real output value and adjusts the weight and bias coefficients until the mean-squared error
function reaches a predetermined minimum value

Steps involved in Backpropagation:


Step – 1: Forward Propagation
Step – 2: Backward Propagation
Step – 3: Putting all the values together and calculating the updated weight value.

DVR
Back propagation algorithm

MACHINE LEARNING
Backpropagation is an essential mechanism by which neural networks get trained

Backpropagation is a mechanism used to fine-tune the weights of a neural network (otherwise referred to as a
model in this article) in regards to the error rate produced in the previous iteration

Backpropagation is similar to a messenger telling the model if the net made a mistake or not as soon as it
predicted

Backpropagation is a mechanism used to fine-tune the weights of a neural network (otherwise referred to as a
model in this article) in regards to the error rate produced in the previous iteration

Backpropagation in neural networks is about the transmission of information and relating this information to the
error generated by the model when a guess was made

DVR
Backpropagation method seeks to reduce the error, which is otherwise referred to as the loss function
How does backpropagation work

MACHINE LEARNING
Back propagation works. It has four layers: input layer, hidden layer, hidden layer II and final output layer.

Main three layers are: Input layer


Hidden layer
Output layer
Each layer has its own way of working and its own way to take action such that we are able to get the desired results
and correlate these scenarios to our conditions.
This image summarizes the functioning of the
backpropagation approach.
Input layer receives x
Input is modeled using weights w
Each hidden layer calculates the output and data is ready at
the output layer
Difference between actual output and desired output is
known as the error
Go back to the hidden layers and adjust the weights so that
this error is reduced in future runs
This process is repeated till we get the desired output.

DVR
The training phase is done with supervision. Once the model is stable, it is used in production.
Back Propagation Algorithm

MACHINE LEARNING
The Backpropagation algorithm learns the weights for a multilayer network, given a network with a
fixed set of units and interconnections

Backpropagation employs gradient descent to attempt to minimize the squared error between the
network output values and the target values for these outputs

Because we are considering networks with multiple output units rather than single units as before,
we begin by redefining E to sum the errors over all of the network output units

Where outputs is the set of output units in the network, and tkd and okd are the target and output
values associated with the kth output unit and training example d

DVR
Back Propagation Algorithm

MACHINE LEARNING
DVR
Back Propagation Algorithm

MACHINE LEARNING
DVR
Back Propagation Algorithm

MACHINE LEARNING
DVR
Back Propagation Algorithm

MACHINE LEARNING
DVR
Back Propagation Algorithm

MACHINE LEARNING
DVR
Back Propagation Algorithm

MACHINE LEARNING
DVR
Back Propagation Algorithm

MACHINE LEARNING
DVR
Back Propagation Algorithm

MACHINE LEARNING
DVR
Back Propagation Algorithm

MACHINE LEARNING
DVR
Remarks on the Back-Propagation algorithm

MACHINE LEARNING
Convergence and Local Minima

Representational power of feed forward networks

Hypothesis space search and inductive bias

Hidden Layer representation

Generalization Overfitting and stopping criterion

Functions

Boolean

Continuous

Arbitrary

DVR
Remarks on the Back-Propagation algorithm

MACHINE LEARNING
Convergence and Local Minima

1.The Backpropagation multilayer networks is only guaranteed to converge towards some local
minimum in E and not necessarily to the global minimum error.

Despite the lack of assured convergence to the global minimum error, Backpropagation is a
highly effective function approximation method in practice

Local minima can be gained by considering the manner in which network weights evolve as the
number of trainings iterations increases

Backpropagation over multilayer networks is only guaranteed to converge towards some local
minimum in "E" and not necessarily to the global minimum error
When gradient descent falls into a local minimum with respect to one of these weights, it will
not necessarily be in a local minimum with respect to other weights

A 2nd perspective on local minima can be gained by considering the manner in which n/w weights
evolve as the number of training iterations increases

DVR
Back-Propagation algorithm - Advantages

MACHINE LEARNING
Backpropagation is fast, simple and easy to program
Backpropagation has no parameters to tune apart from the numbers of input

Backpropagation is a flexible method as it does not require prior knowledge about the network

DVR
Back-Propagation algorithm - Disadvantages

MACHINE LEARNING
Back Propagation Algorithm relies on input to perform on a specific problem
Sensitive to complex/noisy data

Back Propagation Algorithm needs the derivatives of activation functions for the network
design time

DVR
Back-Propagation algorithm - Applications

MACHINE LEARNING
Improving the accuracy of predictions in data mining and machine learning
Sensitive to complex/noisy data

Back Propagation Algorithm needs the derivatives of activation functions for the network
design time

DVR
MACHINE LEARNING
Artificial Neural Networks - 2

DVR
An illustrative example: Artificial Neural Networks-Face Recognition

MACHINE LEARNING
The main benefit of the neural network in facial recognition is the ability to train a system to
capture a complex class of facial patterns
The neural networks are non-linear in the network, so it is a widely used technique for facial
recognition

The OpenCV method is a common method in face detection

OpenCV method firstly extracts the feature images into a large sample set by extracting the face
Haar features in the image and then uses the AdaBoost algorithm as the face detector

Facial recognition is a technology that can recognize a person only by looking at them
Facial recognition uses machine learning techniques to identify, collect, store, and evaluate face
characteristics so that they can be matched to photos of people in a database

DVR
An illustrative example: Artificial Neural Networks-Face Recognition

MACHINE LEARNING
AI face recognition work

Facial recognition software reads the geometry of your face

Key factors include the distance between your eyes and the distance from
forehead to chin

The software identifies facial landmarks — one system identifies 68 of them —


that are key to distinguishing your face

The result: your facial signature

Biometric technology use unique traits of particular part of the body comprising facial
recognition, iris scan, fingerprint scan, etc

AI transform these informative distinctive traits into codes or make them more
complex and easy to understand by the system

DVR
An illustrative example: Artificial Neural Networks-Face Recognition

MACHINE LEARNING
DVR
An illustrative example: Artificial Neural Networks-Face Recognition

MACHINE LEARNING
Face Detection:

The main function of this step is to determine

1. Whether human faces appear in a given image, and

2. Where these faces are located at.


• The expected outputs of this step are patches containing each face in the input image. In
order to make further face recognition system more robust and easy to design, face
alignment are performed to justify the scales and orientations of these patches.
• Besides serving as the pre-processing for face recognition, face detection could be used for
region-of-interest detection, retargeting, video and image classification, etc.

DVR
An illustrative example: Artificial Neural Networks-Face Recognition

MACHINE LEARNING
Feature Extraction:
• After the face detection step, human-face patches are extracted from images.
• Directly using these patches for face recognition have some disadvantages.
• First, each patch usually contains over 1000 pixels, which are too large to build a robust recognition system1 .
• Second, face patches may be taken from different camera alignments, with different face expressions, illuminations,
and may suffer from occlusion and clutter.
• To overcome these drawbacks, feature extractions are performed to do information packing, dimension reduction,
salience extraction, and noise cleaning.
• After this step, a face patch is usually transformed into a vector with fixed dimension or a set of fiducial points and
their corresponding locations.

DVR
An illustrative example: Artificial Neural Networks-Face Recognition

MACHINE LEARNING
Face Recognition:
• After formulizing the representation of each face, the last step is to recognize the identities of these faces.
• In order to achieve automatic recognition, a face database is required to build. For each person, several
images are taken and their features are extracted and stored in the database.
• Then when an input face image comes in, we perform face detection and feature extraction, and compare its
feature to each face class stored in the database.

DVR
An illustrative example: Artificial Neural Networks-Face Recognition

MACHINE LEARNING
Face detection algorithms are classified into four categories:
• Knowledge-based
• Feature invariant
• Template matching
• Appearance-based method

1.Knowledge-based methods :
• These rule-based methods encode human knowledge of what constitutes a typical face.
• Usually, the rules capture the relationships between facial features.
• These methods are designed mainly for face localization, which aims to determine the image position of a
single face.

DVR
An illustrative example: Artificial Neural Networks-Face Recognition

MACHINE LEARNING
Face detection algorithms are classified into four categories:
• Knowledge-based
• Feature invariant
• Template matching
• Appearance-based method

2. Feature invariant approaches:


• These algorithms aim to find structural features that exist even when the pose, viewpoint, or lighting
conditions vary, and then use these to locate faces. These methods are designed mainly for face
localization.
• To distinguish from the knowledge-based methods, the feature invariant approaches start at feature
extraction process and face candidates finding, and later verify each candidate by spatial relations
among these features, while the knowledge-based methods usually exploit information of the whole
image and are sensitive to complicated backgrounds and other factors

DVR
An illustrative example: Artificial Neural Networks-Face Recognition

MACHINE LEARNING
Face detection algorithms are classified into four categories:
• Knowledge-based
• Feature invariant
• Template matching
• Appearance-based method
3 Template Matching :
Template Matching method uses pre-defined or parameterized face templates to locate or detect the
faces by the correlation between the templates and input images
Example:- a human face can be divided into eyes, face contour, nose, and mouth
Also, a face model can be built by edges just by using edge detection method
This approach is simple to implement, but it is inadequate for face detection
However, deformable templates have been proposed to deal with these problems

DVR
An illustrative example: Artificial Neural Networks-Face Recognition

MACHINE LEARNING
Face detection algorithms are classified into four categories:
• Knowledge-based
• Feature invariant
• Template matching
• Appearance-based method
4. Appearance-based methods:

In contrast to template matching, the models (or templates) are learned from a set of training images which
should capture the representative variability of facial appearance.These learned models are then used for
detection.
The appearance-based method depends on a set of delegate training face images to find out face models
The appearance-based approach is better than other ways of performance
In general appearance-based method rely on techniques from statistical analysis and machine learning to
find the relevant characteristics of face images
This method also used in feature extraction for face recognition

DVR
An illustrative example: Artificial Neural Networks-Face Recognition

MACHINE LEARNING
How the Face Detection Works:

There are many techniques to detect faces, with the help of these techniques, we can identify faces with
higher accuracy

These techniques have an almost same procedure for Face Detection such as OpenCV, Neural Networks,
Matlab, etc

The face detection work as to detect multiple faces in an image


In general appearance-based method rely on techniques from statistical analysis and machine learning to
find the relevant characteristics of face images
This method also used in feature extraction for face recognition

DVR
Neural Nets for Face Recognition
(http://www.cs.cmu.edu/tom/faces.html)

• Training images : 20 different persons with 32 images per person.


• After 260 training images, the network achieves an accuracy of
90% over test set.
• Algorithm parameters : η=0.3, α=0.3
Alternative Error Functions

• Penalize large weights: (weight decay) : Reducing the risk of


overfitting

• Train on target slopes as well as values:

• Minimizing the cross entropy : Learning a probabilistic output


function (chapter 6)

  t d log od  (1  t d ) log(1  od )
d∈ D
Recurrent Networks

(a) (b) (c)

(a) Feedforward network


(b) Recurrent network
(c) Recurrent network unfolded
in time
Dynamically Modifying Network Structure

• To improve generalization accuracy and training


efficiency
• Cascade-Correlation algorithm (Fahlman and Lebiere 1990)
– Start with the simplest possible network (no hidden units) and
add complexity
• Lecun et al. 1990
– Start with the complex network and prune it as we find that
certain connectives are inessential.
Evaluation Hypotheses
Hypothesis

Hypothesis is a supposition or proposed explanation made on the basis of limited evidence as a starting
point for further investigation

Hypothesis can be defined as the “claim” about the truth related to something that exists in the world

Hypothesis is the “claim”

Until the “claim” is proven to be true, it is called the hypothesis. Once the claim is proved, it becomes the
new truth about the thing

Example
A claim is made that students studying for more than 6 hours a day gets more than 90% of marks in their
examination

Now, this is just a claim and not the truth in the real world

In order for the claim to become the truth for widespread adoption, it needs to be proved using pieces of
evidence
In order to prove the claim or reject this claim, one needs to do some empirical analysis by gathering data
samples and evaluating the claim
Hypothesis

The process of gathering data and evaluating the claim with the goal to reject or failing to reject the
claim is termed hypothesis testing. Note the wordings – “failing to reject”

It means that we don’t have enough evidence to reject the claim


until the time that new evidence comes up, the claim can be considered the truth

There are different techniques to test the claim in order to reach the conclusion of whether the
hypothesis can be used to represent the truth of the world

A hypothesis is a statement of a relationship between two or more variables


Hypothesis

• A hypothesis is an explanation for something.


• It is a provisional idea, an educated guess that requires some evaluation.
• A good hypothesis is testable; it can be either true or false.
• In science, a hypothesis must be falsifiable, meaning that there exists a test whose outcome could mean
that the hypothesis is not true. The hypothesis must also be framed before the outcome of the test is k
nown.
• A good hypothesis fits the evidence and can be used to make predictions about new observations or new
situations.
• The hypothesis that best fits the evidence and can be used to make predictions is called a theory, or is p
art of a theory.
• Hypothesis in Science: Provisional explanation that fits the evidence and can be confirmed or disproved.
Hypothesis

• Statistical hypothesis tests are techniques used to calculate a critical value called an “effect.”
• The critical value can then be interpreted in order to determine how likely it is to observe the effect if
a relationship does not exist.
• If the likelihood is very small, then it suggests that the effect is probably real.
• If the likelihood is large, then we may have observed a statistical fluctuation, and the effect is
probably not real.
• One hypothesis is that there is no difference between the population means, based on the data samples.
• This is a hypothesis of no effect and is called the null hypothesis and is used as the statistical
hypothesis test to either reject this hypothesis, or fail to reject (retain) it.
• It is not said “accept” because the outcome is probabilistic and could still be wrong, just with a very low
probability.
Hypothesis

• If the null hypothesis is rejected, then we assume the alternative hypothesis that there exists some di
fference between the means.
• Null Hypothesis (H0): Suggests no effect.
• Alternate Hypothesis (H1): Suggests some effect.
• Statistical hypothesis tests don’t comment on the size of the effect, only the likelihood of the presenc
e or absence of the effect in the population, based on the observed samples of data.
• Hypothesis in Statistics: Probabilistic explanation about the presence of a relationship between observ
ations.
Hypothesis in Machine Learning

• Machine learning, specifically supervised learning, can be described as the desire to use available data to
learn a function that best maps inputs to outputs.
• Technically, this is a problem called function approximation, where we are approximating an unknown target
function (that we assume exists) that can best map inputs to outputs on all possible observations from the
problem domain.
• An example of a model that approximates the target function and performs mappings of inputs to outputs
is called a hypothesis in machine learning.
• The choice of algorithm (e.g. neural network) and the configuration of the algorithm (e.g. network topology
and hyperparameters) define the space of possible hypothesis that the model may represent.
• Learning for a machine learning algorithm involves navigating the chosen space of hypothesis toward the be
st or a good enough hypothesis that best approximates the target function
• A common notation is used where lowercase-h (h) represents a given specific hypothesis and uppercase-h
(H) represents the hypothesis space that is being searched.
• h (hypothesis): A single hypothesis, e.g. an instance or specific candidate model that maps inputs to output
s and can be evaluated and used to make predictions.
• H (hypothesis set): A space of possible hypotheses for mapping inputs to outputs that can be searched, of
ten constrained by the choice of the framing of the problem, the choice of model and the choice of model
configuration.
Motivation

MACHINE LEARNING
Evaluating the performance of learning systems is important because

Learning systems are usually designed to predict the class of “future” unlabeled data points
In some cases, evaluating hypotheses is an integral part of the learning process (example, when
pruning a decision tree)

DVR
Motivation

MACHINE LEARNING
Underfitting
A statistical model or a machine learning algorithm is said to have under fitting when it cannot
capture the underlying trend of the data.
It’s just like trying to fit undersized pants!

Underfitting destroys the accuracy of our machine learning model


Underfitting occurrence simply means that our model or the algorithm does not fit the data
well enough
Underfitting usually happens when we have fewer data to build an accurate model and also when
we try to build a linear model with fewer non-linear data

In such cases, the rules of the machine learning model are too easy and flexible to be applied
on such minimal data and therefore the model will probably make a lot of wrong predictions
Underfitting can be avoided by using more data and also reducing the features by feature
selection

DVR
Motivation

MACHINE LEARNING
Techniques to reduce Underfitting

Increase model complexity


Increase the number of features, performing feature engineering
Remove noise from the data.
Increase the number of epochs or increase the duration of training to get better results

DVR
Motivation

MACHINE LEARNING
Overfitting

A statistical model is said to be overfitted when we train it with a lot of data

just like fitting ourselves in oversized pants!


When a model gets trained with so much data, it starts learning from the noise and inaccurate
data entries in our data set

Then the model does not categorize the data correctly, because of too many details and noise

The causes of overfitting are the non-parametric and non-linear methods because these types
of machine learning algorithms have more freedom in building the model based on the dataset
and therefore they can really build unrealistic models

A solution to avoid overfitting is using a linear algorithm if we have linear data or using the
parameters like the maximal depth if we are using decision trees

DVR
Motivation

MACHINE LEARNING
Techniques to reduce Overfitting

Increase training data

Reduce model complexity


Early stopping during the training phase (have an eye over the loss over the training period as
soon as loss begins to increase stop training)
Ridge Regularization and Lasso Regularization
Use dropout for neural networks to tackle overfitting.

DVR
Motivation – Underfitting and Overfitting

MACHINE LEARNING
DVR
Motivation – Underfitting and Overfitting

MACHINE LEARNING
DVR
Motivation

MACHINE LEARNING
In many cases it is important to evaluate the performance of learned hypotheses as precisely
as possible
One reason is simply to understand whether to use the hypothesis

Example:

When learning from a limited-size database indicating the effectiveness of different


medical treatments, it is important to understand as precisely as possible the accuracy of
the learned hypotheses

A second reason is that evaluating hypotheses is an integral component of many learning


methods

Example:

in post-pruning decision trees to avoid overfitting, we must evaluate the impact of possible
pruning steps on the accuracy of the resulting decision tree

DVR
Difficulties in Evaluating Hypotheses when only limited data are available

MACHINE LEARNING
It is important to understand the likely errors inherent in estimating the accuracy of the
pruned and unpruned tree
Estimating the accuracy of a hypothesis is relatively straightforward when data is plentiful

When we must learn a hypothesis and estimate its future accuracy given only a limited set of
data, two key difficulties arise:

Bias in the estimate

First, the observed accuracy of the learned hypothesis over the training examples is often a
poor estimator of its accuracy over future examples
Because the learned hypothesis was derived from these examples, they will typically provide an
optimistically biased estimate of hypothesis accuracy over future examples
This is especially likely when the learner considers a very rich hypothesis space, enabling it to
overfit the training examples
To obtain an unbiased estimate of future accuracy, we typically test the hypothesis on some
set of test examples chosen independently of the training examples and the hypothesis

DVR
Difficulties in Evaluating Hypotheses when only limited data are available

MACHINE LEARNING
Variance in the estimate

First, the observed accuracy of the learned hypothesis over the training examples is often a
poor estimator of its accuracy over future examples
Second, even if the hypothesis accuracy is measured over an unbiased set of test examples
independent of the training examples, the measured accuracy can still vary from the true
accuracy, depending on the makeup of the particular set of test examples

The smaller the set of test examples, the greater the expected variance

DVR
Estimation hypothesis accuracy

MACHINE LEARNING
When evaluating a learned hypothesis we are most often interested in estimating the accuracy
with which it will classify future instances.
At the same time, we would like to know the probable error in this accuracy estimate (i.e., what
error bars to associate with this estimate)
Example

We consider the following setting for the learning problem

There is some space of possible instances X (e.g., the set of all people) over which various
target functions may be defined (e.g., people who plan to purchase new skis this year).

We assume that different instances in X may be encountered with different frequencies

A convenient way to model this is to assume there is some unknown probability distribution
D that defines the probability of encountering each instance in X (e-g., 23 might assign a
higher probability to encountering 19-year-old people than 109-year-old people)

DVR
Questions Considered

MACHINE LEARNING
• Given the observed accuracy of a hypothesis over a limited sample of
data, how well does this estimate its accuracy over additional
examples?
• Given that one hypothesis outperforms another over some sample
data, how probable is it that this hypothesis is more accurate, in
general?
• When data is limited what is the best way to use this data to both learn
a hypothesis and estimate its accuracy?

DVR
124
Estimation hypothesis accuracy

MACHINE LEARNING
Example

We consider the following setting for the learning problem

Notice 23 says nothing about whether x is a positive or negative example; it only detennines
the probability that x will be encountered

The learning task is to learn the target concept or target function f by considering a space
H of possible hypotheses

We assume that different instances in X may be encountered with different frequencies

Training examples of the target function f are provided to the learner by a trainer who
draws each instance independently, according to the distribution D, and who then forwards
the instance x along with its correct target value f (x) to the learner

DVR
Estimation hypothesis accuracy

MACHINE LEARNING
To illustrate

Consider learning the target function "people who plan to purchase new skis this year," given
a sample of training data collected by surveying people as they arrive at a ski resort

In this case the instance space X is the space of all people, who might be described by
attributes such as their
Age

Occupation,

How many times they skied last year, etc

The distribution D specifies for each person x the probability that x will be encountered as
the next person arriving at the ski resort

The target function f : X + {O,1) classifies each person according to whether or not they
plan to purchase skis this year.

DVR
Estimation hypothesis accuracy

MACHINE LEARNING
To illustrate

Within this general setting we are interested in the following two questions:

1. Given a hypothesis h and a data sample containing n examples drawn at random


according to the distribution D, what is the best estimate of the accuracy of h over
future instances drawn from the same distribution?

2. What is the probable error in this accuracy estimate?

DVR
Sample Error and True Error

MACHINE LEARNING
Sample Error represents the fraction of the sample which is misclassified
Sample Error is used to estimate the errors of the sample

The sample error (denoted errors(h)) of hypothesis h with respect to target function f and data
sample S is:
errors(h)= 1/n xS(f(x),h(x))
where n is the number of examples in S, and the quantity (f(x),h(x)) is 1 if f(x)  h(x), and 0,
otherwise.

DVR
128
Sample Error and True Error

MACHINE LEARNING
True error represents the probability that a random sample from the population is misclassified
True error is used to estimate the error of the population

The true error (denoted errorD(h)) of hypothesis h with respect to target function f and
distribution D, is the probability that h will misclassify an instance drawn at random according to D.
errorD(h)= PrxD[f(x)  h(x)]

DVR
129
Sample Error and True Error

MACHINE LEARNING
True Error is the error Calculated with respect to Data Distribution D
Sample Error represents the fraction of the sample which is misclassified
Sample Error is 0.2 (i.e 1 out of 5) from the OUR SAMPLE circle
True Error is 0.5 (i.e 10 out of 20) from all the DATA DISTRIBUTION

DVR
130
Advanced topics in artificial neural networks.

MACHINE LEARNING
The multimodal neurons are one of the most advanced neural networks to date.

The researchers have found these advanced neurons can respond to a cluster of abstract concepts
centered around a common high-level theme rather than a specific visual feature.

A recurrent neural network (RNN) is a type of artificial neural network which uses sequential data or time series
data.

RNN's are mainly used for


Sequence Classification

Sentiment Classification & Video

Classification

Sequence Labelling — Part of speech tagging & Named entity


recognition

Recurrent Neural Networks(RNN) are a type of Neural Network where the output from the previous
step is fed as input to the current step

DVR
Basics of sampling theory

MACHINE LEARNING
Random Variable

A random variable can be viewed as the name of an experiment with a probabilistic


outcome. Its value is the outcome of the experiment.

Random variable value is the outcome of the experiment.

Probability Distribution

A probability distribution for a random variable Y specifies the probability Pr(Y = yi)
that Y will take on the value yi, for each possible value yi.

Expected value

The expected value, or mean, of a random variable Y is E[Y] = Ciyi P r(Y = yi). The
symbol p)i~s commonly used to represent E[Y]

Variance of a Random Variable

The variance of a random variable is Var(Y) = E[(Y - p~)~]T.he variance characterizes

DVR
the width or dispersion of the distribution about its mean
Basics of sampling theory

MACHINE LEARNING
Standard Deviation

The standard deviation of Y is SQRT(Var(Y)).

The symbol σy is often used to represent the standard deviation of Y

Binomial Distribution

The Binomial distribution gives the probability of observing r heads in a series of n


independent coin tosses, if the probability of heads in a single toss is p

Normal Distribution

The Normal distribution is a bell-shaped probability distribution that covers many


natural phenomena

Central Limit Theorem

The Central Limit Theorem is a theorem stating that the sum of a large number of
independent, identically distributed random variables approximately follows a Normal
distribution

DVR
Basics of sampling theory

MACHINE LEARNING
Estimator

An estimator is a random variable Y used to estimate some parameter p of an


underlying population

Estimation Bias

The estimation bias of Y as an estimator for p is the quantity (E[Y] - p).

An unbiased estimator is one for which the bias is zero.

Confidence Interval

A N% confidence interval estimate for parameter p is an interval that includes p with


probability N%

Hypothesis

A N% confidence interval estimate for parameter p is an interval that includes p with


probability N%

DVR
BASICS OF SAMPLING THEORY

MACHINE LEARNING
Error Estimation and Estimating Binomial Proportions

Collect a random sample S of n independently drawn instances from the distribution D, and then
measure the sample error errors(h).

Repeat this experiment many times, each time drawing a different random sample Si of size n, we
would expect to observe different values for the various errorsi(h), depending on random differences
in the makeup of the various Si. We say that errorsi(h), the outcome of the ith such experiment, is a
random variable

DVR
BASICS OF SAMPLING THEORY

MACHINE LEARNING
 Imagine that we were to run k random experiments, measuring the random variables errors1(h), errors2(h) . .
. errorssk(h) and plotted a histogram displaying the frequency with which each possible error value is
observed.
 As k grows, the histogram would approach a particular probability distribution called the Binomial
distribution which is shown in below figure.

DVR
BASICS OF SAMPLING THEORY

MACHINE LEARNING
DVR
A general approach for deriving confidence intervals

MACHINE LEARNING
The general process includes the following steps:
1. Identify the underlying population parameter "p" to be estimated
Example:
errorD(h)

2. Define the estimator Y(e.g. errorS(h).


It is desirable to choose a minimum variance, unbiased estimator

3. Determine the probability distribution Dy that governs the


estimator Y, including its mean and variance

4. Determine the N% confidence interval by finding thresholds "L"


and "V" such that N% of mass in the probability distribution Dy falls
b/w "L" and "V".

DVR
A general approach for deriving confidence intervals

MACHINE LEARNING
Central Limit Theorem :

Central Theorem states that "The probability distribution governing Yn approaches a normal distribution as
n--- ,regardless of distribution that governs the underlying random variables Yi“
ȣ

Further more, the mean of distribution governing Yn approaches "u"


and Standard deviation approaches

3. Determine the probability distribution Dy that governs the


ȣ
estimator Y, including its mean and variance

4. Determine the N% confidence interval by finding thresholds "L"


and "V" such that N% of mass in the probability distribution Dy falls
b/w "L" and "V".

DVR
Difference in error of two hypotheses

MACHINE LEARNING
A statistical hypothesis test is a procedure for deciding between two possible
statements about a population

The phrase significance test means the same thing as the phrase "Hypothesis test"

A hypothesis test is a statistical method that uses sample data to evaluate a


hypothesis about a population

The general goal of a hypothesis test is to rule out chance as a plausible explanation
for the results from a research study

The goal in hypothesis testing is to analyze a sample in an attempt to distinguish


between population characteristics that are likely to occur and population
characteristics that are unlikely to occur

DVR
Difference in error of two hypotheses

MACHINE LEARNING
Consider the case where we have two hypotheses hl and h2 for some discrete valued target function.

Hypothesis hl has been tested on a sample S1 containing nl randomly drawn examples, and h2 has
been tested on an independent sample S2 containing n2 examples drawn from the same
distribution.

Suppose we wish to estimate the difference d between the true errors of these two hypotheses

gives the unbiased estimate of d,that is E[ ]=d

DVR
Difference in Error of two Hypotheses

MACHINE LEARNING
● Let h1 and h2 be two hypotheses for some discrete-valued target function. H1 has been
tested on a sample S1 containing n1 randomly drawn examples, and h2 has been tested on
an independent sample S2 containing n2 examples drawn from the same distribution.
● Let’s estimate the difference between the true errors of these two hypotheses, d, by
computing the difference between the sample errors:
● dˆ = errorS1(h1)-errorS2(h2)
● The approximate N% confidence interval for d is:
● d^ ± ZN√errorS1(h1)(1-errorS1(h1))/n1 + errorS2(h2)(1-errorS2(h2))/n2

DVR
142
Comparing Learning Algorithms

MACHINE LEARNING
● Which of LA and LB is the better learning method on average for learning some particular
target function f ?
● To answer this question, we wish to estimate the expected value of the difference in their
errors: ES⊂D [errorD(LA(S))-errorD(LB(S))]
● Of course, since we have only a limited sample D0 we estimate this quantity by dividing D0
into a training set S0 and a testing set T0 and measure:
errorT0(LA(S0))-errorT0(LB(S0))
● Problem: We are only measuring the difference in errors for one training set S0 rather than
the expected value of this difference over all samples S drawn from D
Solution: k-fold Cross-Validation

DVR
143
k-Fold Cross-Validation

MACHINE LEARNING
1. Partition the available data D0 into k disjoint
subsets T1, T2, …, Tk of equal size, where this size
is at least 30.
2. For i from 1 to k, do
use Ti for the test set, and the remaining data for
training set Si
• Si <- {D0 - Ti}
• hA <- LA(Si)
• hB <- LB(Si)
• δi <- errorTi(hA)-errorTi(hB)
3. Return the value avg(δ), where
. avg(δ) = 1/k Σi=1k δi

DVR
144
Confidence of the k-fold Estimate

MACHINE LEARNING
● The approximate N% confidence interval for estimating
ES⊂D0 [errorD(LA(S))-errorD(LB(S))] using avg(δ), is given by:
avg(δ)±tN,k-1savg(δ)
where tN,k-1 is a constant similar to ZN (See [Mitchell, Table 5.6]) and
savg(δ) is an estimate of the standard deviation of the distribution
governing avg(δ)
savg(δ))=√1/k(k-1) Σi=1k (δi -avg(δ))2

DVR
145
MACHINE LEARNING

MACHINE LEARNING
Assignment - 2

Please complete the following UNIT-2 videos and prepare the notes on the following videos
Introduction to Artificial Neural Networks & their Representation of Neural
Networks |ML| https://www.youtube.com/watch?v=WtUfElRD3rU
An Introduction to Artificial Neural Networks - Big Data AnalyticsTutorial by
Mahesh Huddar https://www.youtube.com/watch?v=5glV6PQAekY
https://www.coursera.org/lecture/neural-networks-deep-
learning/neural-networks-overview-qg83v
https://www.youtube.com/watch?v=MnKPzItvqQw
Neural Networks Representation in Machine Learning https://www.youtube.com/watch?v=uGWV4Mbf0PU
Remarks On Back Propagation Algorithm |ML| https://www.youtube.com/watch?v=RQmaFw7Pc_k
Neural Network Representation | Deep Learning https://www.youtube.com/watch?v=MnKPzItvqQw
What Is Backpropagation? — A Step-By-Step Guide To Backpropagation https://medium.com/edureka/backpropagation-bd2cf8fdde81
Back Propagation Algorithm With Example Part-1 |ML| https://www.youtube.com/watch?v=3oOTnFM-3lg
6 Basics Of Sampling Theory |ML| https://www.youtube.com/watch?v=WdYZkv2rtCo

Evaluating The Hypothesis - Motivation, Estimating Hypothesis Accuracy |ML| https://www.youtube.com/watch?v=qXX5xyZj31s


The Perceptron and The Perceptron training rule |ML| https://www.youtube.com/watch?v=LbLcEiSfA-c
Neural Networks Representation in Machine Learning https://www.youtube.com/watch?v=uGWV4Mbf0PU
Difference in Error Of Two Hypothesis - Hypothesis Testing - Type 1& Type 2
Errors |ML| https://www.youtube.com/watch?v=XMT3QMVXadA

I will check each and every one Notes of the above Videos. This is your Assignment.

Unit-1 Introduction to Decision Trees https://www.youtube.com/watch?v=FuJVLsZYkuE


Unit-1 Overfitting https://www.youtube.com/watch?v=y6SpA2Wuyt8

DVR
DVR MACHINE LEARNING
MACHINE LEARNING
DR DV RAMANA,
DATA STRATEGIST –CONSULTANT
AND
CHIEF ACADEMIC ADVISOR
MAIL ADDRESS: drdvramana@pallaviengineeringcollege.ac.in
TO CONTACT: +91 9959423084

DVR

You might also like