ML Unit 2

You might also like

Download as pdf or txt
Download as pdf or txt
You are on page 1of 91

Artificial Neural Networks-1– Introduction, neural network

representation, appropriate problems for neural network learning,


perceptions, multilayer networks and the back-propagation
algorithm.
Artificial Neural Networks-2- Remarks on the Back-Propagation
algorithm, An illustrative example: face recognition, advanced topics
in artificial neural networks.
Evaluation Hypotheses – Motivation, estimation hypothesis accuracy,
basics of sampling theory, a general approach for deriving
confidence intervals, difference in error of two hypotheses,
comparing learning algorithms.
The term "Artificial Neural Network" is derived from Biological neural
networks that develop the structure of a human brain.
Similar to the human brain that has neurons interconnected to one another,
artificial neural networks also have neurons that are interconnected to one another
in various layers of the networks.
These neurons are known as nodes.
ANN is divided into 3 Parts:
1. Input Layer
2. Hidden Layer
3. Output Layer
Artificial Neural Networks
• The “building blocks” of neural networks are the neurons.
• In technical systems, we also refer to them as units or nodes.
• Basically, each neuron
receives input from many other neurons.
changes its internal state (activation) based on the current input.
sends one output signal to many other neurons, possibly including its input
neurons (recurrent network).
• Information is transmitted as a series of electric impulses, so-called spikes.
• The frequency and phase of these spikes encodes the information.
• In biological systems, one neuron can be connected to as many as 10,000 other
neurons.
Usually, a neuron receives its information from other neurons in a confined
area, its so-called receptive field.
Humans perform complex tasks like vision, motor control, or language
understanding very well.
The brain is a highly complex, non-linear, and parallel computer, composed of
some 1011 neurons that are densely connected (~104 connection per neuron).
One way to build intelligent machines is to try to imitate the (organizational
principles of) human brain.
Computational models inspired by the human brain:
Massively parallel, distributed system, made up of simple processing units
(neurons)
Synaptic connection strengths among neurons are used to store the acquired
knowledge.
Knowledge is acquired by the network from its environment through a learning
process
How do ANNs work?
An artificial neural network (ANN) is either a hardware implementation or a
computer program which strives to simulate the information processing capabilities of
its biological exemplar.
ANNs are typically composed of a great number of interconnected artificial neurons. The
artificial neurons are simplified models of their biological counterparts.
ANN is a technique for solving problems by constructing software that works like our
brains.
How do our brains work?
 The Brain is a massively parallel information processing system.
 Our brains are a huge network of processing elements.
 A typical brain contains a network of 10 billion neurons.
How do ANNs work? Node has 2 parts:
1.Summation Function
2.Activation function

An artificial neuron is an imitation of a human neuron


How do ANNs work?
Model of an artificial neuron
How do ANNs work?

............
Input xm x2 x1

Processing ∑
∑= X1+X2 + ….+Xm =y

Output y
How do ANNs work?
Not all inputs are equal
............
xm x2 x1
Input
wm ..... w2 w1
weights

Processing ∑ ∑= X1w1+X2w2 + ….+Xmwm =y

Output y
How do ANNs work?

............
xm x2 x1
Input
wm ..... w2 w1
weights

Processing ∑
Transfer Function (Activation
f(vk)
Function)

Output y
The output is a function of the input, that is affected by the
weights, and the transfer functions
Artificial Neural Networks
An ANN can:
1. compute any computable function, by the appropriate selection of the network topology
and weights values.
2. learn from experience!
 Specifically, by trial‐and‐error
Learning by trial‐and‐error
Continuous process of:
Trial:
Processing an input to produce an output (In terms of ANN: Compute the output function of a given
input)

Evaluate:
Evaluating this output by comparing the actual output with the expected output.
Adjust:
Adjust the weights.
Types of ANN:
A prototypical example of ANN learning is provided by Pomerleau's (1993) system
ALVINN, which uses a learned ANN to steer an autonomous vehicle driving at normal
speeds on public highways.
The input to the neural network is a 30 x 32 grid of pixel intensities obtained from a
forward-pointed camera mounted on the vehicle.
The network output is the direction in which the vehicle is steered.
The ANN is trained to mimic the observed steering commands of a human driving the
vehicle for approximately 5 minutes.
ALVINN has used its learned networks to successfully drive at speeds up to 70 miles per
hour and for distances of 90 miles on public highways.
Figure 4.1: Neural network
learning to steer an autonomous
vehicle. The ALVINN system
uses BACKPROPAGATION to
learn to steer an autonomous
vehicle (photo at top) driving at
speeds up to 70 miles per hour.
The network is shown on the left side of the figure, with the input camera image depicted below it.
Each node (i.e., circle) in the network diagram corresponds to the output of a single network unit,
and the lines entering the node from below are its inputs.
There are four units that receive inputs directly from all of the 30 x 32 pixels in the image. These
are called "hidden" units because their output is available only within the network and is not
available as part of the global network output.
Each of these four hidden units computes a single real-valued output based on a weighted
combination of its 960 inputs.
These hidden unit outputs are then used as inputs to a second layer of 30 "output" units. Each output
unit corresponds to a particular steering direction, and the output values of these units determine
which steering direction is recommended most strongly
The diagrams on the right side of the figure depict the learned weight values associated with one of
the four hidden units in this ANN.
The large matrix of black and white boxes on the lower right depicts the weights from the 30 x 32
pixel inputs into the hidden unit.
Here, a white box indicates a positive weight, a black box a negative weight, and the size of the box
indicates the weight magnitude.
The BACKPROPAGATION algorithm is the most commonly used ANN
learning technique. It is appropriate for problems with the following
characteristics:
1. Instances are represented by many attribute-value pairs.
2. The target function output may be discrete-valued, real-valued, or a vector of
several real- or discrete-valued attributes.
3. The training examples may contain errors or missing Values.
4. Long Training Times are Acceptable.
5. Fast evaluation of the learned target function may be required.
6. The ability of humans to understand the learned target function is not important.
Here t is the target output for the
current training example,
o is the output generated by the
perceptron, and
n is a positive constant called the
learning rate.
The role of the learning rate is to
moderate the degree to which weights
are changed at each step
VISUALIZING THE HYPOTHESIS SPACE
td = Constant
Single perceptrons can only express linear decision surfaces.
In contrast, the kind of multilayer networks learned by the Backpropagation algorithm are capable of expressing a
rich variety of nonlinear decision surfaces.
Multiple layers of cascaded linear units still produce only linear functions, and we prefer networks capable of
representing highly nonlinear functions.
What we need is a unit whose output is a nonlinear function of its inputs, but whose output is also a
differentiable function of its inputs.
One solution is the sigmoid unit-a unit very much like a perceptron, but based on a smoothed, differentiable
threshold function.
The sigmoid unit is illustrated in Figure 4.6. Like the perceptron, the sigmoid unit first computes a linear
combination of its inputs, then applies a threshold to the result.
Random Weights can be range
of -1 to +1.
Calculate Error less than 5%
Assign Random Numbers
between 0 and 1
Convergence and Local Minima – All the Neurons are interconnected with each
other & Minimization of Errors within Network.
Representational Power of Feedforward Networks – Boolean function,
Continuous functions, Arbitrary functions.
Hypothesis Space Search and Inductive Bias
Hidden Layer Representations
Generalization, Overfitting, and Stopping Criterion.
Task:
The learning task here involves classifying camera images of faces of various people in
various poses.
Images of 20 different people were collected, including approximately 32 images per
person, varying the person's expression (happy, sad, angry, neutral), the direction in which
they were looking (left, right, straight ahead, up), and whether or not they were wearing
sunglasses.
There is also variation in the background behind the person, the clothing worn by the
person, and the position of the person's face within the image.
In total, 624 greyscale images were collected, each with a resolution of 120 x 128, with each
image pixel described by a greyscale intensity value between 0 (black) and 255 (white).
Design Choices:
After training on a set of 260 images, classification accuracy over a separate test set is 90%.
In contrast, the default accuracy achieved by randomly guessing one of the four possible
face directions is 25%.
Input encoding:
Given that the ANN input is to be some representation of the image, one key
design choice is how to encode this image.
For example, we could preprocess the image to extract edges, regions of
uniform intensity, or other local image features, then input these features to the
network
Output encoding:
ANN must output one of four values indicating the direction in which the
person is looking (left, right, up, or straight).
We could encode this four-way classification using a single output unit,
assigning outputs of, say, 0.2,0.4,0.6, and 0.8 to encode these four possible
values.
Bias: If S is training set, errorS(h) is optimistically biased

For unbiased estimate, h and S must be chosen


independently
Variance: Even with unbiased S, errorS(h) may still vary from errorD(h)

Bias in the estimate: The observed accuracy of the learned hypothesis over the training
examples is a poor estimator of its accuracy over future examples ==> we test the hypothesis
on a test set chosen independently of the training set and the hypothesis.
Variance in the estimate: Even with a separate test set, the measured accuracy can vary from
the true accuracy, depending on the makeup of the particular set of test examples. The
smaller the test set, the greater the expected variance.
When evaluating a learned hypothesis we are most often interested in estimating the
accuracy with which it will classify future instances.
At the same time, we would like to know the probable error in this accuracy estimate.
The target function f : X ->{0,1) classifies each person Yes or No

1. Given a hypothesis h and a data sample containing n examples drawn at random


according to the distribution D, what is the best estimate of the accuracy of h over future
instances drawn from the same distribution?
2. What is the probable error in this accuracy estimate?
Sample Error – Sample of
Data
True Error – Entire
Distribution
F(x) –Learned Target
function
H(x) – Predicted function

Eg. Total 60
Sample :30
P(X)=H/T (or) 0/1
The general process includes the following steps:
1. Identify the underlying population parameter p to be estimated, for example,
error D (h).
2. Define the estimator Y (e.g., errors(h)). It is desirable to choose a minimum
variance, unbiased estimator.
3. Determine the probability distribution DY that governs the estimator Y, including
its mean and variance.
4. Determine the N% confidence interval by finding thresholds L and U such that
N% of the mass in the probability distribution DY falls between L and U
Test h1 on sample S1, test h2 on S2
1. Pick parameter to estimate

2. Choose an estimator

3. Determine probability distribution that governs estimator

4. Find interval (L, U) such that N% of probability mass falls in the interval (L,U –
Thresholds)

You might also like