Lect 2

University Of Khartoum
Department Of Electronics & Electrical

Engineering
Software & Control Engineering
EEE52511: NEURAL NETWORKS

& FUZZY SYSTEMS
By: Dr. Hiba Hassan Sayed
Lecture 2
16/1/2023 U of K: Dr. Hiba Hassan 2
Complexity of Human Neural System

• Biological information processing is robust and fault-tolerant:
• Early on in life, we have our greatest number of neurons, then
daily thousands of them are lost. Nevertheless, we continue to
function for many years without an associated decline in our
capabilities.
• Biological information processors are flexible:
• We do not need to be reprogrammed when we go into a new
environment; we adapt to the new environment, i.e. we learn.
HNS Cont.
• The way we handle fuzzy, probabilistic, noisy and inconsistent data
is possible with computer programs under specific circumstances.
• Highly sophisticated programming and when the context of such
data has been analyzed in detail.
• We have natural ability to handle uncertainty.
• The biological processing unit, the brain, is highly parallel, small,
compact and requires little power.
Neural Networks Approach

• How to formulate neural network solutions:
1) Understand and specify your problem in terms of given inputs
and desired outputs.
2) Take the simplest form of network you think might be used to
solve your problem, e.g. a simple Perceptron.
3) Try to find appropriate connection weights (including neuron
thresholds) so that the network produces the right outputs for
each input in its training data.
Cont.
4) Use different sets of data; the network is trained on a set of
training data, and its generalization ability is tested using a new
testing data. More often validation data is used to tune the
hyperparameters.
5) If the network doesn’t perform well enough, go back to stage 3
and try harder.
6) If the network still doesn’t perform well enough, go back to stage
2 and try harder.
7) If the network still doesn’t perform well enough, go back to stage
1 and try harder.
8) Problem solved – move on to next problem.
Cont.
• There are two important aspects of the network’s operation to
consider:
• Learning: The network must learn decision boundaries from a set
of training patterns so that these training patterns are classified
correctly.
• Generalization: After training, the network must also be able to
generalize, i.e. correctly classify test patterns it has never seen
before.
• Usually we want the neural network to learn in a way that produces
good generalization.
Cont.
• Sometimes, the training data may contain errors (e.g.,
noise in the experimental determination of the input values,
or incorrect classifications).
• In this case, learning the training data perfectly may lead to
poor generalization. There is an important tradeoff
between learning and generalization that arises quite
often.
Neuron Models
• When the input is a vector, the individual element inputs are
multiplied (dot product) by weights and the weighted values are fed
to the summing junction.
• Then the output y is given by:
𝑦 = ෍ 𝑤𝑖 𝑥𝑖 = 𝑓 𝑤. 𝑥 = 𝑓(𝑤 𝑇 𝑥)
𝑖
• The neuron model is represented as follows:
Neuron model with vector input

A layer of neurons:
Multiple Layers Neurons

General FeedForward Artificial Neural Networks Architecture

(FFANN)
• FeedForward ANNs allow signal to travel one-way only, from input
to output.
• FeedForward ANNs tend to be straightforward networks that
associate inputs with outputs. They are extensively used in pattern
recognition.
• Figure above shows the architecture of a Multi-Layer FeedForward
neural network of log sigmoid neurons it is a equivalent to Multi-
Layer Perceptron (MLP) network.
Multiple Layers Neurons (cont.)
• The above example has R1 inputs, S1 neurons in the first layer, S2

neurons in the second layer, etc. It is common for different layers to
have different numbers of neurons. The output of previous figure is
defined as follows :
a 3  f 3 (LW3, 2 f 2 (LW2,1 f 1 (IW1,1P + b1) + b2 ) + b3)

• The layer that produces the network output is the output layer and
all the middle layers are called the hidden layers.
TRANSFER (ACTIVATION)
FUNCTIONS
1- Linear neurons
• These are simple but computationally limited
• If we can make them learn we may gain insight and proceed into
more complicated neurons.
y  b   xi wi y
0
i
0
b   xi wi
i
2- Binary threshold neurons
output
0 threshold weighted input
• Developed by McCulloch-Pitts (1943): Also called the hard limiter

transfer function.
2- Binary threshold neurons (cont.)

• There are two equivalent ways to write the
equations for a binary threshold neuron:
z   xi wi z = b + å xi wi
q = -b i
i
1 if z  1 if z³0
y y
0 otherwise 0 otherwise
3- Rectified Linear Neurons

(linear threshold neurons)
They compute a linear weighted sum of their
inputs.
The output is a non-linear function of the total
input.
z = b + å xi wi
i
z if z >0 y
y = z
0 otherwise 0
4- Sigmoid neurons
z = b+ å xiwi
• They give a real- 1
y
valued output that is a
1 e
 z
smooth and bounded
function of their total i
input.
1
• Typically they use
the logistic function
0.5
• They have positive y
derivatives which
make learning easy. 0
0 z
5- Softmax Transfer Function
• When we have several independent binary attributes by which to
classify the data, we need to use a network with multiple logistic

outputs.
• Then we have n output neurons, each one corresponding to one
class, and the target values are 1 for the correct class, and 0
otherwise.
Cont.
• Each output neuron will produce a value between 0 & 1, example;
0.3, 0.7, 0.8, 0.9….
• To solve this problem, a generalization of the logistic sigmoid was
developed, the softmax activation function.
• The softmax function converts a vector of numbers into a vector of
probabilities proportional to the scale of each vector element.
• Hence, it normalizes the output vector.
Softmax Transfer Function (cont.)

e zi
yi =
å e
zj
jÎgroup
• Where z is the value of each input node.

• A suitable cost (error) function to use with softmax
is the negative log probability of the right answer.
• This is called cross entropy cost function, and it is
given by,
C = - å t j log y j
j
6. Radial Basis and Triangular Basis

transfer functions:
a (n)  exp( n )
2
1  n , if  1  n  1
a ( n)  
0, otherwise
Learning in Artificial Neural Network:
• Learning in the context of neural network is defined as:
• Learning is a process by which the free parameters of a neural
network are adapted through a process of stimulation by the

environment in which the network is embedded. The type of
learning is determined by the manner in which the parameter
changes take place.
Learning Algorithm
• The learning algorithm is a prescribed set of well-defined rule for
the solution of a learning problem.
• In every learning algorithm, we must specify the cost function.
• Cost function - is a way of using your training data to determine
values for your parameters which produces an output function as
accurate as possible.
• The Learning paradigm is a model of the environment in which
the neural network operates.
• There are three major learning paradigms.
1- Supervised Learning
• A teacher is present during the learning process & the desired
output is presented.
• Every input pattern is used to train the network.
• The cost function is given by the difference between the
network’s computed output and the expected output (the error).
2- Unsupervised Learning
• There is no teacher.
• No expected output is presented to the network.
• The system undergoes self learning by discovering and adapting to
the structural features in the input patterns.
• The cost function is determined by the task formulation.
• Most applications fall within the domain of estimation problems such
as statistical modeling, compression, filtering, blind source separation
and clustering.
2- Unsupervised learning (cont.)
• Unsupervised or self-organised learning; the neural

network is presented with input data only; no target.
• Unsupervised learning tends to follow the neuro-biological
organisation of the brain.
3- Reinforced Learning
• There is a teacher.
• There is no expected outcome presented to the network.
• The teacher help by indicating if a computed output is right or
wrong.
• A reward is given for the right one & a penalty is given for the
wrong one.
• Data is usually not given, but generated by an agent's interactions
with the environment.
3- Reinforced Learning (cont.)

• At each point in time, the agent performs an action and the environment
generates an observation and the instantaneous cost according to some
dynamics.
• The aim is to discover a policy for selecting actions that minimizes some
measure of a long-term cost, i.e. the expected cumulative cost.
• That is, the goal is to map situations to actions, so as to maximize a
numerical reward signal
• The environment's dynamics and the long-term cost for each policy are
usually unknown, but can be estimated.
Cont.
• Tasks that fall within the paradigm of
reinforcement learning are control problems,
games and other sequential decision making
tasks.
Two types of supervised learning

• Each training case consists of an input vector x and a target output
t.
• Regression: The target output is a real number or a whole vector of

real numbers.
• The price of a stock in 6 months time.
• The temperature at noon tomorrow.
• Classification: The target output is a class label.

• The simplest case is a choice between 1 and 0.
• We can also have multiple alternative labels.
Supervised Learning Example

• Here is an example of a Regression supervised
learning problem.
Example
• "Given this data, a friend has a house 650 square feet - how much
can they be expected to get?"
There are different approaches that can be used to solve this,
• A Straight line through data
• Maybe $150 000
• A Second order polynomial
• Maybe $200 000
• Each of these approaches represent a way of doing supervised
learning.
Cont.
• So, a training data is provided in which the actual price of the
house is known.
• The algorithm uses this to learn to predict prices of houses for any
other set of data.
• We call this a regression problem because,
• It predicts continuous valued output (price)
• It has no real discrete definition.
Example 2
• The following graph shows the number of times a

breast tumor is benign or malignant vs its tumor
size:
Example 2 (cont.)
• The graph shows that we have 5 tumors of each kind.
• We want to find a way to classify whether a tumor is benign or
malignant according to our trained network!
• This is an example of a classification problem
• Classify data into one of two discrete classes - malignant or not.
• In classification problems, we may have a discrete number of
possible values for the output, e.g. 0 – benign, 1 - type 1, 2 - type
2, 3 - type 4.
• In classification problems we can plot data in different ways.
Classification Example (cont.)

• Notice that only the size attribute was used there.
• There may be other attributes to be used such as
age.
Cont.
• Based on that data, you can try and define separate classes by,
• Drawing a straight line between the two groups
• Using a more complex function to define the two groups.
• Then, when you have an individual with a specific tumor size and
who is a specific age, you can use that information to place them
into one of your classes
• You might have many features to consider
• Clump thickness, Uniformity of cell size, Uniformity of cell
shape…etc.
Review Questions
1. The amount of rain that falls is usually measured in either (mm)
or inches. Suppose you use a learning algorithm to predict how
much rain will fall next week. Is this a classification or a
regression problem?
2. Suppose you are working on stock market prediction. Typically
tens of millions of shares of Microsoft stock are traded (i.e.,
bought/sold) each day. You would like to predict the number of
Microsoft shares that will be traded tomorrow. Is this as a
classification or a regression problem?
Linear Regression Example

• Considering the training algorithm of the Housing price data
example given earlier.
• The data set is our Training set.
• We define the following variables:
• m = number of training examples
• x's = input variables / features
• y's = output variable "target" variables
• (x,y) - single training example
• (xi, yj) - specific example (ith training example)
i is an index to training set
Cont.
• The number of training examples is 47 & the first four are shown
below:
Cont.
• The training set is passed to the learning algorithm.
• This algorithm accepts an input such as the size of a new house &
makes a hypothesis, denoted by h to output the estimated value of Y.
• The hypothesis h is given by,
• hθ(x) = θ0 + θ1x, h(x) (shorthand)
• As noticed Y is a linear function of x
• θi are parameters
• θ0 is zero condition
• θ1 is gradient
• This kind of function is a linear regression with one variable.
• Also called univariate linear regression
Linear Regression Implementation- Cost Function

• Cost function - is a way of using your training data to determine
values for your θ values which make the hypothesis as accurate as
possible.
• Different values for θi (parameters) give you different functions.
• If θ0 is 1.5 and θ1 is 0 then we get straight line parallel with X along 1.5
@y
• If θ1 is > 0 then we get a positive slope
• Hence, we need to choose those parameters so hθ(x) as is close to
y as possible.
• Think of hθ(x) as a "y imitator" - it tries to convert the x into y, and since
we know y, it is possible to evaluate how well hθ(x) does this.
Cont.
• Therefore, we need to optimize our system by solving
a minimization problem, i.e. minimize the difference between h(x)
and y for each example.
• One of the most common used cost functions is the squared error
cost function given by the following:
 
m 2
1
J ( 0 , 1 )   ho ( x )  y
i i
2 i 1
House cost prediction Example (Cont.)

• To minimize the above cost function, we compute the values
of θ0 and θ1 which find on average the minimal deviation of x from
y when we use those parameters in our hypothesis function.
• Hence, we need an algorithm which will help us determine those θ 0
& θ1 values.
• There are several learning algorithms, such as; Hebbian, Gradient
Descent, Competitive & Stochastic.
• For supervised learning, usually either Gradient Descent or
Stochastic is used, while the other two are used for unsupervised
learning.
Supervised Learning
• A programmer specifies number of units in each layer and connectivity
between units, so the only unknown is the set of weights associated with
the connections.
Supervised Learning (Cont.)

Algorithm:
• Initialize the weights in the network (usually with random values).
• Repeat until stopping criterion is met.
• For each example in training set do:
• O=neural network output
• T=desired output (Teacher or Target)
• Update weights
Note: Each pass through all of the training examples is called
epoch.
Learning Rules:
• A learning rule, also known as training algorithm, is defined as a
procedure for modifying the weights and biases of a network.
• The learning rule is applied to train the network to perform some
particular task.
Learning Rules
• These learning types may use different learning rules, such as:
• Hebbian,
• Gradient descent,
• Competitive,
• Stochastic.
• Hence, the learning types are categorized even further according
to the rule used.
Perceptrons – the first NNs

• They are the first neural networks, introduced in 1950s by Frank
Rosenblatt along with other researchers.
• It was developed to perform pattern recognition, hence it is a
classifier.
• It is a fast and reliable network.
• It could be a single layer or multi layered.
• It has limited applications.
Perceptrons (cont.)
• It is made up of only input neurons and output neurons
• Input neurons, usually, have two states: ON and OFF
• A simple threshold activation function is used for the output
neurons.
• It uses supervised training
• Example:
Cont.
• Based on that simple example, now we can develop the learning
rule for a perceptron.
• The perceptron, usually, uses a hard limit activation function as
shown in the following figures.
Perceptrons
One perceptron neuron
One Perceptron layer

A layer of Perceptrons
Multilayer Perceptron
Perceptron Learning Rule

• First, we define the perceptron error e;
e = t – a,
Where; t = target, a = output.
• Hence, we update the weight via the following rule:
wnew = w old + ep = w old + (t – a)p
For bias; bnew = b old +e

Perceptron Learning Rule: (Convergence Theorem)

• Perceptrons are trained on examples of desired behavior. The
desired behavior can be summarized by a set of input/output pairs.
• Where p is network input & t is the corresponding target. The

objective is to reduce the error e between the neuron response a,
& the target vector t (t – a).
Cont.
• The perceptron learning rule (e.g. learnp in Matlab) calculates desired
changes to the perceptron's weights and biases given an input vector p,
and the associated error e.
• The target vector t must contain values of either 0 or 1, as perceptrons
(with hardlim transfer functions) can only output such values.
• By carefully increasing the number of epochs, i.e. each time learnp is
executed, the perceptron has a better chance of getting closer to the
target values, & hence converging.
The Decision Boundary

• The decision boundary is a line in the input space (vector space); on
one side of the line, the network output is 0 while on the other side,
the network output is 1.
• Decision boundary Example: Suppose that we have a 2- input
perceptron with one neuron, as shown in the next figure, & we want to
calculate its decision boundary.
• The decision boundary is determined by the input vectors for which
the net input n is zero:
Example
• We assume the following values for the

weights:
Example(cont.)
• Then,
n = p1 + p2 -1 = 0
• Set p1 =0
• Set p2 = 0
• Now, we can test one point to determine which side

of the boundary corresponds to a decision of 1.
p  2,0
• Consider the input T
The decision
boundary, in blue,
is orthogonal to
the weight vector,
1w. That means
that our classes
are Linearly
separable.
Perceptron Implementation
• Orthogonal means that the weight vector is a 90̊
angle with the decision boundary.
• Example: implement an AND logic gate.
• Answer: It has the following input/target pairs:
Cont.
• First we need to select a decision boundary.
• Then, we choose a weight vector orthogonal to the
decision boundary.
• Then we choose any weight that falls in this vector,
for example;
• That leads us to this graph.

Perceptron Learning Rule (Summary)

1. Choose initial weights randomly.
2. Present a randomly chosen pattern x.
3. Update weights using Delta rule:
wij (t+1) = wij (t) + ei * xj
where ei = (targeti - outputi)
4. Repeat steps 2 and 3 until the stopping criterion (convergence,

max number of iterations) is reached.
Cont.
• The process of finding new weights (and biases) can be repeated
until there are no errors.
• Note that the perceptron learning rule is guaranteed to converge in
a finite number of steps for all problems that can be solved by a
perceptron.
• These include all classification problems that are "linearly
separable".

Lect 2

Uploaded by

Document Information

Original Description:

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Lect 2

Uploaded by

Copyright:

Available Formats

University Of Khartoum

Department Of Electronics & Electrical

EEE52511: NEURAL NETWORKS

Complexity of Human Neural System

Neural Networks Approach

Neuron model with vector input

Multiple Layers Neurons

General FeedForward Artificial Neural Networks Architecture

Multiple Layers Neurons (cont.)

• The above example has R1 inputs, S1 neurons in the first layer, S2

a 3  f 3 (LW3, 2 f 2 (LW2,1 f 1 (IW1,1P + b1) + b2 ) + b3)

2- Binary threshold neurons

0 threshold weighted input

• Developed by McCulloch-Pitts (1943): Also called the hard limiter

2- Binary threshold neurons (cont.)

3- Rectified Linear Neurons

5- Softmax Transfer Function

• When we have several independent binary attributes by which to

classify the data, we need to use a network with multiple logistic

• Then we have n output neurons, each one corresponding to one

Softmax Transfer Function (cont.)

• Where z is the value of each input node.

6. Radial Basis and Triangular Basis

Learning in Artificial Neural Network:

• Learning in the context of neural network is defined as:

• Learning is a process by which the free parameters of a neural

network are adapted through a process of stimulation by the

2- Unsupervised learning (cont.)

• Unsupervised or self-organised learning; the neural

3- Reinforced Learning (cont.)

Two types of supervised learning

• Regression: The target output is a real number or a whole vector of

• Classification: The target output is a class label.

Supervised Learning Example

• The following graph shows the number of times a

Classification Example (cont.)

Linear Regression Example

Linear Regression Implementation- Cost Function

House cost prediction Example (Cont.)

Supervised Learning (Cont.)

• A learning rule, also known as training algorithm, is defined as a

procedure for modifying the weights and biases of a network.

• The learning rule is applied to train the network to perform some

Perceptrons – the first NNs

One Perceptron layer

Perceptron Learning Rule

wnew = w old + ep = w old + (t – a)p

For bias; bnew = b old +e

Perceptron Learning Rule: (Convergence Theorem)

• Where p is network input & t is the corresponding target. The

The Decision Boundary

• We assume the following values for the

• Now, we can test one point to determine which side

• That leads us to this graph.

Perceptron Learning Rule (Summary)

where ei = (targeti - outputi)

4. Repeat steps 2 and 3 until the stopping criterion (convergence,

You might also like