Download as pdf or txt
Download as pdf or txt
You are on page 1of 71

University Of Khartoum

Department Of Electronics & Electrical


Engineering
Software & Control Engineering

EEE52511: NEURAL NETWORKS


& FUZZY SYSTEMS
By: Dr. Hiba Hassan Sayed
Lecture 2
16/1/2023 U of K: Dr. Hiba Hassan 2

Complexity of Human Neural System


• Biological information processing is robust and fault-tolerant:
• Early on in life, we have our greatest number of neurons, then
daily thousands of them are lost. Nevertheless, we continue to
function for many years without an associated decline in our
capabilities.
• Biological information processors are flexible:
• We do not need to be reprogrammed when we go into a new
environment; we adapt to the new environment, i.e. we learn.
16/1/2023 U of K: Dr. Hiba Hassan 3

HNS Cont.
• The way we handle fuzzy, probabilistic, noisy and inconsistent data
is possible with computer programs under specific circumstances.
• Highly sophisticated programming and when the context of such
data has been analyzed in detail.
• We have natural ability to handle uncertainty.
• The biological processing unit, the brain, is highly parallel, small,
compact and requires little power.
16/1/2023 U of K: Dr. Hiba Hassan 4

Neural Networks Approach


• How to formulate neural network solutions:
1) Understand and specify your problem in terms of given inputs
and desired outputs.
2) Take the simplest form of network you think might be used to
solve your problem, e.g. a simple Perceptron.
3) Try to find appropriate connection weights (including neuron
thresholds) so that the network produces the right outputs for
each input in its training data.
16/1/2023 U of K: Dr. Hiba Hassan 5

Cont.
4) Use different sets of data; the network is trained on a set of
training data, and its generalization ability is tested using a new
testing data. More often validation data is used to tune the
hyperparameters.
5) If the network doesn’t perform well enough, go back to stage 3
and try harder.
6) If the network still doesn’t perform well enough, go back to stage
2 and try harder.
7) If the network still doesn’t perform well enough, go back to stage
1 and try harder.
8) Problem solved – move on to next problem.
16/1/2023 U of K: Dr. Hiba Hassan 6

Cont.
• There are two important aspects of the network’s operation to
consider:
• Learning: The network must learn decision boundaries from a set
of training patterns so that these training patterns are classified
correctly.
• Generalization: After training, the network must also be able to
generalize, i.e. correctly classify test patterns it has never seen
before.
• Usually we want the neural network to learn in a way that produces
good generalization.
16/1/2023 U of K: Dr. Hiba Hassan 7

Cont.
• Sometimes, the training data may contain errors (e.g.,
noise in the experimental determination of the input values,
or incorrect classifications).
• In this case, learning the training data perfectly may lead to
poor generalization. There is an important tradeoff
between learning and generalization that arises quite
often.
16/1/2023 U of K: Dr. Hiba Hassan 8

Neuron Models
• When the input is a vector, the individual element inputs are
multiplied (dot product) by weights and the weighted values are fed
to the summing junction.
• Then the output y is given by:

𝑦 = ෍ 𝑤𝑖 𝑥𝑖 = 𝑓 𝑤. 𝑥 = 𝑓(𝑤 𝑇 𝑥)
𝑖
• The neuron model is represented as follows:
16/1/2023 U of K: Dr. Hiba Hassan 9

Neuron model with vector input


16/1/2023 U of K: Dr. Hiba Hassan 10

A layer of neurons:
16/1/2023 U of K: Dr. Hiba Hassan 11

Multiple Layers Neurons


16/1/2023 U of K: Dr. Hiba Hassan 12

General FeedForward Artificial Neural Networks Architecture


(FFANN)
• FeedForward ANNs allow signal to travel one-way only, from input
to output.
• FeedForward ANNs tend to be straightforward networks that
associate inputs with outputs. They are extensively used in pattern
recognition.
• Figure above shows the architecture of a Multi-Layer FeedForward
neural network of log sigmoid neurons it is a equivalent to Multi-
Layer Perceptron (MLP) network.
16/1/2023 U of K: Dr. Hiba Hassan 13

Multiple Layers Neurons (cont.)

• The above example has R1 inputs, S1 neurons in the first layer, S2


neurons in the second layer, etc. It is common for different layers to
have different numbers of neurons. The output of previous figure is
defined as follows :

a 3  f 3 (LW3, 2 f 2 (LW2,1 f 1 (IW1,1P + b1) + b2 ) + b3)


• The layer that produces the network output is the output layer and
all the middle layers are called the hidden layers.
16/1/2023 U of K: Dr. Hiba Hassan 14

TRANSFER (ACTIVATION)
FUNCTIONS
16/1/2023 U of K: Dr. Hiba Hassan 15

1- Linear neurons
• These are simple but computationally limited
• If we can make them learn we may gain insight and proceed into
more complicated neurons.

y  b   xi wi y
0
i
0
b   xi wi
i
16/1/2023 U of K: Dr. Hiba Hassan 16

2- Binary threshold neurons

output

0 threshold weighted input

• Developed by McCulloch-Pitts (1943): Also called the hard limiter


transfer function.
16/1/2023 U of K: Dr. Hiba Hassan 17

2- Binary threshold neurons (cont.)


• There are two equivalent ways to write the
equations for a binary threshold neuron:

z   xi wi z = b + å xi wi
q = -b i
i

1 if z  1 if z³0
y y
0 otherwise 0 otherwise
16/1/2023 U of K: Dr. Hiba Hassan 18

3- Rectified Linear Neurons


(linear threshold neurons)
They compute a linear weighted sum of their
inputs.
The output is a non-linear function of the total
input.
z = b + å xi wi
i
z if z >0 y

y = z
0 otherwise 0
4- Sigmoid neurons

z = b+ å xiwi
• They give a real- 1
y
valued output that is a
1 e
 z
smooth and bounded
function of their total i
input.
1
• Typically they use
the logistic function
0.5
• They have positive y
derivatives which
make learning easy. 0
0 z
16/1/2023 U of K: Dr. Hiba Hassan 19
16/1/2023 U of K: Dr. Hiba Hassan 20

5- Softmax Transfer Function

• When we have several independent binary attributes by which to

classify the data, we need to use a network with multiple logistic


outputs.

• Then we have n output neurons, each one corresponding to one

class, and the target values are 1 for the correct class, and 0
otherwise.
16/1/2023 U of K: Dr. Hiba Hassan 21

Cont.
• Each output neuron will produce a value between 0 & 1, example;
0.3, 0.7, 0.8, 0.9….
• To solve this problem, a generalization of the logistic sigmoid was
developed, the softmax activation function.
• The softmax function converts a vector of numbers into a vector of
probabilities proportional to the scale of each vector element.
• Hence, it normalizes the output vector.
16/1/2023 U of K: Dr. Hiba Hassan 22

Softmax Transfer Function (cont.)


e zi
yi =
å e
zj

jÎgroup

• Where z is the value of each input node.


• A suitable cost (error) function to use with softmax
is the negative log probability of the right answer.
• This is called cross entropy cost function, and it is
given by,
C = - å t j log y j
j
16/1/2023 U of K: Dr. Hiba Hassan 23

6. Radial Basis and Triangular Basis


transfer functions:

a (n)  exp( n )
2
1  n , if  1  n  1
a ( n)  
0, otherwise
16/1/2023 U of K: Dr. Hiba Hassan 24
16/1/2023 U of K: Dr. Hiba Hassan 25
16/1/2023 U of K: Dr. Hiba Hassan 26

Learning in Artificial Neural Network:

• Learning in the context of neural network is defined as:

• Learning is a process by which the free parameters of a neural

network are adapted through a process of stimulation by the


environment in which the network is embedded. The type of
learning is determined by the manner in which the parameter
changes take place.
16/1/2023 U of K: Dr. Hiba Hassan 27

Learning Algorithm
• The learning algorithm is a prescribed set of well-defined rule for
the solution of a learning problem.
• In every learning algorithm, we must specify the cost function.
• Cost function - is a way of using your training data to determine
values for your parameters which produces an output function as
accurate as possible.
• The Learning paradigm is a model of the environment in which
the neural network operates.
• There are three major learning paradigms.
16/1/2023 U of K: Dr. Hiba Hassan 28

1- Supervised Learning
• A teacher is present during the learning process & the desired
output is presented.
• Every input pattern is used to train the network.
• The cost function is given by the difference between the
network’s computed output and the expected output (the error).
16/1/2023 U of K: Dr. Hiba Hassan 29

2- Unsupervised Learning
• There is no teacher.
• No expected output is presented to the network.
• The system undergoes self learning by discovering and adapting to
the structural features in the input patterns.
• The cost function is determined by the task formulation.
• Most applications fall within the domain of estimation problems such
as statistical modeling, compression, filtering, blind source separation
and clustering.
16/1/2023 U of K: Dr. Hiba Hassan 30

2- Unsupervised learning (cont.)

• Unsupervised or self-organised learning; the neural


network is presented with input data only; no target.
• Unsupervised learning tends to follow the neuro-biological
organisation of the brain.
16/1/2023 U of K: Dr. Hiba Hassan 31

3- Reinforced Learning
• There is a teacher.
• There is no expected outcome presented to the network.
• The teacher help by indicating if a computed output is right or
wrong.
• A reward is given for the right one & a penalty is given for the
wrong one.
• Data is usually not given, but generated by an agent's interactions
with the environment.
16/1/2023 U of K: Dr. Hiba Hassan 32

3- Reinforced Learning (cont.)


• At each point in time, the agent performs an action and the environment
generates an observation and the instantaneous cost according to some
dynamics.
• The aim is to discover a policy for selecting actions that minimizes some
measure of a long-term cost, i.e. the expected cumulative cost.
• That is, the goal is to map situations to actions, so as to maximize a
numerical reward signal
• The environment's dynamics and the long-term cost for each policy are
usually unknown, but can be estimated.
16/1/2023 U of K: Dr. Hiba Hassan 33

Cont.
• Tasks that fall within the paradigm of
reinforcement learning are control problems,
games and other sequential decision making
tasks.
16/1/2023 U of K: Dr. Hiba Hassan 34

Two types of supervised learning


• Each training case consists of an input vector x and a target output
t.

• Regression: The target output is a real number or a whole vector of


real numbers.
• The price of a stock in 6 months time.
• The temperature at noon tomorrow.

• Classification: The target output is a class label.


• The simplest case is a choice between 1 and 0.
• We can also have multiple alternative labels.
16/1/2023 U of K: Dr. Hiba Hassan 35

Supervised Learning Example


• Here is an example of a Regression supervised
learning problem.
16/1/2023 U of K: Dr. Hiba Hassan 36

Example
• "Given this data, a friend has a house 650 square feet - how much
can they be expected to get?"
There are different approaches that can be used to solve this,
• A Straight line through data
• Maybe $150 000
• A Second order polynomial
• Maybe $200 000
• Each of these approaches represent a way of doing supervised
learning.
16/1/2023 U of K: Dr. Hiba Hassan 37

Cont.
• So, a training data is provided in which the actual price of the
house is known.
• The algorithm uses this to learn to predict prices of houses for any
other set of data.
• We call this a regression problem because,
• It predicts continuous valued output (price)
• It has no real discrete definition.
16/1/2023 U of K: Dr. Hiba Hassan 38

Example 2

• The following graph shows the number of times a


breast tumor is benign or malignant vs its tumor
size:
16/1/2023 U of K: Dr. Hiba Hassan 39

Example 2 (cont.)
• The graph shows that we have 5 tumors of each kind.
• We want to find a way to classify whether a tumor is benign or
malignant according to our trained network!
• This is an example of a classification problem
• Classify data into one of two discrete classes - malignant or not.
• In classification problems, we may have a discrete number of
possible values for the output, e.g. 0 – benign, 1 - type 1, 2 - type
2, 3 - type 4.
• In classification problems we can plot data in different ways.
16/1/2023 U of K: Dr. Hiba Hassan 40

Classification Example (cont.)


• Notice that only the size attribute was used there.
• There may be other attributes to be used such as
age.
16/1/2023 U of K: Dr. Hiba Hassan 41

Cont.
• Based on that data, you can try and define separate classes by,
• Drawing a straight line between the two groups
• Using a more complex function to define the two groups.
• Then, when you have an individual with a specific tumor size and
who is a specific age, you can use that information to place them
into one of your classes
• You might have many features to consider
• Clump thickness, Uniformity of cell size, Uniformity of cell
shape…etc.
16/1/2023 U of K: Dr. Hiba Hassan 42

Review Questions
1. The amount of rain that falls is usually measured in either (mm)
or inches. Suppose you use a learning algorithm to predict how
much rain will fall next week. Is this a classification or a
regression problem?
2. Suppose you are working on stock market prediction. Typically
tens of millions of shares of Microsoft stock are traded (i.e.,
bought/sold) each day. You would like to predict the number of
Microsoft shares that will be traded tomorrow. Is this as a
classification or a regression problem?
16/1/2023 U of K: Dr. Hiba Hassan 43

Linear Regression Example


• Considering the training algorithm of the Housing price data
example given earlier.
• The data set is our Training set.
• We define the following variables:
• m = number of training examples
• x's = input variables / features
• y's = output variable "target" variables
• (x,y) - single training example
• (xi, yj) - specific example (ith training example)
i is an index to training set
16/1/2023 U of K: Dr. Hiba Hassan 44

Cont.
• The number of training examples is 47 & the first four are shown
below:
16/1/2023 U of K: Dr. Hiba Hassan 45

Cont.
• The training set is passed to the learning algorithm.
• This algorithm accepts an input such as the size of a new house &
makes a hypothesis, denoted by h to output the estimated value of Y.
• The hypothesis h is given by,
• hθ(x) = θ0 + θ1x, h(x) (shorthand)
• As noticed Y is a linear function of x
• θi are parameters
• θ0 is zero condition
• θ1 is gradient
• This kind of function is a linear regression with one variable.
• Also called univariate linear regression
16/1/2023 U of K: Dr. Hiba Hassan 46

Linear Regression Implementation- Cost Function


• Cost function - is a way of using your training data to determine
values for your θ values which make the hypothesis as accurate as
possible.
• Different values for θi (parameters) give you different functions.
• If θ0 is 1.5 and θ1 is 0 then we get straight line parallel with X along 1.5
@y
• If θ1 is > 0 then we get a positive slope
• Hence, we need to choose those parameters so hθ(x) as is close to
y as possible.
• Think of hθ(x) as a "y imitator" - it tries to convert the x into y, and since
we know y, it is possible to evaluate how well hθ(x) does this.
16/1/2023 U of K: Dr. Hiba Hassan 47

Cont.
• Therefore, we need to optimize our system by solving
a minimization problem, i.e. minimize the difference between h(x)
and y for each example.
• One of the most common used cost functions is the squared error
cost function given by the following:

 
m 2
1
J ( 0 , 1 )   ho ( x )  y
i i

2 i 1
16/1/2023 U of K: Dr. Hiba Hassan 48

House cost prediction Example (Cont.)


• To minimize the above cost function, we compute the values
of θ0 and θ1 which find on average the minimal deviation of x from
y when we use those parameters in our hypothesis function.
• Hence, we need an algorithm which will help us determine those θ 0
& θ1 values.
• There are several learning algorithms, such as; Hebbian, Gradient
Descent, Competitive & Stochastic.
• For supervised learning, usually either Gradient Descent or
Stochastic is used, while the other two are used for unsupervised
learning.
16/1/2023 U of K: Dr. Hiba Hassan 49

Supervised Learning
• A programmer specifies number of units in each layer and connectivity
between units, so the only unknown is the set of weights associated with
the connections.
16/1/2023 U of K: Dr. Hiba Hassan 50

Supervised Learning (Cont.)


Algorithm:
• Initialize the weights in the network (usually with random values).
• Repeat until stopping criterion is met.
• For each example in training set do:
• O=neural network output
• T=desired output (Teacher or Target)
• Update weights
Note: Each pass through all of the training examples is called
epoch.
16/1/2023 U of K: Dr. Hiba Hassan 51

Learning Rules:

• A learning rule, also known as training algorithm, is defined as a

procedure for modifying the weights and biases of a network.

• The learning rule is applied to train the network to perform some

particular task.
16/1/2023 U of K: Dr. Hiba Hassan 52

Learning Rules
• These learning types may use different learning rules, such as:
• Hebbian,
• Gradient descent,
• Competitive,
• Stochastic.
• Hence, the learning types are categorized even further according
to the rule used.
16/1/2023 U of K: Dr. Hiba Hassan 53

Perceptrons – the first NNs


• They are the first neural networks, introduced in 1950s by Frank
Rosenblatt along with other researchers.
• It was developed to perform pattern recognition, hence it is a
classifier.
• It is a fast and reliable network.
• It could be a single layer or multi layered.
• It has limited applications.
16/1/2023 U of K: Dr. Hiba Hassan 54

Perceptrons (cont.)
• It is made up of only input neurons and output neurons
• Input neurons, usually, have two states: ON and OFF
• A simple threshold activation function is used for the output
neurons.
• It uses supervised training
• Example:
16/1/2023 U of K: Dr. Hiba Hassan 55
16/1/2023 U of K: Dr. Hiba Hassan 56

Cont.
• Based on that simple example, now we can develop the learning
rule for a perceptron.
• The perceptron, usually, uses a hard limit activation function as
shown in the following figures.
16/1/2023 U of K: Dr. Hiba Hassan 57

Perceptrons
One perceptron neuron
16/1/2023 U of K: Dr. Hiba Hassan 58

One Perceptron layer


16/1/2023 U of K: Dr. Hiba Hassan 59

A layer of Perceptrons
16/1/2023 U of K: Dr. Hiba Hassan 60

Multilayer Perceptron
16/1/2023 U of K: Dr. Hiba Hassan 61

Perceptron Learning Rule


• First, we define the perceptron error e;
e = t – a,
Where; t = target, a = output.
• Hence, we update the weight via the following rule:

wnew = w old + ep = w old + (t – a)p

For bias; bnew = b old +e


16/1/2023 U of K: Dr. Hiba Hassan 62

Perceptron Learning Rule: (Convergence Theorem)


• Perceptrons are trained on examples of desired behavior. The
desired behavior can be summarized by a set of input/output pairs.

• Where p is network input & t is the corresponding target. The


objective is to reduce the error e between the neuron response a,
& the target vector t (t – a).
16/1/2023 U of K: Dr. Hiba Hassan 63

Cont.
• The perceptron learning rule (e.g. learnp in Matlab) calculates desired
changes to the perceptron's weights and biases given an input vector p,
and the associated error e.
• The target vector t must contain values of either 0 or 1, as perceptrons
(with hardlim transfer functions) can only output such values.
• By carefully increasing the number of epochs, i.e. each time learnp is
executed, the perceptron has a better chance of getting closer to the
target values, & hence converging.
16/1/2023 U of K: Dr. Hiba Hassan 64

The Decision Boundary


• The decision boundary is a line in the input space (vector space); on
one side of the line, the network output is 0 while on the other side,
the network output is 1.
• Decision boundary Example: Suppose that we have a 2- input
perceptron with one neuron, as shown in the next figure, & we want to
calculate its decision boundary.
• The decision boundary is determined by the input vectors for which
the net input n is zero:
16/1/2023 U of K: Dr. Hiba Hassan 65

Example

• We assume the following values for the


weights:
16/1/2023 U of K: Dr. Hiba Hassan 66

Example(cont.)
• Then,
n = p1 + p2 -1 = 0
• Set p1 =0

• Set p2 = 0

• Now, we can test one point to determine which side


of the boundary corresponds to a decision of 1.
16/1/2023 U of K: Dr. Hiba Hassan 67

p  2,0
• Consider the input T

The decision
boundary, in blue,
is orthogonal to
the weight vector,
1w. That means
that our classes
are Linearly
separable.
16/1/2023 U of K: Dr. Hiba Hassan 68

Perceptron Implementation
• Orthogonal means that the weight vector is a 90̊
angle with the decision boundary.
• Example: implement an AND logic gate.
• Answer: It has the following input/target pairs:
16/1/2023 U of K: Dr. Hiba Hassan 69

Cont.
• First we need to select a decision boundary.
• Then, we choose a weight vector orthogonal to the
decision boundary.
• Then we choose any weight that falls in this vector,
for example;

• That leads us to this graph.


16/1/2023 U of K: Dr. Hiba Hassan 70

Perceptron Learning Rule (Summary)


1. Choose initial weights randomly.
2. Present a randomly chosen pattern x.
3. Update weights using Delta rule:
wij (t+1) = wij (t) + ei * xj

where ei = (targeti - outputi)

4. Repeat steps 2 and 3 until the stopping criterion (convergence,


max number of iterations) is reached.
16/1/2023 U of K: Dr. Hiba Hassan 71

Cont.
• The process of finding new weights (and biases) can be repeated
until there are no errors.
• Note that the perceptron learning rule is guaranteed to converge in
a finite number of steps for all problems that can be solved by a
perceptron.
• These include all classification problems that are "linearly
separable".

You might also like