Professional Documents
Culture Documents
Lect 2
Lect 2
HNS Cont.
• The way we handle fuzzy, probabilistic, noisy and inconsistent data
is possible with computer programs under specific circumstances.
• Highly sophisticated programming and when the context of such
data has been analyzed in detail.
• We have natural ability to handle uncertainty.
• The biological processing unit, the brain, is highly parallel, small,
compact and requires little power.
16/1/2023 U of K: Dr. Hiba Hassan 4
Cont.
4) Use different sets of data; the network is trained on a set of
training data, and its generalization ability is tested using a new
testing data. More often validation data is used to tune the
hyperparameters.
5) If the network doesn’t perform well enough, go back to stage 3
and try harder.
6) If the network still doesn’t perform well enough, go back to stage
2 and try harder.
7) If the network still doesn’t perform well enough, go back to stage
1 and try harder.
8) Problem solved – move on to next problem.
16/1/2023 U of K: Dr. Hiba Hassan 6
Cont.
• There are two important aspects of the network’s operation to
consider:
• Learning: The network must learn decision boundaries from a set
of training patterns so that these training patterns are classified
correctly.
• Generalization: After training, the network must also be able to
generalize, i.e. correctly classify test patterns it has never seen
before.
• Usually we want the neural network to learn in a way that produces
good generalization.
16/1/2023 U of K: Dr. Hiba Hassan 7
Cont.
• Sometimes, the training data may contain errors (e.g.,
noise in the experimental determination of the input values,
or incorrect classifications).
• In this case, learning the training data perfectly may lead to
poor generalization. There is an important tradeoff
between learning and generalization that arises quite
often.
16/1/2023 U of K: Dr. Hiba Hassan 8
Neuron Models
• When the input is a vector, the individual element inputs are
multiplied (dot product) by weights and the weighted values are fed
to the summing junction.
• Then the output y is given by:
𝑦 = 𝑤𝑖 𝑥𝑖 = 𝑓 𝑤. 𝑥 = 𝑓(𝑤 𝑇 𝑥)
𝑖
• The neuron model is represented as follows:
16/1/2023 U of K: Dr. Hiba Hassan 9
A layer of neurons:
16/1/2023 U of K: Dr. Hiba Hassan 11
TRANSFER (ACTIVATION)
FUNCTIONS
16/1/2023 U of K: Dr. Hiba Hassan 15
1- Linear neurons
• These are simple but computationally limited
• If we can make them learn we may gain insight and proceed into
more complicated neurons.
y b xi wi y
0
i
0
b xi wi
i
16/1/2023 U of K: Dr. Hiba Hassan 16
output
z xi wi z = b + å xi wi
q = -b i
i
1 if z 1 if z³0
y y
0 otherwise 0 otherwise
16/1/2023 U of K: Dr. Hiba Hassan 18
y = z
0 otherwise 0
4- Sigmoid neurons
z = b+ å xiwi
• They give a real- 1
y
valued output that is a
1 e
z
smooth and bounded
function of their total i
input.
1
• Typically they use
the logistic function
0.5
• They have positive y
derivatives which
make learning easy. 0
0 z
16/1/2023 U of K: Dr. Hiba Hassan 19
16/1/2023 U of K: Dr. Hiba Hassan 20
class, and the target values are 1 for the correct class, and 0
otherwise.
16/1/2023 U of K: Dr. Hiba Hassan 21
Cont.
• Each output neuron will produce a value between 0 & 1, example;
0.3, 0.7, 0.8, 0.9….
• To solve this problem, a generalization of the logistic sigmoid was
developed, the softmax activation function.
• The softmax function converts a vector of numbers into a vector of
probabilities proportional to the scale of each vector element.
• Hence, it normalizes the output vector.
16/1/2023 U of K: Dr. Hiba Hassan 22
jÎgroup
a (n) exp( n )
2
1 n , if 1 n 1
a ( n)
0, otherwise
16/1/2023 U of K: Dr. Hiba Hassan 24
16/1/2023 U of K: Dr. Hiba Hassan 25
16/1/2023 U of K: Dr. Hiba Hassan 26
Learning Algorithm
• The learning algorithm is a prescribed set of well-defined rule for
the solution of a learning problem.
• In every learning algorithm, we must specify the cost function.
• Cost function - is a way of using your training data to determine
values for your parameters which produces an output function as
accurate as possible.
• The Learning paradigm is a model of the environment in which
the neural network operates.
• There are three major learning paradigms.
16/1/2023 U of K: Dr. Hiba Hassan 28
1- Supervised Learning
• A teacher is present during the learning process & the desired
output is presented.
• Every input pattern is used to train the network.
• The cost function is given by the difference between the
network’s computed output and the expected output (the error).
16/1/2023 U of K: Dr. Hiba Hassan 29
2- Unsupervised Learning
• There is no teacher.
• No expected output is presented to the network.
• The system undergoes self learning by discovering and adapting to
the structural features in the input patterns.
• The cost function is determined by the task formulation.
• Most applications fall within the domain of estimation problems such
as statistical modeling, compression, filtering, blind source separation
and clustering.
16/1/2023 U of K: Dr. Hiba Hassan 30
3- Reinforced Learning
• There is a teacher.
• There is no expected outcome presented to the network.
• The teacher help by indicating if a computed output is right or
wrong.
• A reward is given for the right one & a penalty is given for the
wrong one.
• Data is usually not given, but generated by an agent's interactions
with the environment.
16/1/2023 U of K: Dr. Hiba Hassan 32
Cont.
• Tasks that fall within the paradigm of
reinforcement learning are control problems,
games and other sequential decision making
tasks.
16/1/2023 U of K: Dr. Hiba Hassan 34
Example
• "Given this data, a friend has a house 650 square feet - how much
can they be expected to get?"
There are different approaches that can be used to solve this,
• A Straight line through data
• Maybe $150 000
• A Second order polynomial
• Maybe $200 000
• Each of these approaches represent a way of doing supervised
learning.
16/1/2023 U of K: Dr. Hiba Hassan 37
Cont.
• So, a training data is provided in which the actual price of the
house is known.
• The algorithm uses this to learn to predict prices of houses for any
other set of data.
• We call this a regression problem because,
• It predicts continuous valued output (price)
• It has no real discrete definition.
16/1/2023 U of K: Dr. Hiba Hassan 38
Example 2
Example 2 (cont.)
• The graph shows that we have 5 tumors of each kind.
• We want to find a way to classify whether a tumor is benign or
malignant according to our trained network!
• This is an example of a classification problem
• Classify data into one of two discrete classes - malignant or not.
• In classification problems, we may have a discrete number of
possible values for the output, e.g. 0 – benign, 1 - type 1, 2 - type
2, 3 - type 4.
• In classification problems we can plot data in different ways.
16/1/2023 U of K: Dr. Hiba Hassan 40
Cont.
• Based on that data, you can try and define separate classes by,
• Drawing a straight line between the two groups
• Using a more complex function to define the two groups.
• Then, when you have an individual with a specific tumor size and
who is a specific age, you can use that information to place them
into one of your classes
• You might have many features to consider
• Clump thickness, Uniformity of cell size, Uniformity of cell
shape…etc.
16/1/2023 U of K: Dr. Hiba Hassan 42
Review Questions
1. The amount of rain that falls is usually measured in either (mm)
or inches. Suppose you use a learning algorithm to predict how
much rain will fall next week. Is this a classification or a
regression problem?
2. Suppose you are working on stock market prediction. Typically
tens of millions of shares of Microsoft stock are traded (i.e.,
bought/sold) each day. You would like to predict the number of
Microsoft shares that will be traded tomorrow. Is this as a
classification or a regression problem?
16/1/2023 U of K: Dr. Hiba Hassan 43
Cont.
• The number of training examples is 47 & the first four are shown
below:
16/1/2023 U of K: Dr. Hiba Hassan 45
Cont.
• The training set is passed to the learning algorithm.
• This algorithm accepts an input such as the size of a new house &
makes a hypothesis, denoted by h to output the estimated value of Y.
• The hypothesis h is given by,
• hθ(x) = θ0 + θ1x, h(x) (shorthand)
• As noticed Y is a linear function of x
• θi are parameters
• θ0 is zero condition
• θ1 is gradient
• This kind of function is a linear regression with one variable.
• Also called univariate linear regression
16/1/2023 U of K: Dr. Hiba Hassan 46
Cont.
• Therefore, we need to optimize our system by solving
a minimization problem, i.e. minimize the difference between h(x)
and y for each example.
• One of the most common used cost functions is the squared error
cost function given by the following:
m 2
1
J ( 0 , 1 ) ho ( x ) y
i i
2 i 1
16/1/2023 U of K: Dr. Hiba Hassan 48
Supervised Learning
• A programmer specifies number of units in each layer and connectivity
between units, so the only unknown is the set of weights associated with
the connections.
16/1/2023 U of K: Dr. Hiba Hassan 50
Learning Rules:
particular task.
16/1/2023 U of K: Dr. Hiba Hassan 52
Learning Rules
• These learning types may use different learning rules, such as:
• Hebbian,
• Gradient descent,
• Competitive,
• Stochastic.
• Hence, the learning types are categorized even further according
to the rule used.
16/1/2023 U of K: Dr. Hiba Hassan 53
Perceptrons (cont.)
• It is made up of only input neurons and output neurons
• Input neurons, usually, have two states: ON and OFF
• A simple threshold activation function is used for the output
neurons.
• It uses supervised training
• Example:
16/1/2023 U of K: Dr. Hiba Hassan 55
16/1/2023 U of K: Dr. Hiba Hassan 56
Cont.
• Based on that simple example, now we can develop the learning
rule for a perceptron.
• The perceptron, usually, uses a hard limit activation function as
shown in the following figures.
16/1/2023 U of K: Dr. Hiba Hassan 57
Perceptrons
One perceptron neuron
16/1/2023 U of K: Dr. Hiba Hassan 58
A layer of Perceptrons
16/1/2023 U of K: Dr. Hiba Hassan 60
Multilayer Perceptron
16/1/2023 U of K: Dr. Hiba Hassan 61
Cont.
• The perceptron learning rule (e.g. learnp in Matlab) calculates desired
changes to the perceptron's weights and biases given an input vector p,
and the associated error e.
• The target vector t must contain values of either 0 or 1, as perceptrons
(with hardlim transfer functions) can only output such values.
• By carefully increasing the number of epochs, i.e. each time learnp is
executed, the perceptron has a better chance of getting closer to the
target values, & hence converging.
16/1/2023 U of K: Dr. Hiba Hassan 64
Example
Example(cont.)
• Then,
n = p1 + p2 -1 = 0
• Set p1 =0
• Set p2 = 0
p 2,0
• Consider the input T
The decision
boundary, in blue,
is orthogonal to
the weight vector,
1w. That means
that our classes
are Linearly
separable.
16/1/2023 U of K: Dr. Hiba Hassan 68
Perceptron Implementation
• Orthogonal means that the weight vector is a 90̊
angle with the decision boundary.
• Example: implement an AND logic gate.
• Answer: It has the following input/target pairs:
16/1/2023 U of K: Dr. Hiba Hassan 69
Cont.
• First we need to select a decision boundary.
• Then, we choose a weight vector orthogonal to the
decision boundary.
• Then we choose any weight that falls in this vector,
for example;
Cont.
• The process of finding new weights (and biases) can be repeated
until there are no errors.
• Note that the perceptron learning rule is guaranteed to converge in
a finite number of steps for all problems that can be solved by a
perceptron.
• These include all classification problems that are "linearly
separable".