Professional Documents
Culture Documents
Supervised Learning Networks
Supervised Learning Networks
= 𝑤𝑖 ∗ 𝑥𝑖 − 𝜃 ≥ 0
Finally, we can replace –θ with b to indicate
“bias”, move the b to the front, and we obtain:
𝑧 = 𝑏 + 𝑤𝑖 ∗ 𝑥𝑖 ≥ 0
How Does a Perceptron Work?
Now, the weight for b can be learned along with
the weights for the input values.
If you omit the “bias term”, the perceptron won’t
be able to learn solutions that do not pass by the
origin in R dimensional space.
The threshold function for the perceptron is
defined as:
+1, 𝑖𝑓 𝑧 > 0
𝑦ො=𝑓 𝑧 =൜
ሺ ሻ
−1, 𝑜𝑡ℎ𝑒𝑟𝑤𝑖𝑠𝑒
Perceptron Learning Procedure
• Here we describe what Rosenblatt defined as an error-
corrective reinforcement learning procedure:
– Compute the mismatch between the obtained value and the
expected value for the training example
– If the obtained and expected values match, do nothing
– If the obtained and expected values do not match, compute
the difference or delta between those values
– Use the delta value to update the weights of the network
Perceptron Learning Procedure
Formally,
𝑤𝑘 +1 = 𝑤𝑘 + 𝛥𝑤𝑘
The 𝛥𝑤𝑘 is computed as:
𝛥𝑤𝑘 = 𝜂(𝑦 − 𝑦ො)𝑥𝑘
where:
𝑤𝑘 is the weight for case k
𝜂 is the learning rate
𝑦 is the actual value (true class label)
𝑦
ො is the predicted value (predicted class label)
𝑥𝑘 is the vector of inputs for case k
The learning rate has the role of facilitating the
training process by weighting the delta used to update
the weights.
This basically means that instead of completely
replacing the previous weight with the sum of weights
+Δ, we incorporate a proportion of the error into the
updating process to make the learning process more
stable.
Schematic Representation of the Perceptron
with the Learning Procedure
Multilayer Perceptron
Limitations of Perceptron Model
• The output of a perceptron can only be a
binary number due to the hard limit transfer
function.
• Perceptron can only be used to classify the
linearly separable sets of input vectors.
ADALINE (Adaptive Linear Neuron)
• The ADALINE was introduced shortly after Rosenblatt’s
perceptron by Bernard Widrow and Ted Hoff.
• The main difference between the perceptron and the
ADALINE is that the later works by minimizing the
mean-squared error of the predictions of a linear
function.
• This means that the learning procedure is based on
the outcome of a linear function rather than on the
outcome of a threshold function as in the perceptron.
ADALINE (Adaptive Linear Neuron)
• Mathematically, learning from the output of a
linear function enables the minimization of a
continuous cost or loss function.
• The cost function is a measure of the overall
badness (or goodness) of the network
prediction.
ADALINE-Mathematical Formalization
• Mathematically, the ADALINE is described
by:
– A linear function that aggregates the input signal
– A learning procedure to adjust connection weights
• The linear aggregation function is the same as
in the perceptron:
ADALINE-Threshold Decision Function
When dealing with a binary classification
problem, we still use a threshold function, as in the
perceptron, by taking the sign of the linear
function as:
+1, 𝑖𝑓 𝑦ො>0
𝑦ො
′ = 𝑓ሺ𝑦ොሻ= ൜
−1, 𝑜𝑡ℎ𝑒𝑟𝑤𝑖𝑠𝑒
where 𝑦ො is the output of the linear function.
Perceptron Vs. ADALINE
• The perceptron updates the weights by computing
the difference between the expected and predicted
class labels.
• In other words, the perceptron always compares
+1 or -1 (predicted values) to +1 or -1 (expected
values).
• An important consequence of this is that
perceptron only learns when errors are made.
• In contrast, the ADALINE computes the
difference between the expected class value y (+1
or -1) and the continuous output value 𝑦ොfrom the
linear function, which can be any real number.
Perceptron Vs. ADALINE
• It means that the ADALINE can learn even
when no classification mistake has been made.
• Since the ADALINE learns all the time and the
perceptron only after errors, the ADALINE
will find a solution faster than the perceptron
for the same problem.
The ADALINE Error Computation
• In a single iteration, the error in the ADALINE
is calculated as (𝑦 − 𝑦ො )2 , in words, by squaring
the difference between the expected value and
the predicted value.
• This process of comparing the expected and
predicted values is repeated for all cases, j=1 to
j=n, in a given dataset.
• Once we add the squared difference for the
entire dataset and divide by the total, we obtain
the mean of squared errors (MSE)
The ADALINE Error Computation
The ADALINE Error Surface