Download as pdf or txt
Download as pdf or txt
You are on page 1of 43

1 The current weight vectors for a multi-class perceptron are:

W1=[2,−6,9],W2=[3,−5,9],W3=[9,9,−3].
Update the weight vectors with a new training instance with f(x) = (−3,2,5),y*=2.
What are the weight vectors W1, W2 and W3 after the update?

Step 1: Calculate the Output for Each Class

The output for a class is the dot product of the weight vector of that class and the
feature vector f(x) .

Output for W1 : 2×-3+(−6) ×2+9×5

Output for W2 :3×-3+(−5) ×2+9×5

Output for W 3: 9×−3+9×2+(−3)×5


Step 2: Identify the Predicted Class

The predicted class is the one with the highest output. After calculating, we found:

Output for W1 : 27

Output for W2 : 26

Output for W3 : -24

The highest output is from W1 , so the predicted class is 1.

Step 2

Answer

Final Updated Weight Vectors

W1=[5,−8,4]

W2=[0,−3,14]

W3=[9,9,−3]

These are the weight vectors after updating them with the new training instance.
2 Given the following information, compute a perceptron weight update for the second
weight (new value for the second weight after this data point is processed during the
learning of the model):
Eta (learning rate): 0.1
Weights: [-1, 1, 0.1, 0.9]
Current training data point: [3, 1, -0.9, 1]
True label: 1

In the question, it is asked to compute perceptron weight update.


Let's see how the perceptron weight update is generally done.
Suppose given
Learning rate = a
Current Weights = [ w1,w2,w3,w4,....................,wn]
Current training data point = [ d1,d2,d3,d4,..............,dn]
True label = y
predicted label = p
Then using the following formula all the elements of Weights will be updated
w1 ( updated) = a ( y - p) d1
w2 ( updated) = a ( y - p) d2
w3 ( updated) = a ( y - p) d3
w4 ( updated) = a ( y - p) d4
....wn (updated) = a (y-p) dn
In the question given values are as shown below:
Eta (learning rate) = 0.1
=> a = 0.1
Current Weights = [-1,1,0.1,0.9]
There are 4 elements in Current Weights that mean
w1 = -1
w2 = 1
w3 = 0.1
w4 = 0.9
Current training data point = [ 3,1,-0.9,1]
There are 4 elements in the Current training data point that means
d1 = 3
d2 = 1
d3 = -0.9
d4 = 1
From Current Weights and Current training data point we can calculate Activation using
the below method:
Activation = w1d1 + w2d2 + w3d3 + w4d4
Activation= ( -3 ) + 1 + (-0.09) + 0.9
Activation = - 1.19
From Activation we can calculate the predicted label using the below condition:
If the value of Activation is greater than zero then the value of the predicted label is 1
else value of the predicted label is zero.
As you can see here the value of Activation is -1.19 which means the value of the
predicted label is zero.
That means the value of p =0.
In the question, it is given that the value of the True label is 1 which means y = 1.
Now we have all the information needed to compute a perceptron weight update.
In the question, it is asked what is the updated value of the second weight.
The second weight can be updated using the following method:
w2 ( updated) = w2 + a (y-p) d2 .................... (1)
Given w2 = 1 , a = 0.1 , y = 1 and d2 = 1
We computed the value of p as 0.
By putting all these values in equation (1) we will get the result shown below.
w2 ( updated) = 1 + 0.1 ( 1 - 0 ) 1
w2 ( updated) = 1.01
Hence the updated value of the second weight is 1.01.
Explanation:
We solved the problem and found that the updated value of the second weight is 1.01.
Hence answer of Question is 1.01.

3
Given this Perceptron with weights as shown (W_0 = 0.1,w_1 = 0.12, W_2 = -0.97, w_3
= -0.36,W_4 = -0.54), after using this training example while training with the Perceptron
training rule (weight update rule), what would the new weight of w_2 be? Note that eta is
0.1. x_1 (sepal length)||x_2 (sepal width) x_3 (petal length) x_4 (petal width) target
(species) 5.1 3.5 1.4 0.2 +1 (setosa) DO Xo=1 Wo = 0.5 X = 5.1 W=0.12 (X2=3.5 w E
output We -0.97 * = 1.4 Wg= -0.36 |X4= 0.2 Wys 0.54
What would the new weight of w_2 be?

Frank Rosenblatt introduced "Perceptron rule" in 1957. Perceptron learning rule was
based on the MCP neuron.

MCP neurons or The McCulloch-Pitts Neuron model, is a theory developed in the 90s
where Walter McCulloch and Warren Pitts simulated a netwok model based on human
neuron in Biology consisting of Axons, Soma etc. These are mapped to the functioning
of the neural network.
Perceptron algorithm changes weights for the input signals to attain a linear decision
boundary, which is +1 or -1. So until the "target" value is reached, weights have to be
modified with inputs remaining the same.

As you can see that the inputs "x" and weights "w" are fed to a "Summation function"
. Which will sum up all the "input values multiplied with weights".

Suppose there are n numbers of input, so the equation will be ,

or, w0x0 + w1x1 + ....... + wnxn = y

where y will be the output for each step of perceptron learning rule.

if w0x0 + w1x1 + ....... + wnxn + b > 0


otherwise

** here b is "bias" that adjusts the boundary away from origin without dependending on
the input values ** (this value is optional)

Now with the value provided the sum is,

(1 x 0.1) + (5.1 x 0.12) + (3.5 x -0.97) + (1.4 x -0.36) + (0.2 x -0.54) = - 3.295 ,

Which sets the value of "y" the output, as after "first learning step".
But our target output is
So in order to attain target value t, we have to modify the weights.

where,
= The learning rate or step size , the value is provided as .

So for

So for next step,

So this will be the new weight in a "decreased value" so that overall the output in turns
will reach towards the positive target

4 The following questions use the gradient descent algorithm. Use a step size of 0.1 , and
ϵ=0.001. a) Consider the function f(x,y)=x2+y2. Use the starting value (x0,y0)=(4,3).
Determine the value of (x2,y2). b) What is the final (x,y) value that minimizes the function
f(x,y)
5 Apply the gradient descent algorithm for the following function for 3 iterations assuming
learning rate alpha 0.01. Let's assume the starting values are x = 0 and y = 0 .

Apply the gradient descent algorithm for the following function for 3 iterations assuming
learning rate alpha 0.01 2) f(x,y) (x-7)A2 (y-3)A2
6 Given the following predictions and true labels, calculate the binary cross-entropy loss.
(Use the natural logarithm (logx:=lnx) in the calculation of the BCE. For the definition see
Lecture 2.) Predictions: [0.7,0.1,0.8,0.2] True labels: [1,0,0,1]
Answer
Given the provided predictions and true labels, the calculated Binary Cross-Entropy loss
is approximately 0.9202.
7

Consider a classification problem with two features x_1 and x_2. In this question, we will
walk through a single step of stochastic gradient descent on logistic regression with
cross entropy loss. Let's say for a given example, target value Y=1 is given and the two
features values are x_1 =3 and x_2=2. Let's assume that initial weights and bias (we call
then together as a vector w with three dimension w1, w2 and b) is set to O and the
learning rate is alpha = 0.1: w1 = w2= b = 0 alpha = 0.1 (learning rate) What would the
vector w=[w1, w2, b] be after one step of update of w? O [w1=0, w2=0, b=0] O [w1=0.15,
w2=0.1, b=0.05] O [w1=-1.5, w2=-1.0, b=-0.5] O [w1=-0.15, w2=-0.1, b=-0.05]
8
Step 1
9
1
0

Backpropagation The following neural network has 3 layers. The activation function in
the hidden layers is ReLu and the activation function in the output layer is sigmoid. (a)
Compute the output values of all nodes in forward propagation when the network is
given the input x1 =0,x2 =2, with the desired output y=0. All the weights are also provided
above.
1
1

Use Backpropagation method to update the weights and biases of the following neural
network with one input, one hidden layer of two neurons and one output. You are
required to do the forward pass and the backward pass four times by applying the inputs
one by one (i.e. one complete epoch where you apply each input-output pair once) to
update the network weights. The activation function of the hidden and output neurons is
ƒ(a) = 0.5a and the learning rate is 0.2 (n = 0.2). Report the local gradients. w₁ = 0.5, W₂
= 0.2, W₂ = 0.1,w₁ = 0.3 b₁ = 0.1,b₂ = 0.2,b3 = 0.1 b1 h1 w3 Input (i) Output (o) 1 0 1 1 0
1 0 0 h2 W1 w2 b2 W4 b3
1
2

Given the following predictions and true labels, calculate the binary cross-entropy loss.
(Use the natural logarithm (logx:=lnx) in the calculation of the BCE. For the definition see
Lecture 2.) Predictions: [0.7,0.1,0.8,0.2] True labels: [1,0,0,1]

Explanation:
Binary Cross-Entropy loss measures the difference between true binary labels and
predicted probabilities. In the given data, we compare each prediction's likelihood with
the actual label to compute the overall loss.
Explanation:
The Binary Cross-Entropy (BCE) loss calculates the difference between true labels and
predicted probabilities. In this case, the BCE loss is approximately 0.9202, indicating the
given predictions' accuracy compared to the true labels.

Answer
Given the provided predictions and true labels, the calculated Binary Cross-Entropy loss
is approximately 0.9202.

1 Consider a multi-class classification problem with four classes, A, B, C, and D. To solve the
3 problem, you train a multi-class classifier, which outputs the probabilities of each class. Consider
a data example, whose true class is B, and the probabilities produced by the classifier when
applied to this data example is P(A) = 0.1, P(B) = 0.5, P(C) = 0.2 and P(D) = 0.2. What is the
cross entropy loss for this data example?
-(log(0.1)+log(0.5)+log(0.2)+log(0.2))
0.5
-log(0.5)
-log(0.1)
Step 1
Understand the Problem
You have a multi-class classification problem with four classes: A, B, C, and D. The true
class for a specific data example is B. The classifier outputs probabilities for each class:
P(A)=0.1
P(B)=0.5
P(C)=0.2
P(D)=0.2
Explanation:
The cross-entropy loss measures the dissimilarity between predicted and true probability
distributions. It penalizes deviation from actual class probabilities. The formula involves
summing over classes the product of true and predicted probabilities with a negative
sign.

Step 2
Define True Probability Distribution Vector
Since the true class is B, the true probability distribution vector y is [0,1,0,0]. This is
known as "one-hot encoding," where the element corresponding to the true class is 1,
and the others are 0.
Explanation:
For the data example with true class B, represent the true probability distribution vector y
as [0, 1, 0, 0]. Given classifier outputs P(A)=0.1,P(B)=0.5,P(C)=0.2,P(D)=0.2, these are
the predicted probabilities pi .
1
4
1
5

You might also like