Summary Lecture 12

You might also like

Download as docx, pdf, or txt
Download as docx, pdf, or txt
You are on page 1of 1

Summary Lecture 12

Benedikt Willecke
IES19372

If we want a likelihood for multiple classes, we can use the SoftMax activation function. The
equations for that are given in the lecture. Basically, we competent the e function with exponent s
(the score) and divide by all the scores summed. The Hinge Loss penalizes all wrong scores and
doesn’t penalize the right ones. For this we choose the maximum between 0 and 1-y*s. The log
Likelihood Loss also gives us a probability of all the possibilities. But additionally, in the gradient
everything is 0 except for the max probability. The Cross-entropy Loss is basically the same as the Log
Likelihood Loss with the distinction that the gradients of the other probabilities are (mostly) not 0.
This means that all probabilities get penalized which could get us more consistent results. Basically,
we don’t only maximize or right probability but also minimize wrong probabilities. For the regression
loss we just get the square of the solution minus our prediction. In neural networks we have several
layers of weights and biases with activation functions in between. We have several from which we
can choose from with ReLu being the most popular. In order to optimize all these variables, we use
Backpropagation. For this we need the gradients. Because one layer depends on the previous one,
we use the chain rule to compute the gradients. Let’s say we have layer A, B and C. C is our output
layer. If we want to see how A affects the output, we should calculate: dC/dB*dB/dA. This has to be
done for every variable in every layer except the input variable. Because this is very resource
intensive, we do this with just a sample called batch size.

You might also like