Professional Documents
Culture Documents
07 09 P3-En
07 09 P3-En
07 09 P3-En
In other words, we're saying for all the cases where you already
know the answer, because the answer was given to you,
whatever you have here, the loss should be minimal.
You want them to be very close to each other.
So we are going to write Ya,i minus Xa,i.
And I'm going to be using here, again,
the same as in regression, squared laws.
So this term tells you I want to be very
close to what was given to me.
Plus, in addition to it, I want to do regularization.
The same way as we've done in a linear regression where
we don't want our parameters to become really unruly large,
and we're keeping them-- there norm close to 0.
So in this case, this is our hyperparameter, lambda.
And we are going to go through all the entries of the matrix X
and we want them and the--
and we look at their square and we
want this norm to be minimal.
So if it helps you, you can also write it like this.
So this is our objective that we will try to minimize.
So we go through all possible matrix X's and we
want to find one which actually makes this empirical risk
the smallest.
So how can we do it?
Again, exactly the same way as we've
done in linear regression.
First of all, I want to notice that the way I formulated
the problem, every single entry here
is independent of each other.
Whenever I'm deciding about the preference of the first user
for the first movie and the second movie,
there is no connection between them.
So actually I don't have to keep this the sum.
I can independently estimate each one of those Xi's.
And the difference here would be the following--
whether this Xai actually part of this D or not.
Because for all those that are part of the set D,
we will need to look at these two factors.
For those which are not in D, we would only look at this factor.
So let's just start.
When looking again, we will make an assumption
that particular Xai that I am looking at it belongs to D.
So, in this case, as I've said what we are going to do,
we will take J X of that a, i and do the same thing
we've done previously, which is just differentiate with respect
to this Xi.
And there is no sum here, again, because we are
looking at one Xi at a time.
So we will take Ya,i minus Xa, i square divided by 2 plus alpha
2 Xi squared.