Download as pdf or txt
Download as pdf or txt
You are on page 1of 12

Empirical Risk Minimization

Dr. Jainendra Shukla


jainendra@iiitd.ac.in
B-501, R & D Block
Computer Science and Engineering
The Problem
Learning Process
Three components

1. Generator of random vectors x, i.i.d. from a fixed but


unknown distribution P(x)

2. A supervisor (oracle/jyotish) which returns an output


vector y, for every input vector x, as per the conditional
distribution P(y|x), also fixed but unknown.

3. A learning machine capable of implementing a set of


functions

f(x,w) w ε W
Learning Process
● The learning problem is to choose from the given set of
functions the one which best approximates the
supervisor’s response.

● The selection is based on training samples


(xi, yi) i = 1,2,3...l
Loss Functions
● To choose the best function, it makes sense to
minimize a loss (or cost or discrepancy) between the
response of the supervisor and the learning machine,
given an input (x, y)
L(y, f(x))

● We want to minimize the loss over all samples


(xi, yi) i = 1,2,3...l
Loss Examples
Loss functions: L1 Loss
Loss functions: Squared Loss (L2 Loss)
Loss Functions - Classification
● Binary Classification with equal weights on misclassification
○ Minimize a 0-1 loss

● Classification with unequal weights on misclassification


○ Minimize a 0-107-500 loss
Risk Minimization
● Risk of f classification/regression function
= Expected Loss ⋿[L(y, f(x)]
= R(f) = ∫∫p(x,y)L(y, f(x))dx dy

● Aggregate over all points/examples of the probability


of observing that training example multiplied by the
corresponding loss
Empirical Risk Minimization Principle

● The risk functional is replaced by the empirical risk


functional

● Learning theory asks the following questions


○ What are the conditions for consistency?
○ How fast is the convergence rate?
○ How can one control generalization ability (on unseen
examples from F(x,y))?

You might also like