Linear Regression by Maximum Likelihood Maximumm A Posteriori and Bayesian Learning

You might also like

Download as pdf or txt
Download as pdf or txt
You are on page 1of 14

CS480/680

Lecture 5: May 22, 2019


Linear Regression by Maximum
Likelihood, Maximum A Posteriori and
Bayesian Learning
[B] Sections 3.1 – 3.3, [M] Chapt. 7

University of Waterloo CS480/680 Spring 2019 Pascal Poupart 1


Noisy Linear Regression
• Assume ! is obtained from " by a deterministic
function # that has been perturbed (i.e., noisy
measurement)
!=# " % +'

• Gaussian noise: /3 "


% 2(0, 0 4 )

% , /, 0) = 2 + /3 -
Pr(+|- %, 04
B
>? @/3 "
%A
9 8 =
= ∏678 < BCB
4:;

University of Waterloo CS480/680 Spring 2019 Pascal Poupart 2


Maximum Likelihood
• Possible objective: find best !∗ by maximizing the
likelihood of the data
!∗ = $%&'$(! Pr(,|/ . , !, 1)
=
78 9!: ;
.<
= $%&'$(! ∏4 5 6 =>=

: =
A8 6! ; .<
= $%&'$(! ∑4 −
BC=
:. B
= $%&'DE! ∑4 F4 − ! ;<

• We arrive at the original least square problem!


University of Waterloo CS480/680 Spring 2019 Pascal Poupart 3
Maximum A Posteriori
• Alternative objective: find !∗ with highest
posterior probability
• Consider Gaussian prior: Pr ! = &((, *)

• Posterior:
Pr ! ,, - ∝ Pr ! Pr - ,, !
;
!2 * 34 ! ∑7 87 3!2 9:
1 1
= /0 5 0 ;<;

University of Waterloo CS480/680 Spring 2019 Pascal Poupart 4


Maximum A Posteriori
• Optimization:
- , 0)
!∗ = $%&'$(! Pr(!|.
6- 8
= $%&'$(! − ∑4 54 − ! 74 − !6 9 :; !
6- 8
= $%&'<=! ∑4 54 − ! 74 + !6 9 :; !
• Let 9 :; = ?@ then
6- 8 8
= $%&'<=! ∑4 54 − ! 74 +? ! 8
• We arrive at the original regularized least square
problem!

University of Waterloo CS480/680 Spring 2019 Pascal Poupart 5


Expected Squared Loss
• Even though we use a statistical framework, it is
interesting to evaluate the expected squared loss
/ 1
! " = ∫%,' Pr(%, +) + −. %0 2%2+
/ 1
= ∫%,' Pr %, + +−3 % +3 % −. %0 2%2+
1 1
= ∫%,' Pr %, + +−3 % +2 +−3 % 3 % − ./ %
0 + 3 % − ./ %
0 2%2+

Expectation with respect to + is 0

1 / 1
!["] = ∫%,' Pr %, + +−3 % 2%2+ + ∫% Pr % 3 % −. %0 2%

noise (constant) error (depends on .)

University of Waterloo CS480/680 Spring 2019 Pascal Poupart 6


Expected Squared Loss
• Let’s focus on the error part, which depends on !
'( ) '( )
"# (% # − ! # ] = , Pr # % # − ! # /#
#
• But the choice of ! depends on the dataset 0
• Instead consider expectation with respect to 0

' )
"1 (% # − ( ]
!2 #
where !2 is the weight vector obtained based on 0

University of Waterloo CS480/680 Spring 2019 Pascal Poupart 7


Bias-Variance Decomposition
• Decompose squared loss
) +
!" ($ % − * ]
'( %
) +
= !" $ % − !" ')( %
* )
* − '( %
+ !" '( % *
) +
= !" / $ % − !" '( % *
+2 $ % − !" ')( %
* !" ')( % * − ')( %
*
) +
+ !" ')( %
* * 0
− '( %
Expectation is 0

+ ) +
= $ % − !" ')( %
* + !" !" ')( %
* − *
'( %

University of Waterloo
bias2 CS480/680 Spring 2019 Pascal Poupart
variance 8
Bias-Variance Decomposition
• Hence:
+
![#$%%] = ()*% + -*.)*/01 + /$)%1
• Picture:

University of Waterloo CS480/680 Spring 2019 Pascal Poupart 9


Bias-Variance Decomposition
• Example

University of Waterloo CS480/680 Spring 2019 Pascal Poupart 10


Bayesian Linear Regression
• We don’t know if !∗ is the true underlying !
• Instead of making predictions according to !∗ ,
compute the weighted average prediction according
to Pr(!|(' , *)
:
0 23 ∑ 7 2! 08
'9
! 1 ! 6 6
' , * = -. / /
Pr ! ( 4 . :;:
<
' = >(!/!)
/ !/! '
= -. : ' >/3 )
= ?(!,
' = @ /A >/3 (
where ! '*
'(
> = @ /A ( ' 0 + 1 /C

University of Waterloo CS480/680 Spring 2019 Pascal Poupart 11


Bayesian Learning

University of Waterloo CS480/680 Spring 2019 Pascal Poupart 12


Bayesian Learning

University of Waterloo CS480/680 Spring 2019 Pascal Poupart 13


Bayesian Prediction
• Let !∗ be the input for which we want a prediction
and #∗ be the corresponding prediction

& , ) = + Pr #∗ !
&∗ , (
Pr #∗ ! & , # -,
&∗ , , Pr , (
,
4 5
2∗ 3&
!∗ , 7
1 1 & 8 9(,1,)
,1, &
= . ∫, 0 565 0 5 -,
= <(= 1> ! & ), = > + !
&4∗ 91? ( &4∗ 91? !
&∗ )

University of Waterloo CS480/680 Spring 2019 Pascal Poupart 14

You might also like