Download as pdf or txt
Download as pdf or txt
You are on page 1of 46

Advanced Machine Learning

(COMP 5328)
Sparse Coding and Regularisation

Tongliang Liu
Quiz results

COMP5328 Tongliang Liu 2


Review

3
Dictionary learning
What is a dictionary in machine learning?
d
Let x 2 R , D2R d⇥k

⇤ 2
↵ = arg min kx D↵k .
↵2Rk

p
Note that kxk = x> x is the ell 2 norm.

d
Given x1 , . . . , xn 2 R
Xn
⇤ ⇤ ⇤ 1 2
{D , ↵1 , . . . , ↵ n } = arg min kxi D↵i k .
D2Rd⇥k ,↵1 ,...,↵n 2Rk n i=1

COMP5328 Tongliang Liu 4


Dictionary learning

Note that
Xn
1 1 2
kxi D↵i k = kX DRkF2 ,
n i=1 n

d⇥n
where X = [x1 , x2 , . . . , xn ] 2 R ,
k⇥n
R = [↵1 , ↵2 , . . . , ↵n ] 2 R .
v
q u d n
uX X
kXkF = trace(X > X) = t 2
Xi,j is the Frobenius norm of X.
i=1 j=1

COMP5328 Tongliang Liu 5


Dictionary learning

Note that

2
arg min kX DRkF ,
D2D,R2R

where D and R are some speci c domains for D and R.

COMP5328 Tongliang Liu 6


fi
Optimisation
Objective:
2
min kX DRkF
D2D,R2R
The objective is convex with respect to either R or D but
not to both.

Fix R , solve for D


2
min kX DRkF
D2D

Fix D , solve for R


2
min kX DRkF
R2R
Engan, Kjersti, Sven Ole Aase, and J. Hakon Husoy. "Method of optimal directions for frame design." Acoustics, Speech, and Signal Processing,
1999. Proceedings., 1999 IEEE International Conference on.Vol. 5. IEEE, 1999.

COMP5328 Tongliang Liu 7


Dictionary learning
application

Dimension(
Reduction


X

COMP5328 Tongliang Liu 8


Sparse Coding
Why sparse?
Many signals are sparse in some transform domain.

Image credit: https://datacommandnet.blogspot.com/p/periodic-analog-signals.html

COMP5328 Tongliang Liu 10


Why sparse?
Many signals are sparse in some transform domain.

Image credit: http://www.princeton.edu/~yc5/ele538b_sparsity/lectures/sparse_representation.pdf

COMP5328 Tongliang Liu 11


Why sparse?
Many signals are sparse in some transform domain.

Image credit: http://www.princeton.edu/~yc5/ele538b_sparsity/lectures/sparse_representation.pdf

COMP5328 Tongliang Liu 12


Why sparse?
Many signals are sparse in some transform domain.

Image credit: http://www.princeton.edu/~yc5/ele538b_sparsity/lectures/sparse_representation.pdf

COMP5328 Tongliang Liu 13


Why sparse?
Many signals are sparse in some transform domain.

Musical instrument spectrum Speech spectrogram


COMP5328 Tongliang Liu 14
Why sparse?
Many signals are sparse in some transform domain.

Natural image
Image credit: Zaouali, Bouzidi, and Zagrouba: Review of multiscale geometric decompositions in a remote sensing context

COMP5328 Tongliang Liu 15


Dictionary learning

Note that

2
arg min kX DRkF ,
D2D,R2R

where D and R are some speci c domains for D and R .

COMP5328 Tongliang Liu 16


fi
Sparse coding

Note that
2
arg min kX DRkF
D2D,R2R

X # "

Sparsity

Overcompleteness
COMP5328 Tongliang Liu 17
`p norm
0 11/p
Xk
`p norm: k↵kp = @ |↵j | pA
, where ↵ 2 R .
k

j=1

k
X
p p
In other words, k↵kp = |↵j | .
j=1

COMP5328 Tongliang Liu 18


K-means
K-means clustering:

D:1

2
D:3
min kX DRkF D:2
D2D,R2R

Special requirement: each column of R is an one-hot


vector, i.e., kRi k0 = 1 & kRi k1 = 1.

COMP5328 Tongliang Liu 19


K-SVD

K-SVD: min kX DRk2


D2D,R2R

Special requirement: each column of R is a sparse vector,


0
i.e., kRi k0  k << k.

COMP5328 Tongliang Liu 20


Sparse coding applications
Image Compression: Results for 820 bytes per image

Image credit: Compression of facial images using the K-SVD algorithm, J.Vis. Comun. Image Represent.

COMP5328 Tongliang Liu 21


Data pre-processing for K-SVD

Facial feature points.

Image credit: CLow Bit-Rate Compression of Facial Images, IEEE Transactions on Image Processing

COMP5328 Tongliang Liu 22


Data pre-processing for K-SVD

(Top) Input images and their (canonical) aligned (bottom)


versions.
Image credit: CLow Bit-Rate Compression of Facial Images, IEEE Transactions on Image Processing

COMP5328 Tongliang Liu 23


Data pre-processing for K-SVD

Uniform slicing
Image credit: Compression of facial images using the K-SVD algorithm, J.Vis. Comun. Image Represent.

COMP5328 Tongliang Liu 24


Sparse coding applications
Inpainting:
70%$Missing$Samples DCT$(RMSE=0.04) Haar$ (RMSE=0.045) K>SVD$(RMSE=0.03)

90%$Missing$Samples DCT$(RMSE=0.085_ Haar$ (RMSE=0.07) K>SVD$(RMSE=0.06)

Image credit: http://www.cs.technion.ac.il/~ronrubin/Talks/K-SVD.ppt

COMP5328 Tongliang Liu 25


Sparse coding applications
Inpainting:
20%

50%

80%

Image credit: http://www.cs.technion.ac.il/~ronrubin/Talks/K-SVD.ppt

COMP5328 Tongliang Liu 26


Sparse coding applications
Inpainting for text removal:

Image credit: Sparse representation for color image restoration. IEEE Transactions on Image Processing.

COMP5328 Tongliang Liu 27


Sparse coding applications
Inpainting for text removal:

Image credit: Sparse representation for color image restoration. IEEE Transactions on Image Processing.

COMP5328 Tongliang Liu 28


Measure of Sparsity
The objective function:

min kX DRk2F + (R)


D2D,R2R

Data tting Sparse regularisation

Question: how can we design the regularisation to


make R to be sparse?

COMP5328 Tongliang Liu 29


fi
Measure of Sparsity: `0 norm
k
X
k↵kpp = |↵j |p .
j=1

As →0 , we get a count of the non-zeros in the vector.


So we can employ k↵k0 to measure sparsity.
COMP5328 Tongliang Liu 30
𝑝
Measure of Sparsity: `0 norm

As →0 , we get a count of the non-zeros in the vector.


So we can employ k↵k0 to measure sparsity.

However, the `0 minimisation is not easy. How to do?

COMP5328 Tongliang Liu 31


𝑝
Measure of Sparsity `1 norm
2D example (compared with `2 -norm):
1 1 *
min ' − ) *
* ++,. .. ) /≤ 1
min ' − ) * ++,. .. ) * ≤0
$ 2 $ 2

COMP5328 Tongliang Liu 32


Measure of Sparsity `1 norm
2D example (compared with `2 -norm):
1 1 *
min ' − ) *
* ++,. .. ) /≤ 1
min ' − ) * ++,. .. ) * ≤0
$ 2 $ 2

↵2 ↵2

µ ↵1 ↵1
µ µ µ

COMP5328 Tongliang Liu 33


Sparse coding learning
algorithms
The `0 norm based approaches:
+
• min & − () * $$,. .. ∀$0, ) 2 < 4.
$%

/
• min & ' $$s. *. + − -& . ≤ 1.
$%

Greedy algorithms:
•OMP (Y. Pati, et al. 1993; J. Tropp 2004).
•Subspace pursuit (SP) (W. Dai and O. Milenkovic 2009),
CosaMP (D. Needell and J. Tropp 2009).
•IHT (T. Blumensath and M. Davies 2009).
COMP5328 Tongliang Liu 34
Sparse coding learning
algorithms
The `1 norm based approaches:
/
• min % & ''s.*. + − -% . ≤ 1.
$
*
• min % − '( ) +, ( -.
$

Bayesian approach:
• Relevance vector machine (RVM) (M. Tipping 2001 ).
• Bayesian compressed sensing (BCS) (S. Ji, et al. 2008).

COMP5328 Tongliang Liu 35


Regularisation and
algorithmic stability
No-Free-Lunch Theorem

• Sparse algorithms are not stable!

• A learning algorithm is said to be stable if slight


perturbations in the training data result in
small changes in the output of the algorithm,
and these changes vanish as the data set grows
bigger and bigger.

Xu, H., Caramanis, C., & Mannor, S. (2012). Sparse algorithms are not stable: A no-free-lunch theorem. IEEE transactions on pattern analysis
and machine intelligence, 34(1), 187-193.

COMP5328 Tongliang Liu 37


Algorithmic Stability
We have two different training samples:
S = {(X1 , Y1 ), . . . , (Xi 1 , Yi 1 ), (Xi , Yi ), (Xi+1 , Yi+1 ), . . . , (Xn , Yn )}

0 0
S i = {(X1 , Y1 ), . . . , (Xi 1 , Yi 1 ), (X i , Y i ), (Xi+1 , Yi+1 ), . . . , (Xn , Yn )}

They are different because of only one training example.

An algorithm is uniformly stable if for any example (X, Y )


|`(X, Y, hS ) `(X, Y, hS i )|  ✏(n).
Note that will vanish as n goes to in nity.

COMP5328 Tongliang Liu 38


fi
Generalisation error

R(hS ) min R(h) = R(hS ) R(h )
h2H
⇤ ⇤ ⇤
= R(hS ) RS (hS ) + RS (hS ) RS (h ) + RS (h ) R(h )
 R(hS ) RS (hS ) + RS (h⇤ ) R(h⇤ )
⇤ ⇤
 |R(hS ) RS (hS )| + |R(h ) RS (h )|
 sup |R(h) RS (h)| + sup |R(h) RS (h)|
h2H h2H
= 2 sup |R(h) RS (h)|.
h2H

R(hS ) RS (hS )  sup |R(h) RS (h)|.


h2H

COMP5328 Tongliang Liu 39


Algorithmic Stability
A good stable algorithm will have a good generalisation ability:
E[R(hS ) RS (hS )]
" n
#
1 X
= ES EX,Y [`(X, Y, hS )] `(Xi , Yi , hS )
n i=1
" " n
# n
#
1 X 1 X
= ES ES 0 `(Xi0 , Yi0 , hS ) `(Xi , Yi , hS )
n i=1
n i=1
" n
#
1 X
= ES,S 0 (`(Xi0 , Yi0 , hS ) `(Xi , Yi , hS ))
n i=1
" n
# ✏0 (n) would be small if the learning
1 X
= ES,S 0 (`(Xi0 , Yi0 , hS ) `(Xi0 , Yi0 , hS i )) algorithm is stable (because hS and hS i are similar).
n i=1 This implies that stable algorithms will have small
expected generalisation errors.
 ✏0 (n)

where S 0 = {(X10 , Y10 ), . . . , (Xn0 , Yn0 )}.


COMP5328 Tongliang Liu 40
Regularisation and Stability
`2 norm regularisation will make learning algorithms stable if
the employed surrogate loss function is convex.

Xn
1 2
hS = arg min `(Xi , Yi , h) + khk2
h2H n
i=1
If the convex surrogate loss function is L-Lipschitz continuous
w.r.t. h , kXk2  B, we have

COMP5328 Tongliang Liu 41


Proof (optional)
A surrogate loss function is L-Lipschitz continuous w.r.t. h if
for any in the domain and any

0 0
|`(X, Y, h) `(X, Y, h )|  L|h(X) h (X)|.

COMP5328 Tongliang Liu 42


Proof (optional)
A function is -strongly convex:

) *
! " ≥ ! $ + &! $ ," − $ + $−" , ∀$, ",
*

./ ≼ & *! $ , ∀$

!"

. *
! $ + &! $ ," − $ + $ −" ! $ + &! $ ," − $
2

($, ! $ )

COMP5328 - Tongliang Liu 43


𝜇
Proof (optional)
Note that the function `(X, Y, h) + khk is always 2

strongly convex w.r.t. h .


Let n
1 X
RS, (h) = `(Xi , Yi , h) + khk2 .
n i=1
We have
RS, (hS i ) RS, (hS ) + hrRS, (hS ), hS i hS i + khS i hS k2
2
2
= RS, (hS ) + kh Si hS k
2
⌦ ↵
RS i , (hS ) RS i , (hS i ) + rRS i , (hS i ), hS hS i + khS h S i k2
2
= RS i , (hS i ) + khS h S i k2 .
2
COMP5328 Tongliang Liu 44
Proof (optional)
We have

khS i hS k2  RS i , (hS ) RS, (hS ) + RS, (hS i ) RS i , (hS i )


1 0 0 0 0 1
 |`(Xi , Yi , hS ) `(Xi , Yi , hS i )| + |`(Xi , Yi , hS ) `(Xi , Yi , hS i )|
n n
L 0 0 L
 |hS (Xi ) hS i (Xi )| + |hS (Xi ) hS i (Xi )|
n n
aumme h(X) = hh, Xi L L
0
= | hhS hS i , Xi i | + | hhS hS i , Xi i |
n n
L 0 L
 khS hS i kkXi k + khS hS i kkXi k
n n
2LB
 khS hS i k,
n

where the fth inequality holds because of Cauchy-


Schwartz inequality: ha, bi  kakkbk.
COMP5328 Tongliang Liu 45
fi
Proof (optional)
Thus,

We then have

|`(X, Y, hS ) `(X, Y, hS i )|  L|hS (X) hS i (X)|


 LkhS hS i kkXk
2L2 B 2
 .
n
COMP5328 Tongliang Liu 46

You might also like