Processing Big Data: Kernel PCA

You might also like

Download as pdf or txt
Download as pdf or txt
You are on page 1of 75

Processing Big Data

Kernel PCA
Dimensionality reduction
• Linear

• Principal Component Analysis: SVD-based

• Compressed sensing

• Matrix sketching

• Non-linear

• Kernel PCA

• Isometric mapping
High dimensionality can be
good
Transforming data to lie in higher dimension spaces can be a good thing!
High dimensionality can be
good
Transforming data to lie in higher dimension spaces can be a good thing!

Statistical learning theory aims to answer:


High dimensionality can be
good
Transforming data to lie in higher dimension spaces can be a good thing!

Statistical learning theory aims to answer:

Under what conditions are high-dimension representations good?


High dimensionality can be
good
Transforming data to lie in higher dimension spaces can be a good thing!

Statistical learning theory aims to answer:

Under what conditions are high-dimension representations good?

Sometimes
High dimensionality can be
good
Transforming data to lie in higher dimension spaces can be a good thing!

Statistical learning theory aims to answer:

Under what conditions are high-dimension representations good?

Sometimes

Increased
Higher dimension
classification power
Example
Example
Let’s go 3D

<latexit sha1_base64="6UGWcFCkSQYtfoRwCAg0UxNhP1Y=">AAACDXicbVC7SgNBFL0bXzG+Vi1tBqNgFXajoFgFbSyjmAdk1zA7mU2GzD6YmRXCkh+w8VdsLBSxtbfzb5xNtoiJBwYO59zL3HO8mDOpLOvHKCwtr6yuFddLG5tb2zvm7l5TRokgtEEiHom2hyXlLKQNxRSn7VhQHHictrzhdea3HqmQLArv1SimboD7IfMZwUpLXfPIqQ8YukROgNXA89K78UMVOSqaFU67ZtmqWBOgRWLnpAw56l3z2+lFJAloqAjHUnZsK1ZuioVihNNxyUkkjTEZ4j7taBrigEo3naQZo2Ot9JAfCf1ChSbq7EaKAylHgacnsxvlvJeJ/3mdRPkXbsrCOFE0JNOP/IQjnTarBvWYoETxkSaYCKZvRWSABSZKF1jSJdjzkRdJs1qxrYp9e1auXeV1FOEADuEEbDiHGtxAHRpA4Ale4A3ejWfj1fgwPqejBSPf2Yc/ML5+AeY3mtI=</latexit>
: R 2 ! R3

2 2
(x1 , x2 ) !
<latexit sha1_base64="+mY8jK9SQHkQeVKU86Mp72Rw5IA=">AAACD3icbVDLSgMxFM3UV62vUZdugkVpsZSZQdBl0Y3LCvYB7XTIpGkbmskMSUZahv6BG3/FjQtF3Lp159+YtiNo64WQk3Pu5eYcP2JUKsv6MjIrq2vrG9nN3Nb2zu6euX9Ql2EsMKnhkIWi6SNJGOWkpqhipBkJggKfkYY/vJ7qjXsiJA35nRpHxA1Qn9MexUhpyjNPCyPPLo08pwjbKoQ/rxLUoOPAM307HafomXmrbM0KLgM7BXmQVtUzP9vdEMcB4QozJGXLtiLlJkgoihmZ5NqxJBHCQ9QnLQ05Coh0k5mfCTzRTBf2QqEPV3DG/p5IUCDlOPB1Z4DUQC5qU/I/rRWr3qWbUB7FinA8X9SLGdTOp+HALhUEKzbWAGFB9V8hHiCBsNIR5nQI9qLlZVB3yrZVtm/P85WrNI4sOALHoABscAEq4AZUQQ1g8ACewAt4NR6NZ+PNeJ+3Zox05hD8KePjG73VmKo=</latexit>
(x1 , x2 , x1 + x2 )

or, more concisely,


x
x! 2
kxk
<latexit sha1_base64="S78ZIv1hWRk1fX6qxO/0wH15ybQ=">AAACGnicbVBNS8NAEN3Ur1q/oh69LBbBU0mKoMeiF48V7Ac0sWy203bpZhN2N9KS9nd48a948aCIN/Hiv3HbBtTWBwOP92aYmRfEnCntOF9WbmV1bX0jv1nY2t7Z3bP3D+oqSiSFGo14JJsBUcCZgJpmmkMzlkDCgEMjGFxN/cY9SMUicatHMfgh6QnWZZRoI7Vtd4g9HWEvgB4TaRASLdlwgo3qYW889MZ3ZeyB6PxYbbvolJwZ8DJxM1JEGapt+8PrRDQJQWjKiVIt14m1nxKpGeUwKXiJgpjQAelBy1BBQlB+Onttgk+M0sHdSJoSGs/U3xMpCZUahYHpNAf21aI3Ff/zWonuXvgpE3GiQdD5om7CsQljmhPuMAlU85EhhEpmbsW0TySh2qRZMCG4iy8vk3q55Dol9+asWLnM4sijI3SMTpGLzlEFXaMqqiGKHtATekGv1qP1bL1Z7/PWnJXNHKI/sD6/ARjToOY=</latexit>
Example

2 3
<latexit sha1_base64="6UGWcFCkSQYtfoRwCAg0UxNhP1Y=">AAACDXicbVC7SgNBFL0bXzG+Vi1tBqNgFXajoFgFbSyjmAdk1zA7mU2GzD6YmRXCkh+w8VdsLBSxtbfzb5xNtoiJBwYO59zL3HO8mDOpLOvHKCwtr6yuFddLG5tb2zvm7l5TRokgtEEiHom2hyXlLKQNxRSn7VhQHHictrzhdea3HqmQLArv1SimboD7IfMZwUpLXfPIqQ8YukROgNXA89K78UMVOSqaFU67ZtmqWBOgRWLnpAw56l3z2+lFJAloqAjHUnZsK1ZuioVihNNxyUkkjTEZ4j7taBrigEo3naQZo2Ot9JAfCf1ChSbq7EaKAylHgacnsxvlvJeJ/3mdRPkXbsrCOFE0JNOP/IQjnTarBvWYoETxkSaYCKZvRWSABSZKF1jSJdjzkRdJs1qxrYp9e1auXeV1FOEADuEEbDiHGtxAHRpA4Ale4A3ejWfj1fgwPqejBSPf2Yc/ML5+AeY3mtI=</latexit>
:R !R


x
x!
kxk2
<latexit sha1_base64="S78ZIv1hWRk1fX6qxO/0wH15ybQ=">AAACGnicbVBNS8NAEN3Ur1q/oh69LBbBU0mKoMeiF48V7Ac0sWy203bpZhN2N9KS9nd48a948aCIN/Hiv3HbBtTWBwOP92aYmRfEnCntOF9WbmV1bX0jv1nY2t7Z3bP3D+oqSiSFGo14JJsBUcCZgJpmmkMzlkDCgEMjGFxN/cY9SMUicatHMfgh6QnWZZRoI7Vtd4g9HWEvgB4TaRASLdlwgo3qYW889MZ3ZeyB6PxYbbvolJwZ8DJxM1JEGapt+8PrRDQJQWjKiVIt14m1nxKpGeUwKXiJgpjQAelBy1BBQlB+Onttgk+M0sHdSJoSGs/U3xMpCZUahYHpNAf21aI3Ff/zWonuXvgpE3GiQdD5om7CsQljmhPuMAlU85EhhEpmbsW0TySh2qRZMCG4iy8vk3q55Dol9+asWLnM4sijI3SMTpGLzlEFXaMqqiGKHtATekGv1qP1bL1Z7/PWnJXNHKI/sD6/ARjToOY=</latexit>
Problems with high-
dimensionality
Product feature extractor with monomials of degree d=2

2 3
<latexit sha1_base64="6UGWcFCkSQYtfoRwCAg0UxNhP1Y=">AAACDXicbVC7SgNBFL0bXzG+Vi1tBqNgFXajoFgFbSyjmAdk1zA7mU2GzD6YmRXCkh+w8VdsLBSxtbfzb5xNtoiJBwYO59zL3HO8mDOpLOvHKCwtr6yuFddLG5tb2zvm7l5TRokgtEEiHom2hyXlLKQNxRSn7VhQHHictrzhdea3HqmQLArv1SimboD7IfMZwUpLXfPIqQ8YukROgNXA89K78UMVOSqaFU67ZtmqWBOgRWLnpAw56l3z2+lFJAloqAjHUnZsK1ZuioVihNNxyUkkjTEZ4j7taBrigEo3naQZo2Ot9JAfCf1ChSbq7EaKAylHgacnsxvlvJeJ/3mdRPkXbsrCOFE0JNOP/IQjnTarBvWYoETxkSaYCKZvRWSABSZKF1jSJdjzkRdJs1qxrYp9e1auXeV1FOEADuEEbDiHGtxAHRpA4Ale4A3ejWfj1fgwPqejBSPf2Yc/ML5+AeY3mtI=</latexit>
:R !R

2 2
(x1 , x2 ) !
<latexit sha1_base64="6raepg/CjlRxtKiXO1OrmPHeF10=">AAACE3icbZC7TsMwFIadcivlFmBksaiQWoSqpEKCsYKFsUj0IrUhclynteo4ke2gVlHfgYVXYWEAIVYWNt4GJ80ALUey9Z3/nCP7/F7EqFSW9W0UVlbX1jeKm6Wt7Z3dPXP/oC3DWGDSwiELRddDkjDKSUtRxUg3EgQFHiMdb3yd1jsPREga8js1jYgToCGnPsVIack1TysT1z6buPUq7KsQptl9Pc2z29aQJlBj1TXLVs3KAi6DnUMZ5NF0za/+IMRxQLjCDEnZs61IOQkSimJGZqV+LEmE8BgNSU8jRwGRTpLtNIMnWhlAPxT6cAUz9fdEggIpp4GnOwOkRnKxlor/1Xqx8i+dhPIoVoTj+UN+zKDePjUIDqggWLGpBoQF1X+FeIQEwkrbWNIm2IsrL0O7XrOtmn17Xm5c5XYUwRE4BhVggwvQADegCVoAg0fwDF7Bm/FkvBjvxse8tWDkM4fgTxifP0n7mqQ=</latexit>
(x1 , x2 , x1 x2 , x2 x1 )

for m-dimensional data the number of monomials is

✓ ◆
m+d+1 (d + m 1)! 16 x 16 pixel images with
= d=5 will lead to 1010
<latexit sha1_base64="2wc3TY/SE1D2Fcqdd3fVdLwrY4o=">AAACFXicbZDLSgMxFIYz9VbrbdSlm9QiVKplRgTdCEU3LivYC3RKyWQybWiSGZKMUIZ5CTe+ihsXirgV3Pk2ppeFtv4Q+PjPOZyc348ZVdpxvq3c0vLK6lp+vbCxubW9Y+/uNVWUSEwaOGKRbPtIEUYFaWiqGWnHkiDuM9LyhzfjeuuBSEUjca9HMely1Bc0pBhpY/XsE8+nIuIprwQVN0uDDF5BL5QIp+Wgwk/d46Ixi+Up9eySU3UmgovgzqAEZqr37C8viHDCidCYIaU6rhPrboqkppiRrOAlisQID1GfdAwKxInqppOrMnhknACGkTRPaDhxf0+kiCs14r7p5EgP1HxtbP5X6yQ6vOymVMSJJgJPF4UJgzqC44hgQCXBmo0MICyp+SvEA2Qy0SbIggnBnT95EZpnVdepunfnpdr1LI48OACHoAxccAFq4BbUQQNg8AiewSt4s56sF+vd+pi25qzZzD74I+vzB33snHM=</latexit>
d d!(m 1)! <latexit sha1_base64="PNDeVSHIe81Hvsgj3BXt8UVmb94=">AAAB7nicdVDLSgMxFL3js9ZX1aWbYBFcDYmI2l3RjcsK9gHtWDJppg3NZIYkI5ShH+HGhSJu/R53/o3pQ/B54MLhnHu5954wlcJYjN+9hcWl5ZXVwlpxfWNza7u0s9swSaYZr7NEJroVUsOlULxuhZW8lWpO41DyZji8nPjNO66NSNSNHaU8iGlfiUgwap3UJPg2J3jcLZWxX8GkckrQb0J8PEUZ5qh1S2+dXsKymCvLJDWmTXBqg5xqK5jk42InMzylbEj7vO2oojE3QT49d4wOndJDUaJdKYum6teJnMbGjOLQdcbUDsxPbyL+5bUzG50HuVBpZrlis0VRJpFN0OR31BOaMytHjlCmhbsVsQHVlFmXUNGF8Pkp+p80jn2CfXJ9Uq5ezOMowD4cwBEQOIMqXEEN6sBgCPfwCE9e6j14z97LrHXBm8/swTd4rx+w9I8m</latexit>
Can we have the best of two
worlds?
Yes, we can!
Can we have the best of two
worlds?
Yes, we can!

Given any algorithm that only requires dot products


Can we have the best of two
worlds?
Yes, we can!

Given any algorithm that only requires dot products

We can work in the high-dimensional space without ever


mapping into that space!
The kernel trick
Given any algorithm that only requires dot products, we can
construct different nonlinear versions of it, by setting different
kernel functions

k(x, x0 ) = h (x), (x0 )i


<latexit sha1_base64="cXyk4E9OLouOqndDqiTZd/gEeNU=">AAACHnicbVDLSgMxFM3UV62vqks3wSK0UMqMKLoRim5cVrAP6NSSSW/b0ExmSDLSMvRL3PgrblwoIrjSvzGdVtDWAyGHc84luccLOVPatr+s1NLyyupaej2zsbm1vZPd3aupIJIUqjTggWx4RAFnAqqaaQ6NUALxPQ51b3A18ev3IBULxK0ehdDySU+wLqNEG6mdPR3kh8XhnRtK5kMBX2CXE9HjgN1Kn+WHheL0/gm4MnHb2ZxdshPgReLMSA7NUGlnP9xOQCMfhKacKNV07FC3YiI1oxzGGTdSEBI6ID1oGiqID6oVJ+uN8ZFROrgbSHOExon6eyImvlIj3zNJn+i+mvcm4n9eM9Ld81bMRBhpEHT6UDfiWAd40hXuMAlU85EhhEpm/oppn0hCtWk0Y0pw5ldeJLXjkmOXnJuTXPlyVkcaHaBDlEcOOkNldI0qqIooekBP6AW9Wo/Ws/VmvU+jKWs2s4/+wPr8BhSEoTw=</latexit>
The kernel trick
Given any algorithm that only requires dot products, we can
construct different nonlinear versions of it, by setting different
kernel functions

k(x, x0 ) = h (x), (x0 )i


<latexit sha1_base64="cXyk4E9OLouOqndDqiTZd/gEeNU=">AAACHnicbVDLSgMxFM3UV62vqks3wSK0UMqMKLoRim5cVrAP6NSSSW/b0ExmSDLSMvRL3PgrblwoIrjSvzGdVtDWAyGHc84luccLOVPatr+s1NLyyupaej2zsbm1vZPd3aupIJIUqjTggWx4RAFnAqqaaQ6NUALxPQ51b3A18ev3IBULxK0ehdDySU+wLqNEG6mdPR3kh8XhnRtK5kMBX2CXE9HjgN1Kn+WHheL0/gm4MnHb2ZxdshPgReLMSA7NUGlnP9xOQCMfhKacKNV07FC3YiI1oxzGGTdSEBI6ID1oGiqID6oVJ+uN8ZFROrgbSHOExon6eyImvlIj3zNJn+i+mvcm4n9eM9Ld81bMRBhpEHT6UDfiWAd40hXuMAlU85EhhEpm/oppn0hCtWk0Y0pw5ldeJLXjkmOXnJuTXPlyVkcaHaBDlEcOOkNldI0qqIooekBP6AW9Wo/Ws/VmvU+jKWs2s4/+wPr8BhSEoTw=</latexit>

Let’s apply the kernel trick to PCA…


PCA
n⇥m
Notation: ai 2 R
<latexit sha1_base64="ELkqXXJkDg/hbur6F3/rKeEyaNs=">AAAB/XicbVDLSsNAFL2pr1pf8bFzM1gEVyURQZdFNy6r2Ac0MUymk3boZBJmJkINxV9x40IRt/6HO//GSduFth4YOJxzL/fMCVPOlHacb6u0tLyyulZer2xsbm3v2Lt7LZVkktAmSXgiOyFWlDNBm5ppTjuppDgOOW2Hw6vCbz9QqVgi7vQopX6M+4JFjGBtpMA+wAFDHhPIi7EehGF+O76PA7vq1JwJ0CJxZ6QKMzQC+8vrJSSLqdCEY6W6rpNqP8dSM8LpuOJliqaYDHGfdg0VOKbKzyfpx+jYKD0UJdI8odFE/b2R41ipURyaySKjmvcK8T+vm+nows+ZSDNNBZkeijKOdIKKKlCPSUo0HxmCiWQmKyIDLDHRprCKKcGd//IiaZ3WXKfm3pxV65ezOspwCEdwAi6cQx2uoQFNIPAIz/AKb9aT9WK9Wx/T0ZI129mHP7A+fwAB0ZTu</latexit>
m
A2R
<latexit sha1_base64="tU26yT4y0zROPaCEGhakhgjzO58=">AAACBnicbVDLSsNAFJ34rPUVdSnCYBFclUQEXVbduKxiH9DEMplO2qGTSZi5EUrIyo2/4saFIm79Bnf+jZO2C209MHDmnHu5954gEVyD43xbC4tLyyurpbXy+sbm1ra9s9vUcaooa9BYxKodEM0El6wBHARrJ4qRKBCsFQyvCr/1wJTmsbyDUcL8iPQlDzklYKSufXCBPS6xFxEYBEF2m99n5gc8YhpHedeuOFVnDDxP3CmpoCnqXfvL68U0jZgEKojWHddJwM+IAk4Fy8teqllC6JD0WcdQScwcPxufkeMjo/RwGCvzJOCx+rsjI5HWoygwlcW6etYrxP+8TgrhuZ9xmaTAJJ0MClOBIcZFJrjHFaMgRoYQqrjZFdMBUYSCSa5sQnBnT54nzZOq61Tdm9NK7XIaRwnto0N0jFx0hmroGtVRA1H0iJ7RK3qznqwX6936mJQuWNOePfQH1ucPg9+YhA==</latexit>

X
Centered data: T
A 1=0 ,
<latexit sha1_base64="cyr5ZsGXW9pO0KmpWskPSAmVz34=">AAAB8HicbVBNSwMxEJ2tX7V+VT16CRbBU9kVQS9C1YvHCv2Sdi3ZNNuGJtklyQpl6a/w4kERr/4cb/4b0+0etPXBwOO9GWbmBTFn2rjut1NYWV1b3yhulra2d3b3yvsHLR0litAmiXikOgHWlDNJm4YZTjuxolgEnLaD8e3Mbz9RpVkkG2YSU1/goWQhI9hY6eH6sYE8dIXcfrniVt0MaJl4OalAjnq//NUbRCQRVBrCsdZdz42Nn2JlGOF0WuolmsaYjPGQdi2VWFDtp9nBU3RilQEKI2VLGpSpvydSLLSeiMB2CmxGetGbif953cSEl37KZJwYKsl8UZhwZCI0+x4NmKLE8IklmChmb0VkhBUmxmZUsiF4iy8vk9ZZ1XOr3v15pXaTx1GEIziGU/DgAmpwB3VoAgEBz/AKb45yXpx352PeWnDymUP4A+fzB1IhjsU=</latexit>
ai = 0
<latexit sha1_base64="+PL7JPH+bqHynUJosSdRIxohH5A=">AAACB3icbVBNS8NAEN34WetX1KMgi0XwVBIR9CIUvXjwUMF+QBPCZrtpl242YXeilNCbF/+KFw+KePUvePPfuG1z0NYHA4/3ZpiZF6aCa3Ccb2thcWl5ZbW0Vl7f2Nzatnd2mzrJFGUNmohEtUOimeCSNYCDYO1UMRKHgrXCwdXYb90zpXki72CYMj8mPckjTgkYKbAPvBsWgeK9PhClkgfs6SwOOCamLrAT2BWn6kyA54lbkAoqUA/sL6+b0CxmEqggWndcJwU/Jwo4FWxU9jLNUkIHpMc6hkoSM+3nkz9G+MgoXRwlypQEPFF/T+Qk1noYh6YzJtDXs95Y/M/rZBCd+zmXaQZM0umiKBMYEjwOBXe5YhTE0BBCFTe3YtonilAw0ZVNCO7sy/OkeVJ1nap7e1qpXRZxlNA+OkTHyEVnqIauUR01EEWP6Bm9ojfryXqx3q2PaeuCVczsoT+wPn8Az0OYlg==</latexit>
i

1X T
Find principal components by diagonalizing C= ai ai
n i
<latexit sha1_base64="1sPqTb8QurV3li5czvZ+EP5N9GY=">AAACBnicbVDLSsNAFL2pr1pfUZciDBbBVUlE0I1Q7MZlhb6giWEynbRDJ5MwMxFK6MqNv+LGhSJu/QZ3/o3Tx0JbD9zL4Zx7mbknTDlT2nG+rcLK6tr6RnGztLW9s7tn7x+0VJJJQpsk4YnshFhRzgRtaqY57aSS4jjktB0OaxO//UClYolo6FFK/Rj3BYsYwdpIgX1cQ9fIiyQmyEUCeSqLA4ZwwO4bpgV22ak4U6Bl4s5JGeaoB/aX10tIFlOhCcdKdV0n1X6OpWaE03HJyxRNMRniPu0aKnBMlZ9PzxijU6P0UJRIU0Kjqfp7I8exUqM4NJMx1gO16E3E/7xupqMrP2cizTQVZPZQlHGkEzTJBPWYpETzkSGYSGb+isgAm0y0Sa5kQnAXT14mrfOK61Tcu4ty9WYeRxGO4ATOwIVLqMIt1KEJBB7hGV7hzXqyXqx362M2WrDmO4fwB9bnD1IElyE=</latexit>

Cv = v
<latexit sha1_base64="6pWje61hjjUi8/u5tMDSz6y2nrc=">AAAB9XicbVDLSgMxFL2pr1pfVZdugkVwVWZE0I1Q7MZlBfuAdiyZTKYNzWSGJFMpQ//DjQtF3Pov7vwb03YW2nogcDjnXO7N8RPBtXGcb1RYW9/Y3Cpul3Z29/YPyodHLR2nirImjUWsOj7RTHDJmoYbwTqJYiTyBWv7o/rMb4+Z0jyWD2aSMC8iA8lDTomx0mN9jG9wT9h8QPC4X644VWcOvErcnFQgR6Nf/uoFMU0jJg0VROuu6yTGy4gynAo2LfVSzRJCR2TAupZKEjHtZfOrp/jMKgEOY2WfNHiu/p7ISKT1JPJtMiJmqJe9mfif101NeO1lXCapYZIuFoWpwCbGswpwwBWjRkwsIVRxeyumQ6IINbaoki3BXf7yKmldVF2n6t5fVmq3eR1FOIFTOAcXrqAGd9CAJlBQ8Ayv8Iae0At6Rx+LaAHlM8fwB+jzBwxWkY8=</latexit>
Using PCA

1. Find principal components, by computing eigenvectors sorted


in the order of decreasing eigenvalue and establish a cutoff;

2. Project test points onto principal components;

3. Use coefficients for visualization, classification, regression, etc.


PCA in dot products

Principal directions lie in the span of data points

X
T
v= ↵i ai
<latexit sha1_base64="mXTgWYhfQEyQqaBem9iUFe7iQ3E=">AAACAnicbZDLSsNAFIZP6q3WW9SVuBksgquSiKAboejGZYXeoIlhMp20QyeTMDMplFDc+CpuXCji1qdw59s4vSy09YcDH/85h5nzhylnSjvOt1VYWV1b3yhulra2d3b37P2DpkoySWiDJDyR7RArypmgDc00p+1UUhyHnLbCwe2k3xpSqVgi6nqUUj/GPcEiRrA2VmAfDdE18lQWBwx5mKd9bMDUQz2wy07FmQotgzuHMsxVC+wvr5uQLKZCE46V6rhOqv0cS80Ip+OSlymaYjLAPdoxKHBMlZ9PTxijU+N0UZRIU0Kjqft7I8exUqM4NJMx1n212JuY//U6mY6u/JyJNNNUkNlDUcaRTtAkD9RlkhLNRwYwkcz8FZE+lphok1rJhOAunrwMzfOK61Tc+4ty9WYeRxGO4QTOwIVLqMId1KABBB7hGV7hzXqyXqx362M2WrDmO4fwR9bnD7NQllY=</latexit>
i
Kernel PCA
If we send data to a higher dimensional space:

x 7! (x)
<latexit sha1_base64="Eeje23qlPuGa/IH7WkSjQo4uKns=">AAAB+nicbVBNS8NAEJ3Ur1q/Uj16WSxCvZREBD0WvXisYD+gCWWz3bRLN5uwu9GW2J/ixYMiXv0l3vw3btsctPXBwOO9GWbmBQlnSjvOt1VYW9/Y3Cpul3Z29/YP7PJhS8WpJLRJYh7LToAV5UzQpmaa004iKY4CTtvB6Gbmtx+oVCwW93qSUD/CA8FCRrA2Us8uj5EX4UTpGHmNIauOz3p2xak5c6BV4uakAjkaPfvL68ckjajQhGOluq6TaD/DUjPC6bTkpYommIzwgHYNFTiiys/mp0/RqVH6KIylKaHRXP09keFIqUkUmM4I66Fa9mbif1431eGVnzGRpJoKslgUphyZR2c5oD6TlGg+MQQTycytiAyxxESbtEomBHf55VXSOq+5Ts29u6jUr/M4inAMJ1AFFy6hDrfQgCYQeIRneIU368l6sd6tj0VrwcpnjuAPrM8fJnqTQw==</latexit>

X
and have it centered: (ai ) = 0
<latexit sha1_base64="p8XGEfbkGbCRQIhg0q23qcH5ghM=">AAAB/nicbVBNS8NAEJ3Ur1q/ouLJy2IR6qUkIuhFKHrxWMF+QBPCZrtpl+4mYXcjlFDwr3jxoIhXf4c3/43bNgdtfTDweG+GmXlhypnSjvNtlVZW19Y3ypuVre2d3T17/6CtkkwS2iIJT2Q3xIpyFtOWZprTbiopFiGnnXB0O/U7j1QqlsQPepxSX+BBzCJGsDZSYB95KhMBQ15zyGo4YGfoGjkosKtO3ZkBLRO3IFUo0AzsL6+fkEzQWBOOleq5Tqr9HEvNCKeTipcpmmIywgPaMzTGgio/n50/QadG6aMokaZijWbq74kcC6XGIjSdAuuhWvSm4n9eL9PRlZ+zOM00jcl8UZRxpBM0zQL1maRE87EhmEhmbkVkiCUm2iRWMSG4iy8vk/Z53XXq7v1FtXFTxFGGYziBGrhwCQ24gya0gEAOz/AKb9aT9WK9Wx/z1pJVzBzCH1ifP3vDk9o=</latexit>
i

1X T
Covariance is: C= (ai ) (ai )
n i
<latexit sha1_base64="YVe3OeEN0bsyiREaSKervEhlo2E=">AAACEnicbZDLSgMxFIYz9VbrbdSlm2AR2k2ZEUE3QrEblxV6ETrjkEkzbWiSGZKMUIY+gxtfxY0LRdy6cufbmLaDaOsPgY//nMPJ+cOEUaUd58sqrKyurW8UN0tb2zu7e/b+QUfFqcSkjWMWy9sQKcKoIG1NNSO3iSSIh4x0w1FjWu/eE6loLFp6nBCfo4GgEcVIGyuwqw14Cb1IIgxdKKCnUh5Q6DWHtIICWr1r/WBgl52aMxNcBjeHMsjVDOxPrx/jlBOhMUNK9Vwn0X6GpKaYkUnJSxVJEB6hAekZFIgT5WezkybwxDh9GMXSPKHhzP09kSGu1JiHppMjPVSLtan5X62X6ujCz6hIUk0Eni+KUgZ1DKf5wD6VBGs2NoCwpOavEA+RyUebFEsmBHfx5GXonNZcp+benJXrV3kcRXAEjkEFuOAc1ME1aII2wOABPIEX8Go9Ws/Wm/U+by1Y+cwh+CPr4xsLnJs1</latexit>
Kernel PCA
1X
C= (ai )T (ai ) diagonalizable by Cv = v
n i
<latexit sha1_base64="6pWje61hjjUi8/u5tMDSz6y2nrc=">AAAB9XicbVDLSgMxFL2pr1pfVZdugkVwVWZE0I1Q7MZlBfuAdiyZTKYNzWSGJFMpQ//DjQtF3Pov7vwb03YW2nogcDjnXO7N8RPBtXGcb1RYW9/Y3Cpul3Z29/YPyodHLR2nirImjUWsOj7RTHDJmoYbwTqJYiTyBWv7o/rMb4+Z0jyWD2aSMC8iA8lDTomx0mN9jG9wT9h8QPC4X644VWcOvErcnFQgR6Nf/uoFMU0jJg0VROuu6yTGy4gynAo2LfVSzRJCR2TAupZKEjHtZfOrp/jMKgEOY2WfNHiu/p7ISKT1JPJtMiJmqJe9mfif101NeO1lXCapYZIuFoWpwCbGswpwwBWjRkwsIVRxeyumQ6IINbaoki3BXf7yKmldVF2n6t5fVmq3eR1FOIFTOAcXrqAGd9CAJlBQ8Ayv8Iae0At6Rx+LaAHlM8fwB+jzBwxWkY8=</latexit>

<latexit sha1_base64="YVe3OeEN0bsyiREaSKervEhlo2E=">AAACEnicbZDLSgMxFIYz9VbrbdSlm2AR2k2ZEUE3QrEblxV6ETrjkEkzbWiSGZKMUIY+gxtfxY0LRdy6cufbmLaDaOsPgY//nMPJ+cOEUaUd58sqrKyurW8UN0tb2zu7e/b+QUfFqcSkjWMWy9sQKcKoIG1NNSO3iSSIh4x0w1FjWu/eE6loLFp6nBCfo4GgEcVIGyuwqw14Cb1IIgxdKKCnUh5Q6DWHtIICWr1r/WBgl52aMxNcBjeHMsjVDOxPrx/jlBOhMUNK9Vwn0X6GpKaYkUnJSxVJEB6hAekZFIgT5WezkybwxDh9GMXSPKHhzP09kSGu1JiHppMjPVSLtan5X62X6ujCz6hIUk0Eni+KUgZ1DKf5wD6VBGs2NoCwpOavEA+RyUebFEsmBHfx5GXonNZcp+benJXrV3kcRXAEjkEFuOAc1ME1aII2wOABPIEX8Go9Ws/Wm/U+by1Y+cwh+CPr4xsLnJs1</latexit>

Finding the principal directions amounts to finding coefficients of

X
T
v= ↵i (ai )
<latexit sha1_base64="RAxIQSRgtub82mnbjTA1YL0GozE=">AAACCHicbZDLSsNAFIYn9VbrLerShYNFqJuSiKAboejGZYXeoIlhMp00Q2eSMDMplNClG1/FjQtF3PoI7nwbJ20W2vrDwMd/zuHM+f2EUaks69sorayurW+UNytb2zu7e+b+QUfGqcCkjWMWi56PJGE0Im1FFSO9RBDEfUa6/ug2r3fHREgaRy01SYjL0TCiAcVIacszj8fwGjoy5R6FDmJJiHJohrSm4eyh5ZlVq27NBJfBLqAKCjU988sZxDjlJFKYISn7tpUoN0NCUczItOKkkiQIj9CQ9DVGiBPpZrNDpvBUOwMYxEK/SMGZ+3siQ1zKCfd1J0cqlIu13Pyv1k9VcOVmNEpSRSI8XxSkDKoY5qnAARUEKzbRgLCg+q8Qh0ggrHR2FR2CvXjyMnTO67ZVt+8vqo2bIo4yOAInoAZscAka4A40QRtg8AiewSt4M56MF+Pd+Ji3loxi5hD8kfH5A4cKmGA=</latexit>
i

We do that by solving an eigenvector problem

K↵ = ˜ ↵
<latexit sha1_base64="+FUdJ45h9JKA4FPo3cnmOBgO5Gc=">AAACCnicbVDLSsNAFJ3UV62vqEs3o0VwVRIRdCMU3QhuKtgHNKHcTCbt0MkkzEyEErp246+4caGIW7/AnX/jtM1CWw8MHM65hzv3BClnSjvOt1VaWl5ZXSuvVzY2t7Z37N29lkoySWiTJDyRnQAU5UzQpmaa004qKcQBp+1geD3x2w9UKpaIez1KqR9DX7CIEdBG6tmHt9gDng4AX2JPMx7S3OMmHsK4MHp21ak5U+BF4hakigo0evaXFyYki6nQhINSXddJtZ+D1IxwOq54maIpkCH0addQATFVfj49ZYyPjRLiKJHmCY2n6u9EDrFSozgwkzHogZr3JuJ/XjfT0YWfM5FmmgoyWxRlHOsET3rBIZOUaD4yBIhk5q+YDEAC0aa9iinBnT95kbROa65Tc+/OqvWroo4yOkBH6AS56BzV0Q1qoCYi6BE9o1f0Zj1ZL9a79TEbLVlFZh/9gfX5Axh3meQ=</latexit>
Kernel PCA
Last step: find the projection of a new point a onto the principal components

X
h (a), vj i = ↵ji K(x, xi )
<latexit sha1_base64="7rNdxGlqov3B8g2LD8u1bxUP8U4=">AAACIXicbVBNS8NAEN34bf2qevSyWIQWSklEsBdB9CJ4qWA/oAlhst22WzebsLspltC/4sW/4sWDIt7EP+O2zUFbHww83pthZl4Qc6a0bX9ZS8srq2vrG5u5re2d3b38/kFDRYkktE4iHslWAIpyJmhdM81pK5YUwoDTZvBwPfGbQyoVi8S9HsXUC6EnWJcR0Eby81WXg+hxit1anxWhVB76A+zKmXaBXZWEPsMu8LgPfjpg49viY/nRZyU/X7Ar9hR4kTgZKaAMNT//6XYikoRUaMJBqbZjx9pLQWpGOB3n3ETRGMgD9GjbUAEhVV46/XCMT4zSwd1ImhIaT9XfEymESo3CwHSGoPtq3puI/3ntRHerXspEnGgqyGxRN+FYR3gSF+4wSYnmI0OASGZuxaQPEog2oeZMCM78y4ukcVpx7Ipzd1a4vMri2EBH6BgVkYPO0SW6QTVURwQ9oRf0ht6tZ+vV+rA+Z61LVjZziP7A+v4BfpGiew==</latexit>
i
Normalizing the feature
space
Centering features ˜ (a) = 1X
(a) (ai )
n i
<latexit sha1_base64="EyugDzsEPmro1jzrS2Mv2Gtl2e8=">AAACHnicbZDLSsNAFIYnXmu9RV26GSxCu7AkouhGKLpxWcFeoAlhMp20Q2cmYWYilNAnceOruHGhiOBK38ZJm4W2Hhj4+P9zOHP+MGFUacf5tpaWV1bX1ksb5c2t7Z1de2+/reJUYtLCMYtlN0SKMCpIS1PNSDeRBPGQkU44usn9zgORisbiXo8T4nM0EDSiGGkjBfa5pynrk8xrDumkimrwCuaY0wn0IokwdAX0VMoDWjgBrQV2xak704KL4BZQAUU1A/vT68c45URozJBSPddJtJ8hqSlmZFL2UkUShEdoQHoGBeJE+dn0vAk8NkofRrE0T2g4VX9PZIgrNeah6eRID9W8l4v/eb1UR5d+RkWSaiLwbFGUMqhjmGcF+1QSrNnYAMKSmr9CPEQmE20SLZsQ3PmTF6F9Wnedunt3VmlcF3GUwCE4AlXgggvQALegCVoAg0fwDF7Bm/VkvVjv1sesdckqZg7An7K+fgDhSp/S</latexit>

K̃ = JKJ
<latexit sha1_base64="p2qv0B+pEAeu67IV9E6NgVkuKP0=">AAAB+HicbVBNS8NAEJ3Ur1o/GvXoZbEInkoigl6EohdpLxXsB7ShbDabdulmE3Y3Qg39JV48KOLVn+LNf+O2zUFbHww83pthZp6fcKa043xbhbX1jc2t4nZpZ3dvv2wfHLZVnEpCWyTmsez6WFHOBG1ppjntJpLiyOe0449vZ37nkUrFYvGgJwn1IjwULGQEayMN7HJfMx7QrDFF16jeqA/silN15kCrxM1JBXI0B/ZXP4hJGlGhCcdK9Vwn0V6GpWaE02mpnyqaYDLGQ9ozVOCIKi+bHz5Fp0YJUBhLU0Kjufp7IsORUpPIN50R1iO17M3E/7xeqsMrL2MiSTUVZLEoTDnSMZqlgAImKdF8YggmkplbERlhiYk2WZVMCO7yy6ukfV51nap7f1Gp3eRxFOEYTuAMXLiEGtxBE1pAIIVneIU368l6sd6tj0VrwcpnjuAPrM8ffR2STg==</latexit>

1 T
J =I 1n 1n
<latexit sha1_base64="lHzK6UW3sjf0YaUCjq47G0aJ0PA=">AAACEnicbZDLSgMxFIYzXmu9jbp0EyyCLiwTEXQjFN2oqwq9QWccMmmmDc1khiQjlKHP4MZXceNCEbeu3Pk2pu0sausPgY//nEPO+YOEM6Ud58daWFxaXlktrBXXNza3tu2d3YaKU0loncQ8lq0AK8qZoHXNNKetRFIcBZw2g/71qN58pFKxWNT0IKFehLuChYxgbSzfPr6Dl/D2xA0lJhAJ6EZY94IwQ0NfTPNDzbdLTtkZC84DyqEEclV9+9vtxCSNqNCEY6XayEm0l2GpGeF0WHRTRRNM+rhL2wYFjqjysvFJQ3honA4MY2me0HDsTk9kOFJqEAWmc7Slmq2NzP9q7VSHF17GRJJqKsjkozDlUMdwlA/sMEmJ5gMDmEhmdoWkh0062qRYNCGg2ZPnoXFaRk4Z3Z+VKld5HAWwDw7AEUDgHFTADaiCOiDgCbyAN/BuPVuv1of1OWldsPKZPfBH1tcvKz2ckw==</latexit>
n
kPCA: algorithm

1. Choose your kernel

2. Construct centered kernel matrix


K̃ = JKJ <latexit sha1_base64="p2qv0B+pEAeu67IV9E6NgVkuKP0=">AAAB+HicbVBNS8NAEJ3Ur1o/GvXoZbEInkoigl6EohdpLxXsB7ShbDabdulmE3Y3Qg39JV48KOLVn+LNf+O2zUFbHww83pthZp6fcKa043xbhbX1jc2t4nZpZ3dvv2wfHLZVnEpCWyTmsez6WFHOBG1ppjntJpLiyOe0449vZ37nkUrFYvGgJwn1IjwULGQEayMN7HJfMx7QrDFF16jeqA/silN15kCrxM1JBXI0B/ZXP4hJGlGhCcdK9Vwn0V6GpWaE02mpnyqaYDLGQ9ozVOCIKi+bHz5Fp0YJUBhLU0Kjufp7IsORUpPIN50R1iO17M3E/7xeqsMrL2MiSTUVZLEoTDnSMZqlgAImKdF8YggmkplbERlhiYk2WZVMCO7yy6ukfV51nap7f1Gp3eRxFOEYTuAMXLiEGtxBE1pAIIVneIU368l6sd6tj0VrwcpnjuAPrM8ffR2STg==</latexit>

3. Solve the eigenvalue problem


K↵ = ˜ ↵
X
<latexit sha1_base64="+FUdJ45h9JKA4FPo3cnmOBgO5Gc=">AAACCnicbVDLSsNAFJ3UV62vqEs3o0VwVRIRdCMU3QhuKtgHNKHcTCbt0MkkzEyEErp246+4caGIW7/AnX/jtM1CWw8MHM65hzv3BClnSjvOt1VaWl5ZXSuvVzY2t7Z37N29lkoySWiTJDyRnQAU5UzQpmaa004qKcQBp+1geD3x2w9UKpaIez1KqR9DX7CIEdBG6tmHt9gDng4AX2JPMx7S3OMmHsK4MHp21ak5U+BF4hakigo0evaXFyYki6nQhINSXddJtZ+D1IxwOq54maIpkCH0addQATFVfj49ZYyPjRLiKJHmCY2n6u9EDrFSozgwkzHogZr3JuJ/XjfT0YWfM5FmmgoyWxRlHOsET3rBIZOUaD4yBIhk5q+YDEAC0aa9iinBnT95kbROa65Tc+/OqvWroo4yOkBH6AS56BzV0Q1qoCYi6BE9o1f0Zj1ZL9a79TEbLVlFZh/9gfX5Axh3meQ=</latexit>

4. For any data point, represent it yj = ↵ji K(x, xi )


<latexit sha1_base64="wEHL7a0+2KStbREwn/H0Xr4tdK4=">AAACCXicbVDLSsNAFJ3UV62vqEs3g0WoICURQTdC0Y3gpoJ9QBPCZDppp51JwsxEGkK3bvwVNy4UcesfuPNvnLZZaOuBC4dz7uXee/yYUaks69soLC2vrK4V10sbm1vbO+buXlNGicCkgSMWibaPJGE0JA1FFSPtWBDEfUZa/vB64rceiJA0Cu9VGhOXo15IA4qR0pJnwtQbwEvoyIR7FDqIxX3kZQM6vq2MTkYePfbMslW1poCLxM5JGeSoe+aX041wwkmoMENSdmwrVm6GhKKYkXHJSSSJER6iHuloGiJOpJtNPxnDI610YRAJXaGCU/X3RIa4lCn3dSdHqi/nvYn4n9dJVHDhZjSME0VCPFsUJAyqCE5igV0qCFYs1QRhQfWtEPeRQFjp8Eo6BHv+5UXSPK3aVtW+OyvXrvI4iuAAHIIKsME5qIEbUAcNgMEjeAav4M14Ml6Md+Nj1low8pl98AfG5w8aJplP</latexit>
i
Kernel function

Real-valued function of two arguments

When symmetric and non-negative can be interpreted as a similarity measure

It is a way to introduce prior knowledge about the data


Gaussian kernel

k(x, x0 ) = exp( 1/2(x


<latexit sha1_base64="lA57R0BjI2gUIrVodWA6wZuXCrk=">AAACKHicbVBNS8NAEN34bf2qevSyWIQUbE1E0ItY9OKxom2FJi2b7bRdupuE3Y20hP4cL/4VLyKKePWXuK1FtPpg4PHeDDPzgpgzpR3n3ZqZnZtfWFxazqysrq1vZDe3qipKJIUKjXgkbwOigLMQKpppDrexBCICDrWgdzHya3cgFYvCGz2IwRekE7I2o0QbqZk969n9/X7DiyUTkMen2IN+bBfcg0O7X/jWGzfeNesI0kgL7vCnkW9mc07RGQP/Je6E5NAE5Wb22WtFNBEQasqJUnXXibWfEqkZ5TDMeImCmNAe6UDd0JAIUH46fnSI94zSwu1Imgo1Hqs/J1IilBqIwHQKortq2huJ/3n1RLdP/JSFcaIhpF+L2gnHOsKj1HCLSaCaDwwhVDJzK6ZdIgnVJtuMCcGdfvkvqR4WXafoXh3lSueTOJbQDtpFNnLRMSqhS1RGFUTRPXpEL+jVerCerDfr/at1xprMbKNfsD4+ATvspEM=</latexit>
x0 ) T ⌃ 1
(x x0 ))
Cosine similarity

T 0
0 x x
k(x, x ) =
kxkkx0 k
<latexit sha1_base64="9sXSSeNWieqcDawQRIqsRuxFDAI=">AAACHnicbZDLSgMxFIYz9VbrbdSlm2ARKkiZEUU3QtGNywq9QWdaMmmmDc1cSDIyZTpP4sZXceNCEcGVvo1pO4K2Hkj4+f5zSM7vhIwKaRhfWm5peWV1Lb9e2Njc2t7Rd/caIog4JnUcsIC3HCQIoz6pSyoZaYWcIM9hpOkMbyZ+855wQQO/JkchsT3U96lLMZIKdfXzYSk+iTtWyKlHjuEVtFyOcBJ3aj8wTaxxbI3VlQFrnHb1olE2pgUXhZmJIsiq2tU/rF6AI4/4EjMkRNs0QmkniEuKGUkLViRIiPAQ9UlbSR95RNjJdL0UHinSg27A1fElnNLfEwnyhBh5jur0kByIeW8C//PakXQv7YT6YSSJj2cPuRGDMoCTrGCPcoIlGymBMKfqrxAPkMpHqkQLKgRzfuVF0Tgtm0bZvDsrVq6zOPLgAByCEjDBBaiAW1AFdYDBA3gCL+BVe9SetTftfdaa07KZffCntM9vllejcQ==</latexit>
Mercer kernels
Not all functions allow the kernel trick, only Mercer kernels
Mercer kernels
Not all functions allow the kernel trick, only Mercer kernels

A Mercer kernel is a positive definite kernel


Mercer kernels
Not all functions allow the kernel trick, only Mercer kernels

A Mercer kernel is a positive definite kernel

Mercer theorem: if K is positive definite, we can compute an EVD


Mercer kernels
Not all functions allow the kernel trick, only Mercer kernels

A Mercer kernel is a positive definite kernel

Mercer theorem: if K is positive definite, we can compute an EVD

K = U T ⇤U
<latexit sha1_base64="EX8JptGrttSmOMcleqTR1T/9rDQ=">AAAB+XicbVBNS8NAFHypX7V+RT16WSyCp5KIoBeh6EXQQ4WmLbSxbDabdulmE3Y3hRL6T7x4UMSr/8Sb/8Ztm4NWBxaGmTe8txOknCntOF9WaWV1bX2jvFnZ2t7Z3bP3D1oqySShHkl4IjsBVpQzQT3NNKedVFIcB5y2g9HNzG+PqVQsEU09Sakf44FgESNYG6lv23foCnmPzd69yYQYeX276tScOdBf4hakCgUaffuzFyYki6nQhGOluq6Taj/HUjPC6bTSyxRNMRnhAe0aKnBMlZ/PL5+iE6OEKEqkeUKjufozkeNYqUkcmMkY66Fa9mbif14309GlnzORZpoKslgUZRzpBM1qQCGTlGg+MQQTycytiAyxxESbsiqmBHf5y39J66zmOjX34bxavy7qKMMRHMMpuHABdbiFBnhAYAxP8AKvVm49W2/W+2K0ZBWZQ/gF6+MbWeuSLA==</latexit>
Mercer kernels
Not all functions allow the kernel trick, only Mercer kernels

A Mercer kernel is a positive definite kernel

Mercer theorem: if K is positive definite, we can compute an EVD

K = U T ⇤U
<latexit sha1_base64="EX8JptGrttSmOMcleqTR1T/9rDQ=">AAAB+XicbVBNS8NAFHypX7V+RT16WSyCp5KIoBeh6EXQQ4WmLbSxbDabdulmE3Y3hRL6T7x4UMSr/8Sb/8Ztm4NWBxaGmTe8txOknCntOF9WaWV1bX2jvFnZ2t7Z3bP3D1oqySShHkl4IjsBVpQzQT3NNKedVFIcB5y2g9HNzG+PqVQsEU09Sakf44FgESNYG6lv23foCnmPzd69yYQYeX276tScOdBf4hakCgUaffuzFyYki6nQhGOluq6Taj/HUjPC6bTSyxRNMRnhAe0aKnBMlZ/PL5+iE6OEKEqkeUKjufozkeNYqUkcmMkY66Fa9mbif14309GlnzORZpoKslgUZRzpBM1qQCGTlGg+MQQTycytiAyxxESbsiqmBHf5y39J66zmOjX34bxavy7qKMMRHMMpuHABdbiFBnhAYAxP8AKvVm49W2/W+2K0ZBWZQ/gF6+MbWeuSLA==</latexit>

1 1
T
kij = (⇤ U:,i ) (⇤ U:,j )
<latexit sha1_base64="7PSVHk128MLp8+3WPM34PhlN8JU=">AAACKHicdVDLSgMxFM3UV62vqks3wSK0IGWmCIogFt24cFGhL+hMh0yaadNmHiQZoQzzOW78FTciinTrl5i2s9BWDwQO59zDzT1OyKiQuj7RMiura+sb2c3c1vbO7l5+/6Apgohj0sABC3jbQYIw6pOGpJKRdsgJ8hxGWs7oduq3HgkXNPDrchwSy0N9n7oUI6kkO389smM6TOAVLJr3KtZDXdPlCEOjAht2fHlKk1K3/o83TEp2vqCX9RngMjFSUgApanb+zewFOPKILzFDQnQMPZRWjLikmJEkZ0aChAiPUJ90FPWRR4QVzw5N4IlSetANuHq+hDP1ZyJGnhBjz1GTHpIDsehNxb+8TiTdCyumfhhJ4uP5IjdiUAZw2hrsUU6wZGNFEOZU/RXiAVJdSNVtTpVgLJ68TJqVsqGXjYezQvUmrSMLjsAxKAIDnIMquAM10AAYPIEX8A4+tGftVfvUJvPRjJZmDsEvaF/fV4ujqg==</latexit>
2 2
Mercer kernels
Not all functions allow the kernel trick, only Mercer kernels

A Mercer kernel is a positive definite kernel

Mercer theorem: if K is positive definite, we can compute an EVD

K = U T ⇤U
<latexit sha1_base64="EX8JptGrttSmOMcleqTR1T/9rDQ=">AAAB+XicbVBNS8NAFHypX7V+RT16WSyCp5KIoBeh6EXQQ4WmLbSxbDabdulmE3Y3hRL6T7x4UMSr/8Sb/8Ztm4NWBxaGmTe8txOknCntOF9WaWV1bX2jvFnZ2t7Z3bP3D1oqySShHkl4IjsBVpQzQT3NNKedVFIcB5y2g9HNzG+PqVQsEU09Sakf44FgESNYG6lv23foCnmPzd69yYQYeX276tScOdBf4hakCgUaffuzFyYki6nQhGOluq6Taj/HUjPC6bTSyxRNMRnhAe0aKnBMlZ/PL5+iE6OEKEqkeUKjufozkeNYqUkcmMkY66Fa9mbif14309GlnzORZpoKslgUZRzpBM1qQCGTlGg+MQQTycytiAyxxESbsiqmBHf5y39J66zmOjX34bxavy7qKMMRHMMpuHABdbiFBnhAYAxP8AKvVm49W2/W+2K0ZBWZQ/gF6+MbWeuSLA==</latexit>

1 1
T
kij = (⇤ U:,i ) (⇤ U:,j )
<latexit sha1_base64="7PSVHk128MLp8+3WPM34PhlN8JU=">AAACKHicdVDLSgMxFM3UV62vqks3wSK0IGWmCIogFt24cFGhL+hMh0yaadNmHiQZoQzzOW78FTciinTrl5i2s9BWDwQO59zDzT1OyKiQuj7RMiura+sb2c3c1vbO7l5+/6Apgohj0sABC3jbQYIw6pOGpJKRdsgJ8hxGWs7oduq3HgkXNPDrchwSy0N9n7oUI6kkO389smM6TOAVLJr3KtZDXdPlCEOjAht2fHlKk1K3/o83TEp2vqCX9RngMjFSUgApanb+zewFOPKILzFDQnQMPZRWjLikmJEkZ0aChAiPUJ90FPWRR4QVzw5N4IlSetANuHq+hDP1ZyJGnhBjz1GTHpIDsehNxb+8TiTdCyumfhhJ4uP5IjdiUAZw2hrsUU6wZGNFEOZU/RXiAVJdSNVtTpVgLJ68TJqVsqGXjYezQvUmrSMLjsAxKAIDnIMquAM10AAYPIEX8A4+tGftVfvUJvPRjJZmDsEvaF/fV4ujqg==</latexit>
2 2

1
define
<latexit sha1_base64="bxx5p6RG9R99Sx4/RvjTTrBLwgY=">AAACDXicbVDLSsNAFJ3UV62vqEs3g1WoICUpgiIIRTcuXFSwD2himEwm7dDJJMxMhBL6A278FTcuFHHr3p1/47TNQlsPDBzOuYc79/gJo1JZ1rdRWFhcWl4prpbW1jc2t8ztnZaMU4FJE8csFh0fScIoJ01FFSOdRBAU+Yy0/cHV2G8/ECFpzO/UMCFuhHqchhQjpSXPPHAafVpBHj2CF9C50cEA3TuhQBjaNdj0svNjOvLMslW1JoDzxM5JGeRoeOaXE8Q4jQhXmCEpu7aVKDdDQlHMyKjkpJIkCA9Qj3Q15Sgi0s0m14zgoVYCGMZCP67gRP2dyFAk5TDy9WSEVF/OemPxP6+bqvDMzShPUkU4ni4KUwZVDMfVwIAKghUbaoKwoPqvEPeRrkLpAku6BHv25HnSqlVtq2rfnpTrl3kdRbAH9kEF2OAU1ME1aIAmwOARPINX8GY8GS/Gu/ExHS0YeWYX/IHx+QP6IZmY</latexit>
(ai ) = ⇤ U:,i 2
Mercer kernels
Not all functions allow the kernel trick, only Mercer kernels

A Mercer kernel is a positive definite kernel

Mercer theorem: if K is positive definite, we can compute an EVD

K = U T ⇤U
<latexit sha1_base64="EX8JptGrttSmOMcleqTR1T/9rDQ=">AAAB+XicbVBNS8NAFHypX7V+RT16WSyCp5KIoBeh6EXQQ4WmLbSxbDabdulmE3Y3hRL6T7x4UMSr/8Sb/8Ztm4NWBxaGmTe8txOknCntOF9WaWV1bX2jvFnZ2t7Z3bP3D1oqySShHkl4IjsBVpQzQT3NNKedVFIcB5y2g9HNzG+PqVQsEU09Sakf44FgESNYG6lv23foCnmPzd69yYQYeX276tScOdBf4hakCgUaffuzFyYki6nQhGOluq6Taj/HUjPC6bTSyxRNMRnhAe0aKnBMlZ/PL5+iE6OEKEqkeUKjufozkeNYqUkcmMkY66Fa9mbif14309GlnzORZpoKslgUZRzpBM1qQCGTlGg+MQQTycytiAyxxESbsiqmBHf5y39J66zmOjX34bxavy7qKMMRHMMpuHABdbiFBnhAYAxP8AKvVm49W2/W+2K0ZBWZQ/gF6+MbWeuSLA==</latexit>

1 1
T
kij = (⇤ U:,i ) (⇤ U:,j )
<latexit sha1_base64="7PSVHk128MLp8+3WPM34PhlN8JU=">AAACKHicdVDLSgMxFM3UV62vqks3wSK0IGWmCIogFt24cFGhL+hMh0yaadNmHiQZoQzzOW78FTciinTrl5i2s9BWDwQO59zDzT1OyKiQuj7RMiura+sb2c3c1vbO7l5+/6Apgohj0sABC3jbQYIw6pOGpJKRdsgJ8hxGWs7oduq3HgkXNPDrchwSy0N9n7oUI6kkO389smM6TOAVLJr3KtZDXdPlCEOjAht2fHlKk1K3/o83TEp2vqCX9RngMjFSUgApanb+zewFOPKILzFDQnQMPZRWjLikmJEkZ0aChAiPUJ90FPWRR4QVzw5N4IlSetANuHq+hDP1ZyJGnhBjz1GTHpIDsehNxb+8TiTdCyumfhhJ4uP5IjdiUAZw2hrsUU6wZGNFEOZU/RXiAVJdSNVtTpVgLJ68TJqVsqGXjYezQvUmrSMLjsAxKAIDnIMquAM10AAYPIEX8A4+tGftVfvUJvPRjJZmDsEvaF/fV4ujqg==</latexit>
2 2

1
define
<latexit sha1_base64="bxx5p6RG9R99Sx4/RvjTTrBLwgY=">AAACDXicbVDLSsNAFJ3UV62vqEs3g1WoICUpgiIIRTcuXFSwD2himEwm7dDJJMxMhBL6A278FTcuFHHr3p1/47TNQlsPDBzOuYc79/gJo1JZ1rdRWFhcWl4prpbW1jc2t8ztnZaMU4FJE8csFh0fScIoJ01FFSOdRBAU+Yy0/cHV2G8/ECFpzO/UMCFuhHqchhQjpSXPPHAafVpBHj2CF9C50cEA3TuhQBjaNdj0svNjOvLMslW1JoDzxM5JGeRoeOaXE8Q4jQhXmCEpu7aVKDdDQlHMyKjkpJIkCA9Qj3Q15Sgi0s0m14zgoVYCGMZCP67gRP2dyFAk5TDy9WSEVF/OemPxP6+bqvDMzShPUkU4ni4KUwZVDMfVwIAKghUbaoKwoPqvEPeRrkLpAku6BHv25HnSqlVtq2rfnpTrl3kdRbAH9kEF2OAU1ME1aIAmwOARPINX8GY8GS/Gu/ExHS0YeWYX/IHx+QP6IZmY</latexit>
(ai ) = ⇤ U:,i 2

K can be computed by inner product of feature vectors implicitly defined by U


Kernel PCA: properties

Kernel matrix is n⇥n


<latexit sha1_base64="n0+SQDjdsr7eyqurHHzy1g6Wn80=">AAAB8XicbVBNS8NAEJ3Ur1q/oh69LBbBU0lE0GPRi8cK9gPbUDbbTbt0swm7E6GE/gsvHhTx6r/x5r9x2+agrQ8GHu/NMDMvTKUw6HnfTmltfWNzq7xd2dnd2z9wD49aJsk0402WyER3Qmq4FIo3UaDknVRzGoeSt8Px7cxvP3FtRKIecJLyIKZDJSLBKFrpUZEeipgbovpu1at5c5BV4hekCgUafferN0hYFnOFTFJjur6XYpBTjYJJPq30MsNTysZ0yLuWKmrXBPn84ik5s8qARIm2pZDM1d8TOY2NmcSh7YwpjsyyNxP/87oZRtdBLlSaIVdssSjKJMGEzN4nA6E5QzmxhDIt7K2EjaimDG1IFRuCv/zyKmld1Hyv5t9fVus3RRxlOIFTOAcfrqAOd9CAJjBQ8Ayv8OYY58V5dz4WrSWnmDmGP3A+fwAChpB4</latexit>

Can give a good reencoding of the data when it lies along a non-linear manifold

Can have n positive eigenvalues (not necessarily dimensionality reduction)

Unlike PCA, some directions in feature space might not have preimages
in input space.
Note on Mercer kernels
Note on Mercer kernels
0 T 0
“Sigmoid kernel” k(x, x ) = tanh( x x + r)
<latexit sha1_base64="ry9qXQ+iFPtdaqnG04r+yqOynR0=">AAACFnicbVDLSgNBEJyNrxhfqx69DAYhooZdEfQiiF48RsgLsjH0TibJkJnZZWZWEoJf4cVf8eJBEa/izb9x8hA0saChqOqmuyuMOdPG876c1Nz8wuJSejmzsrq2vuFubpV1lChCSyTikaqGoClnkpYMM5xWY0VBhJxWwu7V0K/cUaVZJIumH9O6gLZkLUbAWKnhHnVzvcPebRArJug+PseBAdnJBW0QAnDvtvjj4QOs9htu1st7I+BZ4k9IFk1QaLifQTMiiaDSEA5a13wvNvUBKMMIp/eZINE0BtKFNq1ZKkFQXR+M3rrHe1Zp4lakbEmDR+rviQEIrfsitJ0CTEdPe0PxP6+WmNZZfcBknBgqyXhRK+HYRHiYEW4yRYnhfUuAKGZvxaQDCoixSWZsCP70y7OkfJz3vbx/c5K9uJzEkUY7aBflkI9O0QW6RgVUQgQ9oCf0gl6dR+fZeXPex60pZzKzjf7A+fgGpgudxg==</latexit>
Note on Mercer kernels
0 T 0
“Sigmoid kernel” k(x, x ) = tanh( x x + r)
<latexit sha1_base64="ry9qXQ+iFPtdaqnG04r+yqOynR0=">AAACFnicbVDLSgNBEJyNrxhfqx69DAYhooZdEfQiiF48RsgLsjH0TibJkJnZZWZWEoJf4cVf8eJBEa/izb9x8hA0saChqOqmuyuMOdPG876c1Nz8wuJSejmzsrq2vuFubpV1lChCSyTikaqGoClnkpYMM5xWY0VBhJxWwu7V0K/cUaVZJIumH9O6gLZkLUbAWKnhHnVzvcPebRArJug+PseBAdnJBW0QAnDvtvjj4QOs9htu1st7I+BZ4k9IFk1QaLifQTMiiaDSEA5a13wvNvUBKMMIp/eZINE0BtKFNq1ZKkFQXR+M3rrHe1Zp4lakbEmDR+rviQEIrfsitJ0CTEdPe0PxP6+WmNZZfcBknBgqyXhRK+HYRHiYEW4yRYnhfUuAKGZvxaQDCoixSWZsCP70y7OkfJz3vbx/c5K9uJzEkUY7aBflkI9O0QW6RgVUQgQ9oCf0gl6dR+fZeXPex60pZzKzjf7A+fgGpgudxg==</latexit>

Not a sigmoid, nor a Mercer kernel


Note on Mercer kernels
0 T 0
“Sigmoid kernel” k(x, x ) = tanh( x x + r)
<latexit sha1_base64="ry9qXQ+iFPtdaqnG04r+yqOynR0=">AAACFnicbVDLSgNBEJyNrxhfqx69DAYhooZdEfQiiF48RsgLsjH0TibJkJnZZWZWEoJf4cVf8eJBEa/izb9x8hA0saChqOqmuyuMOdPG876c1Nz8wuJSejmzsrq2vuFubpV1lChCSyTikaqGoClnkpYMM5xWY0VBhJxWwu7V0K/cUaVZJIumH9O6gLZkLUbAWKnhHnVzvcPebRArJug+PseBAdnJBW0QAnDvtvjj4QOs9htu1st7I+BZ4k9IFk1QaLifQTMiiaDSEA5a13wvNvUBKMMIp/eZINE0BtKFNq1ZKkFQXR+M3rrHe1Zp4lakbEmDR+rviQEIrfsitJ0CTEdPe0PxP6+WmNZZfcBknBgqyXhRK+HYRHiYEW4yRYnhfUuAKGZvxaQDCoixSWZsCP70y7OkfJz3vbx/c5K9uJzEkUY7aBflkI9O0QW6RgVUQgQ9oCf0gl6dR+fZeXPex60pZzKzjf7A+fgGpgudxg==</latexit>

Not a sigmoid, nor a Mercer kernel

Inspired by the multi-layer perceptron, but there is no good reason to use it…
Note on Mercer kernels
True NN kernel:

k(x, x0 ) =
<latexit sha1_base64="NAMTGXD8lfBYRZtV2DhoDMROtkU=">AAAB+XicbVBNSwMxEJ31s9avVY9egkWoIGVXBL0IRS8eK9gPaNeSTdM2NMkuSba0LP0nXjwo4tV/4s1/Y9ruQVsfDDzem2FmXhhzpo3nfTsrq2vrG5u5rfz2zu7evntwWNNRogitkohHqhFiTTmTtGqY4bQRK4pFyGk9HNxN/fqQKs0i+WjGMQ0E7knWZQQbK7Vdd1AcnY+eWrFigp6hG9R2C17JmwEtEz8jBchQabtfrU5EEkGlIRxr3fS92AQpVoYRTif5VqJpjMkA92jTUokF1UE6u3yCTq3SQd1I2ZIGzdTfEykWWo9FaDsFNn296E3F/7xmYrrXQcpknBgqyXxRN+HIRGgaA+owRYnhY0swUczeikgfK0yMDStvQ/AXX14mtYuS75X8h8tC+TaLIwfHcAJF8OEKynAPFagCgSE8wyu8Oanz4rw7H/PWFSebOYI/cD5/ANQPknc=</latexit>


1
x̃ =
x
<latexit sha1_base64="oqLl+mwc6yxARsov7Uj6SeeZtXs=">AAACKnicbVDLSsNAFJ34rPFVdelmsAiuSiKCboSqG5cV7AOaUiaTm3boZBJmJtIS+j1u/BU3XSjFrR/ipC0+Wg8MHM659869x084U9pxJtbK6tr6xmZhy97e2d3bLx4c1lWcSgo1GvNYNn2igDMBNc00h2YigUQ+h4bfv8v9xhNIxWLxqIcJtCPSFSxklGgjdYo3nmY8gMyLiO75YTYYjfA19nzoMpH5RpRsMLJdz7N/KmwPRPBtdoolp+xMgZeJOyclNEe1Uxx7QUzTCISmnCjVcp1EtzMiNaMczPRUQUJon3ShZaggEah2Nj11hE+NEuAwluYJjafq746MREoNI99U5vuqRS8X//NaqQ6v2hkTSapB0NlHYcqxjnGeGw6YBKr50BBCJTO7YtojklBt0rVNCO7iycukfl52nbL7cFGq3M7jKKBjdILOkIsuUQXdoyqqIYqe0St6Q+/WizW2JtbHrHTFmvccoT+wPr8AV9+obQ==</latexit>

Covariance matrix of a Gaussian



<latexit sha1_base64="xl6gt5zesTVLaQIKxfcD+hK+ntk=">AAAB7XicbVDLSgNBEOyNrxhfUY9eBoPgKeyKEI9BLx4jmgckS5idzCZj5rHMzAphyT948aCIV//Hm3/jJNmDJhY0FFXddHdFCWfG+v63V1hb39jcKm6Xdnb39g/Kh0cto1JNaJMornQnwoZyJmnTMstpJ9EUi4jTdjS+mfntJ6oNU/LBThIaCjyULGYEWye1evdsKHC/XPGr/hxolQQ5qUCORr/81RsokgoqLeHYmG7gJzbMsLaMcDot9VJDE0zGeEi7jkosqAmz+bVTdOaUAYqVdiUtmqu/JzIsjJmIyHUKbEdm2ZuJ/3nd1MZXYcZkkloqyWJRnHJkFZq9jgZMU2L5xBFMNHO3IjLCGhPrAiq5EILll1dJ66Ia+NXg7rJSv87jKMIJnMI5BFCDOtxCA5pA4BGe4RXePOW9eO/ex6K14OUzx/AH3ucPa1+PAw==</latexit>
<latexit

process

<latexit sha1_base64="7DcvypsyjNY2TwKStaH9k4JTaHY=">AAAB8nicbVDLSgNBEOyNrxhfUY9eBoPgKeyKoBch6MVjBPOAzRJmJ7PJkHksM7NCWPIZXjwo4tWv8ebfOEn2oIkFDUVVN91dccqZsb7/7ZXW1jc2t8rblZ3dvf2D6uFR26hME9oiiivdjbGhnEnassxy2k01xSLmtBOP72Z+54lqw5R8tJOURgIPJUsYwdZJYc+wocDoBgV+v1rz6/4caJUEBalBgWa/+tUbKJIJKi3h2Jgw8FMb5VhbRjidVnqZoSkmYzykoaMSC2qifH7yFJ05ZYASpV1Ji+bq74kcC2MmInadAtuRWfZm4n9emNnkOsqZTDNLJVksSjKOrEKz/9GAaUosnziCiWbuVkRGWGNiXUoVF0Kw/PIqaV/UA78ePFzWGrdFHGU4gVM4hwCuoAH30IQWEFDwDK/w5lnvxXv3PhatJa+YOYY/8D5/ALN4kDM=</latexit>
= 10 Murphy, 2012
Streaming kPCA

Ghashami, Perry, Phillips, Streaming kernel PCA, AISTATS, 2016

Builds on frequent directions


Vanilla kPCA
Vanilla kPCA
1. Provides set of basis vectors capturing nonlinear structures within
large datasets;

2. Potentially a great tool for data analysis and learning…


Vanilla kPCA
1. Provides set of basis vectors capturing nonlinear structures within
large datasets;

2. Potentially a great tool for data analysis and learning…

But
Vanilla kPCA
1. Provides set of basis vectors capturing nonlinear structures within
large datasets;

2. Potentially a great tool for data analysis and learning…

But

To allow for nonlinear relations, typically a full nxn kernel matrix is constructed
over n data points
Vanilla kPCA
1. Provides set of basis vectors capturing nonlinear structures within
large datasets;

2. Potentially a great tool for data analysis and learning…

But

To allow for nonlinear relations, typically a full nxn kernel matrix is constructed
over n data points

Requires too much space and time for large values of n


Vanilla kPCA: how much is
too much?

Storing matrix K: O(n ) 2


<latexit sha1_base64="YWVqE8nlPUjFoY6Z77g5giNAzcE=">AAAB7XicbVBNSwMxEJ2tX7V+VT16CRahXspuKeix6MWbFewHtGvJptk2NpssSVYoS/+DFw+KePX/ePPfmLZ70NYHA4/3ZpiZF8ScaeO6305ubX1jcyu/XdjZ3ds/KB4etbRMFKFNIrlUnQBrypmgTcMMp51YURwFnLaD8fXMbz9RpZkU92YSUz/CQ8FCRrCxUuu2LB6q5/1iya24c6BV4mWkBBka/eJXbyBJElFhCMdadz03Nn6KlWGE02mhl2gaYzLGQ9q1VOCIaj+dXztFZ1YZoFAqW8Kgufp7IsWR1pMosJ0RNiO97M3E/7xuYsJLP2UiTgwVZLEoTDgyEs1eRwOmKDF8YgkmitlbERlhhYmxARVsCN7yy6ukVa14bsW7q5XqV1kceTiBUyiDBxdQhxtoQBMIPMIzvMKbI50X5935WLTmnGzmGP7A+fwBYOmOVA==</latexit>
Vanilla kPCA: how much is
too much?

Storing matrix K: O(n ) 2


<latexit sha1_base64="YWVqE8nlPUjFoY6Z77g5giNAzcE=">AAAB7XicbVBNSwMxEJ2tX7V+VT16CRahXspuKeix6MWbFewHtGvJptk2NpssSVYoS/+DFw+KePX/ePPfmLZ70NYHA4/3ZpiZF8ScaeO6305ubX1jcyu/XdjZ3ds/KB4etbRMFKFNIrlUnQBrypmgTcMMp51YURwFnLaD8fXMbz9RpZkU92YSUz/CQ8FCRrCxUuu2LB6q5/1iya24c6BV4mWkBBka/eJXbyBJElFhCMdadz03Nn6KlWGE02mhl2gaYzLGQ9q1VOCIaj+dXztFZ1YZoFAqW8Kgufp7IsWR1pMosJ0RNiO97M3E/7xuYsJLP2UiTgwVZLEoTDgyEs1eRwOmKDF8YgkmitlbERlhhYmxARVsCN7yy6ukVa14bsW7q5XqV1kceTiBUyiDBxdQhxtoQBMIPMIzvMKbI50X5935WLTmnGzmGP7A+fwBYOmOVA==</latexit>

3
Computing the decomposition of K: O(n )
<latexit sha1_base64="pUzYwhqmk7Jc05OtuANfZpfZ6hg=">AAAB7XicbVBNSwMxEJ2tX7V+VT16CRahXsquFfRY9OLNCvYD2rVk02wbm02WJCuUpf/BiwdFvPp/vPlvTNs9aOuDgcd7M8zMC2LOtHHdbye3srq2vpHfLGxt7+zuFfcPmlomitAGkVyqdoA15UzQhmGG03asKI4CTlvB6Hrqt56o0kyKezOOqR/hgWAhI9hYqXlbFg/V016x5FbcGdAy8TJSggz1XvGr25ckiagwhGOtO54bGz/FyjDC6aTQTTSNMRnhAe1YKnBEtZ/Orp2gE6v0USiVLWHQTP09keJI63EU2M4Im6Fe9Kbif14nMeGlnzIRJ4YKMl8UJhwZiaavoz5TlBg+tgQTxeytiAyxwsTYgAo2BG/x5WXSPKt4bsW7Oy/VrrI48nAEx1AGDy6gBjdQhwYQeIRneIU3RzovzrvzMW/NOdnMIfyB8/kDYm6OVQ==</latexit>
Vanilla kPCA: how much is
too much?

Storing matrix K: O(n ) 2


<latexit sha1_base64="YWVqE8nlPUjFoY6Z77g5giNAzcE=">AAAB7XicbVBNSwMxEJ2tX7V+VT16CRahXspuKeix6MWbFewHtGvJptk2NpssSVYoS/+DFw+KePX/ePPfmLZ70NYHA4/3ZpiZF8ScaeO6305ubX1jcyu/XdjZ3ds/KB4etbRMFKFNIrlUnQBrypmgTcMMp51YURwFnLaD8fXMbz9RpZkU92YSUz/CQ8FCRrCxUuu2LB6q5/1iya24c6BV4mWkBBka/eJXbyBJElFhCMdadz03Nn6KlWGE02mhl2gaYzLGQ9q1VOCIaj+dXztFZ1YZoFAqW8Kgufp7IsWR1pMosJ0RNiO97M3E/7xuYsJLP2UiTgwVZLEoTDgyEs1eRwOmKDF8YgkmitlbERlhhYmxARVsCN7yy6ukVa14bsW7q5XqV1kceTiBUyiDBxdQhxtoQBMIPMIzvMKbI50X5935WLTmnGzmGP7A+fwBYOmOVA==</latexit>

3
Computing the decomposition of K: O(n )
<latexit sha1_base64="pUzYwhqmk7Jc05OtuANfZpfZ6hg=">AAAB7XicbVBNSwMxEJ2tX7V+VT16CRahXsquFfRY9OLNCvYD2rVk02wbm02WJCuUpf/BiwdFvPp/vPlvTNs9aOuDgcd7M8zMC2LOtHHdbye3srq2vpHfLGxt7+zuFfcPmlomitAGkVyqdoA15UzQhmGG03asKI4CTlvB6Hrqt56o0kyKezOOqR/hgWAhI9hYqXlbFg/V016x5FbcGdAy8TJSggz1XvGr25ckiagwhGOtO54bGz/FyjDC6aTQTTSNMRnhAe1YKnBEtZ/Orp2gE6v0USiVLWHQTP09keJI63EU2M4Im6Fe9Kbif14nMeGlnzIRJ4YKMl8UJhwZiaavoz5TlBg+tgQTxeytiAyxwsTYgAo2BG/x5WXSPKt4bsW7Oy/VrrI48nAEx1AGDy6gBjdQhwYQeIRneIU3RzovzrvzMW/NOdnMIfyB8/kDYm6OVQ==</latexit>

Time to evaluate the kernel function for any arbitrary test


vector with respect to all training examples: O(nm) <latexit sha1_base64="OMLplQzZdF/InFhsmFeSXzDUATY=">AAAB7HicbVBNS8NAEJ3Ur1q/qh69LBahXkoigh6LXrxZwbSFNpTNdtMu3d2E3Y0QQn+DFw+KePUHefPfuG1z0NYHA4/3ZpiZFyacaeO6305pbX1jc6u8XdnZ3ds/qB4etXWcKkJ9EvNYdUOsKWeS+oYZTruJoliEnHbCye3M7zxRpVksH02W0EDgkWQRI9hYyb+vS3E+qNbchjsHWiVeQWpQoDWofvWHMUkFlYZwrHXPcxMT5FgZRjidVvqppgkmEzyiPUslFlQH+fzYKTqzyhBFsbIlDZqrvydyLLTORGg7BTZjvezNxP+8Xmqi6yBnMkkNlWSxKEo5MjGafY6GTFFieGYJJorZWxEZY4WJsflUbAje8surpH3R8NyG93BZa94UcZThBE6hDh5cQRPuoAU+EGDwDK/w5kjnxXl3PhatJaeYOYY/cD5/AAWLjic=</latexit>
How to alleviate this?
How to alleviate this?
Sample points, to obtain a smaller K
How to alleviate this?
Sample points, to obtain a smaller K
Williams and Seeger. "Using the Nyström method to speed up kernel machines." NIPS. 2001.
How to alleviate this?
Sample points, to obtain a smaller K
Williams and Seeger. "Using the Nyström method to speed up kernel machines." NIPS. 2001.

Approximate but explicit embedding of the lifting into Euclidean


space, using a random mapping
How to alleviate this?
Sample points, to obtain a smaller K
Williams and Seeger. "Using the Nyström method to speed up kernel machines." NIPS. 2001.

Approximate but explicit embedding of the lifting into Euclidean


space, using a random mapping
Rahimi, Ali, and Benjamin Recht. "Random features for large-scale kernel machines." NIPS. 2008.
Randomized feature maps
m d
Z:R
<latexit sha1_base64="qi6XQ9NrPxOGEVnwnOUGEMhmI9M=">AAACCnicbVC7TsMwFHV4lvIKMLIYKiSmKkFIIKYKFsaC6EM0oXIcp7VqO5HtIFVRZxZ+hYUBhFj5Ajb+BqfNUFqOZOnonHvle06QMKq04/xYC4tLyyurpbXy+sbm1ra9s9tUcSoxaeCYxbIdIEUYFaShqWaknUiCeMBIKxhc5X7rkUhFY3GnhwnxOeoJGlGMtJG69sE9vIAeR7ofBNnt6IFDT8fTQti1K07VGQPOE7cgFVCg3rW/vTDGKSdCY4aU6rhOov0MSU0xI6OylyqSIDxAPdIxVCBOlJ+No4zgkVFCGMXSPKHhWJ3eyBBXasgDM5nfqGa9XPzP66Q6OvczKpJUE4EnH0UpgyZt3gsMqSRYs6EhCEtqboW4jyTC2rRXNiW4s5HnSfOk6jpV9+a0Urss6iiBfXAIjoELzkANXIM6aAAMnsALeAPv1rP1an1Yn5PRBavY2QN/YH39Aj5smf0=</latexit>
!R
Randomized feature maps
m d
Z:R
<latexit sha1_base64="qi6XQ9NrPxOGEVnwnOUGEMhmI9M=">AAACCnicbVC7TsMwFHV4lvIKMLIYKiSmKkFIIKYKFsaC6EM0oXIcp7VqO5HtIFVRZxZ+hYUBhFj5Ajb+BqfNUFqOZOnonHvle06QMKq04/xYC4tLyyurpbXy+sbm1ra9s9tUcSoxaeCYxbIdIEUYFaShqWaknUiCeMBIKxhc5X7rkUhFY3GnhwnxOeoJGlGMtJG69sE9vIAeR7ofBNnt6IFDT8fTQti1K07VGQPOE7cgFVCg3rW/vTDGKSdCY4aU6rhOov0MSU0xI6OylyqSIDxAPdIxVCBOlJ+No4zgkVFCGMXSPKHhWJ3eyBBXasgDM5nfqGa9XPzP66Q6OvczKpJUE4EnH0UpgyZt3gsMqSRYs6EhCEtqboW4jyTC2rRXNiW4s5HnSfOk6jpV9+a0Urss6iiBfXAIjoELzkANXIM6aAAMnsALeAPv1rP1an1Yn5PRBavY2QN/YH39Aj5smf0=</latexit>
!R

For any shift invariant kernel k(x, y) = k(x


<latexit sha1_base64="g08mJemHIbiqQbYUE3mosN/i58I=">AAAB+HicbZDLSsNAFIZP6q3WS6Mu3QwWoQUtiQi6EYpuXFawF2hDmUyn7dDJJMxMxBj6JG5cKOLWR3Hn2zhts9DWHwY+/nMO58zvR5wp7TjfVm5ldW19I79Z2Nre2S3ae/tNFcaS0AYJeSjbPlaUM0EbmmlO25GkOPA5bfnjm2m99UClYqG410lEvQAPBRswgrWxenZxXH48SSroChk4TSo9u+RUnZnQMrgZlCBTvWd/dfshiQMqNOFYqY7rRNpLsdSMcDopdGNFI0zGeEg7BgUOqPLS2eETdGycPhqE0jyh0cz9PZHiQKkk8E1ngPVILdam5n+1TqwHl17KRBRrKsh80SDmSIdomgLqM0mJ5okBTCQztyIywhITbbIqmBDcxS8vQ/Os6jpV9+68VLvO4sjDIRxBGVy4gBrcQh0aQCCGZ3iFN+vJerHerY95a87KZg7gj6zPHyRxkXE=</latexit>
y)
Randomized feature maps
m d
Z:R
<latexit sha1_base64="qi6XQ9NrPxOGEVnwnOUGEMhmI9M=">AAACCnicbVC7TsMwFHV4lvIKMLIYKiSmKkFIIKYKFsaC6EM0oXIcp7VqO5HtIFVRZxZ+hYUBhFj5Ajb+BqfNUFqOZOnonHvle06QMKq04/xYC4tLyyurpbXy+sbm1ra9s9tUcSoxaeCYxbIdIEUYFaShqWaknUiCeMBIKxhc5X7rkUhFY3GnhwnxOeoJGlGMtJG69sE9vIAeR7ofBNnt6IFDT8fTQti1K07VGQPOE7cgFVCg3rW/vTDGKSdCY4aU6rhOov0MSU0xI6OylyqSIDxAPdIxVCBOlJ+No4zgkVFCGMXSPKHhWJ3eyBBXasgDM5nfqGa9XPzP66Q6OvczKpJUE4EnH0UpgyZt3gsMqSRYs6EhCEtqboW4jyTC2rRXNiW4s5HnSfOk6jpV9+a0Urss6iiBfXAIjoELzkANXIM6aAAMnsALeAPv1rP1an1Yn5PRBavY2QN/YH39Aj5smf0=</latexit>
!R

For any shift invariant kernel k(x, y) = k(x


<latexit sha1_base64="g08mJemHIbiqQbYUE3mosN/i58I=">AAAB+HicbZDLSsNAFIZP6q3WS6Mu3QwWoQUtiQi6EYpuXFawF2hDmUyn7dDJJMxMxBj6JG5cKOLWR3Hn2zhts9DWHwY+/nMO58zvR5wp7TjfVm5ldW19I79Z2Nre2S3ae/tNFcaS0AYJeSjbPlaUM0EbmmlO25GkOPA5bfnjm2m99UClYqG410lEvQAPBRswgrWxenZxXH48SSroChk4TSo9u+RUnZnQMrgZlCBTvWd/dfshiQMqNOFYqY7rRNpLsdSMcDopdGNFI0zGeEg7BgUOqPLS2eETdGycPhqE0jyh0cz9PZHiQKkk8E1ngPVILdam5n+1TqwHl17KRBRrKsh80SDmSIdomgLqM0mJ5okBTCQztyIywhITbbIqmBDcxS8vQ/Os6jpV9+68VLvO4sjDIRxBGVy4gBrcQh0aQCCGZ3iFN+vJerHerY95a87KZg7gj6zPHyRxkXE=</latexit>
y)

E[hZ(x), Z(y)i] = k(x, y)


<latexit sha1_base64="FrbiT7nwDUttDjNNtjHsr+mO8pQ=">AAACDnicbVBNS8NAEN3Ur1q/oh69LJZCCqUkIuhFKIrgsYJtpW0om+22XbrZhN2NNIT+Ai/+FS8eFPHq2Zv/xm2ag7Y+GHi8N8PMPC9kVCrb/jZyK6tr6xv5zcLW9s7unrl/0JRBJDBp4IAF4t5DkjDKSUNRxch9KAjyPUZa3vhq5rceiJA04HcqDonroyGnA4qR0lLPLF13ugzxISOwbU3KlbYVl2FXpIoLL+DYmlTics8s2lU7BVwmTkaKIEO9Z351+wGOfMIVZkjKjmOHyk2QUBQzMi10I0lChMdoSDqacuQT6SbpO1NY0kofDgKhiyuYqr8nEuRLGfue7vSRGslFbyb+53UiNTh3E8rDSBGO54sGEYMqgLNsYJ8KghWLNUFYUH0rxCMkEFY6wYIOwVl8eZk0T6qOXXVuT4u1yyyOPDgCx8ACDjgDNXAD6qABMHgEz+AVvBlPxovxbnzMW3NGNnMI/sD4/AFn15nH</latexit>
Randomized feature maps
m d
Z:R
<latexit sha1_base64="qi6XQ9NrPxOGEVnwnOUGEMhmI9M=">AAACCnicbVC7TsMwFHV4lvIKMLIYKiSmKkFIIKYKFsaC6EM0oXIcp7VqO5HtIFVRZxZ+hYUBhFj5Ajb+BqfNUFqOZOnonHvle06QMKq04/xYC4tLyyurpbXy+sbm1ra9s9tUcSoxaeCYxbIdIEUYFaShqWaknUiCeMBIKxhc5X7rkUhFY3GnhwnxOeoJGlGMtJG69sE9vIAeR7ofBNnt6IFDT8fTQti1K07VGQPOE7cgFVCg3rW/vTDGKSdCY4aU6rhOov0MSU0xI6OylyqSIDxAPdIxVCBOlJ+No4zgkVFCGMXSPKHhWJ3eyBBXasgDM5nfqGa9XPzP66Q6OvczKpJUE4EnH0UpgyZt3gsMqSRYs6EhCEtqboW4jyTC2rRXNiW4s5HnSfOk6jpV9+a0Urss6iiBfXAIjoELzkANXIM6aAAMnsALeAPv1rP1an1Yn5PRBavY2QN/YH39Aj5smf0=</latexit>
!R

For any shift invariant kernel k(x, y) = k(x


<latexit sha1_base64="g08mJemHIbiqQbYUE3mosN/i58I=">AAAB+HicbZDLSsNAFIZP6q3WS6Mu3QwWoQUtiQi6EYpuXFawF2hDmUyn7dDJJMxMxBj6JG5cKOLWR3Hn2zhts9DWHwY+/nMO58zvR5wp7TjfVm5ldW19I79Z2Nre2S3ae/tNFcaS0AYJeSjbPlaUM0EbmmlO25GkOPA5bfnjm2m99UClYqG410lEvQAPBRswgrWxenZxXH48SSroChk4TSo9u+RUnZnQMrgZlCBTvWd/dfshiQMqNOFYqY7rRNpLsdSMcDopdGNFI0zGeEg7BgUOqPLS2eETdGycPhqE0jyh0cz9PZHiQKkk8E1ngPVILdam5n+1TqwHl17KRBRrKsh80SDmSIdomgLqM0mJ5okBTCQztyIywhITbbIqmBDcxS8vQ/Os6jpV9+68VLvO4sjDIRxBGVy4gBrcQh0aQCCGZ3iFN+vJerHerY95a87KZg7gj6zPHyRxkXE=</latexit>
y)

E[hZ(x), Z(y)i] = k(x, y)


<latexit sha1_base64="FrbiT7nwDUttDjNNtjHsr+mO8pQ=">AAACDnicbVBNS8NAEN3Ur1q/oh69LJZCCqUkIuhFKIrgsYJtpW0om+22XbrZhN2NNIT+Ai/+FS8eFPHq2Zv/xm2ag7Y+GHi8N8PMPC9kVCrb/jZyK6tr6xv5zcLW9s7unrl/0JRBJDBp4IAF4t5DkjDKSUNRxch9KAjyPUZa3vhq5rceiJA04HcqDonroyGnA4qR0lLPLF13ugzxISOwbU3KlbYVl2FXpIoLL+DYmlTics8s2lU7BVwmTkaKIEO9Z351+wGOfMIVZkjKjmOHyk2QUBQzMi10I0lChMdoSDqacuQT6SbpO1NY0kofDgKhiyuYqr8nEuRLGfue7vSRGslFbyb+53UiNTh3E8rDSBGO54sGEYMqgLNsYJ8KghWLNUFYUH0rxCMkEFY6wYIOwVl8eZk0T6qOXXVuT4u1yyyOPDgCx8ACDjgDNXAD6qABMHgEz+AVvBlPxovxbnzMW3NGNnMI/sD4/AFn15nH</latexit>

2
if d = O((d/✏ ) log(n/ ))
<latexit sha1_base64="9mKjKharHTvMK3+B8+kPEi+Bmj8=">AAACDHicbVDLSgMxFM3UV62vqks3wSK0m3amCLoRim7cWcE+oFNLJnPbhmaSIckIpfQD3Pgrblwo4tYPcOffmD4W2nogcDjnXG7uCWLOtHHdbye1srq2vpHezGxt7+zuZfcP6lomikKNSi5VMyAaOBNQM8xwaMYKSBRwaASDq4nfeAClmRR3ZhhDOyI9wbqMEmOlTjYX4gt8k8+HJR9izbgU9+WCz2UvL0p+CNyQQsGm3KI7BV4m3pzk0BzVTvbLDyVNIhCGcqJ1y3Nj0x4RZRjlMM74iYaY0AHpQctSQSLQ7dH0mDE+sUqIu1LZJwyeqr8nRiTSehgFNhkR09eL3kT8z2slpnveHjERJwYEnS3qJhwbiSfN4JApoIYPLSFUMftXTPtEEWpsfxlbgrd48jKpl4ueW/RuT3OVy3kdaXSEjlEeeegMVdA1qqIaougRPaNX9OY8OS/Ou/Mxi6ac+cwh+gPn8wdWXZlG</latexit>

P(|hZ(x), Z(y)i
<latexit sha1_base64="u9x+KqsvsXPk9f3YvTzuGFhZZM4=">AAACNHicbVDLSgMxFM3Ud32NunQTLMIU2jIjgi5FN4KbClZLO6Vk0ts2NJMZk4w4jP0oN36IGxFcKOLWbzB9LHwdCJyccy/JOUHMmdKu+2zlZmbn5hcWl/LLK6tr6/bG5qWKEkmhRiMeyXpAFHAmoKaZ5lCPJZAw4HAVDE5G/tUNSMUicaHTGFoh6QnWZZRoI7XtMz8kuh8EWXXo3PmciB4H3HBui6WGkxaxLydKeeDcltLiHfY5XGMfYsV4JIzfM1ev7HeAa9K2C27FHQP/Jd6UFNAU1bb96HcimoQgNOVEqabnxrqVEakZ5TDM+4mCmNAB6UHTUEFCUK1sHHqId43Swd1ImiM0HqvfNzISKpWGgZkcRVS/vZH4n9dMdPewlTERJxoEnTzUTTjWER41iDtMAtU8NYRQycxfMe0TSag2PedNCd7vyH/J5V7Fcyve+X7h6HhaxyLaRjvIQR46QEfoFFVRDVF0j57QK3qzHqwX6936mIzmrOnOFvoB6/MLGZSpQg==</latexit>
k(x, y)|  ✏) 1
Randomized feature maps
m d
Z:R
<latexit sha1_base64="qi6XQ9NrPxOGEVnwnOUGEMhmI9M=">AAACCnicbVC7TsMwFHV4lvIKMLIYKiSmKkFIIKYKFsaC6EM0oXIcp7VqO5HtIFVRZxZ+hYUBhFj5Ajb+BqfNUFqOZOnonHvle06QMKq04/xYC4tLyyurpbXy+sbm1ra9s9tUcSoxaeCYxbIdIEUYFaShqWaknUiCeMBIKxhc5X7rkUhFY3GnhwnxOeoJGlGMtJG69sE9vIAeR7ofBNnt6IFDT8fTQti1K07VGQPOE7cgFVCg3rW/vTDGKSdCY4aU6rhOov0MSU0xI6OylyqSIDxAPdIxVCBOlJ+No4zgkVFCGMXSPKHhWJ3eyBBXasgDM5nfqGa9XPzP66Q6OvczKpJUE4EnH0UpgyZt3gsMqSRYs6EhCEtqboW4jyTC2rRXNiW4s5HnSfOk6jpV9+a0Urss6iiBfXAIjoELzkANXIM6aAAMnsALeAPv1rP1an1Yn5PRBavY2QN/YH39Aj5smf0=</latexit>
!R

For any shift invariant kernel k(x, y) = k(x


<latexit sha1_base64="g08mJemHIbiqQbYUE3mosN/i58I=">AAAB+HicbZDLSsNAFIZP6q3WS6Mu3QwWoQUtiQi6EYpuXFawF2hDmUyn7dDJJMxMxBj6JG5cKOLWR3Hn2zhts9DWHwY+/nMO58zvR5wp7TjfVm5ldW19I79Z2Nre2S3ae/tNFcaS0AYJeSjbPlaUM0EbmmlO25GkOPA5bfnjm2m99UClYqG410lEvQAPBRswgrWxenZxXH48SSroChk4TSo9u+RUnZnQMrgZlCBTvWd/dfshiQMqNOFYqY7rRNpLsdSMcDopdGNFI0zGeEg7BgUOqPLS2eETdGycPhqE0jyh0cz9PZHiQKkk8E1ngPVILdam5n+1TqwHl17KRBRrKsh80SDmSIdomgLqM0mJ5okBTCQztyIywhITbbIqmBDcxS8vQ/Os6jpV9+68VLvO4sjDIRxBGVy4gBrcQh0aQCCGZ3iFN+vJerHerY95a87KZg7gj6zPHyRxkXE=</latexit>
y)

E[hZ(x), Z(y)i] = k(x, y)


<latexit sha1_base64="FrbiT7nwDUttDjNNtjHsr+mO8pQ=">AAACDnicbVBNS8NAEN3Ur1q/oh69LJZCCqUkIuhFKIrgsYJtpW0om+22XbrZhN2NNIT+Ai/+FS8eFPHq2Zv/xm2ag7Y+GHi8N8PMPC9kVCrb/jZyK6tr6xv5zcLW9s7unrl/0JRBJDBp4IAF4t5DkjDKSUNRxch9KAjyPUZa3vhq5rceiJA04HcqDonroyGnA4qR0lLPLF13ugzxISOwbU3KlbYVl2FXpIoLL+DYmlTics8s2lU7BVwmTkaKIEO9Z351+wGOfMIVZkjKjmOHyk2QUBQzMi10I0lChMdoSDqacuQT6SbpO1NY0kofDgKhiyuYqr8nEuRLGfue7vSRGslFbyb+53UiNTh3E8rDSBGO54sGEYMqgLNsYJ8KghWLNUFYUH0rxCMkEFY6wYIOwVl8eZk0T6qOXXVuT4u1yyyOPDgCx8ACDjgDNXAD6qABMHgEz+AVvBlPxovxbnzMW3NGNnMI/sD4/AFn15nH</latexit>

instead of implicitly lifting data points to the high-dimensional feature space


by the kernel trick, the maps explicitly embed the data to a low-dimensional
Euclidean inner product subspace
Streaming kPCA
Streaming kPCA
Phase 1: before the data arrives
Streaming kPCA
Phase 1: before the data arrives

(f1 , . . . , fd )
<latexit sha1_base64="pe05lwAVD7COl5M3hx1R3IyHc8M=">AAAB+HicbVBNS8NAFHypX7V+NOrRy2IRKpSSiKDHohePFWwttCFsNpt26WYTdjdCDf0lXjwo4tWf4s1/47bNQVsHFoaZN7y3E6ScKe0431ZpbX1jc6u8XdnZ3duv2geHXZVkktAOSXgiewFWlDNBO5ppTnuppDgOOH0Ixjcz/+GRSsUSca8nKfViPBQsYgRrI/l2tR75bmMQJlo1Ij888+2a03TmQKvELUgNCrR9+8uESRZToQnHSvVdJ9VejqVmhNNpZZApmmIyxkPaN1TgmCovnx8+RadGCVGUSPOERnP1dyLHsVKTODCTMdYjtezNxP+8fqajKy9nIs00FWSxKMo40gmatYBCJinRfGIIJpKZWxEZYYmJNl1VTAnu8pdXSfe86TpN9+6i1rou6ijDMZxAHVy4hBbcQhs6QCCDZ3iFN+vJerHerY/FaMkqMkfwB9bnDy9Pkh8=</latexit>
Streaming kPCA
Phase 1: before the data arrives

Generate random feature functions (f1 , . . . , fd )


<latexit sha1_base64="pe05lwAVD7COl5M3hx1R3IyHc8M=">AAAB+HicbVBNS8NAFHypX7V+NOrRy2IRKpSSiKDHohePFWwttCFsNpt26WYTdjdCDf0lXjwo4tWf4s1/47bNQVsHFoaZN7y3E6ScKe0431ZpbX1jc6u8XdnZ3duv2geHXZVkktAOSXgiewFWlDNBO5ppTnuppDgOOH0Ixjcz/+GRSsUSca8nKfViPBQsYgRrI/l2tR75bmMQJlo1Ij888+2a03TmQKvELUgNCrR9+8uESRZToQnHSvVdJ9VejqVmhNNpZZApmmIyxkPaN1TgmCovnx8+RadGCVGUSPOERnP1dyLHsVKTODCTMdYjtezNxP+8fqajKy9nIs00FWSxKMo40gmatYBCJinRfGIIJpKZWxEZYYmJNl1VTAnu8pdXSfe86TpN9+6i1rou6ijDMZxAHVy4hBbcQhs6QCCDZ3iFN+vJerHerY/FaMkqMkfwB9bnDy9Pkh8=</latexit>
Streaming kPCA
Phase 1: before the data arrives

Generate random feature functions (f1 , . . . , fd )


<latexit sha1_base64="pe05lwAVD7COl5M3hx1R3IyHc8M=">AAAB+HicbVBNS8NAFHypX7V+NOrRy2IRKpSSiKDHohePFWwttCFsNpt26WYTdjdCDf0lXjwo4tWf4s1/47bNQVsHFoaZN7y3E6ScKe0431ZpbX1jc6u8XdnZ3duv2geHXZVkktAOSXgiewFWlDNBO5ppTnuppDgOOH0Ixjcz/+GRSsUSca8nKfViPBQsYgRrI/l2tR75bmMQJlo1Ij888+2a03TmQKvELUgNCrR9+8uESRZToQnHSvVdJ9VejqVmhNNpZZApmmIyxkPaN1TgmCovnx8+RadGCVGUSPOERnP1dyLHsVKTODCTMdYjtezNxP+8fqajKy9nIs00FWSxKMo40gmatYBCJinRfGIIJpKZWxEZYYmJNl1VTAnu8pdXSfe86TpN9+6i1rou6ijDMZxAHVy4hBbcQhs6QCCDZ3iFN+vJerHerY/FaMkqMkfwB9bnDy9Pkh8=</latexit>

When data arrives


Streaming kPCA
Phase 1: before the data arrives

Generate random feature functions (f1 , . . . , fd )


<latexit sha1_base64="pe05lwAVD7COl5M3hx1R3IyHc8M=">AAAB+HicbVBNS8NAFHypX7V+NOrRy2IRKpSSiKDHohePFWwttCFsNpt26WYTdjdCDf0lXjwo4tWf4s1/47bNQVsHFoaZN7y3E6ScKe0431ZpbX1jc6u8XdnZ3duv2geHXZVkktAOSXgiewFWlDNBO5ppTnuppDgOOH0Ixjcz/+GRSsUSca8nKfViPBQsYgRrI/l2tR75bmMQJlo1Ij888+2a03TmQKvELUgNCrR9+8uESRZToQnHSvVdJ9VejqVmhNNpZZApmmIyxkPaN1TgmCovnx8+RadGCVGUSPOERnP1dyLHsVKTODCTMdYjtezNxP+8fqajKy9nIs00FWSxKMo40gmatYBCJinRfGIIJpKZWxEZYYmJNl1VTAnu8pdXSfe86TpN9+6i1rou6ijDMZxAHVy4hBbcQhs6QCCDZ3iFN+vJerHerY/FaMkqMkfwB9bnDy9Pkh8=</latexit>

When data arrives

use random feature functions to map each data point to an approximate


feature vector
d
ai 2 R m
zi 2 R
<latexit sha1_base64="ELkqXXJkDg/hbur6F3/rKeEyaNs=">AAAB/XicbVDLSsNAFL2pr1pf8bFzM1gEVyURQZdFNy6r2Ac0MUymk3boZBJmJkINxV9x40IRt/6HO//GSduFth4YOJxzL/fMCVPOlHacb6u0tLyyulZer2xsbm3v2Lt7LZVkktAmSXgiOyFWlDNBm5ppTjuppDgOOW2Hw6vCbz9QqVgi7vQopX6M+4JFjGBtpMA+wAFDHhPIi7EehGF+O76PA7vq1JwJ0CJxZ6QKMzQC+8vrJSSLqdCEY6W6rpNqP8dSM8LpuOJliqaYDHGfdg0VOKbKzyfpx+jYKD0UJdI8odFE/b2R41ipURyaySKjmvcK8T+vm+nows+ZSDNNBZkeijKOdIKKKlCPSUo0HxmCiWQmKyIDLDHRprCKKcGd//IiaZ3WXKfm3pxV65ezOspwCEdwAi6cQx2uoQFNIPAIz/AKb9aT9WK9Wx/T0ZI129mHP7A+fwAB0ZTu</latexit>
<latexit sha1_base64="XbJM0smuD2SjMttetBBTYXRaAEA=">AAAB/XicbVDLSsNAFL3xWeurPnZuBovgqiQi6LLoxmUV+4Amhslk0g6dTMLMRGhD8VfcuFDErf/hzr9x2mahrQcGDufcyz1zgpQzpW3721paXlldWy9tlDe3tnd2K3v7LZVkktAmSXgiOwFWlDNBm5ppTjuppDgOOG0Hg+uJ336kUrFE3OthSr0Y9wSLGMHaSH7lcOQz5DKB3BjrfhDkd+OH0K9U7Zo9BVokTkGqUKDhV77cMCFZTIUmHCvVdexUezmWmhFOx2U3UzTFZIB7tGuowDFVXj5NP0YnRglRlEjzhEZT9fdGjmOlhnFgJicZ1bw3Ef/zupmOLr2ciTTTVJDZoSjjSCdoUgUKmaRE86EhmEhmsiLSxxITbQormxKc+S8vktZZzbFrzu15tX5V1FGCIziGU3DgAupwAw1oAoERPMMrvFlP1ov1bn3MRpesYucA/sD6/AEb7JT+</latexit>
Streaming kPCA
Phase 1: before the data arrives

Generate random feature functions (f1 , . . . , fd )


<latexit sha1_base64="pe05lwAVD7COl5M3hx1R3IyHc8M=">AAAB+HicbVBNS8NAFHypX7V+NOrRy2IRKpSSiKDHohePFWwttCFsNpt26WYTdjdCDf0lXjwo4tWf4s1/47bNQVsHFoaZN7y3E6ScKe0431ZpbX1jc6u8XdnZ3duv2geHXZVkktAOSXgiewFWlDNBO5ppTnuppDgOOH0Ixjcz/+GRSsUSca8nKfViPBQsYgRrI/l2tR75bmMQJlo1Ij888+2a03TmQKvELUgNCrR9+8uESRZToQnHSvVdJ9VejqVmhNNpZZApmmIyxkPaN1TgmCovnx8+RadGCVGUSPOERnP1dyLHsVKTODCTMdYjtezNxP+8fqajKy9nIs00FWSxKMo40gmatYBCJinRfGIIJpKZWxEZYYmJNl1VTAnu8pdXSfe86TpN9+6i1rou6ijDMZxAHVy4hBbcQhs6QCCDZ3iFN+vJerHerY/FaMkqMkfwB9bnDy9Pkh8=</latexit>

When data arrives

use random feature functions to map each data point to an approximate


feature vector
d
ai 2 R m
zi 2 R
<latexit sha1_base64="ELkqXXJkDg/hbur6F3/rKeEyaNs=">AAAB/XicbVDLSsNAFL2pr1pf8bFzM1gEVyURQZdFNy6r2Ac0MUymk3boZBJmJkINxV9x40IRt/6HO//GSduFth4YOJxzL/fMCVPOlHacb6u0tLyyulZer2xsbm3v2Lt7LZVkktAmSXgiOyFWlDNBm5ppTjuppDgOOW2Hw6vCbz9QqVgi7vQopX6M+4JFjGBtpMA+wAFDHhPIi7EehGF+O76PA7vq1JwJ0CJxZ6QKMzQC+8vrJSSLqdCEY6W6rpNqP8dSM8LpuOJliqaYDHGfdg0VOKbKzyfpx+jYKD0UJdI8odFE/b2R41ipURyaySKjmvcK8T+vm+nows+ZSDNNBZkeijKOdIKKKlCPSUo0HxmCiWQmKyIDLDHRprCKKcGd//IiaZ3WXKfm3pxV65ezOspwCEdwAi6cQx2uoQFNIPAIz/AKb9aT9WK9Wx/T0ZI129mHP7A+fwAB0ZTu</latexit>
<latexit sha1_base64="XbJM0smuD2SjMttetBBTYXRaAEA=">AAAB/XicbVDLSsNAFL3xWeurPnZuBovgqiQi6LLoxmUV+4Amhslk0g6dTMLMRGhD8VfcuFDErf/hzr9x2mahrQcGDufcyz1zgpQzpW3721paXlldWy9tlDe3tnd2K3v7LZVkktAmSXgiOwFWlDNBm5ppTjuppDgOOG0Hg+uJ336kUrFE3OthSr0Y9wSLGMHaSH7lcOQz5DKB3BjrfhDkd+OH0K9U7Zo9BVokTkGqUKDhV77cMCFZTIUmHCvVdexUezmWmhFOx2U3UzTFZIB7tGuowDFVXj5NP0YnRglRlEjzhEZT9fdGjmOlhnFgJicZ1bw3Ef/zupmOLr2ciTTTVJDZoSjjSCdoUgUKmaRE86EhmEhmsiLSxxITbQormxKc+S8vktZZzbFrzu15tX5V1FGCIziGU3DgAupwAw1oAoERPMMrvFlP1ov1bn3MRpesYucA/sD6/AEb7JT+</latexit>

Phase 2: feed feature vector to frequent directions algorithm


Streaming kPCA
Phase 1: before the data arrives

Generate random feature functions (f1 , . . . , fd )


<latexit sha1_base64="pe05lwAVD7COl5M3hx1R3IyHc8M=">AAAB+HicbVBNS8NAFHypX7V+NOrRy2IRKpSSiKDHohePFWwttCFsNpt26WYTdjdCDf0lXjwo4tWf4s1/47bNQVsHFoaZN7y3E6ScKe0431ZpbX1jc6u8XdnZ3duv2geHXZVkktAOSXgiewFWlDNBO5ppTnuppDgOOH0Ixjcz/+GRSsUSca8nKfViPBQsYgRrI/l2tR75bmMQJlo1Ij888+2a03TmQKvELUgNCrR9+8uESRZToQnHSvVdJ9VejqVmhNNpZZApmmIyxkPaN1TgmCovnx8+RadGCVGUSPOERnP1dyLHsVKTODCTMdYjtezNxP+8fqajKy9nIs00FWSxKMo40gmatYBCJinRfGIIJpKZWxEZYYmJNl1VTAnu8pdXSfe86TpN9+6i1rou6ijDMZxAHVy4hBbcQhs6QCCDZ3iFN+vJerHerY/FaMkqMkfwB9bnDy9Pkh8=</latexit>

When data arrives

use random feature functions to map each data point to an approximate


feature vector
d
ai 2 R m
zi 2 R
<latexit sha1_base64="ELkqXXJkDg/hbur6F3/rKeEyaNs=">AAAB/XicbVDLSsNAFL2pr1pf8bFzM1gEVyURQZdFNy6r2Ac0MUymk3boZBJmJkINxV9x40IRt/6HO//GSduFth4YOJxzL/fMCVPOlHacb6u0tLyyulZer2xsbm3v2Lt7LZVkktAmSXgiOyFWlDNBm5ppTjuppDgOOW2Hw6vCbz9QqVgi7vQopX6M+4JFjGBtpMA+wAFDHhPIi7EehGF+O76PA7vq1JwJ0CJxZ6QKMzQC+8vrJSSLqdCEY6W6rpNqP8dSM8LpuOJliqaYDHGfdg0VOKbKzyfpx+jYKD0UJdI8odFE/b2R41ipURyaySKjmvcK8T+vm+nows+ZSDNNBZkeijKOdIKKKlCPSUo0HxmCiWQmKyIDLDHRprCKKcGd//IiaZ3WXKfm3pxV65ezOspwCEdwAi6cQx2uoQFNIPAIz/AKb9aT9WK9Wx/T0ZI129mHP7A+fwAB0ZTu</latexit>
<latexit sha1_base64="XbJM0smuD2SjMttetBBTYXRaAEA=">AAAB/XicbVDLSsNAFL3xWeurPnZuBovgqiQi6LLoxmUV+4Amhslk0g6dTMLMRGhD8VfcuFDErf/hzr9x2mahrQcGDufcyz1zgpQzpW3721paXlldWy9tlDe3tnd2K3v7LZVkktAmSXgiOwFWlDNBm5ppTjuppDgOOG0Hg+uJ336kUrFE3OthSr0Y9wSLGMHaSH7lcOQz5DKB3BjrfhDkd+OH0K9U7Zo9BVokTkGqUKDhV77cMCFZTIUmHCvVdexUezmWmhFOx2U3UzTFZIB7tGuowDFVXj5NP0YnRglRlEjzhEZT9fdGjmOlhnFgJicZ1bw3Ef/zupmOLr2ciTTTVJDZoSjjSCdoUgUKmaRE86EhmEhmsiLSxxITbQormxKc+S8vktZZzbFrzu15tX5V1FGCIziGU3DgAupwAw1oAoERPMMrvFlP1ov1bn3MRpesYucA/sD6/AEb7JT+</latexit>

Phase 2: feed feature vector to frequent directions algorithm

computing an approximate set of singular vectors W 2 Rd⇥`


<latexit sha1_base64="0oWmhk7YZkcenrUm6gQs0/P8epA=">AAACCXicbVBNS8NAEN34WetX1KOXxSJ4KokIeix68VjFfkATy2YzaZduNmF3I5TQqxf/ihcPinj1H3jz37hpc9DWBwOP92aYmReknCntON/W0vLK6tp6ZaO6ubW9s2vv7bdVkkkKLZrwRHYDooAzAS3NNIduKoHEAYdOMLoq/M4DSMUScafHKfgxGQgWMUq0kfo27mCPCezFRA+DIL+d3Och9jSLQWEPOJ/07ZpTd6bAi8QtSQ2VaPbtLy9MaBaD0JQTpXquk2o/J1IzymFS9TIFKaEjMoCeoYKYVX4+/WSCj40S4iiRpoTGU/X3RE5ipcZxYDqLi9W8V4j/eb1MRxd+zkSaaRB0tijKONYJLmLBIZNANR8bQqhk5lZMh0QSqk14VROCO//yImmf1l2n7t6c1RqXZRwVdIiO0Aly0TlqoGvURC1E0SN6Rq/ozXqyXqx362PWumSVMwfoD6zPH++Zmdo=</latexit>
Streaming kPCA
Phase 1: before the data arrives

Generate random feature functions (f1 , . . . , fd )


<latexit sha1_base64="pe05lwAVD7COl5M3hx1R3IyHc8M=">AAAB+HicbVBNS8NAFHypX7V+NOrRy2IRKpSSiKDHohePFWwttCFsNpt26WYTdjdCDf0lXjwo4tWf4s1/47bNQVsHFoaZN7y3E6ScKe0431ZpbX1jc6u8XdnZ3duv2geHXZVkktAOSXgiewFWlDNBO5ppTnuppDgOOH0Ixjcz/+GRSsUSca8nKfViPBQsYgRrI/l2tR75bmMQJlo1Ij888+2a03TmQKvELUgNCrR9+8uESRZToQnHSvVdJ9VejqVmhNNpZZApmmIyxkPaN1TgmCovnx8+RadGCVGUSPOERnP1dyLHsVKTODCTMdYjtezNxP+8fqajKy9nIs00FWSxKMo40gmatYBCJinRfGIIJpKZWxEZYYmJNl1VTAnu8pdXSfe86TpN9+6i1rou6ijDMZxAHVy4hBbcQhs6QCCDZ3iFN+vJerHerY/FaMkqMkfwB9bnDy9Pkh8=</latexit>

When data arrives

use random feature functions to map each data point to an approximate


feature vector
d
ai 2 R m
zi 2 R
<latexit sha1_base64="ELkqXXJkDg/hbur6F3/rKeEyaNs=">AAAB/XicbVDLSsNAFL2pr1pf8bFzM1gEVyURQZdFNy6r2Ac0MUymk3boZBJmJkINxV9x40IRt/6HO//GSduFth4YOJxzL/fMCVPOlHacb6u0tLyyulZer2xsbm3v2Lt7LZVkktAmSXgiOyFWlDNBm5ppTjuppDgOOW2Hw6vCbz9QqVgi7vQopX6M+4JFjGBtpMA+wAFDHhPIi7EehGF+O76PA7vq1JwJ0CJxZ6QKMzQC+8vrJSSLqdCEY6W6rpNqP8dSM8LpuOJliqaYDHGfdg0VOKbKzyfpx+jYKD0UJdI8odFE/b2R41ipURyaySKjmvcK8T+vm+nows+ZSDNNBZkeijKOdIKKKlCPSUo0HxmCiWQmKyIDLDHRprCKKcGd//IiaZ3WXKfm3pxV65ezOspwCEdwAi6cQx2uoQFNIPAIz/AKb9aT9WK9Wx/T0ZI129mHP7A+fwAB0ZTu</latexit>
<latexit sha1_base64="XbJM0smuD2SjMttetBBTYXRaAEA=">AAAB/XicbVDLSsNAFL3xWeurPnZuBovgqiQi6LLoxmUV+4Amhslk0g6dTMLMRGhD8VfcuFDErf/hzr9x2mahrQcGDufcyz1zgpQzpW3721paXlldWy9tlDe3tnd2K3v7LZVkktAmSXgiOwFWlDNBm5ppTjuppDgOOG0Hg+uJ336kUrFE3OthSr0Y9wSLGMHaSH7lcOQz5DKB3BjrfhDkd+OH0K9U7Zo9BVokTkGqUKDhV77cMCFZTIUmHCvVdexUezmWmhFOx2U3UzTFZIB7tGuowDFVXj5NP0YnRglRlEjzhEZT9fdGjmOlhnFgJicZ1bw3Ef/zupmOLr2ciTTTVJDZoSjjSCdoUgUKmaRE86EhmEhmsiLSxxITbQormxKc+S8vktZZzbFrzu15tX5V1FGCIziGU3DgAupwAw1oAoERPMMrvFlP1ov1bn3MRpesYucA/sD6/AEb7JT+</latexit>

Phase 2: feed feature vector to frequent directions algorithm

computing an approximate set of singular vectors W 2 Rd⇥`


<latexit sha1_base64="0oWmhk7YZkcenrUm6gQs0/P8epA=">AAACCXicbVBNS8NAEN34WetX1KOXxSJ4KokIeix68VjFfkATy2YzaZduNmF3I5TQqxf/ihcPinj1H3jz37hpc9DWBwOP92aYmReknCntON/W0vLK6tp6ZaO6ubW9s2vv7bdVkkkKLZrwRHYDooAzAS3NNIduKoHEAYdOMLoq/M4DSMUScafHKfgxGQgWMUq0kfo27mCPCezFRA+DIL+d3Och9jSLQWEPOJ/07ZpTd6bAi8QtSQ2VaPbtLy9MaBaD0JQTpXquk2o/J1IzymFS9TIFKaEjMoCeoYKYVX4+/WSCj40S4iiRpoTGU/X3RE5ipcZxYDqLi9W8V4j/eb1MRxd+zkSaaRB0tijKONYJLmLBIZNANR8bQqhk5lZMh0QSqk14VROCO//yImmf1l2n7t6c1RqXZRwVdIiO0Aly0TlqoGvURC1E0SN6Rq/ozXqyXqx362PWumSVMwfoD6zPH++Zmdo=</latexit>

Ghashami, Perry, Phillips, Streaming kernel PCA, AISTATS, 2016


Streaming kPCA

Because `⌧d
<latexit sha1_base64="yhlXyNXtBQG5T9/rHfAOj+a9hw4=">AAAB8XicbVBNS8NAEJ3Ur1q/qh69LBbBU0lE0GPRi8cK9gObUDabSbt0swm7G6GU/gsvHhTx6r/x5r9x2+agrQ8GHu/NMDMvzATXxnW/ndLa+sbmVnm7srO7t39QPTxq6zRXDFssFanqhlSj4BJbhhuB3UwhTUKBnXB0O/M7T6g0T+WDGWcYJHQgecwZNVZ69FEI4tuK+tWaW3fnIKvEK0gNCjT71S8/SlmeoDRMUK17npuZYEKV4UzgtOLnGjPKRnSAPUslTVAHk/nFU3JmlYjEqbIlDZmrvycmNNF6nIS2M6FmqJe9mfif18tNfB1MuMxyg5ItFsW5ICYls/dJxBUyI8aWUKa4vZWwIVWUGRtSxYbgLb+8StoXdc+te/eXtcZNEUcZTuAUzsGDK2jAHTShBQwkPMMrvDnaeXHenY9Fa8kpZo7hD5zPH8RvkE8=</latexit>

` -dimensional row space of W encodes a lower dimensional subspace


<latexit sha1_base64="NiCNFQINCFG5T4iOY+JPxVaf5XA=">AAAB63icbVA9SwNBEJ2LXzF+RS1tFoNgFe5E0DJoYxnBxEByhL3NJFmyu3fs7gnhyF+wsVDE1j9k579xL7lCEx8MPN6bYWZelAhurO9/e6W19Y3NrfJ2ZWd3b/+genjUNnGqGbZYLGLdiahBwRW2LLcCO4lGKiOBj9HkNvcfn1AbHqsHO00wlHSk+JAzanOph0L0qzW/7s9BVklQkBoUaParX71BzFKJyjJBjekGfmLDjGrLmcBZpZcaTCib0BF2HVVUogmz+a0zcuaUARnG2pWyZK7+nsioNGYqI9cpqR2bZS8X//O6qR1ehxlXSWpRscWiYSqIjUn+OBlwjcyKqSOUae5uJWxMNWXWxVNxIQTLL6+S9kU98OvB/WWtcVPEUYYTOIVzCOAKGnAHTWgBgzE8wyu8edJ78d69j0VryStmjuEPvM8fDWaOOw==</latexit>
<latexit
Random Fourier feature
maps
For any positive definite shift invariant kernel k(x, y) = k(x
<latexit sha1_base64="g08mJemHIbiqQbYUE3mosN/i58I=">AAAB+HicbZDLSsNAFIZP6q3WS6Mu3QwWoQUtiQi6EYpuXFawF2hDmUyn7dDJJMxMxBj6JG5cKOLWR3Hn2zhts9DWHwY+/nMO58zvR5wp7TjfVm5ldW19I79Z2Nre2S3ae/tNFcaS0AYJeSjbPlaUM0EbmmlO25GkOPA5bfnjm2m99UClYqG410lEvQAPBRswgrWxenZxXH48SSroChk4TSo9u+RUnZnQMrgZlCBTvWd/dfshiQMqNOFYqY7rRNpLsdSMcDopdGNFI0zGeEg7BgUOqPLS2eETdGycPhqE0jyh0cz9PZHiQKkk8E1ngPVILdam5n+1TqwHl17KRBRrKsh80SDmSIdomgLqM0mJ5okBTCQztyIywhITbbIqmBDcxS8vQ/Os6jpV9+68VLvO4sjDIRxBGVy4gBrcQh0aQCCGZ3iFN+vJerHerY95a87KZg7gj6zPHyRxkXE=</latexit>
y)
Random Fourier feature
maps
For any positive definite shift invariant kernel k(x, y) = k(x
<latexit sha1_base64="g08mJemHIbiqQbYUE3mosN/i58I=">AAAB+HicbZDLSsNAFIZP6q3WS6Mu3QwWoQUtiQi6EYpuXFawF2hDmUyn7dDJJMxMxBj6JG5cKOLWR3Hn2zhts9DWHwY+/nMO58zvR5wp7TjfVm5ldW19I79Z2Nre2S3ae/tNFcaS0AYJeSjbPlaUM0EbmmlO25GkOPA5bfnjm2m99UClYqG410lEvQAPBRswgrWxenZxXH48SSroChk4TSo9u+RUnZnQMrgZlCBTvWd/dfshiQMqNOFYqY7rRNpLsdSMcDopdGNFI0zGeEg7BgUOqPLS2eETdGycPhqE0jyh0cz9PZHiQKkk8E1ngPVILdam5n+1TqwHl17KRBRrKsh80SDmSIdomgLqM0mJ5okBTCQztyIywhITbbIqmBDcxS8vQ/Os6jpV9+68VLvO4sjDIRxBGVy4gBrcQh0aQCCGZ3iFN+vJerHerY95a87KZg7gj6zPHyRxkXE=</latexit>
y)
T
Generate d functions fi (x) =
<latexit sha1_base64="uYCi7FOd4Sb0X9t5OdlR9mDMwV4=">AAACCXicbVDLSgMxFM3UV62vUZdugkVoEcqMCLoRim5cVugLOuOQSTNtaJIZkoy0lG7d+CtuXCji1j9w59+YtrPQ1gMXDufcy733hAmjSjvOt5VbWV1b38hvFra2d3b37P2DpopTiUkDxyyW7RApwqggDU01I+1EEsRDRlrh4Gbqtx6IVDQWdT1KiM9RT9CIYqSNFNgwCmhpWIZX0MOxKsmA3teH8BR6PcQ5Cmg5sItOxZkBLhM3I0WQoRbYX143xiknQmOGlOq4TqL9MZKaYkYmBS9VJEF4gHqkY6hAnCh/PPtkAk+M0oVRLE0JDWfq74kx4kqNeGg6OdJ9tehNxf+8TqqjS39MRZJqIvB8UZQyqGM4jQV2qSRYs5EhCEtqboW4jyTC2oRXMCG4iy8vk+ZZxXUq7t15sXqdxZEHR+AYlIALLkAV3IIaaAAMHsEzeAVv1pP1Yr1bH/PWnJXNHII/sD5/AHdSmEc=</latexit>
cos(ri x + i)
Random Fourier feature
maps
For any positive definite shift invariant kernel k(x, y) = k(x
<latexit sha1_base64="g08mJemHIbiqQbYUE3mosN/i58I=">AAAB+HicbZDLSsNAFIZP6q3WS6Mu3QwWoQUtiQi6EYpuXFawF2hDmUyn7dDJJMxMxBj6JG5cKOLWR3Hn2zhts9DWHwY+/nMO58zvR5wp7TjfVm5ldW19I79Z2Nre2S3ae/tNFcaS0AYJeSjbPlaUM0EbmmlO25GkOPA5bfnjm2m99UClYqG410lEvQAPBRswgrWxenZxXH48SSroChk4TSo9u+RUnZnQMrgZlCBTvWd/dfshiQMqNOFYqY7rRNpLsdSMcDopdGNFI0zGeEg7BgUOqPLS2eETdGycPhqE0jyh0cz9PZHiQKkk8E1ngPVILdam5n+1TqwHl17KRBRrKsh80SDmSIdomgLqM0mJ5okBTCQztyIywhITbbIqmBDcxS8vQ/Os6jpV9+68VLvO4sjDIRxBGVy4gBrcQh0aQCCGZ3iFN+vJerHerY95a87KZg7gj6zPHyRxkXE=</latexit>
y)
T
Generate d functions fi (x) =
<latexit sha1_base64="uYCi7FOd4Sb0X9t5OdlR9mDMwV4=">AAACCXicbVDLSgMxFM3UV62vUZdugkVoEcqMCLoRim5cVugLOuOQSTNtaJIZkoy0lG7d+CtuXCji1j9w59+YtrPQ1gMXDufcy733hAmjSjvOt5VbWV1b38hvFra2d3b37P2DpopTiUkDxyyW7RApwqggDU01I+1EEsRDRlrh4Gbqtx6IVDQWdT1KiM9RT9CIYqSNFNgwCmhpWIZX0MOxKsmA3teH8BR6PcQ5Cmg5sItOxZkBLhM3I0WQoRbYX143xiknQmOGlOq4TqL9MZKaYkYmBS9VJEF4gHqkY6hAnCh/PPtkAk+M0oVRLE0JDWfq74kx4kqNeGg6OdJ9tehNxf+8TqqjS39MRZJqIvB8UZQyqGM4jQV2qSRYs5EhCEtqboW4jyTC2oRXMCG4iy8vk+ZZxXUq7t15sXqdxZEHR+AYlIALLkAV3IIaaAAMHsEzeAVv1pP1Yr1bH/PWnJXNHII/sD5/AHdSmEc=</latexit>
cos(ri x + i)

where ri 2 R m
<latexit sha1_base64="z/mciukMIoSSxNZrEekMqSQmfjM=">AAAB/XicbVDLSsNAFL2pr1pf8bFzM1gEVyURQZdFNy6r2Ac0MUymk3boZBJmJkINxV9x40IRt/6HO//GSduFth4YOJxzL/fMCVPOlHacb6u0tLyyulZer2xsbm3v2Lt7LZVkktAmSXgiOyFWlDNBm5ppTjuppDgOOW2Hw6vCbz9QqVgi7vQopX6M+4JFjGBtpMA+kAFDHhPIi7EehGF+O76PA7vq1JwJ0CJxZ6QKMzQC+8vrJSSLqdCEY6W6rpNqP8dSM8LpuOJliqaYDHGfdg0VOKbKzyfpx+jYKD0UJdI8odFE/b2R41ipURyaySKjmvcK8T+vm+nows+ZSDNNBZkeijKOdIKKKlCPSUo0HxmCiWQmKyIDLDHRprCKKcGd//IiaZ3WXKfm3pxV65ezOspwCEdwAi6cQx2uoQFNIPAIz/AKb9aT9WK9Wx/T0ZI129mHP7A+fwAc2JT/</latexit>
is a uniform sample of the Fourier transform of k
Random Fourier feature
maps
For any positive definite shift invariant kernel k(x, y) = k(x
<latexit sha1_base64="g08mJemHIbiqQbYUE3mosN/i58I=">AAAB+HicbZDLSsNAFIZP6q3WS6Mu3QwWoQUtiQi6EYpuXFawF2hDmUyn7dDJJMxMxBj6JG5cKOLWR3Hn2zhts9DWHwY+/nMO58zvR5wp7TjfVm5ldW19I79Z2Nre2S3ae/tNFcaS0AYJeSjbPlaUM0EbmmlO25GkOPA5bfnjm2m99UClYqG410lEvQAPBRswgrWxenZxXH48SSroChk4TSo9u+RUnZnQMrgZlCBTvWd/dfshiQMqNOFYqY7rRNpLsdSMcDopdGNFI0zGeEg7BgUOqPLS2eETdGycPhqE0jyh0cz9PZHiQKkk8E1ngPVILdam5n+1TqwHl17KRBRrKsh80SDmSIdomgLqM0mJ5okBTCQztyIywhITbbIqmBDcxS8vQ/Os6jpV9+68VLvO4sjDIRxBGVy4gBrcQh0aQCCGZ3iFN+vJerHerY95a87KZg7gj6zPHyRxkXE=</latexit>
y)
T
Generate d functions fi (x) =
<latexit sha1_base64="uYCi7FOd4Sb0X9t5OdlR9mDMwV4=">AAACCXicbVDLSgMxFM3UV62vUZdugkVoEcqMCLoRim5cVugLOuOQSTNtaJIZkoy0lG7d+CtuXCji1j9w59+YtrPQ1gMXDufcy733hAmjSjvOt5VbWV1b38hvFra2d3b37P2DpopTiUkDxyyW7RApwqggDU01I+1EEsRDRlrh4Gbqtx6IVDQWdT1KiM9RT9CIYqSNFNgwCmhpWIZX0MOxKsmA3teH8BR6PcQ5Cmg5sItOxZkBLhM3I0WQoRbYX143xiknQmOGlOq4TqL9MZKaYkYmBS9VJEF4gHqkY6hAnCh/PPtkAk+M0oVRLE0JDWfq74kx4kqNeGg6OdJ9tehNxf+8TqqjS39MRZJqIvB8UZQyqGM4jQV2qSRYs5EhCEtqboW4jyTC2oRXMCG4iy8vk+ZZxXUq7t15sXqdxZEHR+AYlIALLkAV3IIaaAAMHsEzeAVv1pP1Yr1bH/PWnJXNHII/sD5/AHdSmEc=</latexit>
cos(ri x + i)

where ri 2 R m
<latexit sha1_base64="z/mciukMIoSSxNZrEekMqSQmfjM=">AAAB/XicbVDLSsNAFL2pr1pf8bFzM1gEVyURQZdFNy6r2Ac0MUymk3boZBJmJkINxV9x40IRt/6HO//GSduFth4YOJxzL/fMCVPOlHacb6u0tLyyulZer2xsbm3v2Lt7LZVkktAmSXgiOyFWlDNBm5ppTjuppDgOOW2Hw6vCbz9QqVgi7vQopX6M+4JFjGBtpMA+kAFDHhPIi7EehGF+O76PA7vq1JwJ0CJxZ6QKMzQC+8vrJSSLqdCEY6W6rpNqP8dSM8LpuOJliqaYDHGfdg0VOKbKzyfpx+jYKD0UJdI8odFE/b2R41ipURyaySKjmvcK8T+vm+nows+ZSDNNBZkeijKOdIKKKlCPSUo0HxmCiWQmKyIDLDHRprCKKcGd//IiaZ3WXKfm3pxV65ezOspwCEdwAi6cQx2uoQFNIPAIz/AKb9aT9WK9Wx/T0ZI129mHP7A+fwAc2JT/</latexit>
is a uniform sample of the Fourier transform of k

<latexit sha1_base64="0nDN44wl9Zv5x1bxp+eNbVWj77o=">AAACDHicbVDLSsNAFJ3UV62vqks3g0XoQkpSBF0W3bisYNpCE8rNdNIOnUnCzEQooR/gxl9x40IRt36AO//GSZuFth4YOJxzLnPvCRLOlLbtb6u0tr6xuVXeruzs7u0fVA+POipOJaEuiXksewEoyllEXc00p71EUhABp91gcpP73QcqFYujez1NqC9gFLGQEdBGGlRr3giEgAHDnmICewL0WIrMNZlZ3T5vegnzTcpu2HPgVeIUpIYKtAfVL28Yk1TQSBMOSvUdO9F+BlIzwums4qWKJkAmMKJ9QyMQVPnZ/JgZPjPKEIexNC/SeK7+nshAKDUVgUnmy6plLxf/8/qpDq/8jEVJqmlEFh+FKcc6xnkzeMgkJZpPDQEimdkVkzFIINr0VzElOMsnr5JOs+HYDefuota6LuoooxN0iurIQZeohW5RG7mIoEf0jF7Rm/VkvVjv1sciWrKKmWP0B9bnD8xbmsw=</latexit>
i ⇠ Unif(0, 2⇡]
Random Fourier feature
maps
For any positive definite shift invariant kernel k(x, y) = k(x
<latexit sha1_base64="g08mJemHIbiqQbYUE3mosN/i58I=">AAAB+HicbZDLSsNAFIZP6q3WS6Mu3QwWoQUtiQi6EYpuXFawF2hDmUyn7dDJJMxMxBj6JG5cKOLWR3Hn2zhts9DWHwY+/nMO58zvR5wp7TjfVm5ldW19I79Z2Nre2S3ae/tNFcaS0AYJeSjbPlaUM0EbmmlO25GkOPA5bfnjm2m99UClYqG410lEvQAPBRswgrWxenZxXH48SSroChk4TSo9u+RUnZnQMrgZlCBTvWd/dfshiQMqNOFYqY7rRNpLsdSMcDopdGNFI0zGeEg7BgUOqPLS2eETdGycPhqE0jyh0cz9PZHiQKkk8E1ngPVILdam5n+1TqwHl17KRBRrKsh80SDmSIdomgLqM0mJ5okBTCQztyIywhITbbIqmBDcxS8vQ/Os6jpV9+68VLvO4sjDIRxBGVy4gBrcQh0aQCCGZ3iFN+vJerHerY95a87KZg7gj6zPHyRxkXE=</latexit>
y)
T
Generate d functions fi (x) =
<latexit sha1_base64="uYCi7FOd4Sb0X9t5OdlR9mDMwV4=">AAACCXicbVDLSgMxFM3UV62vUZdugkVoEcqMCLoRim5cVugLOuOQSTNtaJIZkoy0lG7d+CtuXCji1j9w59+YtrPQ1gMXDufcy733hAmjSjvOt5VbWV1b38hvFra2d3b37P2DpopTiUkDxyyW7RApwqggDU01I+1EEsRDRlrh4Gbqtx6IVDQWdT1KiM9RT9CIYqSNFNgwCmhpWIZX0MOxKsmA3teH8BR6PcQ5Cmg5sItOxZkBLhM3I0WQoRbYX143xiknQmOGlOq4TqL9MZKaYkYmBS9VJEF4gHqkY6hAnCh/PPtkAk+M0oVRLE0JDWfq74kx4kqNeGg6OdJ9tehNxf+8TqqjS39MRZJqIvB8UZQyqGM4jQV2qSRYs5EhCEtqboW4jyTC2oRXMCG4iy8vk+ZZxXUq7t15sXqdxZEHR+AYlIALLkAV3IIaaAAMHsEzeAVv1pP1Yr1bH/PWnJXNHII/sD5/AHdSmEc=</latexit>
cos(ri x + i)

where ri 2 R m
<latexit sha1_base64="z/mciukMIoSSxNZrEekMqSQmfjM=">AAAB/XicbVDLSsNAFL2pr1pf8bFzM1gEVyURQZdFNy6r2Ac0MUymk3boZBJmJkINxV9x40IRt/6HO//GSduFth4YOJxzL/fMCVPOlHacb6u0tLyyulZer2xsbm3v2Lt7LZVkktAmSXgiOyFWlDNBm5ppTjuppDgOOW2Hw6vCbz9QqVgi7vQopX6M+4JFjGBtpMA+kAFDHhPIi7EehGF+O76PA7vq1JwJ0CJxZ6QKMzQC+8vrJSSLqdCEY6W6rpNqP8dSM8LpuOJliqaYDHGfdg0VOKbKzyfpx+jYKD0UJdI8odFE/b2R41ipURyaySKjmvcK8T+vm+nows+ZSDNNBZkeijKOdIKKKlCPSUo0HxmCiWQmKyIDLDHRprCKKcGd//IiaZ3WXKfm3pxV65ezOspwCEdwAi6cQx2uoQFNIPAIz/AKb9aT9WK9Wx/T0ZI129mHP7A+fwAc2JT/</latexit>
is a uniform sample of the Fourier transform of k

<latexit sha1_base64="0nDN44wl9Zv5x1bxp+eNbVWj77o=">AAACDHicbVDLSsNAFJ3UV62vqks3g0XoQkpSBF0W3bisYNpCE8rNdNIOnUnCzEQooR/gxl9x40IRt36AO//GSZuFth4YOJxzLnPvCRLOlLbtb6u0tr6xuVXeruzs7u0fVA+POipOJaEuiXksewEoyllEXc00p71EUhABp91gcpP73QcqFYujez1NqC9gFLGQEdBGGlRr3giEgAHDnmICewL0WIrMNZlZ3T5vegnzTcpu2HPgVeIUpIYKtAfVL28Yk1TQSBMOSvUdO9F+BlIzwums4qWKJkAmMKJ9QyMQVPnZ/JgZPjPKEIexNC/SeK7+nshAKDUVgUnmy6plLxf/8/qpDq/8jEVJqmlEFh+FKcc6xnkzeMgkJZpPDQEimdkVkzFIINr0VzElOMsnr5JOs+HYDefuota6LuoooxN0iurIQZeohW5RG7mIoEf0jF7Rm/VkvVjv1sciWrKKmWP0B9bnD8xbmsw=</latexit>
i ⇠ Unif(0, 2⇡]
p
z(x)i =
<latexit sha1_base64="Ujq2DjGM/mH4pAC7xXQNhiY2OYA=">AAACA3icbVDLSsNAFJ3UV62vqDvdDBahbmpSBN0IRTcuK9gHtCFMJpN26GQSZyZiDQU3/oobF4q49Sfc+TdO2iy09cCFwzn3cu89XsyoVJb1bRQWFpeWV4qrpbX1jc0tc3unJaNEYNLEEYtEx0OSMMpJU1HFSCcWBIUeI21veJn57TsiJI34jRrFxAlRn9OAYqS05Jp7D5X7I5fCc9iTt0KltWN/DAOXZqpZtqrWBHCe2DkpgxwN1/zq+RFOQsIVZkjKrm3FykmRUBQzMi71EklihIeoT7qachQS6aSTH8bwUCs+DCKhiys4UX9PpCiUchR6ujNEaiBnvUz8z+smKjhzUsrjRBGOp4uChEEVwSwQ6FNBsGIjTRAWVN8K8QAJhJWOraRDsGdfnietWtW2qvb1Sbl+kcdRBPvgAFSADU5BHVyBBmgCDB7BM3gFb8aT8WK8Gx/T1oKRz+yCPzA+fwCWWJYx</latexit>
2/dfi (x)
Complexity
2
d = O((1/✏ ) log n)
<latexit sha1_base64="A5xv37IZYuA5qFaNh5fDEtMinEQ=">AAACBXicbVC7SgNBFJ2Nrxhfq5ZaDAYhaeJuELQRgjZ2RjAPyK5hdnY2GTI7s8zMCmFJY+Ov2FgoYus/2Pk3Th6FJh64cDjnXu69J0gYVdpxvq3c0vLK6lp+vbCxubW9Y+/uNZVIJSYNLJiQ7QApwignDU01I+1EEhQHjLSCwdXYbz0Qqajgd3qYED9GPU4jipE2Utc+DOEFvCmV3BOPJIoywe+rZegx0YO83LWLTsWZAC4Sd0aKYIZ61/7yQoHTmHCNGVKq4zqJ9jMkNcWMjApeqkiC8AD1SMdQjmKi/GzyxQgeGyWEkZCmuIYT9fdEhmKlhnFgOmOk+2reG4v/eZ1UR+d+RnmSasLxdFGUMqgFHEcCQyoJ1mxoCMKSmlsh7iOJsDbBFUwI7vzLi6RZrbhOxb09LdYuZ3HkwQE4AiXggjNQA9egDhoAg0fwDF7Bm/VkvVjv1se0NWfNZvbBH1ifP7Rolic=</latexit>

<latexit sha1_base64="W5BzBzpftMOnL9zHHfACZCqQyas=">AAAB/XicbVDLSsNAFJ3UV62v+Ni5GSxC3dREBN0IRTfurGAf0IQymd60QyeTMDMRaij+ihsXirj1P9z5N07bLLR64MLhnHu5954g4Uxpx/myCguLS8srxdXS2vrG5pa9vdNUcSopNGjMY9kOiALOBDQ00xzaiQQSBRxawfBq4rfuQSoWizs9SsCPSF+wkFGijdS19zzgHF/gm4p77EGiGI/FUdcuO1VnCvyXuDkpoxz1rv3p9WKaRiA05USpjusk2s+I1IxyGJe8VEFC6JD0oWOoIBEoP5teP8aHRunhMJamhMZT9edERiKlRlFgOiOiB2rem4j/eZ1Uh+d+xkSSahB0tihMOdYxnkSBe0wC1XxkCKGSmVsxHRBJqDaBlUwI7vzLf0nzpOo6Vff2tFy7zOMoon10gCrIRWeohq5RHTUQRQ/oCb2gV+vRerberPdZa8HKZ3bRL1gf313Bk98=</latexit>
` = O(1/✏)

Space: O((m log n)/✏2 + (log n)/✏3 )


<latexit sha1_base64="7ChuLVMfoAgvHdo3V9TTq6zMKjs=">AAACF3icbVDJSgNBEO2JW4xb1KOXxiBMEMaZKOgx6MWbEcwCyRh6Op2kSS9Dd48QhvyFF3/FiwdFvOrNv7GzHDTxQcHjvSqq6kUxo9r4/reTWVpeWV3Lruc2Nre2d/K7ezUtE4VJFUsmVSNCmjAqSNVQw0gjVgTxiJF6NLga+/UHojSV4s4MYxJy1BO0SzEyVmrnvRvX5bDFZA+K4kmLxJoyKe5L8Bi6C+ppsZ0v+J4/AVwkwYwUwAyVdv6r1ZE44UQYzJDWzcCPTZgiZShmZJRrJZrECA9QjzQtFYgTHaaTv0bwyCod2JXKljBwov6eSBHXesgj28mR6et5byz+5zUT070IUyrixBCBp4u6CYNGwnFIsEMVwYYNLUFYUXsrxH2kEDY2ypwNIZh/eZHUSl7ge8HtWaF8OYsjCw7AIXBBAM5BGVyDCqgCDB7BM3gFb86T8+K8Ox/T1owzm9kHf+B8/gDLlp00</latexit>

Train time: O((n log n)/✏2 + (n log n)/✏3 )


<latexit sha1_base64="gLDt3cGUvwVVfnKtCLBdb8LBtm0=">AAACGXicbVDLSgMxFM3UV62vUZdugkVoEepMFXRZdOPOCvYBnbFk0kwbmkmGJCOUob/hxl9x40IRl7ryb0zbWWjtgQuHc+7l3nuCmFGlHefbyi0tr6yu5dcLG5tb2zv27l5TiURi0sCCCdkOkCKMctLQVDPSjiVBUcBIKxheTfzWA5GKCn6nRzHxI9TnNKQYaSN1beemVOLQY6IPefnEI7GiTPD7KjyGi/TTctcuOhVnCvifuBkpggz1rv3p9QROIsI1ZkipjuvE2k+R1BQzMi54iSIxwkPUJx1DOYqI8tPpZ2N4ZJQeDIU0xTWcqr8nUhQpNYoC0xkhPVDz3kRc5HUSHV74KeVxognHs0VhwqAWcBIT7FFJsGYjQxCW1NwK8QBJhLUJs2BCcOdf/k+a1YrrVNzbs2LtMosjDw7AISgBF5yDGrgGddAAGDyCZ/AK3qwn68V6tz5mrTkrm9kHf2B9/QAKop3X</latexit>

2
Test time: O((m + 1/✏)/✏ log n) <latexit sha1_base64="btbTW2diLSRtRM1qXcm9PvAE8yQ=">AAACDHicbVDLSgMxFM3UV62vqks3wSK0CHWmCLosunFnBfuAzlgyadqG5jEkGaEM/QA3/oobF4q49QPc+Tem7SDaeiBwOOdcbu4JI0a1cd0vJ7O0vLK6ll3PbWxube/kd/caWsYKkzqWTKpWiDRhVJC6oYaRVqQI4iEjzXB4OfGb90RpKsWtGUUk4KgvaI9iZKzUyReui0V+7J34JNKUSVH6YXcV6DPZh6JkU27ZnQIuEi8lBZCi1sl/+l2JY06EwQxp3fbcyAQJUoZiRsY5P9YkQniI+qRtqUCc6CCZHjOGR1bpwp5U9gkDp+rviQRxrUc8tEmOzEDPexPxP68dm955kFARxYYIPFvUixk0Ek6agV2qCDZsZAnCitq/QjxACmFj+8vZErz5kxdJo1L23LJ3c1qoXqR1ZMEBOARF4IEzUAVXoAbqAIMH8ARewKvz6Dw7b877LJpx0pl98AfOxzf1X5mp</latexit>

You might also like