Chazelle

You might also like

Download as ppt, pdf, or txt
Download as ppt, pdf, or txt
You are on page 1of 61

So Much Data

So Little Time

Bernard Chazelle
Princeton University
So Many Slides
So Little Time
(before lunch)

Bernard Chazelle
Princeton University
math
algorithms experimentation

computation
Computers have two problems
1. They don’t have steering wheels
2. End of Moore’s Law

party’s over !
algorithms experimentation

computation
This is not me

32
x 17
224
32
= 544
FFT

RSA
unevenly
priced noisy

big

low entropy uncertain


unevenly
priced noisy

big

low entropy uncertain


Sloan Digital
Sky Survey
4 petabytes
(~1MG)

10 petabytes/yr
Biomedical imaging

150 petabytes/yr
My A(9,9)-th paper

Collected works of Micha Sharir


massive
output
input

Sample tiny fraction


Sublinear
Sublinear
Algorithms
Algorithms
Shortest Paths [C-Liu-Magen ’03]

New York

Delphi
Ray Shooting

 Volume
 Intersection
 Point location
Approximate MST [C-Rubinfeld-
Trevisan ’01]
Reduces to counting connected components
E  = no. connected components

var 
2
<< (no. connected components)

whp,  is a good estimator


of # connected components
input space

worst case

average case (uniform)


worst case
average case = actuarial view
“ OK, if you elect NOT to have the surgery,
the insurance company offers 6 days
and 7 nights in Barbados. “
arbitrary, unknown random source

Self-Improving
Self-Improving
Algorithms
Algorithms
Yes ! This could be YOU, too !
time T1
time T2
time T3
time T4

E Tk  Optimal expected
time for random source
Clustering [ Ailon-C-Liu-Comandur ’05 ]

K-median over Hamming cube


minimize sum of distances
minimize sum of distances
[[ Kumar-Sabharwal-Sen
Kumar-Sabharwal-Sen ’04
’04 ]]

COST  ( 1 + ) OPT
How to achieve linear limiting time?

dn
dn
Input space {0,1}

Identify core

Use
Use KSS
KSS

Tail: prob < O(dn)/KSS


Store
Store sample
sample of
of
precomputed
precomputed KSS
KSS

Nearest
Nearest neighbor
neighbor
Incremental algorithm
Incremental algorithm
Main difficulty: How to spot the tail?
encode
decode
Data inaccessible before noise

What makes you


think it’s wrong?
Data inaccessible before noise

must satisfy some property


(eg, convex, bipartite)
but does not quite
f(x) = ? f = access function

x
data

f(x)
f(x) = ? f = access function

f(x)
f(x) = ?

f(x)

But life being what it is…


f(x) = ?

f(x)

Humans O( )

Define distance from any object to data class


f(x) = ?

filter
x x1, x2,…

g(x) f(x1), f(x2),…

g is access function for:


Online
Online Data
Data
Reconstructio
Reconstructio
nn
d
Monotone function: [n] R
Filter requires polylog (n) lookups

[ Ailon-C-Liu-Comandur ’04 ]
Convex polygon

Filter requires : lookups

[C-Comandur ’06 ]
Convex terrain

Filter requires :
lookups
Iterated planar separator theorem
Iterated planar separator theorem
Iterated (weak) planar separator theorem
in sublinear time!
reconstruct

Using epsilon-nets in spaces of unbounded VC dimension


bipartite graph
k-connectivity
expander
denoising low-dim attractor sets
10 1111 110101
01100010
00 00 0011
01 0
11 0 0 1o1100100
0111
001

Priced
Priced computation
computation
&& accuracy
accuracy

 spectrometry/cloning/gene chip
 Linear programming
 PCR/hybridization/chromatography
 gel electrophoresis/blotting
Factoring
aussian mixtureissample:
easy.00100101001001101010101
Here’s why… ….

Pricing
Pricing data
data
Collaborators: Nir Ailon, Seshadri Comandur, Ding Liu
Avner Magen, Ronitt Rubinfeld, Luca Trevisan

You might also like