Professional Documents
Culture Documents
Chapter 4
Chapter 4
M L
D M
Chapter 4
Non-Parameter Estimation
Introduction
Parzen Windows
K-Nearest-Neighbor Estimation
Classification Techniques
The Nearest-Neighbor rule(1-NN)
The Nearest-Neighbor rule(k-NN)
Distance Metrics
p( x | i ) p(i )
P(i | x)
p ( x)
To compute the posterior probability, we need to
know the prior probability and the likelihood.
Case I: p( x | i ) has certain parametric form
Maximum-Likelihood Estimation
Bayesian Parameter Estimation
Problems:
The assumed parametric form may not fit the ground-
truth density encountered in practice, e.g., assumed
parametric form: unimodal; ground-truth: multimodal
Xin-Shun Xu @ SDU School of Computer Science and Technology, Shandong University 3
Non-Parameter Estimation MIMA
p ( x | i )
P(i | x)
p(x)VR PR
x+
Given n examples (i.i.d.) {x1, x2,…, xn}, let K R
denote the random variable representing
number of samples falling into R, K will take
Binomial distribution:
n k
K ~ B (n, PR ) P( K k ) PR (1 PR ) n k
k n samples
Xin-Shun Xu @ SDU School of Computer Science and Technology, Shandong University 6
Density Estimation MIMA
p(x)VR PR
x+
p(x)VR PR R
E[ K ] / n
p ( x)
E[ K ] nPR VR
Let kR denote the actual kR / n
p ( x)
number of samples in R VR n samples
Xin-Shun Xu @ SDU School of Computer Science and Technology, Shandong University 7
Density Estimation MIMA
kR / n
p ( x)
VR
Use subscript n to take sample size into account
kn / n
pn ( x)
Vn
We hope that: lim pn ( x) p ( x)
n
lim k n
n
lim k n / n 0
n
n samples
Xin-Shun Xu @ SDU School of Computer Science and Technology, Shandong University 8
Density Estimation MIMA
1
It defines a unit hypercube 1
centered at the origin.
x 1 | x | hn / 2
hn
hn 0 otherwise
hn
hn
Xin-Shun Xu @ SDU School of Computer Science and Technology, Shandong University 11
Window function MIMA
x x 1 | x j xj | hn / 2
hn 0 otherwise
1 means that x’ falls within the hypercube of
volume vn centered at x.
kn: # samples inside the hypercube
centered at x, x hn
n
x xi hn
k n hn
i 1 hn
1 n 1 x xi
pn (x)
n i 1 Vn hn
1 n 1 x xi 1 n 1 x xi 1 n
n dx dx u du
i 1 Vn hn n i 1 Vn hn n i 1
1 n 1 x xi
pn (x)
n i 1 Vn hn
1 x 1 n
n (x) pn ( x) n ( x - x i )
Vn hn n i 1
The effect of hn
1 x 1 x
n (x) d
Vn hn hn hn
1 x 1 n
n (x) pn ( x) n ( x - x i )
Vn hn n i 1
1 x 1 n
n (x) pn ( x) n ( x - x i )
Vn hn n i 1
Convergence conditions
To ensure convergence, i.e.,
d
lim (u) ui 0 lim nVn
||u|| n
i 1
1 u 2 / 2
(u) e
2
1 n 1 x xi
pn (x)
n i 1 hn hn
hn h1 / n
1 u 2 / 2
(u) e
2
1 n 1 x xi
pn (x)
n i 1 hn hn
hn h1 / n
1 n 1 x xi
pn (x) 2
n i 1 hn hn
hn h1 / n
Vn V1 / n
The value of initial volume V1 is important.
In some cases, a cell volume is proper for one
region but unsuitable in a different region.
kn / n
pn ( x)
Vn
Fix kn and then determine Vn
To estimate p(x), we can center a cell about x
and let it grow until it captures kn samples,
kn is some specified function of n, e.g.,
lim k n
kn n
n
Principled rule to
choose kn lim Vn 0
n
Pn(i|x)=?
p n ( x , i ) ki
Pn (i | x) c
kn
p ( x, )
j 1
n j
ki / n
x
p n ( x , i )
Vn
c
kn / n
j 1
p n ( xn , j )
Vn
Xin-Shun Xu @ SDU School of Computer Science and Technology, Shandong University 28
Estimation of A Posterior probability MIMA
Pn(i|x)=?
p n ( x , i ) ki
Pn (i | x) c
kn
p ( x, )
j 1
n j
ki / n
p n ( x , i )
Vn The value of Vn or kn can be
c determined base on Parzen window
kn / n
j 1
p n ( xn , j )
Vn
or kn-nearest-neighbor technique.
Decision Boundaries
The voronoi diagram
Given a set of points, a Voronoi diagram describes
the areas that are nearest to any given point.
These areas can be viewed as zones of control.
Decision boundary is
formed by only retaining
these line segment
separating different classes.
The more training examples
we have stored, the more
complex the decision
boundaries can become.
Validation can be viewed as another name for testing, but the name testing is
typically reserved for final evaluation purpose, whereas validation is mostly used
for model selection purpose.
Nonnegativity
D(a, b) 0
Reflexivity
D(a, b) 0 iff a b
Symmetry
D(a, b) D(b, a)
Triangle Inequality
D(a, b) D(b, c) D(a, c)
2. L2 norm
1/ 2
d
2
Euclidean L2 (a, b) || a b ||2 | ai bi |
distance i 1
3. L norm
1/
Chessboard L (a, b) || a b || | ai bi |
d
max(| ai bi |)
distance i 1
Xin-Shun Xu @ SDU School of Computer Science and Technology, Shandong University 47
Minkowski Metric (Lp Norm) MIMA
1/ p
d
p
|| x || p | x |
i 1
1. L1 norm
d
Manhattan or city L1 (a, b) || a b ||1 | ai bi |
block distance i 1
2. L2 norm
1/ 2
d
2
Euclidean L2 (a, b) || a b ||2 | ai bi |
distance i 1
3. L norm
1/
Chessboard L (a, b) || a b || | ai bi |
d
max(| ai bi |)
distance i 1
Xin-Shun Xu @ SDU School of Computer Science and Technology, Shandong University 48
Summary MIMA
Parzen Windows
Fix Vn and then determine kn
1 n 1 x xi
pn (x)
n i 1 Vn hn
kn-Nearest-Neighbor
Fix kn and then determine Vn
Any Question?