Vector Quantization Vector Quantization

You might also like

Download as pdf or txt
Download as pdf or txt
You are on page 1of 31

Vector Quantization

Vector quantization

Aim and principle -What is a vector quantizer Distance measures Lloyds Lloyd s principle and algorithm Initialization

Aim of Vector Quantization


To reduce the size of the database


N features P V e c t o r s Q V e c t o r s N features

Q<P

Principles of vector quantization


Project a continuous input space on a discrete output space

Principles of vector quantization


To define zones in the space, the set of points contained in each zone being projected on a representative vector (centroid) Example: 2-dimensional spaces

What is a vector quantizer ?


Vector quantizer =

1. A codebook (set of centroids, or codewords) m = {yj , 1 j Q } 2. A quantization function q d xi q(xi ) = yj Usually: q defined by the nearest neighbor rule (according ( g to some distance measure) )

Two-Dimensional Minimum Distortion Partition. The four circles are the code words of f a two-dimensional di i l codebook. d b k The Voronoi regions are the quadrants containing the circles.

The xs were p produced by ya training sequence of twelve two-dimensional Gaussian vectors. Each input vector is mapped into the nearest nearest-neighbor neighbor codeword, that is, the circle in the in the same quadrant. X = training vectors o = code words Pi = region i encoded d di into t codeword i

Distance measures

Least square error

r-norm error

Distance measures

Weighted least square error

If W = I least square error If W = -1 where is the covariance matrix of the inputs -1 = Mahalanobis distance

If W is symmetrical: W = PTP

Lloyds principle

3 properties: ti

1. The first one gives the best encoder, once the decoder is known 2. The second one gives the best decoder, once the encoder is known 3 There is no point on the borders between Vorono regions 3. (probability = 0) .

O ti l quantizer: Optimal ti properties ti are necessary, y, but not sufficient:

Lloyd: property #1

For a given decoder , the best encoder is given by:

nearest-neighbor rule .

Lloyd: property #2

For a given encoder , the best decoder is given by:

center-of-gravity rule.

Lloyd: property #3

The probability to find a point xi on a border

(between Vorono regions) is zero !

Lloyds algorithm
1. Choice of an initial codebook. 2. All points xi are encoded; EVQ is evaluated. 3. If EVQ is small enough, then stop. 4. All centroids yj are replaced by the center-of-gravity of the data xi associated to yj in step 2. 5. Back to step 2.

Lloyd: example
1. Initialization of the code boook

. X : data d t
i

yj :centroids

Lloyd: example
2. Encoding (nearest-neighbor)

. x : data d t
i

yj :centroids

Lloyd: example
4. Decoding (center-of-gravity)

Lloyd: example
2. Encoding (nearest-neighbor) new borders

Lloyd: example
4. Decoding (center-of-gravity) new positions of centroids

Lloyd: example
2. Encoding (nearest-neighbor) new borders

Lloyd: example
4. Decoding (center-of-gravity) new positions of centroids

Lloyd: example
2. Encoding (nearest-neighbor) new borders

Lloyds algorithm: the names


Lloyds algorithm Generalized Lloyds algorithm Linde- Buzzo-Gray (LBG) algorithm K K-means ISODATA All based on the same p principle. p

Lloyds algorithm: properties


The codebook is modified only after the presentation of the whole dataset . Th mean square error (EVQ) decreases The d at each h iteration i i . The risk of getting trapped in local minima is high . The final quantizer depends on the initial one

How to initialize Lloyds algorithm ?


1.

Randomly y in the input p space p

2. The Q first data points xi


Use the first 2R vectors (code book vectors) in the training sequence as the initial codebook.

3. Q randomly chosen data points xi


select several widely spaced words from the training sequence. This approach is sometimes called random code generation.

How to initialize Lloyds algorithm ? (Cont.)


4. Product codes: the product of scalar quantizers

Collection of codebooks Ci, each consisting of Mi vectors. Then the product codebook C is defined as the collection of all Mp possible concatenations of m words drawn successively y from the m codebooks Ci.
5 Growing 5.

initial set:

a first centroid y1 is randomly chosen (in the data set) a second d centroid t id y2 is i randomly d l chosen h (in (i the th data d t set); t) if d(y1, y2) > threshold, y2 is kept a third centroid y3 is randomly chosen (in the data set); if d(y1, y3) > threshold and d(y2, y3) > threshold y3 is kept

How to initialize Lloyds algorithm ? (Cont.)


6. pairwise p nearest neighbor: g a first codebook is built with all data points xi the two centroids nearest one from another are merged (center-of gravity) Variant: the increase of distortion ( EVQ) is evaluated for the merge of each pair of centroids yj; the pair giving the lowest increase is merged

How to initialize Lloyds algorithm ?


7. Splitting a first centroid y1 is randomly chosen a second centroid y1 + is created; Lloyd algorithm is applied to the new codebook two new centroids are created by perturbing the two existing ones; Lloyd algorithm is applied to this 4 4centroids codebook

Splitting

Reference

You might also like