Download as pdf or txt
Download as pdf or txt
You are on page 1of 37

Image Processing Primer

David Doria
daviddoria@gmail.com

Tuesday 16th December, 2008

Contents

1 Prerequisites 5

2 What is Image Processing? 5

3 Color Models 5
3.1 Additive . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5
3.1.1 RGB . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5
3.1.2 HSV . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6
3.2 Subtractive . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7
3.2.1 CMYK . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7

4 Image Basics 8
4.1 Pixel . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8
4.2 Binary Image . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8
4.3 Greyscale Image . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8
4.4 Color Image . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8
4.5 Pseudocolor . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8

5 Thresholding 8
5.1 Thresholding by Clustering . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8
5.2 Otsu’s Method . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8
5.2.1 Details . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9

6 Filtering Terminology 9

7 Morphological Operations 9
7.1 Erosion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9
7.1.1 Idea . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9
7.1.2 Procedure . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9
7.1.3 Example . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10
7.2 Dilation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10
7.2.1 Idea . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10
7.2.2 Procedure . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10
7.2.3 Example . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10

1
7.3 Combined Operations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10
7.3.1 Opening . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10
7.3.2 Closing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11
7.4 Thickening . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11
7.4.1 Procedure . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11
7.4.2 Example . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12
7.5 Thinning . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12
7.5.1 Procedure . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13
7.5.2 Example . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13

8 Edge Detectors/Filters 13
8.1 Differential Family . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14
8.1.1 Prewitt . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14
8.1.2 Sobel . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15
8.1.3 Canny . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15
8.2 Zero Crossing Family . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16
8.2.1 Laplacian . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16
8.2.2 Laplacian of Gaussian . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16
8.2.3 Example . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16

9 Corner Detection 17
9.1 Hit or Miss Transform . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17
9.1.1 Idea . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17
9.1.2 Procedure . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17
9.1.3 Example . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17
9.2 Harris Corner Detector . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17
9.2.1 Details . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17

10 Point Operations 18
10.1 Histogram . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18
10.2 Contrast Stretching . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18
10.2.1 Idea . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18
10.2.2 Procedure . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18
10.3 Histogram Equalization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19
10.3.1 Idea . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19
10.3.2 Procedure . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19

11 Image Enhancement 20
11.1 Point Spread Function . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20
11.2 Deconvolution . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20
11.3 Sharpening . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21
11.3.1 Edge Enhancement . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21
11.3.2 Unsharp Masking . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21

12 Algebraic Operations 21
12.1 Motion (Change) Detection . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21
12.2 Noise Removal by Image Averaging . . . . . . . . . . . . . . . . . . . . . . . . . . . 21

2
13 Image Noise 21
13.1 Salt and Pepper Noise . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21
13.2 Example . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22
13.3 Gaussian Noise . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22
13.4 Example . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22

14 Spatial Transformations 22
14.1 Averaging Filter . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22
14.2 Median Filter . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22

15 Geometric Operations 23
15.1 Grey-Level Interopolation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23
15.1.1 Nearest Neighbor (Zero Order) Interpolation . . . . . . . . . . . . . . . . . . 23
15.1.2 Bilinear Interpolation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23
15.1.3 Bicubic Interpolation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23
15.2 Translation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24
15.2.1 Example . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24
15.3 Scaling . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24
15.3.1 Example . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24
15.4 Rotation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24
15.4.1 Example . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24
15.4.2 Details . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24
15.5 Compound Transformations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24
15.5.1 Example . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24

16 Transforms 25
16.1 Hough Transform . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25
16.1.1 Idea . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25
16.1.2 Procedure . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25
16.1.3 Example . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25
16.1.4 Details . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26
16.2 Skeletonization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27
16.2.1 Idea . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27
16.2.2 Procedure . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27
16.2.3 Example . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 28
16.3 Medial Axis Transform . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29
16.3.1 Example . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29
16.4 Distance Transform . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30
16.5 Radon Transform . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31
16.5.1 Example . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31
16.6 Fourier Slice Theorem . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31
16.6.1 Procedure . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31
16.7 Fourier Transform . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 32
16.7.1 Idea . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 32
16.7.2 Procedure . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 32
16.7.3 Example . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 32
16.8 Discrete Cosine Transform (DCT) . . . . . . . . . . . . . . . . . . . . . . . . . . . . 33
16.8.1 Idea . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 33
16.8.2 Procedure . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 33

3
16.8.3 Example . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 33
16.9 Hadamard Transform . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 33
16.10KL, Principal Component, Hotelling, and Eigenvector Transforms . . . . . . . . . . 33

17 Compression 34
17.1 Lossless . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 34
17.1.1 Run Length Encoding . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 34
17.1.2 Lemple Ziv Coding . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 34
17.1.3 Huffman Coding . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 34
17.2 Lossy . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 35
17.2.1 Transform Coding . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 35
17.3 Theoretical Compression Limit . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 35

18 Segmentation 35
18.1 Idea . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 35
18.2 Connected Components Labeling . . . . . . . . . . . . . . . . . . . . . . . . . . . . 35
18.2.1 Procedure . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 35
18.3 Region Growing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 35
18.3.1 Procedure . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 35
18.4 Region Splitting and Merging . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 36
18.5 Watershed Segmentation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 36

19 JPEG 37

20 TV Standards 37
20.1 Color . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 37
20.2 NTSC . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 37
20.3 PAL . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 37

4
1 Prerequisites
This paper assumes a working knowledge of Pattern Recognition, Probability, and basic Linear
Algebra

2 What is Image Processing?


3 Color Models
There are two ways to represent a color C. The first is to ask “Which colors of light would I have to
combine to produce C?”. This is called the additive color model. The second way is to ask “Which
colors of ink should be combined so that they absorb the colors I DONT want from incident white
light?”. This is called the subtractive color model.

3.1 Additive

Figure 1: Additive Color

3.1.1 RGB
Red, green, and blue are the three basic colors. By combining these three colors of light, any color
can be produced. R, G, and B are specified as relative amounts, which describe how much of each
color to combine (eg. [1, 0, 0 ] is pure red, [1, 1, 0] means to combine red and green in equal
quantities, etc.). These combinations can be represented as a cube.

Figure 2: RGB Cube

5
3.1.2 HSV
H, S, and V correspond to the hue, saturation, and value of the color. These values describe colors
as points in a cylinder with the following properties.

• The angle around the axis corresponds to hue. This is the actual “color” of the color.

• The distance from the axis corresponds to saturation. Near the axis, a color could be called
“light blue”, where at the edge of the cylinder, it may be called “dark blue”.

• The central axis ranges from black at the bottom to white at the top. The distance along the
axis corresponds to value (also called lightness or brightness (HSL, HSV).

Figure 3: HSV Cylinder

6
3.2 Subtractive

Figure 4: Subtractive Color

Describes what kinds of inks need to be applied to a white background so that the reflected light
produces a particular color.

3.2.1 CMYK
Cyan, Magenta, Yellow, and blacK. With these four colors of ink any color can be produced. Since
these colors are the exact inverse of the additive color model, the two systems can be interchanged
with      
C 1 R
M  = 1 − G
Y 1 B
Black is not needed in theory, CMY should color the entire range of possible colors. However,
in practice, it is much better to use a fourth color, black. Some reasons are as follows:

• It is cheaper to apply 1 ink (black) than 3 inks (CMY)

• The paper gets wet if too much ink is applied, which often happens when C, M, and Y are
applied. This is inefficient because it adds drying time to the printing process.

• Text is often black. Since text requires very fine detail, it should be easy to produce this
detail in black. If it was produced with CMY, the C, M, and Y print heads would have to be
very accuratly aligned, which is much more difficult than simply using a fourth ink.

7
4 Image Basics
4.1 Pixel
The A single element of a digital image. The word pixel is short for Picture (abbreviated “pix”)
Element (“el”).

4.2 Binary Image


A greyscale image is a two dimensional array of binary pixels. If the value is 0, the pixel is black.
If the value is 1, the pixel is white.

4.3 Greyscale Image


A greyscale image is a two dimensional array of values indicating the brightness at each point.
The brightness values are generally stored as a value between 0 (black) and 255 (white). Values
inbetween are different shades of grey.

4.4 Color Image


A color image can be viewed in two equivalent ways. The first is as a two dimensional array of
pixels, just like a greyscale image, but instead of a brightness value, each pixel has a specific color
given by an (R,G,B) triple. The alternative view is that the image is composed of three separate
2D arrays of pixels (one for red, one for green, and one for blue), where each element in the three
arrays contains the amount of only of the layer color present in the image at that point. Each of
these 2D arrays is called a layer. When the layers are overlayed, the color image is produced.

4.5 Pseudocolor
A map can be produced from a grey level to a color spectrum. This can help to visually identify
different intensities more easily.

5 Thresholding
Chose a value below which an action will be taken, and above which a different action will be taken.

5.1 Thresholding by Clustering


Use K-Means with K = 2 to try to divide the histogram values into foreground and background
pixels (white and black).

5.2 Otsu’s Method


This method is generally used to reduce a greyscale image to a binary image. In this case, pixels
above the threshold will be set to white, and pixels below the threshold will be set to black. Otsu’s
method produces the optimal threshold because it separates the pixels so that their within-class
variance is minimized.

8
5.2.1 Details
The within class variance is defined as the weighted sum of the variances of each cluster.

σwithin (T ) = nB (T )σB2 (T ) + nW (T )σW


2
(T )
P −1
where T is the threshold and nB (T ) = Ti=0 p(i).
This is very computationally expensive. An equivalent method is to maximize between class
variance; the variance of the combined distribution minus the within class variance:

σbetween = σ 2 − σwithin
2
(T )

This can be expressed as

σbetween = nB (T )nW (T ) [µB (T ) − µW (T )]2

This expresion is very simple, only involving the means of each cluster weighted by the number
of points in each cluster. Otsu’s method computes this quantity iteratively with T instead of
computing the entire expression at each iteration which makes it very fast.

6 Filtering Terminology
a X
X b
g(x, y) = w(s, t)f (x + s, y + t)
s=−a t=−b
m−1 n−1
where a = 2
and b = 2
where the filter is m by n.

7 Morphological Operations
A morphological operation is one which changes the shape of a region in an image. Morphological
operations operate on binary images. Because of this, we can define a “background” pixel to be
black, and a “foreground” pixel to be white. This type of operation is used to remove extraneous
information before further processing.
These operations require the notion of the “neighborhood” of a pixel. A structuring element (S)
is a small grid of pixels (possibly 3x3, 5x5, etc.). In procedures besides morphological operations,
this can be called a kernel or a filter. We require that a filter be ODDxODD size, so that it is very
easy to center the filter at a particular pixel.

7.1 Erosion
7.1.1 Idea
Areas of foreground pixels shrink, holes within foreground areas grow. This is commonly used to
separate touching objects so that they can be counted.

7.1.2 Procedure
For each pixel p in the image, if S centered at p overlaps ONLY foreground pixels, p is labeled a
foreground pixel. This makes objects smaller. It can break a single object into multiple objects.

9
7.1.3 Example

Figure 5: Erosion

7.2 Dilation
7.2.1 Idea
Gradually enlarge the boundaries of regions of foreground pixels. This makes objects larger. It can
merge multiple objects into one, and remove holes inside objects. Dilation can be used for edge
detection. The dilated image is subtracted from the original, resulting in the edge image.

7.2.2 Procedure
For each pixel p, if S centered at p overlaps ANY foreground pixels, p is labeled a foreground pixel.

7.2.3 Example

Figure 6: Dilation

7.3 Combined Operations


7.3.1 Opening
Idea

Remove small islands and thin filaments of object pixels. The effect is basically the same as
erosion, but less destructive. The effect can be visualized by “rolling” the structuring element
around the inner boundary of the foreground regions.

Procedure

Erosion followed by dilation.

10
Example

Figure 7: Opening

7.3.2 Closing
Idea

Remove islands and thin filaments of background pixels. The effect can be visualized by “rolling”
the structuring element around the outer boundary of the foreground regions.

Procedure

Dilation followed by erosion.

Example

Figure 8: Closing

7.4 Thickening
Uses:

• Finding the convex hull of an object.

7.4.1 Procedure
The hit-and-miss transform is added to the original image.

11
7.4.2 Example

Figure 9: Original Image

Figure 10: Thickened Image

7.5 Thinning
Uses:
• Reduce the output of an edge detector to single pixel wide edges while preserving the length
of the edges.

12
• Produce a skeleton.

7.5.1 Procedure
The hit-and-miss transform is subtracted from the original image. Consider all pixels on the bound-
aries of foreground regions (i.e. foreground points that have at least one background neighbor).
Delete any such point that has more than one foreground neighbor, as long as doing so does not
locally disconnect (i.e. split into two) the region containing that pixel. Iterate until convergence.

7.5.2 Example

Figure 11: Thinning

8 Edge Detectors/Filters
The goal is to find edges in the image. An edge is a place in the image with a strong intensity
contrast. This significantly reduces the amount of data (goes from millions of pixels to only hundreds
of edge pixels). This filtering of nonimportant information while at the same time preserving
important structural properties is generally a very helpful first step in the analysis of an image.
Edges are often used in segmentation because they generally occur at natural object boundaries.

Figure 12: Original Image

13
Figure 13: Edge Image

This is generally performed by creating a mask (aka. kernel or filter) which outputs the desired
information (usually something about the gradient). The constant in front of the mask is simply
to normalize the value to [0, 1].

8.1 Differential Family


Looks for values in first derivative above a threshold. If you use a vertical
p and horizontal mask pair,
then the absolute magnitude of the gradient can be calculated with x2 + y 2 .

8.1.1 Prewitt
The gradient in the x direction is approximated by

[f (x + 1) − f (x)] + [f (x) − f (x − 1)] f (x + 1) − f (x − 1)


=
2 2
This is the average change to the right and the left of the current pixel.
Therefore, if we multiply the pixel to the right of p by 1, and the pixel to the left of p by -1,
sum the results, and divide by 2, we have an approximation of the gradient.

Simple Vertical Edge Detector

By applying this filter at each pixel


1 
−1 0 1
2
we get an approximation of the gradient at that pixel. If the gradient is high, we can infer there is
an edge at this location. However, to be more robust to noise, this measurement is averaged over
the other dimesion (ie. while looking for horizontal edges, average the gradient in the y direction).
This is done simply by stacking several (-1 0 1) filters together, and normalizing by the sum of the
filter coefficients.

14
Vertical Edge Detector

 
−1 0 1
1
−1 0 1
6
−1 0 1
1
The factor of 6
is because sum of the filter coefficients is 6.

Horizontal Edge Detector

This is exactly the same concept as the vertical edge detector.


 
−1 −1 −1
1
0 0 0
6
1 1 1

8.1.2 Sobel
This is a Prewitt filter but with approximated Gaussian averaging instead of linear averaging over the
dimension that is not being differentiated. Since 3x3 filters are shown, the Gaussian approximation
is very bad ([1 2 1] does not look very Gaussian!), but if a 20x20 filter was used, the filter coeffients
should resemble a sampled Gaussian function.

Vertical Edge Detector

 
−1 0 1
1
−2 0 2
8
−1 0 1

Horizontal Edge Detector

 
−1 −2 −1
1
0 0 0
8
1 2 1

8.1.3 Canny
The Canny edge detector generally gives the best results. This comes at the cost of significantly
more complexity. First, the image is blurred with a Gaussian kernel. Then Sobel edge detection is
performed. After that, pixels that were labeled as edge pixesl are partitioned into horizontal, right
diagonal, vertical, or left diagonal edges. Double thresholding is done on each of these partitions.
This means that a value below a bottom threshold is definitly not an edge, and a value above the
top threshold definitly is an edge. The pixels between the thresholds are checked to see if there is a
path between adjacent pixels which have the same edge orientation and were labeled as “definitely
an edge”. This has the effect of connecting the edges - which overcomes the main drawback of most
of the other types of edge detection.

15
8.2 Zero Crossing Family
8.2.1 Laplacian
The second derivative in x can be approximated with

[f (x + 1) − f (x)] − [f (x) − f (x − 1)] = f (x + 1) − 2f (x) + f (x − 1)

This can be read “the difference between A and B” where A is the difference between the current
pixel and the one to the right of it, and B is the difference between the current pixel and the one
to the left of it.
Combine this with the second derivative in y to obtain
 
0 1 0
1
1 −4 1
8
0 1 0
or
 
1 1 1
1 
1 −8 1
16
1 1 1

8.2.2 Laplacian of Gaussian


Blur the image using a Gaussian kernel. Then use the Laplacian kernel. The blurring is to prevent
responses to noise instead of an actual edge. Another method to achieve the same result is to first
convolve the Laplacian filter with the Gaussian filter, and then apply the result to the image.

x2 + y 2 − x2 +y2 2
 
1
LoG(x, y) = − 4 1 − e 2σ
πσ 2σ 2

8.2.3 Example

Figure 14: Laplacian Of Gaussian Kernel

16
9 Corner Detection
9.1 Hit or Miss Transform
9.1.1 Idea
Find sections of the image that identically match a particular foreground/background configuration.

9.1.2 Procedure
A filter is created that contains 1’s and 0’s as its coefficients. We run this filter over the image,
looking for positions where the foreground/background configuration of the mask exactly matches
that of the image. We can detect four different corner orientations using the four filters below.

Figure 15: Hit Or Miss Corner Filters

9.1.3 Example

Figure 16: Hit Or Miss Corner Detection

9.2 Harris Corner Detector


Looks for corners in an image by examining the autocorrelation function in a neighborhood of each
pixel.

9.2.1 Details
The autocorrelation function is given by
X
c(x, y) = [I(xi , yi ) − I(xi + ∆x, yi + ∆y)]2
w

where w is some choice of window function.


After approximating the shifted
 image function with a first order Taylor polynomial, c(x, y) can
∆x
be written as [∆x∆y]H(x, y) . The matrix H captures the intensity structure of the local
∆y
neighborhood. The eigenvalues of H are considered. There are three cases.

17
1. Both λ1 and λ2 are small. This means the autocorrelation function is flat, and therefore there
are no edges or corners.

2. One of the eigenvalues is large. This means shifts in one direction cause little change in the
autocorrelation function. This means that there is an edge present.

3. Both eigenvalues are large. This means that shifts in both directions cause large changes in
the autocorrelation function. This means that there is a corner present.

10 Point Operations
This is also called contrast enhancement, contrast stretching, or grayscale transformation. Each
pixel can be changed in grey level and only proportionally to the old grey level.

10.1 Histogram
Represents the relative frequency of occurence of the various gray levels in an image.

Figure 17: Histogram

10.2 Contrast Stretching


10.2.1 Idea
Improve the contrast in an image by stretching the range of intensity values it contains. This is
done with a linear scaling function. This can help to bring out detail in an image which has mostly
light pixels or mostly dark pixels.

10.2.2 Procedure
 
b−a
Pout = (Pin − c) +a
d−c
a is the lower limit after the stretching (usually 0) and b is the upper limit (usually 255). c is the
lowest intensity value in the input image, and d if the highest intensity value.
Outlier rejection is very important. c and d can instead be chosen by first discarding the lower
and upper five percent of the input histogram and then selecting the lowest and highest values.

18
10.3 Histogram Equalization
10.3.1 Idea
Produce an output histogram with a uniform distribution of intensities. This is essentially contrast
stretching with a nonlinear function.

10.3.2 Procedure
We try to find a transformation T that maps grey levels to different grey levels such that the result
is spread uniformly over the entire range of grey levels. We construct T by looking at the cumulative
distribution function of the image intensities (the integral of the histogram).
i
X
c(i) = p(xj )
j=0

The transformation is simply yi = c(i). Becareful to scale the output histogram so that it is in
the desired range. This is done with a simple normalization.

Figure 18: Histogram Equalization

Figure 19: Unequalized Image

19
Figure 20: Equalized Image

11 Image Enhancement
11.1 Point Spread Function
The point spread function (PSF) is the response of the camera to a point source of light. If a point
of light exists in the world, we would like to to occupy a very small region (maybe only a pixel) in
the image. However, optical systems are not perfect, and this point of light usually get blurred into
many pixels in the image. Often, a Gaussian function is a good approximation to this blurring.

Figure 21: Point Spread Function

11.2 Deconvolution
If the PSF is known, we can multiply by the inverse filter in the frequency domain (Weiner filtering)
to remove the effect of the blur.

20
11.3 Sharpening
11.3.1 Edge Enhancement
Idea

Boosting the high frequency content in an image to make its edges appear sharper.

Procedure

Filter the image with an edge filter and then add the edge image to the original image.

11.3.2 Unsharp Masking


Idea

Boost the high frequency content in an image to make its edges appear sharper.

Procedure

First, blur the image. Then, subtract the blurred image from the original. Add the difference
back to the original.

12 Algebraic Operations
12.1 Motion (Change) Detection
The goal is to identify the set of pixels that are “significantly different” between two images. These
pixels are called the “change mask”. It is not assumed that the two pictures are taken at the
same time, so the change mask should not include nuisance forms of change such as changes in
illumination.
The simplest method is known as “simple differencing”. The two images are subtracted, and if
the difference in any pixel is greater than a threshold τ , the pixel is labeled as a changed pixel.

12.2 Noise Removal by Image Averaging


The assumption is that the noise is random. If we can align multiple images of the same scene, we
can then simply average the value at pixel (i, j) in each images to obtain an image with less noise
(the noise is “averaged out”).

13 Image Noise
Just as in signal processing, unwanted noise is often present in an image. The two main noise
models are described in the following.

13.1 Salt and Pepper Noise


This type of noise is modeled as randomly occuring white or black pixels.

21
13.2 Example

Figure 22: Salt and Pepper Noise

This type of noise can be removed with a median filter (described below).

13.3 Gaussian Noise


Noise in the more usual sense, where each pixel value is perturbed by a Gaussian random variable.

13.4 Example

Figure 23: Gaussian Noise

14 Spatial Transformations
Performed on a local neighborhood of image pixels. Generally the image is convolved with a small
(ie. 3x3 or 5x5) filter (aka mask or kernel).

14.1 Averaging Filter


The value at each pixel p is set to the average of the pixels which overlap the mask centered at p.
This has a smoothing effect, and is often called a low pass filter.

14.2 Median Filter


Each pixel is given the median value of the pixels in some neighborhood. This has a smoothing
effect. This is an excellent way to remove salt and pepper noise.

22
15 Geometric Operations
Geometric operations are applied globally to an image. Typical transformations include scaling,
rotation, and reflection.

15.1 Grey-Level Interopolation


When a geometric operation is applied to an image, each pixel in the input image does not necessarily
map to an integer pixel in the output image. Therefore, we must choose a method for calculating
the intensity for each pixel in the new image.

15.1.1 Nearest Neighbor (Zero Order) Interpolation


Use the grey value of the nearest pixel. This is very computationally easy, but the resulting image
may be very blocky.

Example

Figure 24: Nearest Neighbor Interpolation

15.1.2 Bilinear Interpolation


Interpolate the values of the 4 closest pixels to the desired output pixel and assign the new pixel
this value. These 4 pixels can be weighted based on their distance to the desired location. This is
more computationally expensive, but produces better results than Nearest Neighbor Interpolation.

Example

Figure 25: Bilinear Interpolation

15.1.3 Bicubic Interpolation


Interpolate the values of the 16 closest pixels to the desired output pixel and assign the new pixel
this value. These 16 pixels are weighted based on their distance to the desired location. This is more
computationally expensive, but produces better results than Bilinear Interpolation. This method
is the best trade off between quality and computational expense.

Example

Figure 26: Bicubic Interpolation

23
15.2 Translation
To translate the image, multiply by the following matrix.
    
xnew 1 0 x0 x
 ynew  = 0 1 y0  y 
1 0 0 1 1

15.2.1 Example
!!!

15.3 Scaling
  1  
xnew c
0 0 x
 ynew  =  0 1 0 y 
d
1 0 0 1 1

15.3.1 Example

Figure 27: Scaled Image

15.4 Rotation
    
xnew cos(θ) −sin(θ) 0 x
 ynew  = sin(θ) cos(θ) 0 y 
1 0 0 1 1

15.4.1 Example

Figure 28: Rotated Image

15.4.2 Details
This is derived by drawing a coordinate axes, and a new coordinate axes rotated by θ.

x0 = x cos(θ) + y sin(θ)
y 0 = −x sin(θ) + y cos(θ)

15.5 Compound Transformations


To rotate around a point other than the origin, first translate the image so the point you want to
rotate around is at the origin. Then rotate the image by the desired amount. Then perform the
inverse translation (translate the image back to where it was originally).

15.5.1 Example
!!!

24
16 Transforms
16.1 Hough Transform
16.1.1 Idea
The Hough transform is generally used to find lines in an image, although it can be easily modified
to find other simple objects. It is very robust to noise and discontinuities in the appearance of
object which is being searched for. The transform does not make any hard decisions about the
location of the object, but rather provides a grid of values in the parameter space (which could
be interpreted as a probability density if normalized) which indicate the likelihood of the object
appearing in the form described by the parameters.
The idea is that each point in the edge image could possibly have come from an infinite number
of lines.

16.1.2 Procedure
First, use an edge detector. We increment each cell in the accumulator space that each point could
have come from. Cells with very high counts indicate that many points in the image fell on the
same line, indicating that there is infact a line in the image. Since there are infinite lines which go
through each point in the image, to implement this tranform we must pick an angular resolution
and solve for the corresponding r value of the line that goes through the point for each angle in our
now finite set.
Generally, the transform is thresholded to find the maxima.

16.1.3 Example

Figure 29: Original Image

25
Figure 30: Edge Image

Figure 31: Hough Transform

The bright regions in this image represent the points in the parameter space which the most edge
pixels contributed to. These points correspond to all of the lines in the original image.

16.1.4 Details
The parametric form of a line
x cos(θ) + y cos(θ) = r

26
is generally used to prevent numerical problems (division by zero) in the case of vertical lines.
The accumulator space is generally θ vs r where θ is the angle from the origin to the line
perpendicular to the line in question. r is the perpendicular distance to the line from the origin.

Figure 32: Hough Transform Line Parameterization

16.2 Skeletonization
16.2.1 Idea
Remove most of the foreground pixels while preserving the size and connectedness of the original
image.

16.2.2 Procedure
There are two ways to produce a skeleton.

• Perform successive erosions without changing the connectedness.

• Find all circles of any size that are tangent to the boundary in at least two places. The centers
of these circles are the skeleton.

27
Figure 33: Skeletonization With Bi-Tangent Circles

16.2.3 Example

Figure 34: OriginalImage

28
Figure 35: Skeleton

16.3 Medial Axis Transform


This is skeletonization, but rather than producing a binary image of the skeleton, a greyscale image
is produced with the intensity of each pixel on the skeleton representing the distance to a boundary
in the original image.

16.3.1 Example

Figure 36: OriginalImage

29
Figure 37: Medial Axis Transform

16.4 Distance Transform


Produce an image that looks like the original image, but instead of binary pixels, each foreground
pixel p takes the greyvalue of the distance from p to the closest boundary.

Figure 38: Original Image

30
Figure 39: Distance Transform

16.5 Radon Transform


The Radon Space is the paramaterized the same way as the Hough Space (with an angle and a
perpendicular distance). However in the Radon Transform, rather than incrementing many accu-
mulator bins for each point in the image, you simply take the line integral over the line given by
the current parameter pair and assign that value to the location (θ, r).

16.5.1 Example
!!!

16.6 Fourier Slice Theorem


Reconstruct an area by collecting projections onto multiple lines.

16.6.1 Procedure
We shoot rays (usually x-rays) through the object at every translation of a fixed angle. Each ray
is integrating the absorbtion properties of the object (the line integral) along the ray. This set of
line integral values form a 1D function (the projection onto the line perpendicular to the current
angle). The Fourier transform of this function is a single slice of the full 2D Fourier Transform
of the object. We rotate the angle and repeat the process multiple times at the desired angular
resolution. After the multiple 1D Fourier Transforms have been assembled, we can take the inverse
2D Fourier Transform to obtain an image of the object.

31
16.7 Fourier Transform
16.7.1 Idea
Decompose an image into a sum of complex exponential functions. This is a direct extension of the
1D DFT into 2D.

16.7.2 Procedure
M −1 N −1
1 XX ux vy
F (u, v) = f (x, y)e−j2π( M + N )
M N x=0 y=0
M −1 N −1
X X ux vy
f (x, y) = F (u, v)ej2π( M + N )
u=0 v=0

16.7.3 Example

Figure 40: Original Image

32
Figure 41: Fourier Transform

16.8 Discrete Cosine Transform (DCT)


16.8.1 Idea
Just like Fourier transform, but the output image is real valued. The DCT is also much faster than
the DFT. It is often used in compression - the very high frequency components can be discarded,
resulting in less information to store.

16.8.2 Procedure
N −1 N −1    
X X (2i + 1)kπ) (2j + 1)nπ
C(k, n) = α(k, n) f (i, j) cos cos
i=0 j=0
2N 2N
(
1
N
fork, n = 0
where α(k, n) = 2
N
fork, n = 1, 2, ..., N − 1

16.8.3 Example
!!!

16.9 Hadamard Transform


Matrix of 1’s and -1’s. Square basis functions instead of sinusoidal!

16.10 KL, Principal Component, Hotelling, and Eigenvector Trans-


forms
These are all the same. The new basis is an orthogonal set of eigenvectors.

33
17 Compression
Compression is a way to represent information in a more compact way, for a variety of reasons
including storage capacity restrictions, or transfer rate requirements. There are two main divisions
of compression techniques, lossless and lossy, which are described below.
To measure how much compression has been achieved, we consider the compression ratio:
num bits before
R=
num bits after
An entire field, information theory, is dedicated to precisely defining what is meant by “infor-
mation”. Because of this, it is wise to leave the word “information” at the door when talking about
compression. Instead, we consider the data to be compressed as a list of “symbols”. Sometimes the
system is binary, in which case the symbols are ’0’ and ’1’. The symbols could also be the digits
0-9, or the letters a-z.

17.1 Lossless
Compression techniques from which the original image can be recovered exactly.

17.1.1 Run Length Encoding


Used for images of few grey levels. The coding is line by line. It stores the grey level and how many
adjacent pixels are the same level.

Example

!!!

17.1.2 Lemple Ziv Coding


Single symbols are assigned a code and placed in a table. When a string not already in the table
occurs, it is stored in the table along with the code assigned to it.

Example

!!!

17.1.3 Huffman Coding


Huffman coding is an excellent compression method, however it requires prior knowledge of the rate
of occurance of each symbol.

Procedure

Build a tree by joining the two symbols with the lowest probability of occurance. Label the top
branch 0 and the bottom branch 1. Continue this until a single character remains. The code for
each character can then be read from the tree directly.

34
Example

!!!

17.2 Lossy
Compression techniques from which the original image can not be recovered.

17.2.1 Transform Coding


One of many transforms (DFT, DCT, Eigenvector, etc) can be used to discard the least important
information in the image. This loss of information clearly results in compession, but at the cost of
irreversible compression.

17.3 Theoretical Compression Limit


Called the Shannon limit for “amount of possible maximum compression”.

18 Segmentation
18.1 Idea
Group pixels into regions based on connectedness.

18.2 Connected Components Labeling


18.2.1 Procedure
Scanning an image, pixel-by-pixel (from top to bottom and left to right) in order to identify con-
nected pixel regions, i.e. regions of adjacent pixels which share the same set of intensity values
V.
However, for the following we assume binary input images and 8-connectivity. Scans the image
by moving along a row until it comes to a point p (where p denotes the pixel to be labeled at
any stage in the scanning process) that is white. It examines the four neighbors of p which have
already been encountered in the scan (i.e. the left, top, and two upper diagonals). Based on this
information, the labeling of p occurs as follows:
If all four neighbors are 0, assign a new label to p
If only one neighbor is white, assign its label to p
If more than one of the neighbors is white, assign one of the labels to p

Example

!!!

18.3 Region Growing


18.3.1 Procedure
Start with a seed pixel. Check the neighboring pixels and add them to the region if they are similar
to the seed. Repeat for each newly added pixel. Stop if no more pixels can be added.

35
This is better than any type of histogram segmentation because it considers the connectedness
of the greylevels, not just the similarity of intensity.
Does not work well with heavily textured images because the grey levels vary too quickly.
Seed point selection is extremely important, and is heavily application dependent. As an ex-
ample, if the application is to segment lit regions in an image from dark regions, a point from the
highest range of the histogram may be selected as a seed point.

Example

!!!

18.4 Region Splitting and Merging


Treat the entire image as a region. Decide if all pixels contained in the region satisfy a similarity
constraint. If they do, then this is labeled as a region. If not, split the region into four equal
sub-regions and perform the same test on each sub-region.
This arbitrary choice of how to subdivide a non-uniform region often results in adjacent regions
which should not be separate. To remedy this, a merging test is performed after each split to decide
if adjacent regions should be re-combined.

Example

!!!

18.5 Watershed Segmentation


We view a greyscale image as a topographic surface. We then “flood” the surface starting at
its minima. We prevent the merging of the waters coming from different sources. The result is
segmentation regions called “catchment basins”. The diving lines between the basins are called the
“watersheds”, meaning the points at which water would run down if poured from above (peaks of
a mountain ridge).
An alternate procedure is to find the downstream path from each pixel to a local minimum. A
catchment basin is then defined as the set of pixels for which their downstream paths end up at the
same minimum.

Figure 42: 1D Watershed Segmentation

36
19 JPEG
1. Decompose the RGB image into YCC form.

2. Optionally downsample the color information.

3. Split the image into 8x8 blocks.

4. Perform the DCT on each block.

5. Huffman the DCT coefficients in a zig zag pattern.

20 TV Standards
20.1 Color
Backwards compatability is obtained by sending luminance (brigtness) information in the same
place that the black and white signal used to be. The chrominance information is added to the
signal.

20.2 NTSC
US 525 (496 visible, the rest are closed captioning, synchronization info, and vertical retrace) lines,
30 (29.97) (30/1.001) fps, 4:3 aspect ratio (to be compatible with early film) interlaced - drawn in
two “fields”, so effective frame rate is 60hz. This matches the 60hz AC on the power lines which
avoids interference which produces rolling bars.

20.3 PAL
European 625 lines (576 visible) 25fps European power grid is 50hz.

37

You might also like