Convolutional Neural Networks in Computer Vision: Jochen Lang

You might also like

Download as pdf or txt
Download as pdf or txt
You are on page 1of 94

Convolutional Neural Networks in Computer

Vision
Jochen Lang

jlang@uottawa.ca

Faculté de génie | Faculty of Engineering


Jochen Lang, EECS
jlang@uOttawa.ca
Convolution

• Image processing and filtering


– Point processing
– Linear filters
– Correlation and convolution
• Convolutional neural networks (CNNs)
– “Classic layers”: Convolutional, pooling and fully-
connected layers
– Visualizing CNNs

Jochen Lang, EECS


jlang@uOttawa.ca
Image Processing

• A common class of image processing methods can be


characterized as processing individual pixels one pixel at
a time or more generally a single point in the image.
• Other methods work on multiple neighboring pixels at a
time (filtering) while others change the geometric
relationship between the pixel and the image
(geometric transformations).
• We use point processing often in pre-processing our
images before learning

Many slides by Alexei Efros

Jochen Lang, EECS


jlang@uOttawa.ca
What is an image?

• We can think of an image as a function, , from to :


gives the intensity at position
– Realistically, we expect the image only to be defined
over a rectangle, with a finite range:

• A color image is just three functions pasted together.


We can write this as a “vector-valued” function:

Source: Alexei Efros


Jochen Lang, EECS
jlang@uOttawa.ca
Images as functions

Source:
Alexei
Efros

Jochen Lang, EECS


jlang@eecs.uOttawa.ca
Imaging Process

Richard Szeliski, Computer Vision: Algorithms and Applications, 2010.

Jochen Lang, EECS


jlang@eecs.uOttawa.ca
Gamma Correction

• The individual pixel values are modified with an


exponential function (power law) with

Jochen Lang, EECS


jlang@uOttawa.ca
Contrast Enhancement

• The current pixel values are remapped to other values


stretching and compressing some ranges

Jochen Lang, EECS


jlang@uOttawa.ca
Image Histogram

• Build a histogram of all pixel values in an image.


• Image histograms can inform pixel processing.
• Sometime (e.g. in histogram equalization) we like to
obtain the cumulative histogram or distribution.

where is the cumulative distribution and the


current histogram bin or distribution. are the number of
pixels in the image

Jochen Lang, EECS


jlang@uOttawa.ca
RGB Histogram

Jochen Lang, EECS


jlang@uOttawa.ca
Histogram Equalization

• Derives a mapping for pixel values with the goal to have


a flat histogram
– Calculate a cumulative distribution of pixel values
– Pixel values are remapped based on the CDF

• Global histogram equalization (over the whole image)


may produce a flat appearance and local variants may
produce more attractive looking images

Jochen Lang, EECS


jlang@uOttawa.ca
Global vs Local Histogram
Equalization

Jochen Lang, EECS


jlang@eecs.uOttawa.ca
Limitations of Point Processing

• Q: What happens if I reshuffle all pixels within the


image?

• A: It’s histogram won’t change. No point processing will


be affected…

Source: Alexei Efros

Jochen Lang, EECS


jlang@uOttawa.ca
Sampling and Reconstruction

Source: Alexei Efros

Jochen Lang, EECS


jlang@eecs.uOttawa.ca
Sampled Representations

• Values at image pixels can be viewed as samples of a


continuous function.
• Reconstruction may be needed, e.g., in 1D:

[FvDFH fig.14.14b / Wolberg]


Jochen Lang, EECS
jlang@uOttawa.ca
Linear Filtering

• Transformations on signals; e.g.:


– blurring/sharpening operations
– smoothing/noise reduction
• Key properties
– linearity:
– shift invariance: behavior invariant to shifting the
input, i.e., not dependent on the location in the
image
• Can be modeled mathematically by convolution

Source: Alexei Efros

Jochen Lang, EECS


jlang@uOttawa.ca
Moving Average

• Basic idea: define a new function by averaging over a


sliding window
• a simple example to start off: smoothing

Source: Alexei Efros

Jochen Lang, EECS


jlang@uOttawa.ca
Moving Average

• Can add weights to our moving average


• Weights […, 0, 1, 1, 1, 1, 1, 0, …] / 5

Source: Alexei Efros

Jochen Lang, EECS


jlang@uOttawa.ca
Cross-correlation

• Let be the image, be the kernel (of size


), and be the output image.
• Can think of as the dot product between local
neighborhood and kernel for each pixel

is called the cross-correlation

Source: Alexei Efros

Jochen Lang, EECS


jlang@uOttawa.ca
Simple 2D Filter: Box Filter

1 1 1

1 1 1

1 1 1

Source: David Lowe

Jochen Lang, EECS


jlang@eecs.uOttawa.ca
1 1 1

Box Filter 1 1 1

1 1 1

0 0 0 0 0 0 0 0 0 0

0 0 0 0 0 0 0 0 0 0 0

0 0 0 90 90 90 90 90 0 0

0 0 0 90 90 90 90 90 0 0

0 0 0 90 90 90 90 90 0 0

0
0 0
0 0
0 90
90 0
0 90
90 90
90 90
90 0
0 0
0

0
0 0
0 0
0 90
90 90
90 90
90 90
90 90
90 0
0 0
0

0
0 0
0 0
0 0
0 0
0 0
0 0
0 0
0 0
0 0
0

0
0 0
0 90
90 0
0 0
0 0
0 0
0 0
0 0
0 0
0

0
0 0
0 0
0 0
0 0
0 0
0 0
0 0
0 0
0 0
0

Source: Steve Seitz


Jochen Lang, EECS
jlang@eecs.uOttawa.ca
1 1 1

Box Filter 1 1 1

1 1 1

0 0 0 0 0 0 0 0 0 0

0 0 0 0 0 0 0 0 0 0 0 10

0 0 0 90 90 90 90 90 0 0

0 0 0 90 90 90 90 90 0 0

0 0 0 90 90 90 90 90 0 0

0 0 0 90 0 90 90 90 0 0

0 0 0 90 90 90 90 90 0 0

0 0 0 0 0 0 0 0 0 0

0 0 90 0 0 0 0 0 0 0

0 0 0 0 0 0 0 0 0 0

Source: Steve Seitz


Jochen Lang, EECS
jlang@eecs.uOttawa.ca
1 1 1

Box Filter 1 1 1

1 1 1

0 0 0 0 0 0 0 0 0 0

0 0 0 0 0 0 0 0 0 0 0 10 20

0 0 0 90 90 90 90 90 0 0

0 0 0 90 90 90 90 90 0 0

0 0 0 90 90 90 90 90 0 0

0 0 0 90 0 90 90 90 0 0

0 0 0 90 90 90 90 90 0 0

0 0 0 0 0 0 0 0 0 0

0 0 90 0 0 0 0 0 0 0

0 0 0 0 0 0 0 0 0 0

Source: Steve Seitz


Jochen Lang, EECS
jlang@eecs.uOttawa.ca
1 1 1

Box Filter 1 1 1

1 1 1

0 0 0 0 0 0 0 0 0 0

0 0 0 0 0 0 0 0 0 0 0 10 20 30

0 0 0 90 90 90 90 90 0 0

0 0 0 90 90 90 90 90 0 0

0 0 0 90 90 90 90 90 0 0

0 0 0 90 0 90 90 90 0 0

0 0 0 90 90 90 90 90 0 0

0 0 0 0 0 0 0 0 0 0

0 0 90 0 0 0 0 0 0 0

0 0 0 0 0 0 0 0 0 0

Source: Steve Seitz


Jochen Lang, EECS
jlang@eecs.uOttawa.ca
1 1 1

Box Filter 1 1 1

1 1 1

0 0 0 0 0 0 0 0 0 0

0 0 0 0 0 0 0 0 0 0 0 10 20 30 30

0 0 0 90 90 90 90 90 0 0

0 0 0 90 90 90 90 90 0 0

0 0 0 90 90 90 90 90 0 0

0 0 0 90 0 90 90 90 0 0

0 0 0 90 90 90 90 90 0 0

0 0 0 0 0 0 0 0 0 0

0 0 90 0 0 0 0 0 0 0

0 0 0 0 0 0 0 0 0 0

Source: Steve Seitz


Jochen Lang, EECS
jlang@eecs.uOttawa.ca
1 1 1

Box Filter 1 1 1

1 1 1

0 0 0 0 0 0 0 0 0 0

0 0 0 0 0 0 0 0 0 0 0 10 20 30 30

0 0 0 90 90 90 90 90 0 0

0 0 0 90 90 90 90 90 0 0

0 0 0 90 90 90 90 90 0 0

0 0 0 90 0 90 90 90 0 0

0 0 0 90 90 90 90 90 0 0
?
0 0 0 0 0 0 0 0 0 0

0 0 90 0 0 0 0 0 0 0

0 0 0 0 0 0 0 0 0 0

Source: Steve Seitz


Jochen Lang, EECS
jlang@eecs.uOttawa.ca
1 1 1

Box Filter 1 1 1

1 1 1

0 0 0 0 0 0 0 0 0 0

0 0 0 0 0 0 0 0 0 0 0 10 20 30 30

0 0 0 90 90 90 90 90 0 0

0 0 0 90 90 90 90 90 0 0

0 0 0 90 90 90 90 90 0 0

0 0 0 90 0 90 90 90 0 0
?
0 0 0 90 90 90 90 90 0 0 50

0 0 0 0 0 0 0 0 0 0

0 0 90 0 0 0 0 0 0 0

0 0 0 0 0 0 0 0 0 0

Source: Steve Seitz


Jochen Lang, EECS
jlang@eecs.uOttawa.ca
1 1 1
1 1 1

Box Filter 1 1 1

0 0 0 0 0 0 0 0 0 0

0 0 0 0 0 0 0 0 0 0 0 10 20 30 30 30 20 10

0 0 0 90 90 90 90 90 0 0 0 20 40 60 60 60 40 20

0 0 0 90 90 90 90 90 0 0 0 30 60 90 90 90 60 30

0 0 0 90 90 90 90 90 0 0 0 30 50 80 80 90 60 30

0 0 0 90 0 90 90 90 0 0 0 30 50 80 80 90 60 30

0 0 0 90 90 90 90 90 0 0 0 20 30 50 50 60 40 20

0 0 0 0 0 0 0 0 0 0 10 20 30 30 30 30 20 10

0 0 90 0 0 0 0 0 0 0 10 10 10 0 0 0 0 0

0 0 0 0 0 0 0 0 0 0

Source: Steve Seitz


Jochen Lang, EECS
jlang@eecs.uOttawa.ca
Box Filter

What does it do?


• Replaces each pixel with an
average of its neighborhood
1 1 1
• Smoothing effect (removes sharp
features) 1 1 1

1 1 1

Source: David Lowe

Jochen Lang, EECS


jlang@uOttawa.ca
Example of Linear Filters: Box

1 1 1
1 1 1
1 1 1
=
Original Blur (with a mean
filter)
Source: David Lowe

Jochen Lang, EECS


jlang@eecs.uOttawa.ca
Identity Filter

0 0 0
0 1 0
0 0 0

Original Filtered
(no change)
Source: David Lowe

Jochen Lang, EECS


jlang@eecs.uOttawa.ca
Shift Left

0 0 0
0 0 1
0 0 0

Original Shifted left


By 1 pixel
Source: David Lowe

Jochen Lang, EECS


jlang@eecs.uOttawa.ca
Sobel Edge Filter

1 0 -1
2 0 -2
1 0 -1

Sobel
Vertical Edge
(absolute value) Source: David Lowe
Jochen Lang, EECS
jlang@uOttawa.ca
Sobel Edge Filter

1 2 1
0 0 0
-1 -2 -1

Sobel
Horizontal Edge
(absolute value) Source: David Lowe

Jochen Lang, EECS


jlang@uOttawa.ca
Sharpening Filter

-
0 0 0 1 1 1
0
0
2
0
0
0
1
1
1
1
1
1 =
Sharpening filter
Original (accentuates edges)

Source: David Lowe

Jochen Lang, EECS


jlang@uOttawa.ca
Sharpening

Source: David Lowe

Jochen Lang, EECS


jlang@uOttawa.ca
Gaussian Blur
• Weight contributions of neighboring pixels by nearness

0.003 0.013 0.022 0.013 0.003


0.013 0.059 0.097 0.059 0.013
0.022 0.097 0.159 0.097 0.022
0.013 0.059 0.097 0.059 0.013
0.003 0.013 0.022 0.013 0.003

5 x 5,  = 1

Source: Christopher Rasmussen

Jochen Lang, EECS


jlang@uOttawa.ca
Box vs. Gaussian Filter

Source: Steve Seitz

Jochen Lang, EECS


jlang@uOttawa.ca
Gaussian filters

Source: Alexei Efros

Jochen Lang, EECS


jlang@uOttawa.ca
Be Aware: Kernel Size

• The Gaussian function has infinite support, but discrete


filters use finite kernels

Source: Kirsten Grauman

Jochen Lang, EECS


jlang@uOttawa.ca
Cross-correlation vs. Convolution

• Cross-correlation is

• Convolution is cross-correlation where the filter is


flipped both horizontally and vertically before being
applied to the image:

• It is written:
• Convolution is commutative and associative
Source: Steve Seitz

Jochen Lang, EECS


jlang@uOttawa.ca
Properties of Convolution
• Convolution is a multiplication-like operation
– commutative
– associative
– distributes over addition
– scalars factor out
– identity: unit impulse
• Conceptually no distinction between filter and signal
• Usefulness of associativity
– often apply several filters one after another:
1 2 3

– this is equivalent to applying one filter: 1 2


3
Source: Alexei Efros
Jochen Lang, EECS
jlang@uOttawa.ca
Gaussian and Convolution
• Removes high-frequency components from the image (low-
pass filter)
• Convolution with self is another Gaussian

=
– Convolving twice with Gaussian kernel of width
convolving once with kernel of width

Source: Kirsten Grauman

Jochen Lang, EECS


jlang@uOttawa.ca
Sampling and Reconstruction
Source: Alexei Efros
• Simple example: a sign wave

Jochen Lang, EECS


jlang@uOttawa.ca
Undersampling
Source: Alexei Efros
• What if we “missed” things between the samples?
• Simple example: undersampling a sine wave
– unsurprising result: information is lost
– surprising result: indistinguishable from lower
frequency

Jochen Lang, EECS


jlang@uOttawa.ca
Undersampling
Source: Alexei Efros
• What if we “missed” things between the samples?
• Simple example: undersampling a sine wave
– unsurprising result: information is lost
– surprising result: indistinguishable from lower
frequency
– also, was always indistinguishable from higher
frequencies
– aliasing: signals “traveling in disguise” as other
frequencies

Jochen Lang, EECS


jlang@uOttawa.ca
Anti-aliasing
Source: Alexei Efros
• Introduce lowpass filters:
– remove high frequencies leaving only safe, low
frequencies
– choose lowest frequency in reconstruction
(disambiguate)

Jochen Lang, EECS


jlang@uOttawa.ca
Aliasing
Plot as image:
Input signal:

Aliasing!
Not enough samples

Source: Alexei Efros

Jochen Lang, EECS


jlang@uOttawa.ca
Undersampling in Learning-Based
Methods
• In learning-based methods, we often like to reduce the
image size
– loss of information
– undersampling creates aliasing
• What can we do about aliasing?
– Get rid of high frequencies
• Information loss
• But does keep low frequency content intact

Jochen Lang, EECS


jlang@uOttawa.ca
Down-sizing

• Image is too large for


screen or for CNN
• Reduce size
• Throw away every
other row and column
to create a size
image
- image sub-
sampling

Jochen Lang, EECS


jlang@uOttawa.ca
Image Sub-sampling

1/8

1/4

1/2 Source: Steve Seitz

Jochen Lang, EECS


jlang@eecs.uOttawa.ca
Aliasing Due to Sub-sampling

1/2 1/4 (2x zoom) 1/8 (4x zoom)


Source: Steve Seitz

Jochen Lang, EECS


jlang@eecs.uOttawa.ca
Sampling an Image – No
Aliasing

Jochen Lang, EECS


jlang@eecs.uOttawa.ca
Sampling an Image - Aliasing

Jochen Lang, EECS


jlang@eecs.uOttawa.ca
Gaussian Pre-filtering

G 1/8
G 1/4

Gaussian 1/2 Source: Steve Seitz

Jochen Lang, EECS


jlang@eecs.uOttawa.ca
Subsampling with Gaussian
pre-filtering Source: Steve Seitz

Gaussian 1/2 G 1/4 G 1/8


Jochen Lang, EECS
jlang@eecs.uOttawa.ca
Compare with...

1/2 1/4 (2x zoom) 1/8 (4x zoom)

Jochen Lang, EECS


jlang@eecs.uOttawa.ca
Source: Steve Seitz
Gaussian Pyramid

• Gaussian Pyramid in computer vision [Burt and Adelson,


1983] or mip-mapping in computer graphics [Williams,
1983]
– Color for illustration only (image size is power of 2)

Size of image
, ,

Jochen Lang, EECS


jlang@uOttawa.ca
Gaussian Pyramid Construction
• Repeatedly filter and subsample until minimum
resolution reached
– Relative increase in filter size compared to original
– But can reuse same filter with images in the pyramid
– Total size is only 4/3 of the original image size

filter mask

Drawing by Steve Seitz

Jochen Lang, EECS


jlang@uOttawa.ca
A bar in the
big images is a
hair on the
zebra’s nose;
in smaller
images, a
stripe; in the
smallest, the
animal’s nose

Figure from David Forsyth

Jochen Lang, EECS


jlang@uOttawa.ca
Some Uses of Image Pyramids

• Improve Search
– Search over translations
• Classic coarse-to-fine strategy
– Search over scale
• Template matching
• E.g. find a face at different scales, e.g., MTCNN
Zhang et al. (2016).

Jochen Lang, EECS


jlang@uOttawa.ca
Discrete Partial Derivatives in Image

• Idea: Calculate finite differences with a filter


– The filter is convolved with the image
– The image is discrete
– Our filters are discrete
( , ) , ,
• Reminder: Forward difference
• Images are two-dimensional “functions”
– Calculate partial derivative in x and y
– Can express the gradient with a magnitude and a
direction

Jochen Lang, EECS


jlang@uOttawa.ca
Traditional Finite Difference Filters
-1 0 1 1 1 1
• Prewitt
-1 0 1 0 0 0
-1 0 1 -1 -1 -1
• Sobel
1 2 1 1 2 1
0 0 0 0 0 0
-1 -2 -1 -1 -2 -1

• Roberts 0 1 1 0
-1 0 0 -1

• Other approximations of derivative filters exist


Jochen Lang, EECS
jlang@uOttawa.ca
Partial derivatives of an image

-1 1
-1 1 1
or
-1
Source: Alexei Efros
Jochen Lang, EECS
jlang@eecs.uOttawa.ca
Image Gradient

• Two-dimensional vector

• Can be express as a vector magnitude and an angle wrt


the x-axis

  , ,
, ,
,

Jochen Lang, EECS


jlang@uOttawa.ca
Image Gradient

Source: Alexei Efros

Jochen Lang, EECS


jlang@eecs.uOttawa.ca
Effects of Noise

• Consider a single row or column of the image


– Plotting intensity as a function of x position gives a
signal

Source: Steve Seitz

Jochen Lang, EECS


jlang@uOttawa.ca
Solution: Smooth First

To find edges,
look for peaks

in

Source: Steve Seitz

Jochen Lang, EECS


jlang@uOttawa.ca
Derivative Theorem of Convolution

• This saves us one


operation:

Source: Steve Seitz

Jochen Lang, EECS


jlang@uOttawa.ca
Derivative of Gaussian Filter

x-direction y-direction

Source: Alexei Efros

Jochen Lang, EECS


jlang@eecs.uOttawa.ca
Example

Jochen Lang, EECS


jlang@eecs.uOttawa.ca
Compute Gradients (DoG)

Jochen Lang, EECS


jlang@eecs.uOttawa.ca
Orientation at Every Pixel

• Threshold at minimum magnitude (here 0.18)


• Color coded result

Jochen Lang, EECS


jlang@uOttawa.ca
Size of Padding

• Size of output
– Maximal sizeof(f) + 2*sizeof(g) – 2
– Typically padding but sizeof(output) = sizeof(f)
– No padding sizeof(f) - 2*sizeof(g) + 2
g g
g g
g g

f f f

g g
g g
g g Image source S. Lazebnik

Jochen Lang, EECS


jlang@uOttawa.ca
Type of Padding

• Behaviour at edge
– Filter window beyond
the edge of the image
– Need to extrapolate
– Methods:
• clip filter (black)
• wrap around
• copy edge
• reflect across edge

Source: Steve Marschner

Jochen Lang, EECS


jlang@uOttawa.ca
Review: Smoothing vs. Derivative
filters
• Smoothing Filters
– Gaussian: remove “high-frequency” components;
“low-pass” filter
– Filter values should sum to one: constant regions
are not affected by the filter
• Derivative Filters
– Derivatives of Gaussian
– Filter values should sum to zero: no response in
constant regions
– High absolute value at points of high contrast

Source: Alexei Efros

Jochen Lang, EECS


jlang@uOttawa.ca
Template matching

• Goal: find in image

• Main challenge: What is a good


similarity or distance measure
between two patches?
– Correlation
– Zero-mean correlation
– Sum Square Difference
– Normalized Cross Correlation

Side by Derek Hoiem

Jochen Lang, EECS


jlang@uOttawa.ca
Matching with filters

• Goal: find in image


• Method 0: filter the image with eye patch

f = image
h = filter

What went wrong?


Filtered Image

Side by Derek Hoiem

Jochen Lang, EECS


jlang@uOttawa.ca
Matching with Filters
Side by Derek Hoiem
• Goal: find in image
• Method 1: filter the image with zero-mean eye

True
detections

False
detections

Input Filtered Image (scaled) Thresholding


Jochen Lang, EECS
jlang@uOttawa.ca
Matching with Filters: SSD
Side by Derek Hoiem
• Goal: find in image
• Method 2: SSD

True
detections

Input 1- sqrt(SSD) Thresholding


Jochen Lang, EECS
jlang@uOttawa.ca
SSD as a Filter

• SSD as a filter

Jochen Lang, EECS


jlang@uOttawa.ca
SSD without Normalization

• Goal: find in image


• Method 2: SSD

Side by Derek Hoiem


Input 1- sqrt(SSD)
Jochen Lang, EECS
jlang@uOttawa.ca
Matching with filters

• Goal: find in image


• Method 3: Normalized cross-correlation
mean template mean image patch

Side by Derek Hoiem

Jochen Lang, EECS


jlang@uOttawa.ca
Matching with Filters: NCC
Side by Derek Hoiem
• Goal: find in image
• Method 3: Normalized cross-correlation NCC

True
detections

Input NCC Thresholding


Jochen Lang, EECS
jlang@uOttawa.ca
Matching with Filters: NCC –
Intensity Invariance
Side by Derek Hoiem
• Goal: find in image
• Method 3: Normalized cross-correlation

True detections

Input NCC Thresholding


Jochen Lang, EECS
jlang@uOttawa.ca
Comparison Template Matching

• Zero-mean filter: fastest but not a great matcher


• SSD: next fastest, sensitive to overall intensity
• Normalized cross-correlation: slowest, invariant to local
average intensity and contrast

Side by Derek Hoiem

Jochen Lang, EECS


jlang@uOttawa.ca
Denoising

Gaussian
Filter

Additive Gaussian Noise


Jochen Lang, EECS
jlang@eecs.uOttawa.ca
Reducing Gaussian noise

Smoothing with
larger standard
deviations
suppresses
noise, but also
blurs the image

Source: S. Lazebnik

Jochen Lang, EECS


jlang@eecs.uOttawa.ca
Reducing Salt-and-Pepper
Noise by Gaussian Smoothing

3x3 5x5 7x7

Jochen Lang, EECS


jlang@eecs.uOttawa.ca
Alternative idea: Median filtering

• A median filter operates over a window by selecting the


median intensity in the window

Source: Kirsten Grauman

Jochen Lang, EECS


jlang@uOttawa.ca
Median filter

• What advantage does


median filtering have
over Gaussian filtering?
– Robustness to outliers

Source: Kirsten Grauman

Jochen Lang, EECS


jlang@uOttawa.ca
Median filter
Salt-and-pepper noise Median filtered

Jochen Lang, EECS


jlang@eecs.uOttawa.ca
Source: M. Hebert
Median vs. Gaussian filtering

Gaussian

3x3 5x5 7x7

Median

Jochen Lang, EECS


jlang@eecs.uOttawa.ca
Summary

• Correlation and Convolution


• Gaussian Image Pyramid
• Basic Filtering Operations
– Smoothing
– Edge Detection
– Template matching
• Linear vs. non-linear operations

Jochen Lang, EECS


jlang@uOttawa.ca

You might also like