FSAN/ELEG815: Statistical Learning: Gonzalo R. Arce

FSAN/ELEG815: Statistical Learning
Gonzalo R. Arce
Department of Electrical and Computer Engineering
University of Delaware
X: Convolutional Neural Networks

FSAN/ELEG815
1/49
FSAN/ELEG815
To discover from images what is present in the world, where things are, what
actions are taking place, to predict and anticipate events in the world.
2/49
FSAN/ELEG815
The Rise and Impact of Computer Vision
3/49
FSAN/ELEG815
Impact: Facial Detection & Recognition
4/49
FSAN/ELEG815
Impact: Self-Driving Cars
5/49
FSAN/ELEG815
Impact: Medicine, Biology, Healthcare
6/49
FSAN/ELEG815
Impact: Accessibility
7/49
FSAN/ELEG815
What Computers See: Images are Numbers
An image is just a matrix of numbers [0, 255],

i.e 1080 × 1082 × 3 for an RGB image
8/49
FSAN/ELEG815
Tasks in Computer Vision
I Regression: output variable takes continuous value

I Classification: output variable takes class label. Can produce probability
of belonging to a particular class
9/49
FSAN/ELEG815
Tasks in Computer Vision
Let’s identify key features in each image category
Nose, Wheels, Door,

Eyes, License Plate, Windows,
Mouth Headlights Steps
10/49
FSAN/ELEG815
Manual Feature Extraction
Problems?
11/49
FSAN/ELEG815
12/49
FSAN/ELEG815
13/49
FSAN/ELEG815
Learning Feature Representations
Can we learn a hierarchy og features directly from data instead of hand

engineering?
Low lever features Mid level features High lever features
Line and Edges Eyes, Nose and Ears Facial Structure
14/49
FSAN/ELEG815
Fully Connected Neural Network
15/49
FSAN/ELEG815
Fully Connected Neural Network
Fully Connected
Input
I 2d image
I Vector of pixel
values
How can we use spatial structure in the input to inform the architecture of
the network?
16/49
FSAN/ELEG815
Using Spatial Structure
17/49
FSAN/ELEG815
Connect patch in input layer to a single neuron in subsequent layer. Use a

sliding window to define connections. Ho can we weight the patch to detect
particular features?
18/49
FSAN/ELEG815
Connect patch in input layer to a single neuron in subsequent layer. Use a

sliding window to define connections. Ho can we weight the patch to detect
particular features?
19/49
FSAN/ELEG815
Connect patch in input layer to a single neuron in subsequent layer. Use a 20/49
FSAN/ELEG815
Feature Extraction with Convolution
I Filter of size 4 × 4: 16 different

weights
I Apply this same filter to 4 × 4
patches in input
I Shift by 2 pixels for next patch
Input
1. Apply a set of weights - a filter - to extract local features 2.Use multiple
filters to extract different features 3.Spatially share parameters of each filter
21/49
FSAN/ELEG815
X or X?
Image is represented as a matrix of pixel values... and computers are literal!

We want to be able to classify an X as an X even if it’s shifted, shrunk rotated
or deformed
22/49
FSAN/ELEG815
Features of X
23/49
FSAN/ELEG815
Filters to Detect X Features
24/49
FSAN/ELEG815
Filters to Detect X Features
25/49
FSAN/ELEG815
The Convolution Operation

Suppose we want to compute the convolution of a 5 × 5 image and a 3 × 3
filter:
We slide the 3 × 3 filter over the input image, element-wise multiply, and add
the outputs... 26/49
FSAN/ELEG815
the outputs:
27/49
FSAN/ELEG815
the outputs:
28/49
FSAN/ELEG815
the outputs:
29/49
FSAN/ELEG815
the outputs:
30/49
FSAN/ELEG815
the outputs:
31/49
FSAN/ELEG815
Producing Feature Maps
32/49
FSAN/ELEG815
Feature Extraction with Convolution
1. Apply a set of weights - a filter - to extract local features

2.Use multiple filters to extract different features
3.Spatially share parameters of each filter
33/49
FSAN/ELEG815
Convolutional Neural Networks for Classification
1. Convolution: Apply filter to generate feature maps

2. Non-linearity: Often ReLU
3. Pooling: Downsampling operator on each feature map
34/49
FSAN/ELEG815
Convolutional Layers: Local Connectivity
For a neuron in a hidden layer:

I Take inputs from patch
I Compute weighted sum
I Apply bias
4 × 4 filter: matrix of weights wi,j : 1) applying a window og weights
2) computing linear combinations
4 X
X 4 3) activation with non-linear functions
wi,j xi+p,j+q + b
i=1 j=1
35/49
FSAN/ELEG815
CNNs: Spatial Arrangement of Output Volume
Layer Dimensions:
h×w×d
where h and w are spatial

dimensions
d (depth) = number of filters
Stride:
Filter step size
Receptive Field:
Locations in input image that a
node is path connected to
36/49
FSAN/ELEG815
Introducing Non-Linearity
I Apply after every convolution operation (i.e., Rectified Linear Unit

after convolutional layers) (ReLU)
I ReLU: pixel-by-pixel operation that replaces all
negatives by zero. Non-linear operation!
g(z) = max(0, z)
37/49
FSAN/ELEG815
Pooling
How else can we downsample and preserve spatial invariance?
38/49
FSAN/ELEG815
Representing Learning in Deep CNNs
Low lever features Mid level features High lever features
Line and Edges Eyes, Nose and Ears Facial Structure

Conv Layer 1 Conv Layer 2 Conv Layer 3
39/49
FSAN/ELEG815
CNNs for Classification: Feature Learning
1. Learn features in input image through convolution

2. Introduce non-linearity through activation function (real world data is
non-linear!)
3. Reduce dimensionality and preserve spatial invariance pooling
40/49
FSAN/ELEG815
CNNs for Classification: Class Probability
1. CONV and POOL layers output high-level features

of input
2. Fully connected layers uses these features for eyi
classifying input image softmax(yi ) = P yj
je
3. Express output as probability of image belonging
to a particular class 41/49
FSAN/ELEG815
An Architecture for Many Applications
42/49
FSAN/ELEG815
Classification: Breast Cancer Screening
McKinney, S.M. et al. International evaluation of an AI system for breast cancer screening. Nature (2020)
43/49
FSAN/ELEG815
Object Detection
44/49
FSAN/ELEG815
Semantic Segmentation: Fully Convolutional Networks
FCN: Fully Convolutional Network

Network designed with all convolutional layers,
with downsampling and upsampling operators
45/49
FSAN/ELEG815
Semantic Segmentation: Biomedical Image Analysis
46/49
FSAN/ELEG815
End-to-End Framework for Autonomous Navigation
47/49
FSAN/ELEG815
Deep Learning for Computer Vision: Impact
48/49
FSAN/ELEG815
Acknowledgement
Alexander Amini and Ava Soleimanym, MIT 6.S191: Introduction to Deep

Learning, IntroToDeepLearning.com
49/49

FSAN/ELEG815: Statistical Learning: Gonzalo R. Arce

Uploaded by

Copyright:

Available Formats

You might also like

FSAN/ELEG815: Statistical Learning: Gonzalo R. Arce

Uploaded by

Document Information

Original Description:

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

FSAN/ELEG815: Statistical Learning: Gonzalo R. Arce

Uploaded by

Copyright:

Available Formats

FSAN/ELEG815: Statistical Learning

X: Convolutional Neural Networks

The Rise and Impact of Computer Vision

Impact: Facial Detection & Recognition

Impact: Self-Driving Cars

Impact: Medicine, Biology, Healthcare

What Computers See: Images are Numbers

An image is just a matrix of numbers [0, 255],

Tasks in Computer Vision

I Regression: output variable takes continuous value

Tasks in Computer Vision

Let’s identify key features in each image category

Nose, Wheels, Door,

Manual Feature Extraction

Manual Feature Extraction

Manual Feature Extraction

Learning Feature Representations

Can we learn a hierarchy og features directly from data instead of hand

Low lever features Mid level features High lever features

Line and Edges Eyes, Nose and Ears Facial Structure

Fully Connected Neural Network

Fully Connected Neural Network

Using Spatial Structure

Using Spatial Structure

Connect patch in input layer to a single neuron in subsequent layer. Use a

Using Spatial Structure

Connect patch in input layer to a single neuron in subsequent layer. Use a

Using Spatial Structure

Feature Extraction with Convolution

I Filter of size 4 × 4: 16 different

Image is represented as a matrix of pixel values... and computers are literal!

Filters to Detect X Features

Filters to Detect X Features

The Convolution Operation

The Convolution Operation

The Convolution Operation

The Convolution Operation

The Convolution Operation

The Convolution Operation

Producing Feature Maps

Feature Extraction with Convolution

1. Apply a set of weights - a filter - to extract local features

Convolutional Neural Networks for Classification

1. Convolution: Apply filter to generate feature maps

Convolutional Layers: Local Connectivity

For a neuron in a hidden layer:

CNNs: Spatial Arrangement of Output Volume

where h and w are spatial

I Apply after every convolution operation (i.e., Rectified Linear Unit

How else can we downsample and preserve spatial invariance?

Representing Learning in Deep CNNs

Low lever features Mid level features High lever features

Line and Edges Eyes, Nose and Ears Facial Structure

CNNs for Classification: Feature Learning

1. Learn features in input image through convolution

CNNs for Classification: Class Probability

1. CONV and POOL layers output high-level features