FSAN/ELEG815: Statistical Learning: Gonzalo R. Arce

You might also like

Download as pdf or txt
Download as pdf or txt
You are on page 1of 50

FSAN/ELEG815: Statistical Learning

Gonzalo R. Arce
Department of Electrical and Computer Engineering
University of Delaware

X: Convolutional Neural Networks


FSAN/ELEG815

1/49
FSAN/ELEG815

To discover from images what is present in the world, where things are, what
actions are taking place, to predict and anticipate events in the world.

2/49
FSAN/ELEG815

The Rise and Impact of Computer Vision

3/49
FSAN/ELEG815

Impact: Facial Detection & Recognition

4/49
FSAN/ELEG815

Impact: Self-Driving Cars

5/49
FSAN/ELEG815

Impact: Medicine, Biology, Healthcare

6/49
FSAN/ELEG815

Impact: Accessibility

7/49
FSAN/ELEG815

What Computers See: Images are Numbers

An image is just a matrix of numbers [0, 255],


i.e 1080 × 1082 × 3 for an RGB image

8/49
FSAN/ELEG815

Tasks in Computer Vision

I Regression: output variable takes continuous value


I Classification: output variable takes class label. Can produce probability
of belonging to a particular class

9/49
FSAN/ELEG815

Tasks in Computer Vision

Let’s identify key features in each image category

Nose, Wheels, Door,


Eyes, License Plate, Windows,
Mouth Headlights Steps

10/49
FSAN/ELEG815

Manual Feature Extraction

Problems?

11/49
FSAN/ELEG815

Manual Feature Extraction

12/49
FSAN/ELEG815

Manual Feature Extraction

13/49
FSAN/ELEG815

Learning Feature Representations

Can we learn a hierarchy og features directly from data instead of hand


engineering?

Low lever features Mid level features High lever features

Line and Edges Eyes, Nose and Ears Facial Structure

14/49
FSAN/ELEG815

Fully Connected Neural Network

15/49
FSAN/ELEG815

Fully Connected Neural Network

Fully Connected

Input
I 2d image
I Vector of pixel
values

How can we use spatial structure in the input to inform the architecture of
the network?

16/49
FSAN/ELEG815

Using Spatial Structure

17/49
FSAN/ELEG815

Using Spatial Structure

Connect patch in input layer to a single neuron in subsequent layer. Use a


sliding window to define connections. Ho can we weight the patch to detect
particular features?

18/49
FSAN/ELEG815

Using Spatial Structure

Connect patch in input layer to a single neuron in subsequent layer. Use a


sliding window to define connections. Ho can we weight the patch to detect
particular features?

19/49
FSAN/ELEG815

Using Spatial Structure

Connect patch in input layer to a single neuron in subsequent layer. Use a 20/49
FSAN/ELEG815

Feature Extraction with Convolution

I Filter of size 4 × 4: 16 different


weights
I Apply this same filter to 4 × 4
patches in input
I Shift by 2 pixels for next patch
Input
1. Apply a set of weights - a filter - to extract local features 2.Use multiple
filters to extract different features 3.Spatially share parameters of each filter

21/49
FSAN/ELEG815

X or X?

Image is represented as a matrix of pixel values... and computers are literal!


We want to be able to classify an X as an X even if it’s shifted, shrunk rotated
or deformed

22/49
FSAN/ELEG815

Features of X

23/49
FSAN/ELEG815

Filters to Detect X Features

24/49
FSAN/ELEG815

Filters to Detect X Features

25/49
FSAN/ELEG815

The Convolution Operation


Suppose we want to compute the convolution of a 5 × 5 image and a 3 × 3
filter:

We slide the 3 × 3 filter over the input image, element-wise multiply, and add
the outputs... 26/49
FSAN/ELEG815

The Convolution Operation

We slide the 3 × 3 filter over the input image, element-wise multiply, and add
the outputs:

27/49
FSAN/ELEG815

The Convolution Operation

We slide the 3 × 3 filter over the input image, element-wise multiply, and add
the outputs:

28/49
FSAN/ELEG815

The Convolution Operation

We slide the 3 × 3 filter over the input image, element-wise multiply, and add
the outputs:

29/49
FSAN/ELEG815

The Convolution Operation

We slide the 3 × 3 filter over the input image, element-wise multiply, and add
the outputs:

30/49
FSAN/ELEG815

The Convolution Operation

We slide the 3 × 3 filter over the input image, element-wise multiply, and add
the outputs:

31/49
FSAN/ELEG815

Producing Feature Maps

32/49
FSAN/ELEG815

Feature Extraction with Convolution

1. Apply a set of weights - a filter - to extract local features


2.Use multiple filters to extract different features
3.Spatially share parameters of each filter

33/49
FSAN/ELEG815

Convolutional Neural Networks for Classification

1. Convolution: Apply filter to generate feature maps


2. Non-linearity: Often ReLU
3. Pooling: Downsampling operator on each feature map

34/49
FSAN/ELEG815

Convolutional Layers: Local Connectivity

For a neuron in a hidden layer:


I Take inputs from patch
I Compute weighted sum
I Apply bias
4 × 4 filter: matrix of weights wi,j : 1) applying a window og weights
2) computing linear combinations
4 X
X 4 3) activation with non-linear functions
wi,j xi+p,j+q + b
i=1 j=1

35/49
FSAN/ELEG815

CNNs: Spatial Arrangement of Output Volume

Layer Dimensions:

h×w×d

where h and w are spatial


dimensions
d (depth) = number of filters
Stride:
Filter step size
Receptive Field:
Locations in input image that a
node is path connected to

36/49
FSAN/ELEG815

Introducing Non-Linearity

I Apply after every convolution operation (i.e., Rectified Linear Unit


after convolutional layers) (ReLU)
I ReLU: pixel-by-pixel operation that replaces all
negatives by zero. Non-linear operation!

g(z) = max(0, z)

37/49
FSAN/ELEG815

Pooling

How else can we downsample and preserve spatial invariance?

38/49
FSAN/ELEG815

Representing Learning in Deep CNNs

Low lever features Mid level features High lever features

Line and Edges Eyes, Nose and Ears Facial Structure


Conv Layer 1 Conv Layer 2 Conv Layer 3

39/49
FSAN/ELEG815

CNNs for Classification: Feature Learning

1. Learn features in input image through convolution


2. Introduce non-linearity through activation function (real world data is
non-linear!)
3. Reduce dimensionality and preserve spatial invariance pooling
40/49
FSAN/ELEG815

CNNs for Classification: Class Probability

1. CONV and POOL layers output high-level features


of input
2. Fully connected layers uses these features for eyi
classifying input image softmax(yi ) = P yj
je
3. Express output as probability of image belonging
to a particular class 41/49
FSAN/ELEG815

An Architecture for Many Applications

42/49
FSAN/ELEG815

Classification: Breast Cancer Screening

McKinney, S.M. et al. International evaluation of an AI system for breast cancer screening. Nature (2020)

43/49
FSAN/ELEG815

Object Detection

44/49
FSAN/ELEG815

Semantic Segmentation: Fully Convolutional Networks

FCN: Fully Convolutional Network


Network designed with all convolutional layers,
with downsampling and upsampling operators

45/49
FSAN/ELEG815

Semantic Segmentation: Biomedical Image Analysis

46/49
FSAN/ELEG815

End-to-End Framework for Autonomous Navigation

47/49
FSAN/ELEG815

Deep Learning for Computer Vision: Impact

48/49
FSAN/ELEG815

Acknowledgement

Alexander Amini and Ava Soleimanym, MIT 6.S191: Introduction to Deep


Learning, IntroToDeepLearning.com

49/49

You might also like