CV Course

Computer Vision
Lab 9: Common Architectures
Based on the Lecture

Prepared by
Amjad Dife
2023 / 2024
Computer Vision 2023 / 2024 Assiut University | Faculty of computers and information 1
Motivation
1 2 3
Motivation
4 5
Motivation
6 7
Motivation
Outline
9 10
Outline
❑ Common Architectures.
▪ AlexNet
▪ VGGNet
▪ GoogleNet
▪ Inception v2, v3, and v4
▪ ResNet
➢ Residual Block
➢ ResNet v1, v2
Outline
▪ AlexNet
▪ VGGNet
▪ GoogleNet
▪ ResNet
➢ Residual Block
➢ ResNet v1, v2
Common Architectures
▪ The size of each ball

corresponds to the model
complexity.
Outline
▪ AlexNet
▪ VGGNet
▪ GoogleNet
▪ ResNet
➢ Residual Block
➢ ResNet v1, v2
AlexNet | ILSVRC 2012 winner
A. Krizhevsky, I. Sutskever, and G. Hinton, ImageNet

Classification with Deep Convolutional Neural
▪ First layer: 96 11x11 filters applied at stride 4 Networks, NIPS 2012
▪ Max pooling, ReLU nonlinearity
▪ 60M params
▪ GPU implementation (50x speedup over CPU)
▪ Trained on two GPUs for a week
▪ Dropout regularization
Let’s Code
AlexNet
Outline
▪ AlexNet
▪ VGGNet
▪ GoogleNet
▪ ResNet
➢ Residual Block
➢ ResNet v1, v2
VGGNet | ILSVRC 2014 2nd place
▪ Sequence of deeper networks.

▪ Large receptive fields replaced by
successive layers of 3x3 convolutions (with
ReLU in between)
▪ Fewer parameters:
▪ Experimented with 1x1 convolutions
VGGNet | (design principle) More depth is better for the respective field
▪ Suppose that:
▪ The input is H x W x C (depth of the input)
▪ and we use convolutions with C (depth of the output feature map) filters to preserve
depth (stride 1, padding to preserve H, W)
One CONV with 7 x 7 filters Three CONV with 3 x 3 filters

Number of weights (7*7*C)*C = 49 C2 3 x (C x (3 x 3 x C) ) = 27 C2
Less compute, more

nonlinearity (better)
Let’s Code
VGG
Outline
▪ AlexNet
▪ VGGNet
▪ GoogleNet
▪ ResNet
➢ Residual Block
➢ ResNet v1, v2
GoogleNet | 2015
Module
Architecture
GoogleNet | 2015
The Inception Module:

▪ Parallel paths with different receptive field sizes and operations are meant to capture
sparse patterns of correlations in the stack of feature maps.
C. Szegedy et al., Going deeper with convolutions, CVPR 2015

GoogleNet | 2015
▪ Note that: 1 x 1 Convolution only changes the depth of the output feature map.
▪ Remember that: the depth of the output feature map equals the number of filters.
▪ Remember that: the depth of the filter equals the depth of the input.
Inception module
GoogleNet | another version of GoogleNet
The main idea here:

• Use 1x1 convolutions for
dimensionality reduction
before expensive
convolutions.
Inception module
C. Szegedy et al., Going deeper with convolutions, CVPR 2015
GoogleNet | another version of GoogleNet
Inception module Inception module
Conv Ops:
• Computational complexity! [1x1 conv, 64] 28x28x64x1x1x256
Conv Ops: [1x1 conv, 64] 28x28x64x1x1x256
[1x1 conv, 128] 28x28x128x1x1x256 [1x1 conv, 128] 28x28x128x1x1x256
[1x1 conv, 64] 28x28x64x1x1x256
Total: 854M ops
Total: 358M ops
GoogleNet
Inception module
Outline
▪ AlexNet
▪ VGGNet
▪ GoogleNet
▪ ResNet
➢ Residual Block
➢ ResNet v1, v2
Inception v2, v3 (2016)
▪ Regularize training with batch normalization, reducing the importance of auxiliary

classifiers.
▪ More variants of inception modules with aggressive factorization of filters.
V2 V3
C. Szegedy et al., Rethinking the inception architecture for computer vision, CVPR 2016
Inception v4
C. Szegedy et al., Inception-v4, Inception-ResNet and the Impact of Residual Connections on Learning, arXiv 2016
Outline
▪ AlexNet
▪ VGGNet
▪ GoogleNet
▪ ResNet
➢ Residual Block
➢ ResNet v1, v2
Outline
▪ AlexNet
▪ VGGNet
▪ GoogleNet
▪ ResNet
➢ Residual Block
➢ ResNet v1, v2
ResNet | Residual Block
The residual module

▪ 152-layer model for ImageNet
▪ Introduce skip or shortcut connections (existing before in various forms in literature)
▪ Make it easy for network layers to represent the identity mapping
▪ For some reason, need to skip at least two layers
Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun, Deep Residual Learning for Image
Recognition, CVPR 2016 (Best Paper)
ResNet | v1 & v2
Let’s Code
ResNet
Outline
▪ AlexNet
▪ VGGNet
▪ GoogleNet
▪ ResNet
➢ Residual Block
➢ ResNet v1, v2
Thank You

CV Course

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

CV Course

Uploaded by

Copyright:

Available Formats

Computer Vision

Lab 9: Common Architectures

Based on the Lecture

▪ The size of each ball

A. Krizhevsky, I. Sutskever, and G. Hinton, ImageNet

▪ Sequence of deeper networks.

▪ Experimented with 1x1 convolutions

One CONV with 7 x 7 filters Three CONV with 3 x 3 filters

Less compute, more

The Inception Module:

C. Szegedy et al., Going deeper with convolutions, CVPR 2015

The main idea here:

C. Szegedy et al., Going deeper with convolutions, CVPR 2015

Inception module Inception module

▪ Regularize training with batch normalization, reducing the importance of auxiliary

The residual module

You might also like