Download as pdf or txt
Download as pdf or txt
You are on page 1of 33

Computer Vision

Lab 9: Common Architectures

Based on the Lecture


Prepared by
Amjad Dife
2023 / 2024

Computer Vision 2023 / 2024 Assiut University | Faculty of computers and information 1
Motivation

1 2 3

Computer Vision 2023 / 2024 Assiut University | Faculty of computers and information 2
Motivation

4 5

Computer Vision 2023 / 2024 Assiut University | Faculty of computers and information 3
Motivation

6 7

Computer Vision 2023 / 2024 Assiut University | Faculty of computers and information 4
Motivation

Computer Vision 2023 / 2024 Assiut University | Faculty of computers and information 5
Outline

9 10

Computer Vision 2023 / 2024 Assiut University | Faculty of computers and information 6
Outline

❑ Common Architectures.
▪ AlexNet
▪ VGGNet
▪ GoogleNet
▪ Inception v2, v3, and v4
▪ ResNet
➢ Residual Block
➢ ResNet v1, v2

Computer Vision 2023 / 2024 Assiut University | Faculty of computers and information 7
Outline

❑ Common Architectures.
▪ AlexNet
▪ VGGNet
▪ GoogleNet
▪ Inception v2, v3, and v4
▪ ResNet
➢ Residual Block
➢ ResNet v1, v2

Computer Vision 2023 / 2024 Assiut University | Faculty of computers and information 8
Common Architectures

▪ The size of each ball


corresponds to the model
complexity.

Computer Vision 2023 / 2024 Assiut University | Faculty of computers and information 9
Outline

❑ Common Architectures.
▪ AlexNet
▪ VGGNet
▪ GoogleNet
▪ Inception v2, v3, and v4
▪ ResNet
➢ Residual Block
➢ ResNet v1, v2

Computer Vision 2023 / 2024 Assiut University | Faculty of computers and information 10
AlexNet | ILSVRC 2012 winner

A. Krizhevsky, I. Sutskever, and G. Hinton, ImageNet


Classification with Deep Convolutional Neural
▪ First layer: 96 11x11 filters applied at stride 4 Networks, NIPS 2012
▪ Max pooling, ReLU nonlinearity
▪ 60M params
▪ GPU implementation (50x speedup over CPU)
▪ Trained on two GPUs for a week
▪ Dropout regularization
Computer Vision 2023 / 2024 Assiut University | Faculty of computers and information 11
Let’s Code
AlexNet

Computer Vision 2023 / 2024 Assiut University | Faculty of computers and information 12
Outline

❑ Common Architectures.
▪ AlexNet
▪ VGGNet
▪ GoogleNet
▪ Inception v2, v3, and v4
▪ ResNet
➢ Residual Block
➢ ResNet v1, v2

Computer Vision 2023 / 2024 Assiut University | Faculty of computers and information 13
VGGNet | ILSVRC 2014 2nd place

▪ Sequence of deeper networks.


▪ Large receptive fields replaced by
successive layers of 3x3 convolutions (with
ReLU in between)
▪ Fewer parameters:

▪ Experimented with 1x1 convolutions

Computer Vision 2023 / 2024 Assiut University | Faculty of computers and information 14
VGGNet | (design principle) More depth is better for the respective field

▪ Suppose that:
▪ The input is H x W x C (depth of the input)

▪ and we use convolutions with C (depth of the output feature map) filters to preserve
depth (stride 1, padding to preserve H, W)

One CONV with 7 x 7 filters Three CONV with 3 x 3 filters


Number of weights (7*7*C)*C = 49 C2 3 x (C x (3 x 3 x C) ) = 27 C2

Less compute, more


nonlinearity (better)

Computer Vision 2023 / 2024 Assiut University | Faculty of computers and information 15
Let’s Code
VGG

Computer Vision 2023 / 2024 Assiut University | Faculty of computers and information 16
Outline

❑ Common Architectures.
▪ AlexNet
▪ VGGNet
▪ GoogleNet
▪ Inception v2, v3, and v4
▪ ResNet
➢ Residual Block
➢ ResNet v1, v2

Computer Vision 2023 / 2024 Assiut University | Faculty of computers and information 17
GoogleNet | 2015

Module

Architecture
Computer Vision 2023 / 2024 Assiut University | Faculty of computers and information 18
GoogleNet | 2015

The Inception Module:


▪ Parallel paths with different receptive field sizes and operations are meant to capture
sparse patterns of correlations in the stack of feature maps.

C. Szegedy et al., Going deeper with convolutions, CVPR 2015


Computer Vision 2023 / 2024 Assiut University | Faculty of computers and information 19
GoogleNet | 2015

▪ Note that: 1 x 1 Convolution only changes the depth of the output feature map.
▪ Remember that: the depth of the output feature map equals the number of filters.
▪ Remember that: the depth of the filter equals the depth of the input.

Inception module

Computer Vision 2023 / 2024 Assiut University | Faculty of computers and information 20
GoogleNet | another version of GoogleNet

The main idea here:


• Use 1x1 convolutions for
dimensionality reduction
before expensive
convolutions.

Inception module

C. Szegedy et al., Going deeper with convolutions, CVPR 2015

Computer Vision 2023 / 2024 Assiut University | Faculty of computers and information 21
GoogleNet | another version of GoogleNet

Inception module Inception module

Conv Ops:
• Computational complexity! [1x1 conv, 64] 28x28x64x1x1x256
Conv Ops: [1x1 conv, 64] 28x28x64x1x1x256
[1x1 conv, 128] 28x28x128x1x1x256 [1x1 conv, 128] 28x28x128x1x1x256
[3x3 conv, 192] 28x28x192x3x3x256 [3x3 conv, 192] 28x28x192x3x3x64
[5x5 conv, 96] 28x28x96x5x5x256 [5x5 conv, 96] 28x28x96x5x5x64
[1x1 conv, 64] 28x28x64x1x1x256
Total: 854M ops
Total: 358M ops
Computer Vision 2023 / 2024 Assiut University | Faculty of computers and information 22
GoogleNet

Inception module
Computer Vision 2023 / 2024 Assiut University | Faculty of computers and information 23
Outline

❑ Common Architectures.
▪ AlexNet
▪ VGGNet
▪ GoogleNet
▪ Inception v2, v3, and v4
▪ ResNet
➢ Residual Block
➢ ResNet v1, v2

Computer Vision 2023 / 2024 Assiut University | Faculty of computers and information 24
Inception v2, v3 (2016)

▪ Regularize training with batch normalization, reducing the importance of auxiliary


classifiers.
▪ More variants of inception modules with aggressive factorization of filters.

V2 V3
C. Szegedy et al., Rethinking the inception architecture for computer vision, CVPR 2016

Computer Vision 2023 / 2024 Assiut University | Faculty of computers and information 25
Inception v4

C. Szegedy et al., Inception-v4, Inception-ResNet and the Impact of Residual Connections on Learning, arXiv 2016

Computer Vision 2023 / 2024 Assiut University | Faculty of computers and information 26
Outline

❑ Common Architectures.
▪ AlexNet
▪ VGGNet
▪ GoogleNet
▪ Inception v2, v3, and v4
▪ ResNet
➢ Residual Block
➢ ResNet v1, v2

Computer Vision 2023 / 2024 Assiut University | Faculty of computers and information 27
Outline

❑ Common Architectures.
▪ AlexNet
▪ VGGNet
▪ GoogleNet
▪ Inception v2, v3, and v4
▪ ResNet
➢ Residual Block
➢ ResNet v1, v2

Computer Vision 2023 / 2024 Assiut University | Faculty of computers and information 28
ResNet | Residual Block

The residual module


▪ 152-layer model for ImageNet
▪ Introduce skip or shortcut connections (existing before in various forms in literature)
▪ Make it easy for network layers to represent the identity mapping
▪ For some reason, need to skip at least two layers

Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun, Deep Residual Learning for Image
Recognition, CVPR 2016 (Best Paper)
Computer Vision 2023 / 2024 Assiut University | Faculty of computers and information 29
ResNet | v1 & v2

Computer Vision 2023 / 2024 Assiut University | Faculty of computers and information 30
Let’s Code
ResNet

Computer Vision 2023 / 2024 Assiut University | Faculty of computers and information 31
Outline

❑ Common Architectures.
▪ AlexNet
▪ VGGNet
▪ GoogleNet
▪ Inception v2, v3, and v4
▪ ResNet
➢ Residual Block
➢ ResNet v1, v2

Computer Vision 2023 / 2024 Assiut University | Faculty of computers and information 32
Thank You

Computer Vision 2023 / 2024 Assiut University | Faculty of computers and information 33

You might also like