Download as pdf or txt
Download as pdf or txt
You are on page 1of 10

SqueezeNet: AlexNet-Level Accuracy with 50x Fewer

Parameters And <0.5MB Model Size

Presented by: Anowarul Kabir


Target and Motivation

What we what?
1. Reduce number of parameters and model size.
2. Maintain same accuracy level with AlexNet.
Benefits
1. Small Models train faster.
2. Less overhead when exporting new models to clients.
3. Feasible FPGA and embedded deployment.

2
SqueezeNet Architectural Design Strategy

1. Replace 3x3 filters with 1x1 filters.


9X fewer parameters require in comparison.
2. Decrease number of input channels to 3x3 filters.
Squeeze layers.
3. Downsample late in the network so that convolution layers have large activation
maps.
Large activation maps may lead higher classification accuracy.

3
Fire Module

● A squeeze convolution layer (1x1


filters)
● An expand layer (mix of 1x1 and 3x3
filters)
●Number of squeeze filters should be
less than number of expand filters.
●Hyperparameters: number of filters in
squeeze and expand layers.

Image Source

4
SqueezeNet Architectural Dimensions

Image Source

5
Some Intuitions

Metaparameters that define number of squeeze and expand filters.


● Number of expand filters in first fire module( 𝑏𝑎𝑠𝑒𝑒)
● After every 𝑓𝑟𝑒𝑞 fire modules, number of expand filters will be increased by 𝑖𝑛𝑐𝑟𝑒
● Ratio of 3x3 filters in expand layer (𝑝𝑐𝑡 3𝑥3 )
𝑛𝑢𝑚𝑏𝑒𝑟𝑜𝑓𝑠𝑞𝑢𝑒𝑒𝑧𝑒𝑓𝑖𝑙𝑡𝑒𝑟𝑠
● Squeeze ratio, 𝑆𝑅 =
𝑛𝑢𝑚𝑏𝑒𝑟𝑜𝑓𝑒𝑥𝑝𝑎𝑛𝑑𝑓𝑖𝑙𝑡𝑒𝑟𝑠
● Number of expand filters is determined by: 𝑒𝑖 = 𝑏𝑎𝑠𝑒𝑒 + 𝑖𝑛𝑐𝑟𝑒 ∗ 𝑖 ÷ 𝑓𝑟𝑒𝑞
● They have following values for each metaparameters.
𝑏𝑎𝑠𝑒𝑒 = 128, 𝑖𝑛𝑐𝑟𝑒 = 128, 𝑝𝑐𝑡 3𝑥3 = 0.5, 𝑓𝑟𝑒𝑞 = 2, 𝑆𝑅 = 0.125

6
Results

Image Source

7
Deep Compression

3 stages of weight compression:


Pruning: Weights below some threshold will be masked out and represented as sparse row
format instead of sparse matrix format.
Quantization: They employed k-means algorithm to compute centroids of the weights.
They also fine-tuned the centroids by calculating gradients of the weights and subtracting
them from corresponding centroids.
Huffman encoding: They utilized of the biased distribution of the weight values and
encoded high-skew weights with smaller number of bits and low-skew weights with larger
number of bits

8
Other Architectures with Bypass

9
10

You might also like