Download as pdf or txt
Download as pdf or txt
You are on page 1of 11

1/7/2019 machine learning - Intuitive understanding of 1D, 2D, and 3D Convolutions in Convolutional Neural Networks - Stack Overflow

Intuitive understanding of 1D, 2D, and 3D Convolutions Ask Question

in Convolutional Neural Networks
Stack Overflow

Tags Can anyone please clearly explain the difference between

1D, 2D, and 3D convolutions in CNN (Deep Learning) with
Users examples?
machine-learning deep-learning signal-processing

conv-neural-network convolution
Q&A for work 67
edited Oct 12 '18 at 2:13
Learn More kmario23
16k 4 58 69

asked Mar 19 '17 at 6:20

618 1 8 13

1 Answer

I want to explain with picture from C3D.

In a nutshell, convolutional direction & output shape is

245 important!

By using our site, you acknowledge that you have read and understand our Cookie Policy, Privacy Policy, and our Terms of Service. 1/11
1/7/2019 machine learning - Intuitive understanding of 1D, 2D, and 3D Convolutions in Convolutional Neural Networks - Stack Overflow

↑↑↑↑↑ 1D Convolutions - Basic ↑↑↑↑↑

just 1-direction (time-axis) to calculate conv

input = [W], filter = [k], output = [W]
ex) input = [1,1,1,1,1], filter = [0.25,0.5,0.25], output =
output-shape is 1D array
example) graph smoothing

tf.nn.conv1d code Toy Example

import tensorflow as tf
import numpy as np

sess = tf.Session()

ones_1d = np.ones(5)
weight_1d = np.ones(3)
strides_1d = 1

in_1d = tf.constant(ones_1d, dtype=tf.float32)

filter_1d = tf.constant(weight_1d, dtype=tf.float32)

in_width = int(in_1d.shape[0])
filter_width = int(filter_1d.shape[0])

input_1d = tf.reshape(in_1d, [1, in_width, 1])

kernel_1d = tf.reshape(filter_1d, [filter_width, 1, 1]
output_1d = tf.squeeze(tf.nn.conv1d(input_1d, kernel_1

By using our site, you acknowledge that you have read and understand our Cookie Policy, Privacy Policy, and our Terms of Service. 2/11
1/7/2019 machine learning - Intuitive understanding of 1D, 2D, and 3D Convolutions in Convolutional Neural Networks - Stack Overflow

↑↑↑↑↑ 2D Convolutions - Basic ↑↑↑↑↑

2-direction (x,y) to calculate conv

output-shape is 2D Matrix
input = [W, H], filter = [k,k] output = [W,H]
example) Sobel Egde Fllter

tf.nn.conv2d - Toy Example

ones_2d = np.ones((5,5))
weight_2d = np.ones((3,3))
strides_2d = [1, 1, 1, 1]

in_2d = tf.constant(ones_2d, dtype=tf.float32)

filter_2d = tf.constant(weight_2d, dtype=tf.float32)

in_width = int(in_2d.shape[0])
in_height = int(in_2d.shape[1])

filter_width = int(filter_2d.shape[0])
filter_height = int(filter_2d.shape[1])

input_2d = tf.reshape(in_2d, [1, in_height, in_width

kernel_2d = tf.reshape(filter_2d, [filter_height, filt

output_2d = tf.squeeze(tf.nn.conv2d(input_2d, kernel_2


By using our site, you acknowledge that you have read and understand our Cookie Policy, Privacy Policy, and our Terms of Service. 3/11
1/7/2019 machine learning - Intuitive understanding of 1D, 2D, and 3D Convolutions in Convolutional Neural Networks - Stack Overflow

↑↑↑↑↑ 3D Convolutions - Basic ↑↑↑↑↑

3-direction (x,y,z) to calcuate conv

output-shape is 3D Volume
input = [W,H,L], filter = [k,k,d] output = [W,H,M]
d < L is important! for making volume output
example) C3D

tf.nn.conv3d - Toy Example

ones_3d = np.ones((5,5,5))
weight_3d = np.ones((3,3,3))
strides_3d = [1, 1, 1, 1, 1]

in_3d = tf.constant(ones_3d, dtype=tf.float32)

filter_3d = tf.constant(weight_3d, dtype=tf.float32)

in_width = int(in_3d.shape[0])
in_height = int(in_3d.shape[1])
in_depth = int(in_3d.shape[2])

filter_width = int(filter_3d.shape[0])
filter_height = int(filter_3d.shape[1])
filter_depth = int(filter_3d.shape[2])

input_3d = tf.reshape(in_3d, [1, in_depth, in_height

kernel_3d = tf.reshape(filter_3d, [filter_depth, filte

output_3d = tf.squeeze(tf.nn.conv3d(input_3d, kernel_3


By using our site, you acknowledge that you have read and understand our Cookie Policy, Privacy Policy, and our Terms of Service. 4/11
1/7/2019 machine learning - Intuitive understanding of 1D, 2D, and 3D Convolutions in Convolutional Neural Networks - Stack Overflow

↑↑↑↑↑ 2D Convolutions with 3D input - LeNet, VGG, ...,


Eventhough input is 3D ex) 224x224x3, 112x112x32

output-shape is not 3D Volume, but 2D Matrix
because filter depth = L must be matched with input
channels = L
2-direction (x,y) to calcuate conv! not 3D
input = [W,H,L], filter = [k,k,L] output = [W,H]
output-shape is 2D Matrix
what if we want to train N filters (N is number of filters)
then output shape is (stacked 2D) 3D = 2D x N matrix.

conv2d - LeNet, VGG, ... for 1 filter

in_channels = 32 # 3 for RGB, 32, 64, 128, ...

ones_3d = np.ones((5,5,in_channels)) # input is 3d, in
# filter must have 3d-shpae with in_channels
weight_3d = np.ones((3,3,in_channels))
strides_2d = [1, 1, 1, 1]

in_3d = tf.constant(ones_3d, dtype=tf.float32)

filter_3d = tf.constant(weight_3d, dtype=tf.float32)

in_width = int(in_3d.shape[0])
in_height = int(in_3d.shape[1])

filter_width = int(filter_3d.shape[0])
filter_height = int(filter_3d.shape[1])
By using our site, you acknowledge that you have read and understand our Cookie Policy, Privacy Policy, and our Terms of Service.
input_3d = tf.reshape(in_3d, [1, in_height, in_width 5/11
1/7/2019 machine learning - Intuitive understanding of 1D, 2D, and 3D Convolutions in Convolutional Neural Networks - Stack Overflow
kernel_3d = tf.reshape(filter_3d, [filter_height, filt

output_2d = tf.squeeze(tf.nn.conv2d(input_3d, kernel_3


conv2d - LeNet, VGG, ... for N filters

in_channels = 32 # 3 for RGB, 32, 64, 128, ...

out_channels = 64 # 128, 256, ...
ones_3d = np.ones((5,5,in_channels)) # input is 3d, in
# filter must have 3d-shpae x number of filters = 4D
weight_4d = np.ones((3,3,in_channels, out_channels))
strides_2d = [1, 1, 1, 1]

in_3d = tf.constant(ones_3d, dtype=tf.float32)

filter_4d = tf.constant(weight_4d, dtype=tf.float32)

in_width = int(in_3d.shape[0])
in_height = int(in_3d.shape[1])

filter_width = int(filter_4d.shape[0])
filter_height = int(filter_4d.shape[1])

input_3d = tf.reshape(in_3d, [1, in_height, in_width

kernel_4d = tf.reshape(filter_4d, [filter_height, filt

#output stacked shape is 3D = 2D x N matrix

output_3d = tf.nn.conv2d(input_3d, kernel_4d, strides=

↑↑↑↑↑ Bonus 1x1 conv in CNN - GoogLeNet, ..., ↑↑↑↑↑

1x1 conv is confusing when you think this as 2D

image filter like sobel
By using our site, you acknowledge that you have read and understand our Cookie Policy, Privacy Policy, and our Terms of Service. 6/11
1/7/2019 machine learning - Intuitive understanding of 1D, 2D, and 3D Convolutions in Convolutional Neural Networks - Stack Overflow

for 1x1 conv in CNN, input is 3D shape as above

it calculate depth-wise filtering
input = [W,H,L], filter = [1,1,L] output = [W,H]
output stacked shape is 3D = 2D x N matrix.

tf.nn.conv2d - special case 1x1 conv

in_channels = 32 # 3 for RGB, 32, 64, 128, ...

out_channels = 64 # 128, 256, ...
ones_3d = np.ones((1,1,in_channels)) # input is 3d, in
# filter must have 3d-shpae x number of filters = 4D
weight_4d = np.ones((3,3,in_channels, out_channels))
strides_2d = [1, 1, 1, 1]

in_3d = tf.constant(ones_3d, dtype=tf.float32)

filter_4d = tf.constant(weight_4d, dtype=tf.float32)

in_width = int(in_3d.shape[0])
in_height = int(in_3d.shape[1])

filter_width = int(filter_4d.shape[0])
filter_height = int(filter_4d.shape[1])

input_3d = tf.reshape(in_3d, [1, in_height, in_width

kernel_4d = tf.reshape(filter_4d, [filter_height, filt

#output stacked shape is 3D = 2D x N matrix

output_3d = tf.nn.conv2d(input_3d, kernel_4d, strides=

Animation (2D Conv with 3D-inputs)

By using our site, you acknowledge that you have read and understand our Cookie Policy, Privacy Policy, and our Terms of Service. 7/11
1/7/2019 machine learning - Intuitive understanding of 1D, 2D, and 3D Convolutions in Convolutional Neural Networks - Stack Overflow

- Original Link : LINK

- The author: Martin Görner
- Twitter: @martin_gorner
- Google +:

Bonus 1D Convolutions with 2D input

↑↑↑↑↑ 1D Convolutions with 1D input ↑↑↑↑↑

↑↑↑↑↑ 1D Convolutions with 2D input ↑↑↑↑↑

Eventhough input is 2D ex) 20x14

By using our site, you acknowledge that you have
output-shape read
is not 2Dand understand
, but our Cookie Policy, Privacy Policy, and our Terms of Service.
1D Matrix 8/11
1/7/2019 machine learning - Intuitive understanding of 1D, 2D, and 3D Convolutions in Convolutional Neural Networks - Stack Overflow

because filter height = L must be matched with input

height = L
1-direction (x) to calcuate conv! not 2D
input = [W,L], filter = [k,L] output = [W]
output-shape is 1D Matrix
what if we want to train N filters (N is number of filters)
then output shape is (stacked 1D) 2D = 1D x N matrix.

Bonus C3D

in_channels = 32 # 3, 32, 64, 128, ...

out_channels = 64 # 3, 32, 64, 128, ...
ones_4d = np.ones((5,5,5,in_channels))
weight_5d = np.ones((3,3,3,in_channels,out_channels))
strides_3d = [1, 1, 1, 1, 1]

in_4d = tf.constant(ones_4d, dtype=tf.float32)

filter_5d = tf.constant(weight_5d, dtype=tf.float32)

in_width = int(in_4d.shape[0])
in_height = int(in_4d.shape[1])
in_depth = int(in_4d.shape[2])
filter_width = int(filter_5d.shape[0])
filter_height = int(filter_5d.shape[1])
filter_depth = int(filter_5d.shape[2])

input_4d = tf.reshape(in_4d, [1, in_depth, in_height

kernel_5d = tf.reshape(filter_5d, [filter_depth, filte
in_channels, out_channels])

output_4d = tf.nn.conv3d(input_4d, kernel_5d, strides=



Input & Output in Tensorflow

By using our site, you acknowledge that you have read and understand our Cookie Policy, Privacy Policy, and our Terms of Service. 9/11
1/7/2019 machine learning - Intuitive understanding of 1D, 2D, and 3D Convolutions in Convolutional Neural Networks - Stack Overflow


edited Nov 23 '17 at 4:27

answered Jun 19 '17 at 10:22

2,469 1 4 5

By using our site, you acknowledge that you have read and understand our Cookie Policy, Privacy Policy, and our Terms of Service.
5 Considering your labor and clarity in the explanations, upvotes 10/11
1/7/2019 machine learning - Intuitive understanding of 1D, 2D, and 3D Convolutions in Convolutional Neural Networks - Stack Overflow

of 8 are too less. – user3282777 Sep 19 '17 at 13:21

1 The 2d conv with 3d input is a nice touch. I would suggest an

edit to include 1d conv with 2d input (e.g. a multi-channel
array) and compare the difference thereof with a 2d conv with
2d input. – SumNeuron Nov 12 '17 at 18:24

Thank you for your comment. ^^ I updated! – runhani Nov 23

'17 at 4:28

1 Amazing answer! – Ben Jan 30 '18 at 13:49

Why is the conv direction in 2d ↲. I have seen sources that

claim that the direction is → for row 1 , then → for row
1+stride . Convolution itself is shift invariant, so why does
the direction of convolution matter? – Minh Triet Mar 19 '18 at

By using our site, you acknowledge that you have read and understand our Cookie Policy, Privacy Policy, and our Terms of Service. 11/11

You might also like