Professional Documents
Culture Documents
Differences Between 1D, 2D, 3D Conv
Differences Between 1D, 2D, 3D Conv
Differences Between 1D, 2D, 3D Conv
Closed. This question does not meet Stack Overflow guidelines. It is not currently accepting answers.
186
This question does not appear to be about programming within the scope defined in the help center.
Closed 2 years ago.
Can anyone please clearly explain the difference between 1D, 2D, and 3D convolutions in
convolutional neural networks (in deep learning) with the use of examples?
Share Follow edited Mar 17, 2020 at 0:20 asked Mar 19, 2017 at 6:20
nbro xlax
15k 30 109 195 2,571 5 12 15
3 I’m voting to close this question because Machine learning (ML) theory questions are off-topic on Stack
Overflow - gift-wrap candidate for Cross-Validated – Daniel F Feb 10, 2021 at 13:52
Sorted by:
4 Answers
Highest score (default)
https://stackoverflow.com/questions/42883547/intuitive-understanding-of-1d-2d-and-3d-convolutions-in-convolutional-neural-n 1/18
4/21/23, 11:20 AM machine learning - Intuitive understanding of 1D, 2D, and 3D convolutions in convolutional neural networks - Stack Overflow
output-shape is 1D array
import tensorflow as tf
import numpy as np
sess = tf.Session()
ones_1d = np.ones(5)
weight_1d = np.ones(3)
strides_1d = 1
in_width = int(in_1d.shape[0])
filter_width = int(filter_1d.shape[0])
https://stackoverflow.com/questions/42883547/intuitive-understanding-of-1d-2d-and-3d-convolutions-in-convolutional-neural-n 2/18
4/21/23, 11:20 AM machine learning - Intuitive understanding of 1D, 2D, and 3D convolutions in convolutional neural networks - Stack Overflow
ones_2d = np.ones((5,5))
weight_2d = np.ones((3,3))
strides_2d = [1, 1, 1, 1]
in_width = int(in_2d.shape[0])
in_height = int(in_2d.shape[1])
filter_width = int(filter_2d.shape[0])
filter_height = int(filter_2d.shape[1])
https://stackoverflow.com/questions/42883547/intuitive-understanding-of-1d-2d-and-3d-convolutions-in-convolutional-neural-n 3/18
4/21/23, 11:20 AM machine learning - Intuitive understanding of 1D, 2D, and 3D convolutions in convolutional neural networks - Stack Overflow
output-shape is 3D Volume
example) C3D
ones_3d = np.ones((5,5,5))
weight_3d = np.ones((3,3,3))
strides_3d = [1, 1, 1, 1, 1]
in_width = int(in_3d.shape[0])
in_height = int(in_3d.shape[1])
in_depth = int(in_3d.shape[2])
filter_width = int(filter_3d.shape[0])
filter_height = int(filter_3d.shape[1])
filter_depth = int(filter_3d.shape[2])
https://stackoverflow.com/questions/42883547/intuitive-understanding-of-1d-2d-and-3d-convolutions-in-convolutional-neural-n 4/18
4/21/23, 11:20 AM machine learning - Intuitive understanding of 1D, 2D, and 3D convolutions in convolutional neural networks - Stack Overflow
output-shape is 2D Matrix
in_width = int(in_3d.shape[0])
in_height = int(in_3d.shape[1])
filter_width = int(filter_3d.shape[0])
filter_height = int(filter_3d.shape[1])
in_width = int(in_3d.shape[0])
in_height = int(in_3d.shape[1])
filter_width = int(filter_4d.shape[0])
filter_height = int(filter_4d.shape[1])
1x1 conv is confusing when you think this as 2D image filter like sobel
in_width = int(in_3d.shape[0])
in_height = int(in_3d.shape[1])
filter_width = int(filter_4d.shape[0])
filter_height = int(filter_4d.shape[1])
Twitter: @martin_gorner
Google +: plus.google.com/+MartinGorne
output-shape is 1D Matrix
Bonus C3D
in_width = int(in_4d.shape[0])
in_height = int(in_4d.shape[1])
in_depth = int(in_4d.shape[2])
filter_width = int(filter_5d.shape[0])
filter_height = int(filter_5d.shape[1])
filter_depth = int(filter_5d.shape[2])
https://stackoverflow.com/questions/42883547/intuitive-understanding-of-1d-2d-and-3d-convolutions-in-convolutional-neural-n 8/18
4/21/23, 11:20 AM machine learning - Intuitive understanding of 1D, 2D, and 3D convolutions in convolutional neural networks - Stack Overflow
input_4d = tf.reshape(in_4d, [1, in_depth, in_height, in_width, in_channels])
kernel_5d = tf.reshape(filter_5d, [filter_depth, filter_height, filter_width,
in_channels, out_channels])
sess.close()
Summary
https://stackoverflow.com/questions/42883547/intuitive-understanding-of-1d-2d-and-3d-convolutions-in-convolutional-neural-n 9/18
4/21/23, 11:20 AM machine learning - Intuitive understanding of 1D, 2D, and 3D convolutions in convolutional neural networks - Stack Overflow
Share Follow edited Aug 17, 2021 at 14:47 answered Jun 19, 2017 at 10:22
rayryeng runhani
102k 22 185 192 5,839 1 8 6
35 Considering your labor and clarity in the explanations, upvotes of 8 are too less. – Ashok K Harnal Sep 19,
2017 at 13:21
4 The 2d conv with 3d input is a nice touch. I would suggest an edit to include 1d conv with 2d input (e.g. a
multi-channel array) and compare the difference thereof with a 2d conv with 2d input. – SumNeuron Nov
12, 2017 at 18:24
Why is the conv direction in 2d ↲. I have seen sources that claim that the direction is → for row 1 , then →
for row 1+stride . Convolution itself is shift invariant, so why does the direction of convolution matter?
– Minh Triet Mar 19, 2018 at 14:11
Thank you for your question. Yes! convolution itself is shift invariant. so for calculation conv direction is not
matter.(You can calculate 2d conv with two big matrix multiplication. caffe framework already did) but for
understanding it's better to explain with conv direction. because 2d conv with 3d input is confusing without
direction. ^^ – runhani Mar 29, 2018 at 10:38
Following the answer from @runhani I am adding a few more details to make the explanation a
bit more clear and will try to explain this a bit more (and of course with exmaples from TF1 and
21 TF2).
Emphasis on applications
Usage of tf.Variable
https://stackoverflow.com/questions/42883547/intuitive-understanding-of-1d-2d-and-3d-convolutions-in-convolutional-neural-n 10/18
4/21/23, 11:20 AM machine learning - Intuitive understanding of 1D, 2D, and 3D convolutions in convolutional neural networks - Stack Overflow
1D Convolution
Here's how you might do 1D convolution using TF 1 and TF 2.
TF1 example
import tensorflow as tf
import numpy as np
TF2 Example
import tensorflow as tf
import numpy as np
inp = np.array([[[0],[1],[2],[3],[4]],[[5],[4],[3],[2],[1]]]).astype(np.float32)
kernel = tf.Variable(tf.initializers.glorot_uniform()([5, 1, 4]), dtype=tf.float32)
out = tf.nn.conv1d(inp, kernel, stride=1, padding='SAME')
print(out)
It's way less work with TF2 as TF2 does not need Session and variable_initializer for example.
So let's understand what this is doing using a signal smoothing example. On the left you got the
original and on the right you got output of a Convolution 1D which has 3 output channels.
https://stackoverflow.com/questions/42883547/intuitive-understanding-of-1d-2d-and-3d-convolutions-in-convolutional-neural-n 11/18
4/21/23, 11:20 AM machine learning - Intuitive understanding of 1D, 2D, and 3D convolutions in convolutional neural networks - Stack Overflow
Multiple channels are basically multiple feature representations of an input. In this example you
have three representations obtained by three different filters. The first channel is the equally-
weighted smoothing filter. The second is a filter that weights the middle of the filter more than
the boundaries. The final filter does the opposite of the second. So you can see how these
different filters bring about different effects.
1D convolution has been successful used for the sentence classification task.
2D Convolution
Off to 2D convolution. If you are a deep learning person, chances that you haven't come across
2D convolution is … well about zero. It is used in CNNs for image classification, object detection,
etc. as well as in NLP problems that involve images (e.g. image caption generation).
Let's try an example, I got a convolution kernel with the following filters here,
Image (black and white) - [batch_size, height, width, 1] (e.g. 1, 340, 371, 1 )
Output (aka feature maps) - [batch_size, height, width, out_channels] (e.g. 1, 340, 371,
3)
TF1 Example,
https://stackoverflow.com/questions/42883547/intuitive-understanding-of-1d-2d-and-3d-convolutions-in-convolutional-neural-n 12/18
4/21/23, 11:20 AM machine learning - Intuitive understanding of 1D, 2D, and 3D convolutions in convolutional neural networks - Stack Overflow
import tensorflow as tf
import numpy as np
from PIL import Image
im = np.array(Image.open(<some image>).convert('L'))#/255.0
kernel_init = np.array(
[
[[[-1, 1.0/9, 0]],[[-1, 1.0/9, -1]],[[-1, 1.0/9, 0]]],
[[[-1, 1.0/9, -1]],[[8, 1.0/9,5]],[[-1, 1.0/9,-1]]],
[[[-1, 1.0/9,0]],[[-1, 1.0/9,-1]],[[-1, 1.0/9, 0]]]
])
TF2 Example
import tensorflow as tf
import numpy as np
from PIL import Image
im = np.array(Image.open(<some image>).convert('L'))#/255.0
x = np.expand_dims(np.expand_dims(im,0),-1)
kernel_init = np.array(
[
[[[-1, 1.0/9, 0]],[[-1, 1.0/9, -1]],[[-1, 1.0/9, 0]]],
[[[-1, 1.0/9, -1]],[[8, 1.0/9,5]],[[-1, 1.0/9,-1]]],
[[[-1, 1.0/9,0]],[[-1, 1.0/9,-1]],[[-1, 1.0/9, 0]]]
])
Here you can see the output produced by above code. The first image is the original and going
clock-wise you have outputs of the 1st filter, 2nd filter and 3 filter.
https://stackoverflow.com/questions/42883547/intuitive-understanding-of-1d-2d-and-3d-convolutions-in-convolutional-neural-n 13/18
4/21/23, 11:20 AM machine learning - Intuitive understanding of 1D, 2D, and 3D convolutions in convolutional neural networks - Stack Overflow
In the context if 2D convolution, it is much easier to understand what these multiple channels
mean. Say you are doing face recognition. You can think of (this is a very unrealistic simplification
but gets the point across) each filter represents an eye, mouth, nose, etc. So that each feature
map would be a binary representation of whether that feature is there in the image you provided.
I don't think I need to stress that for a face recognition model those are very valuable features.
More information in this article.
https://stackoverflow.com/questions/42883547/intuitive-understanding-of-1d-2d-and-3d-convolutions-in-convolutional-neural-n 14/18
4/21/23, 11:20 AM machine learning - Intuitive understanding of 1D, 2D, and 3D convolutions in convolutional neural networks - Stack Overflow
CNNs (Convolution Neural Networks) use 2D convolution operation for almost all computer
vision tasks (e.g. Image classification, object detection, video classification).
3D Convolution
Now it becomes increasingly difficult to illustrate what's going as the number of dimensions
increase. But with good understanding of how 1D and 2D convolution works, it's very straight-
forward to generalize that understanding to 3D convolution. So here goes.
3D data (LIDAR) - [batch size, height, width, depth, in channels] (e.g. 1, 200, 200, 200,
1)
Output - [batch size, width, height, width, depth, out_channels] (e.g. 1, 200, 200,
2000, 3 )
TF1 Example
import tensorflow as tf
import numpy as np
tf.reset_default_graph()
TF2 Example
import tensorflow as tf
import numpy as np
x = np.random.normal(size=(1,200,200,200,1))
kernel = tf.Variable(tf.initializers.glorot_uniform()([5,5,5,1,3]), dtype=tf.float32)
out = tf.nn.conv3d(x, kernel, strides=[1,1,1,1,1], padding='SAME')
https://stackoverflow.com/questions/42883547/intuitive-understanding-of-1d-2d-and-3d-convolutions-in-convolutional-neural-n 15/18
4/21/23, 11:20 AM machine learning - Intuitive understanding of 1D, 2D, and 3D convolutions in convolutional neural networks - Stack Overflow
3D convolution has been used when developing machine learning applications involving LIDAR
(Light Detection and Ranging) data which is 3 dimensional in nature.
If you stride across a corridor, you get there faster in fewer steps. But it also means that you
observed lesser surrounding than if you walked across the room. Let's now reinforce our
understanding with a pretty picture too! Let's understand these via 2D convolution.
Understanding stride
When you use tf.nn.conv2d for example, you need to set it as a vector of 4 elements. There's no
reason to get intimidated by this. It just contain the strides in the following order.
2D Convolution - [batch stride, height stride, width stride, channel stride] . Here,
batch stride and channel stride you just set to one (I've been implementing deep learning
models for 5 years and never had to set them to anything except one). So that leaves you
only with 2 strides to set.
https://stackoverflow.com/questions/42883547/intuitive-understanding-of-1d-2d-and-3d-convolutions-in-convolutional-neural-n 16/18
4/21/23, 11:20 AM machine learning - Intuitive understanding of 1D, 2D, and 3D convolutions in convolutional neural networks - Stack Overflow
3D Convolution - [batch stride, height stride, width stride, depth stride, channel
stride] . Here you worry about height/width/depth strides only.
Understanding padding
Now, you notice that no matter how small your stride is (i.e. 1) there is an unavoidable dimension
reduction happening during convolution (e.g. width is 3 after convolving a 4 unit wide image).
This is undesirable especially when building deep convolution neural networks. This is where
padding comes to the rescue. There are two most commonly used padding types.
Final word: If you are very curious, you might be wondering. We just dropped a bomb on whole
automatic dimension reduction and now talking about having different strides. But the best thing
about stride is that you control when where and how the dimensions get reduced.
https://stackoverflow.com/questions/42883547/intuitive-understanding-of-1d-2d-and-3d-convolutions-in-convolutional-neural-n 17/18
4/21/23, 11:20 AM machine learning - Intuitive understanding of 1D, 2D, and 3D convolutions in convolutional neural networks - Stack Overflow
In summary, In 1D CNN, kernel moves in 1 direction. Input and output data of 1D CNN is 2
dimensional. Mostly used on Time-Series data.
5
In 2D CNN, kernel moves in 2 directions. Input and output data of 2D CNN is 3 dimensional.
Mostly used on Image data.
In 3D CNN, kernel moves in 3 directions. Input and output data of 3D CNN is 4 dimensional.
Mostly used on 3D Image data (MRI, CT Scans).
Maybe important to mention that often times in CNN architectures intermediate layers will have 2D outputs
even if the input is only 1D to begin with. – dmedine Feb 16, 2021 at 5:53
1. CNN 1D,2D, or 3D refers to convolution direction, rather than input or filter dimension.
2. For 1 channel input, CNN2D equals to CNN1D is kernel length = input length. (1 conv
1
direction)
Share Follow edited Jul 17, 2019 at 20:54 answered Jul 15, 2019 at 15:58
Jon Jerry Liu
8,956 9 55 73 39 1 5
https://stackoverflow.com/questions/42883547/intuitive-understanding-of-1d-2d-and-3d-convolutions-in-convolutional-neural-n 18/18