Download as pdf
Download as pdf
You are on page 1of 9
Openinapp Gan) signin @U Medium = © sen F write Convolutions: Transposed and Deconvolution ¢€ Mars Xiang « Follow JD ruvisnedin Thestartup - Sminread -Jult8,2020 Convolutional Neural Networks Traditional convolutional neural networks are frequently used in computer vision problems, where a computer must work with an image, and extract features and patterns. The trend gained a lot of popularity after Alexnet won the ImageNet challenge in 2012, and CNNs conquered the fields of facial recognition and object detection. Convolutional neural networks are also used in video processing, although it is a bit trickier, as videos have a temporal dimension. A CNN combined with a sequence model is sometimes incorporated to solve these issues. Architectur The goal of a CNN is to transform the input image into concise abstract representations of the original input. The individual convolutional layers try to find more complex patterns from the previous layer’s observations. The logic is that 10 curved lines would form two elipses, which would make an eye. To do this, each layer uses a kernel, usually a 2x2 or 3x3 matrix, that slides through the previous layer’s output to generate a new output. The word convolve from convolution means to roll or slide. At each position, the dot product between the kernel and the occupied part of the input is stored in the output. In this example, a 3x3 kernal is sliding over a 4x4 input to give a 2x2 output. Allimages are from user vdumoulin on Github. A convolutional layer withthe input on the bottom and the output on the top Upsampling and Downsampling A downsampling convolutional neural attempts to compress the input, while an upsampling one tries to expand the input. Convolutional neural networks are downsampling by nature, as convolution leaves the output with fewer rows and columns as the input. To control the extent in which the input is compressed or expanded, additional upsampling and downsampling techniques are applied. The most common ones are padding, strides, and dilations. + To increase output dimensions, padding is usually used. The edges of the input are filled with 0's, which do not affect the dot product, but gives more space for the kernel to slide. Padding * Strides control how many units the kernel slides at a time. A high stride value can be used to further compress the output. The stride is usually and implicitly set to 1. Strides 2,2) * Dilations can be used to control the output size, but their main purpose is to expand the range of what a kernel can see to capture larger patterns. Ina dilation, the edge pieces of the kernel are pushed further away from the center piece. Dilation of 2 Transposed Convolution ‘The transposed convolutional layer, unlike the convolutional layer, is upsampling in nature. Transposed convolutions are usually used in auto- encoders and GANs, or generally any network that must reconstruct an image. The word transpose means to cause two or more things to switch places with each other, and in the context of convolutional neural networks, this causes the input and the output dimensions to switch. Ina tranposed convolution, instead of the input being larger than the output, the output is larger. An easy way to think of it is to picture the input being padded until the corner kernel can just barely reach the corner of the input. Transposed Convolution Downsampling and Upsampling... In Reverse When downsampling and upsampling techniques are applied to transposed convolutional layers, their effects are reversed. The reason for this is for a network to be able to use convolutional layers to compress the image, then transposed convolutional layers with the exact same downsampling and upsampling techniques to reconstruct the image. + When padding is ‘added’ to the transposed convolutional layer, it seems as if padding is removed from the input, and the resulting output becomes smaller. Without padding, the output is 77, but with padding on both sides, itis SxS. + When strides are used, they instead affect the input, instead of the output. ‘Strides (2,2) increases the output dimension from 3x3 to 5x5. Transposed Convolution vs Deconvolution Deconvolution is a term floating around next to transposed convolutions, and the two are often confused for each other. Many sources use the two interchangeably, and while deconvolutions do exist, they are not very popular in the field of machine learning. A deconvolution is a mathematical operation that reverses the effect of, convolution, Imagine throwing an input through a convolutional layer, and collecting the output. Now throw the output through the deconvolutional layer, and you get back the exact same input. It is the inverse of the multivariate convolutional function. On the other hand, a transposed convolutional layer only reconstructs the spatial dimensions of the input. In theory, this is fine in deep learning, as it can learn its own parameters through gradient descent, however, it does not give the same output as the input. Summary + A convolutional layer extracts features from the layer, and downsamples the input. + Upsampling and downsampling techniques such as padding, strides, and dilation control the size of the output. + A transposed convolutional layer attempts to reconstruct the spatial dimensions of the convolutional layer and reverses the downsampling and upsampling techniques applied to it. + A deconvolution is a mathematical operation that re a convolutional layer. Machine Learning Artificial nteligence Written by Mars Xiang 41 Followers + V ter for The Startup Convolutional Network talk about math and other topics: eCoh8OSYPOSEHPIY eA ‘More from Mars Xiang and The Startup @ Mars xiang io Mestarup Dynamic Programming to Artificial Intelligence: Q-Learning A failure is not always a mistake, it may simply be the best one can do under the. Sminread » Aug2,2020 Qi ad @ Mayasayvanova in Me Startup Make Crazy Money asa Solopreneur What 7-igure solopreneurs do and we don't (but we could), + + Sminread + May ,2024 @ Maya Sayvanova in The Startup Want to Freelance for Google? Google employs 120,000 contractors/freelancers worldwide. + + Aminread + Apr 21,2024 Qa & an a @ Mars Xiang in The Startup Abstractive Text Summarization with NLP RNNs, LSTMs, and Word Embeddings For Text Summarization Bmintead ~ Jun7,2020 Q a on Seeallfrom Mars Xiang) (_ See al from The Startup Recommended from Medium Init weight @ sonar The Backpropagation Have you ever wondered how neural networks learn? Do you know how they adapt their. Yominread » May’. 2008 65a ti Lists Predictive Modeling w/ Python 20stories Bast mh 1203 saves Al Regulation Getories - 458 caves © Aimaverick Understanding U-Net: A Comprehensive Tutorial Introduction 2mintead » Dec25,2023 64 Qi tt Natural Language Processing si storles - G74 saves Practical Guides to Machine Learning tostores 1455 saves © Dhanushkumar Convolution and ReLU Convolution is a mathematical operation that combines two functions to produce a third... Smintead © Now24,2023 8 Q a Input I | @ cose Tuto Most Important Interview Questions of Convolutional Neura. Here we will be discussing different important interview Questions about the Convolutional Y2minread ~ Dee 22,2028 > &2 Q ~ See ore recommendations) @ seroan iis Backpropagation step by step Backpropagation Simin cead + Nov 2003 Hs Q pay B Abhishek Jin All about convolutions, kernels, features in CNN Convolution Kernels: Amintead - Feb13,2024 Hs Q

You might also like