Professional Documents
Culture Documents
ML Report-Image Segmentation
ML Report-Image Segmentation
Submitted by
HIMANSHU-(ENG17CS0093)
KARAN SINGH-(ENG17CS0105)
KASHVI SHAH-(ENG17CS016)
MIHIR TULI-(ENG17CS0127)
MOHIT BHAGWANANI-(ENG17CS0130)
1
DAYANANDA SAGAR UNIVERSITY
School of Engineering, Kudlu Gate, Bangalore-560068
CERTIFICATE
Date:
Signature of the faculty in-charge
Signature of Chairman
Department of Computer Science & Engineering
2
DECLARATION
HIMANSHU - (ENG17CS0093)
KARAN SINGH - (ENG17CS0105)
KASHVI SHAH - (ENG17CS0106)
MIHIR TULI - (ENG17CS0127)
MOHIT BHAGWANANI - (ENG17CS0130)
3
TABLE OF CONTENTS
Contents Page no
Problem Statement 6
Introduction 7
Objective 9
Literature Survey 10
Methodology 12
Requirements 16
Results 17
Conclusion 18
References 19
4
ABSTRACT
Identifying regions in an image and labelling them to different classes is called image
segmentation. Automatic image segmentation has been one of the major research areas, which is
in trend nowadays. Every other day a new model is being discovered to do better image
segmentation for the task of computer vision. Computer vision is making humans life easier by
automating the tasks which humans used to do manually. In this survey we are comparing
various image segmentation techniques and after comparing them with each other we have
explained the merits and demerits of each technique in detail. Detailed analysis of each
methodology is done on the basis of various parameters, which are used to provide a comparison
among different methods discussed in our work. Our focus is on the techniques which can be
optimized and made better than the one which are present before. This survey emphasizes on the
importance of applications of image segmentation techniques and to make them more useful for
the mankind in daily life. It will enable to us to take full benefits of this technology in monitoring
of the time consuming repetitive activities occurring around, as doing such tasks manually can
become cumbersome and also increases the possibility of errors.
5
1. Problem Statement
The goal of this project is to create a model that will be able to perform image segmentation on
an image by using the concepts of Convolution Neural Network. Though the goal is to create a
model which can do the segmentation of an image when passed to the model. The major goal of
the proposed system is understanding Convolutional Neural Network, and applying it to the
image segmentation system.
6
2. Introduction
Machine learning is the most ideal skill of this digital age. As we dissect the process how a
machine learns to classify and obtains the inputs or the raw materials needed for learning the
specifics of the desired task. Features or attributes form the basis of what we feed in the learning
algorithm. In the task of image processing and object identification, machine learning plays a
vital role. There are many techniques available to do such tasks.
Image segmentation is a key topic in image processing and computer vision with applications
such as
● scene understanding
● robotic perception
● video surveillance
● augmented reality
● image compression
We will discuss various methods that can be used to achieve such tasks. In this world of
digitization, images play a very important role in various areas of life including scientific
computing and visual persuasion tasks etc. This pixel labeling task is also called dense prediction.
Suppose in an image there are various objects available like cars, trees, signals, animals. So,
7
image segmentation will classify all the trees as a single class, all animals and signals to their
respective classes. One important thing to consider in image segmentation is that it considers two
objects of the same type as a single class. We can differentiate objects of the same type using
instance segmentation. CNN is used very frequently for segmenting the image in pattern
recognition and object identification.
We have successfully implemented this project and we were able to predict which is the best
suited algorithm for Image segmentation.
8
3. Objective
● Instance segmentation: identifies each instance of each object in an image. It differs from
semantic segmentation in that it doesn’t categorize every pixel.
● For example, If there are three cars in an image, semantic segmentation classifies all the
cars as one instance, while instance segmentation identifies each individual car.
9
4. Literature survey
1. S. Ji, W. Xu, M. Yang and K. Yu, "3D Convolutional Neural Networks for Human
Action Recognition," in IEEE Transactions on Pattern Analysis and Machine
Intelligence, vol. 35, no. 1, pp. 221-231, Jan. 2013.doi: 10.1109/TPAMI.2012.59
“We consider the automated recognition of human actions in surveillance videos. Most current
methods build classifiers based on complex handcrafted features computed from the raw inputs.
Convolutional neural networks (CNNs) are a type of deep model that can act directly on the raw
inputs. However, such models are currently limited to handling 2D inputs. In this paper, we
develop a novel 3D CNN model for action recognition. This model extracts features from both
the spatial and the temporal dimensions by performing 3D convolutions, thereby capturing the
motion information encoded in multiple adjacent frames. The developed model generates
multiple channels of information from the input frames, and the final feature representation
combines information from all channels. To further boost the performance, we propose
regularizing the outputs with high-level features and combining the predictions of a variety of
different models. We apply the developed models to recognize human actions in the real-world
environment of airport surveillance videos, and they achieve superior performance in
comparison to baseline methods.”
From the above paper, we are taking the U-net Architecture to predict the pixel class. It is Fully
Convolutional Network and modified in a way that it yields better segmentation in medical
imaging. Compared to FCN-8, the two main differences are (1) U-net is symmetric and (2) the
skip connections between the downsampling path and the upsampling path apply a concatenation
operator instead of a sum. These skip connections intend to provide local information to the
global information while upsampling. Because of its symmetry, the network has a large number
of feature maps in the upsampling path, which allows it to transfer information. By comparison,
the basic FCN architecture only had a number of classes feature maps in its upsampling path.
10
2. Ahmed Bassiouny , Motaz El-Saban, ―Semantic segmentation as image representation
for scene recognition‖, 2014 IEEE International Conference on Image Processing
(ICIP).
“We introduce a novel approach towards scene recognition using semantic segmentation maps
as image representation. Given a set of images and a list of possible categories for each image,
our goal is to assign a category from that list to each image. Our approach is based on
representing an image by its semantic segmentation map, which is a mapping from each pixel to
a predefined set of labels. Among similar high-level approaches, ours has the capability of not
only representing what semantic labels the scene contains, but also their shapes, sizes and
locations. We also investigate the effect of varying experiment parameters, including varying
labels used, semantic segmentation technique, and semantic training source.”
From the above paper, we are using semantic segmentation as our topic is image segmentation
whose goal is to label each pixel of an image with a corresponding class of what is being
represented. Because we're predicting every pixel in the image, this task is commonly referred to
as dense prediction.
11
5. METHODOLOGY
5.1. Architecture
The model being used here is a modified U-Net. A U-Net consists of an encoder (downsampler)
and decoder (upsampler). In-order to learn robust features, and reduce the number of trainable
parameters, a pretrained model can be used as the encoder. Thus, the encoder for this task will be
a pre trained MobileNetV2 model, whose intermediate outputs will be used, and the decoder.
● For the upsampler we will be using multiple layers of transposed convolutional layers with
batchnorm and dropout applied and relu activation function.
1. Import libraries
2. Download datasets
12
3. Normalize and augment the images in both the dataset
4. Set Batch size = 64 and Buffer size = 1000 and training size
->UP SAMPLING
result = tf.keras.Sequential()
kernel_initializer=initializer,use_bias=False))
13
if norm_type.lower() == 'batchnorm':
result.add(tf.keras.layers.BatchNormalization())
result.add(InstanceNormalization())
if apply_dropout:
result.add(tf.keras.layers.Dropout(0.5))
result.add(tf.keras.layers.ReLU())
return result
->DOWN SAMPLING
result = tf.keras.Sequential()
result.add(
kernel_initializer=initializer, use_bias=False))
if apply_norm:
if norm_type.lower() == 'batchnorm':
result.add(tf.keras.layers.BatchNormalization())
result.add(InstanceNormalization())
result.add(tf.keras.layers.LeakyReLU())
14
return result
->UNET
def unet_model(output_channels):
x = inputs
skips = down_stack(x)
x = skips[-1]
skips = reversed(skips[:-1])
x = up(x)
concat = tf.keras.layers.Concatenate()
x = concat([x, skip])
last = tf.keras.layers.Conv2DTranspose(
output_channels, 3, strides=2,
x = last(x)
15
6. SOFTWARE AND HARDWARE REQUIREMENTS
● Tensorflow (Keras uses TensorFlow in backend and for some image preprocessing)
(version 2.0.0)
16
7. RESULTS
Semantic segmentation classifies all the pixels of an image into meaningful classes of objects.
These classes are “semantically interpretable” and correspond to real-world categories. For
instance, you could isolate all the pixels associated with a dog and color them blue.
17
8. CONCLUSION
In this work we have found that CNN is one the most powerful tools in image segmentation
technique. Detailed analysis of CNN is also done here explaining different layers and workings
of each layer. We have explained all the possible advantages and fields where CNN can be used
in our daily life. As we know CNN technology is at a boost of implementation nowadays in
making human life more and more convenient and less manual. There has already been a lot of
work done in various fields like commutation, medical tasks, crop monitoring, road
transportation, activity detection, product quality monitoring. All these fields have seen a great
improvement after the use of these techniques, so all the work done is itself state of the art
techniques.
FUTURE WORK
There is always room for improvement, innovation or change of existing techniques in any
research field. So, despite the availability of so much quality research work in this field, yet there
is a lot of work to be done in making those automatic monitoring systems more accurate and
reliable. There is more scope in handling the uncertainties where images are of bad quality or the
boundary pixels of the segmented objects are overlapping. Accuracy of such systems need to be
increased to an extent that they can be relied upon to do crucial tasks such as monitoring
unidentified activities in restricted areas such as country borders or ministerial offices, where the
slightest inaccuracy may prove to be disastrous. Various models need to be implemented by
combining two more models into one such system that it can enable a robot or any other
automatic system to do tasks more effectively and accurately. For e.g. a system can be designed
which is able to identify the objects as well as monitor the activities in surroundings at the same
time. This will increase the applicability of such techniques and will make manual tasks much
more automatic as compared to they have ever been.
18
9. REFERENCES:
● [1] S. Ji, W. Xu, M. Yang and K. Yu, "3D Convolutional Neural Networks for Human
Action Recognition," in IEEE Transactions on Pattern Analysis and Machine Intelligence,
vol. 35, no. 1, pp. 221-231, Jan. 2013.doi: 10.1109/TPAMI.2012.59.
● [2] Ahmed Bassiouny, Motaz El-Saban, ―Semantic segmentation as image
representation for scene recognition‖, 2014 IEEE International Conference on Image
Processing (ICIP).
19