Download as pdf or txt
Download as pdf or txt
You are on page 1of 19

DAYANANDA SAGAR UNIVERSITY

DEPARTMENT OF COMPUTER SCIENCE & ENGINEERING


SCHOOL OF ENGINEERING
DAYANANDA SAGAR UNIVERSITY
KUDLU GATE
BANGALORE - 560068

MINI PROJECT REPORT


ON
“IMAGE SEGMENTATION”

SUBMITTED TO THE 7th SEMESTER MACHINE LEARNING


LABORATORY
BACHELOR OF TECHNOLOGY
IN
COMPUTER SCIENCE & ENGINEERING

Submitted by

HIMANSHU-(ENG17CS0093)
KARAN SINGH-(ENG17CS0105)
KASHVI SHAH-(ENG17CS016)
MIHIR TULI-(ENG17CS0127)
MOHIT BHAGWANANI-(ENG17CS0130)

Under the supervision of


Prof. Reeja S R, Associate Professor

1
DAYANANDA SAGAR UNIVERSITY
School of Engineering, Kudlu Gate, Bangalore-560068

CERTIFICATE

This is to certify that Mr. Himanshu Choudhary, Mr. Karan Singh,


Ms. Kashvi Shah, Mr. Mihir Tuli, Mr. Mohit Bhagwanani bearing
USN ENG17CS0093, ENG17CS0105, ENG17CS0106, ENG17CS0127,
ENG17CS0130 has satisfactorily completed his/her Mini Project as
prescribed by the University for the 7th semester B.Tech.
programme in Computer Science & Engineering during the year
2020-21 at the School of Engineering, Dayananda Sagar
University., Bangalore.

Date:
Signature of the faculty in-charge

Max Marks Marks Obtained

Signature of Chairman
Department of Computer Science & Engineering

2
DECLARATION

We hereby declare that the work presented in this mini


project entitled - “ IMAGE SEGMENTATION “, has been
carried out by us and it has not been submitted for the
award of any degree, diploma or the mini project of any
other college or university.

HIMANSHU - (ENG17CS0093)
KARAN SINGH - (ENG17CS0105)
KASHVI SHAH - (ENG17CS0106)
MIHIR TULI - (ENG17CS0127)
MOHIT BHAGWANANI - (ENG17CS0130)

3
TABLE OF CONTENTS

Contents Page no
Problem Statement 6
Introduction 7
Objective 9
Literature Survey 10
Methodology 12
Requirements 16
Results 17
Conclusion 18
References 19

4
ABSTRACT

Identifying regions in an image and labelling them to different classes is called image
segmentation. Automatic image segmentation has been one of the major research areas, which is
in trend nowadays. Every other day a new model is being discovered to do better image
segmentation for the task of computer vision. Computer vision is making humans life easier by
automating the tasks which humans used to do manually. In this survey we are comparing
various image segmentation techniques and after comparing them with each other we have
explained the merits and demerits of each technique in detail. Detailed analysis of each
methodology is done on the basis of various parameters, which are used to provide a comparison
among different methods discussed in our work. Our focus is on the techniques which can be
optimized and made better than the one which are present before. This survey emphasizes on the
importance of applications of image segmentation techniques and to make them more useful for
the mankind in daily life. It will enable to us to take full benefits of this technology in monitoring
of the time consuming repetitive activities occurring around, as doing such tasks manually can
become cumbersome and also increases the possibility of errors.

5
1. Problem Statement

The goal of this project is to create a model that will be able to perform image segmentation on
an image by using the concepts of Convolution Neural Network. Though the goal is to create a
model which can do the segmentation of an image when passed to the model. The major goal of
the proposed system is understanding Convolutional Neural Network, and applying it to the
image segmentation system.

6
2. Introduction

2.1 Why is your topic important?

Machine learning is the most ideal skill of this digital age. As we dissect the process how a
machine learns to classify and obtains the inputs or the raw materials needed for learning the
specifics of the desired task. Features or attributes form the basis of what we feed in the learning
algorithm. In the task of image processing and object identification, machine learning plays a
vital role. There are many techniques available to do such tasks.

2.2 Where is it used? Applications

Image segmentation is a key topic in image processing and computer vision with applications
such as

● scene understanding

● medical image analysis

● robotic perception

● video surveillance

● augmented reality

● image compression

2.3 What will you talk about?

We will discuss various methods that can be used to achieve such tasks. In this world of
digitization, images play a very important role in various areas of life including scientific
computing and visual persuasion tasks etc. This pixel labeling task is also called dense prediction.
Suppose in an image there are various objects available like cars, trees, signals, animals. So,

7
image segmentation will classify all the trees as a single class, all animals and signals to their
respective classes. One important thing to consider in image segmentation is that it considers two
objects of the same type as a single class. We can differentiate objects of the same type using
instance segmentation. CNN is used very frequently for segmenting the image in pattern
recognition and object identification.

2.4 Overview of the rest of the report

Image segmentation with CNN involves feeding segments of an image as input to a


convolutional neural network, which labels the pixels. CNN cannot process the whole image at
once. It scans the image, looking at a small “filter” of several pixels each time until it has
mapped the entire image.

We have successfully implemented this project and we were able to predict which is the best
suited algorithm for Image segmentation.

The results are as follows:

➢ Accuracy rate is around 93%.

8
3. Objective

● To achieve image segmentation using Convolutional Neural Networks. The objective is to


simplify or change the image into a representation that is more meaningful and easier to
analyze. Image segmentation is typically used to locate objects and boundaries (lines,
curves, etc.) in images. Within the segmentation process itself, there are two levels of
granularity:

● Semantic segmentation: Understanding of an image at the pixel level.

● Instance segmentation: identifies each instance of each object in an image. It differs from
semantic segmentation in that it doesn’t categorize every pixel.

● For example, If there are three cars in an image, semantic segmentation classifies all the
cars as one instance, while instance segmentation identifies each individual car.

9
4. Literature survey

1. S. Ji, W. Xu, M. Yang and K. Yu, "3D Convolutional Neural Networks for Human
Action Recognition," in IEEE Transactions on Pattern Analysis and Machine
Intelligence, vol. 35, no. 1, pp. 221-231, Jan. 2013.doi: 10.1109/TPAMI.2012.59

“We consider the automated recognition of human actions in surveillance videos. Most current
methods build classifiers based on complex handcrafted features computed from the raw inputs.
Convolutional neural networks (CNNs) are a type of deep model that can act directly on the raw
inputs. However, such models are currently limited to handling 2D inputs. In this paper, we
develop a novel 3D CNN model for action recognition. This model extracts features from both
the spatial and the temporal dimensions by performing 3D convolutions, thereby capturing the
motion information encoded in multiple adjacent frames. The developed model generates
multiple channels of information from the input frames, and the final feature representation
combines information from all channels. To further boost the performance, we propose
regularizing the outputs with high-level features and combining the predictions of a variety of
different models. We apply the developed models to recognize human actions in the real-world
environment of airport surveillance videos, and they achieve superior performance in
comparison to baseline methods.”

From the above paper, we are taking the U-net Architecture to predict the pixel class. It is Fully
Convolutional Network and modified in a way that it yields better segmentation in medical
imaging. Compared to FCN-8, the two main differences are (1) U-net is symmetric and (2) the
skip connections between the downsampling path and the upsampling path apply a concatenation
operator instead of a sum. These skip connections intend to provide local information to the
global information while upsampling. Because of its symmetry, the network has a large number
of feature maps in the upsampling path, which allows it to transfer information. By comparison,
the basic FCN architecture only had a number of classes feature maps in its upsampling path.

10
2. Ahmed Bassiouny , Motaz El-Saban, ―Semantic segmentation as image representation
for scene recognition‖, 2014 IEEE International Conference on Image Processing
(ICIP).

“We introduce a novel approach towards scene recognition using semantic segmentation maps
as image representation. Given a set of images and a list of possible categories for each image,
our goal is to assign a category from that list to each image. Our approach is based on
representing an image by its semantic segmentation map, which is a mapping from each pixel to
a predefined set of labels. Among similar high-level approaches, ours has the capability of not
only representing what semantic labels the scene contains, but also their shapes, sizes and
locations. We also investigate the effect of varying experiment parameters, including varying
labels used, semantic segmentation technique, and semantic training source.”

From the above paper, we are using semantic segmentation as our topic is image segmentation
whose goal is to label each pixel of an image with a corresponding class of what is being
represented. Because we're predicting every pixel in the image, this task is commonly referred to
as dense prediction.

11
5. METHODOLOGY

5.1. Architecture

● MobileNetV2 (Unet Architecture Encoder)

The model being used here is a modified U-Net. A U-Net consists of an encoder (downsampler)
and decoder (upsampler). In-order to learn robust features, and reduce the number of trainable
parameters, a pretrained model can be used as the encoder. Thus, the encoder for this task will be
a pre trained MobileNetV2 model, whose intermediate outputs will be used, and the decoder.

● For the upsampler we will be using multiple layers of transposed convolutional layers with
batchnorm and dropout applied and relu activation function.

5.2. Architecture Diagram

5.3. Pseudo Code

1. Import libraries

2. Download datasets

12
3. Normalize and augment the images in both the dataset

a. For image in training images:

Apply normalization ( image/255)

if (tf.random.uniform > 0.5) apply image flip to the image

b. For image in test images:

Apply normalization (image/255)

4. Set Batch size = 64 and Buffer size = 1000 and training size

5. Define the model

-> Down sample ( Mobilenetv2)

->Up sample ( Conv2d )

->Unet model ( Down sample + Up sample )

6. Train the model

7. Plot the training and validation loss

8. Predict with the trained model

5.4. Code snippets

->UP SAMPLING

def upsample(filters, size, norm_type='batchnorm', apply_dropout=False):

initializer = tf.random_normal_initializer(0., 0.02)

result = tf.keras.Sequential()

result.add(tf.keras.layers.Conv2DTranspose(filters, size, strides=2,padding='same',

kernel_initializer=initializer,use_bias=False))

13
if norm_type.lower() == 'batchnorm':

result.add(tf.keras.layers.BatchNormalization())

elif norm_type.lower() == 'instancenorm':

result.add(InstanceNormalization())

if apply_dropout:

result.add(tf.keras.layers.Dropout(0.5))

result.add(tf.keras.layers.ReLU())

return result

->DOWN SAMPLING

def downsample(filters, size, norm_type='batchnorm', apply_norm=True):

initializer = tf.random_normal_initializer(0., 0.02)

result = tf.keras.Sequential()

result.add(

tf.keras.layers.Conv2D(filters, size, strides=2, padding='same',

kernel_initializer=initializer, use_bias=False))

if apply_norm:

if norm_type.lower() == 'batchnorm':

result.add(tf.keras.layers.BatchNormalization())

elif norm_type.lower() == 'instancenorm':

result.add(InstanceNormalization())

result.add(tf.keras.layers.LeakyReLU())

14
return result

->UNET

def unet_model(output_channels):

inputs = tf.keras.layers.Input(shape=[128, 128, 3])

x = inputs

# Downsampling through the model

skips = down_stack(x)

x = skips[-1]

skips = reversed(skips[:-1])

# Upsampling and establishing the skip connections

for up, skip in zip(up_stack, skips):

x = up(x)

concat = tf.keras.layers.Concatenate()

x = concat([x, skip])

# This is the last layer of the model

last = tf.keras.layers.Conv2DTranspose(

output_channels, 3, strides=2,

padding='same') #64x64 -> 128x128

x = last(x)

return tf.keras.Model(inputs=inputs, outputs=x)

15
6. SOFTWARE AND HARDWARE REQUIREMENTS

6.1. Programming Language and Compiler

● Python (3.7.4 used)

● IDE (Jupyter used)


● Numpy (version 1.16.5)
● cv2 (openCV) (version 3.4.2)

● Keras (version 2.3.1)

● Tensorflow (Keras uses TensorFlow in backend and for some image preprocessing)
(version 2.0.0)

● Matplotlib (version 3.1.1)

● Pandas (version 0.25.1)

16
7. RESULTS

Semantic segmentation classifies all the pixels of an image into meaningful classes of objects.
These classes are “semantically interpretable” and correspond to real-world categories. For
instance, you could isolate all the pixels associated with a dog and color them blue.

17
8. CONCLUSION

In this work we have found that CNN is one the most powerful tools in image segmentation
technique. Detailed analysis of CNN is also done here explaining different layers and workings
of each layer. We have explained all the possible advantages and fields where CNN can be used
in our daily life. As we know CNN technology is at a boost of implementation nowadays in
making human life more and more convenient and less manual. There has already been a lot of
work done in various fields like commutation, medical tasks, crop monitoring, road
transportation, activity detection, product quality monitoring. All these fields have seen a great
improvement after the use of these techniques, so all the work done is itself state of the art
techniques.

FUTURE WORK

There is always room for improvement, innovation or change of existing techniques in any
research field. So, despite the availability of so much quality research work in this field, yet there
is a lot of work to be done in making those automatic monitoring systems more accurate and
reliable. There is more scope in handling the uncertainties where images are of bad quality or the
boundary pixels of the segmented objects are overlapping. Accuracy of such systems need to be
increased to an extent that they can be relied upon to do crucial tasks such as monitoring
unidentified activities in restricted areas such as country borders or ministerial offices, where the
slightest inaccuracy may prove to be disastrous. Various models need to be implemented by
combining two more models into one such system that it can enable a robot or any other
automatic system to do tasks more effectively and accurately. For e.g. a system can be designed
which is able to identify the objects as well as monitor the activities in surroundings at the same
time. This will increase the applicability of such techniques and will make manual tasks much
more automatic as compared to they have ever been.

18
9. REFERENCES:

● [1] S. Ji, W. Xu, M. Yang and K. Yu, "3D Convolutional Neural Networks for Human
Action Recognition," in IEEE Transactions on Pattern Analysis and Machine Intelligence,
vol. 35, no. 1, pp. 221-231, Jan. 2013.doi: 10.1109/TPAMI.2012.59.
● [2] Ahmed Bassiouny, Motaz El-Saban, ―Semantic segmentation as image
representation for scene recognition‖, 2014 IEEE International Conference on Image
Processing (ICIP).

19

You might also like