Download as docx, pdf, or txt
Download as docx, pdf, or txt
You are on page 1of 11

Design and implementation of a deep learning-based image segmentation algorithm

Jacky Zhu

181404

Yubo Xuan

PR2761 Technical Thesis

Computing Systems Engineering Technology Program

College of the North Atlantic


Design and implementation of a deep learning-based image

segmentation algorithm

Prepared by

Jacky Zhu

181404

PR2761 Technical Thesis

Computing Systems Engineering Technology Program

Jilin University---Lambton College

March 24, 2021

Prepared for

Program Committee

Computing Systems Engineering Technology Program

College of the North Atlantic


Letter of Transmittal

Gui Gu Street 452 JULC 130012 Changchun, Jilin, China

March 24, 2021


Program Committee
College of the North Atlantic
156 Bridge Road
Engineering Technology Centre
P. O. Box 1150
St. John’s, NL Canada A1C6L8

Dear Program Committee:

Here is my report, Design and implementation of a deep learning-based image


segmentation algorithm, which you asked for June 25, 2021.

In this letter, I will complete the design and implementation of a Mask R-CNN (FFM)
based image segmentation model. During this letter, I will divide into three parts to
accomplish my goal.
The first part is the development of computer vision. The second part is the Mask R-
CNN (FFM) model. I will explain the origins of this model and its properties. Finally,
for the last part, I will design my own algorithm for the Mask R-CNN (FFM) model
to identify and track objects.
All in all, in this report, I will spend more attention on the last part, which means I
will keep training my model. Improving the recognition accuracy of the model

Signature (Handwriting)
Signature (Typing) Jacky
Design and implementation of a deep learning-based image

segmentation algorithm

1.0 Introduction

1.1 Purpose

The aim of this report is based on deep learning models. To understand the current

state of research in scene multi-target recognition, detection and segmentation, and to

complete the design and implementation of Mask R-CNN (FFM) model-based image

segmentation.

1.2 Background

In recent years, deep learning techniques have been widely applied to the field of

image segmentation. Image segmentation is an important part of image processing

and machine vision technology for image understanding, and it is an important branch

of AI. Semantic segmentation is the classification of each pixel point in an image to

determine the category of each point, such as people and cars, and thus the region.

Instance segmentation classifies different types of instances, for example labelling 5

cars with 5 different colors. We will see a complex landscape of multiple overlapping

objects and different backgrounds, and we need to not only classify these different

objects, but also determine the boundaries, differences and relationships between the

objects. Currently, image segmentation has been widely used in scenarios such as

autonomous driving and drone landing point determination. In traditional image


processing methods, there are three main key steps. Firstly, image segmentation is

used to extract the parts of interest from the image so that the output image has a

better effect, it can be fully prepared for image analysis and subsequent recognition.

This is why image segmentation is a crucial pre-process for image recognition and

computer vision. There is no correct recognition without correct segmentation. In

human life, the process of observing something begins with the visual observation of

an image and its transmission through the central nervous system to the cerebral

cortex. The target information is eventually recognized through brain analysis. Image

processing is the simulation and analysis of the characteristics of human vision to

implement the visual functions of a machine so that a computer can analyze, detect,

classify, track, segment, recognize and measure an image or video. In the field of

computer vision, convolutional neural network is one of the most widely used and

best performing deep learning models. It is designed to simulate a biological visual

hierarchy. The convolutional layer extracts target features by convolutional

operations, learning to extract them layer by layer from low to high levels, from

simple to complex. The final feature representation of the target is obtained. In

addition, convolutional neural networks use the principles of local perception and

weight sharing to reduce the number of parameters in the deep network in order to

reduce the computational load of the network. It is due to these excellent structural

properties and outstanding performance that convolutional neural networks are the

best choice for solving computer vision tasks today.

1.3 Scope
The research work in this paper focuses on the design of Mask R-CNN (FFM) model-

based image segmentation based on deep learning models for recognition, detection

and segmentation of multiple targets in multiple scenes.

1.4 Methodology

1.41 Convolutional Neural Networks

Convolutional Neural Networks are the fundamental and basic building blocks for

image segmentation. There are three main layers that make up the CNN architecture.

Convolutional layer: This layer helps to abstract the input image as a feature map via

the use of filters and kernels. Pooling layer: This layer helps to down sample feature

maps by summarizing the presence of features in patches of the feature map. Fully

connected layer:  Fully connected layers connect every neuron in one layer to every

neuron in another layer.

1.42 Mask R-CNN

The Faster R-CNN is based on the Faster R-CNN. A simple fully convolutional

network (FCN) output object mask is added to the original two branches

(classification + coordinate regression) as the third branch. The RoIPooling of the

Faster R-CNN is also replaced by RoIAlign.

1.43 ROI Align

The problem of Faster R-CNN is that the feature map is not aligned with the original

image, so it will affect the detection accuracy. Instead of ROI pooling, Mask R-CNN

proposes the method of RoIAlign, which preserves the approximate spatial location.

In Faster RCNN, there are two indigenization processes.

The first time is the xywh of the region proposal which is usually a fractional number
but is integrase for ease of operation. The second time, the integrase boundary region

is divided equally into k * k cells, and the boundary of each cell is integrase.

In fact, after these two integrations, the candidate frame has already deviated from the

initial regression position, and this deviation affects the accuracy of the detection or

segmentation. To solve this problem, the ROI Align method eliminates the integration

operation and retains the fractional numbers, using bilinear interpolation to obtain the

image values at pixel points with floating point coordinates. In practice, however,

instead of simply supplementing the coordinate points on the boundary of the

candidate region and then pooling them, ROI Align is redesigned.

1.5 Resource requirements

Artificial intelligence accelerated computer for training Mask R-CNN models

2.0 Timelines

3.0 Conclusion

First, the current status of current work in the related field is described, the problems

of existing research are summarized, and the research content of this paper is

determined. The design and implementation of a multi-scene multi-target detection


and segmentation technique based on the Mask R-CNN model is carried out.

Subsequently, some basic knowledge about deep learning and neural networks is

briefly introduced to provide the theoretical basis for the subsequent research. To

enable simultaneous multi-target detection and segmentation of scenes, the model

Mask R-CNN (FFM) is constructed. Neural networks operate in a fundamentally

different way to the human mind. We are also able to transfer knowledge from one

domain to another. When we first see a new animal, we can quickly identify some of

the body parts of most animals’ parts of most animals, such as nose, ears, tail, legs,

etc.

Deep neural networks have no such concept, they develop their knowledge of each

class of data individually. at their heart, neural networks are statistical models that

compare batches of pixels, though in very intricate ways. This is why they need to see

many examples before they can develop the necessary foundation to recognize each

object. Accordingly, neural networks can make dangerous mistakes when they are not

properly trained.

Reference

Khandelwal, R. (2019, November 27). Computer Vision: Instance Segmentation with

Mask R-CNN. Retrieved from https://towardsdatascience.com/computer-

vision-instance-segmentation-with-mask-r-cnn-7983502fcad1
Sharma, P. (2020, November 28). Computer Vision Tutorial: Implementing Mask R-

CNN for Image Segmentation (with Python Code). Retrieved from

https://www.analyticsvidhya.com/blog/2019/07/computer-vision-

implementing-mask-r-cnn-image-segmentation/

IBM. (n.d.). What is Computer Vision? Retrieved from

https://www.ibm.com/topics/computer-vision

You might also like