Download as pdf or txt
Download as pdf or txt
You are on page 1of 17

DEPARTMENT OF MECHATRONICS ENGINEERING

UNIVERSITY OF ENGINEERING & TECHNOLOGY, PESHAWAR


Digital Image Processing Mte-422, 7th Semester

Assignment No 3

Student Name: Imad Ahmad


Reg No: 20PWMCT0732
Subject: Digital Image Processing

Submitted To:
Dr. Shahzad Anwar

Submission Date:
5th January ,2024
What is Image Segmentation?
Image segmentation, a fundamental concept in computer vision, is dividing a digital picture into several
segments, which are sometimes referred to as image regions or objects. This procedure converts a picture
into a more relevant and easier-to-understand representation. Image segmentation classifies pixels so that
those with the same label have similar properties. This method is useful for finding items and boundaries
inside photos. In medical imaging, for example, segmentation may generate 3D reconstructions from CT
images using geometry reconstruction methods.

Types of Image Segmentation


Semantic Segmentation: This method determines the class to which each pixel belongs. In a picture
containing numerous individuals, for example, all pixels linked with humans will have the same class label,
however background pixels would be classed differently.

Instance Segmentation: Every pixel is associated with a specific instance of the item. It is concerned
with distinguishing different things of interest in a picture. In a photograph with numerous people, for
example, each person would be segmented as a separate item.
Panoptic Segmentation:
Panoptic segmentation, a hybrid of semantic and instance segmentation, determines the class to which each
pixel belongs while discriminating between distinct instances of the same class.

What is Panoptic Segmentation?


The term "panoptic" refers to everything seen in a single perspective. Panoptic segmentation in computer
vision provides a holistic method to segmentation, effortlessly combining the capabilities of instance and
semantic segmentation.
Panoptic segmentation is a complex approach that classifies every pixel in an image based on its class label
while identifying the exact instance of that class to which it belongs. In a picture with many automobiles,
for example, panoptic segmentation would recognize and differentiate each car, generating a unique
instance ID for each.
The complete breadth of this approach distinguishes it from other segmentation jobs. While semantic
segmentation allocates pixels to classes without discriminating between specific instances, instance
segmentation does so. Every pixel in an image processed using panoptic segmentation would have two
associated values: a label indicating its class and an instance number. Pixels that belong to "stuff" regions,
which are harder to quantify (like the sky or pavement), might have an instance number reflecting that
categorization or none at all. In contrast, pixels belonging to "things" (countable objects like cars or people)
would have unique instance IDs.
This enhanced segmentation approach might be used in a variety of industries, including medical
imaging, driverless cars, and digital image processing. Because of its capacity to give a thorough
comprehension of pictures, it is a crucial tool in the ever-changing field of computer vision.

Working Mechanism:
Panoptic segmentation has emerged as a game changer in computer vision. It's a hybrid strategy that
combines the best of semantic and instance segmentation. Whereas semantic segmentation categorizes each
pixel, instance segmentation recognizes specific object instances. Panoptic segmentation, on the other hand,
performs both: it classifies every pixel and provides a unique instance ID to distinct objects.
The Efficient Panoptic Segmentation (EfficientPS) approach is one of the most advanced in panoptic
segmentation. Deep learning and neural networks are used in this method to generate high-quality
segmentation results. EfficientPS is intended to be both computationally efficient and effective in terms of
segmentation quality. It processes input photos and generates segmentation masks using feature pyramid
networks and convolutional layers. The COCO dataset is also used for training and validation, ensuring
that the models are exposed to a wide range of pictures and settings.
The benefit of panoptic segmentation, particularly approaches like EfficientPS, is that it may give a precise,
pixel-level comprehension of pictures. This is extremely useful in real-world applications such as driverless
cars, where recognizing the category (road, pedestrian, vehicle) and the environment is critical.

Key Components of Panoptic Segmentation


Consider a painter who not only recognizes every object in a scene but also colors inside the lines, ensuring
that every detail is highlighted. In the area of computer vision, this is the magic of panoptic segmentation.
Understanding its major components allows us to understand how it successfully delineates and classifies
every pixel in a picture, ensuring coherence and distinction.

Fully Convolutional Network (FCN) and Mask R-CNN


FCNs (Fully Convolutional Networks) have emerged as a critical component in panoptic segmentation.
FCN's strength is its ability to analyze pictures of varied sizes and provide outputs of equivalent sizes. By
categorizing each pixel into a semantic label, this network collects patterns from an infinite number of
things, such as the sky or highways. It is intended to function from pixel to pixel, providing a thorough,
spatially dense forecast.

Fully Convolutional Neural Networks


Mask R-CNN, an extension of the Faster R-CNN, on the other hand, is critical in distinguishing countable
items. While Faster R-CNN is good at recognizing bounding boxes, Mask R-CNN adds a parallel branch
to predict an object mask. This implies that Mask R-CNN recognizes and builds a high-quality
segmentation mask for each identified item. Because of its dual feature, it is an important tool for jobs that
need object identification and pixel-level segmentation, such as detecting and separating individual
automobiles in a traffic scene.

Mask RCNN Architecture


FCN and Mask R-CNN form the backbone of panoptic segmentation, ensuring that every pixel in an
image is accurately classified and, if applicable, associated with a unique instance ID.

Practical Applications of Panoptic Segmentation


Medical Imaging
In medical imaging, panoptic segmentation has achieved considerable advances. By combining the power
of semantic and instance segmentation, panoptic segmentation provides a rich and comprehensive
perspective of medical pictures. This is especially useful in tumor cell identification, as the model detects
the presence of tumor cells while also distinguishing between individual cells. Such accuracy is required
for precise diagnoses, which allows medical practitioners to develop more effective treatment strategies.
Using datasets like COCO and Cityscapes in conjunction with deep learning algorithms guarantees that
segmentation models are trained on high-quality data, which improves their accuracy in medical
diagnostics.

Autonomous Vehicles
Another arena in which panoptic segmentation excels is that of autonomous cars. Understanding the
environment is critical for self-driving automobiles. Panoptic segmentation helps with this by offering a
pixel-level comprehension of the environment. It is critical in estimating distance to object, allowing the
vehicle to make intelligent judgments in real time. Panoptic segmentation provides safer navigation for
autonomous cars by discriminating between countable items (such as pedestrians and other vehicles) and
uncountable objects (such as highways and sky).

Digital Image Processing


Modern smartphone cameras are technological marvels, and panoptic segmentation expands their
possibilities. Portrait mode, bokeh mode, and auto-focus all use picture segmentation to distinguish between
the subject and the backdrop. This enables the generation of high-quality pictures with depth effects. The
combination of semantic and instance segmentation means that the camera can recognize and focus on the
subject while blurring away the backdrop, producing spectacular images.

Implementation of Panoptic Segmentation:


MATLAB Code:
panop_model = load('panoptic_model.mat');
image = imread('my_image.jpg');
panopticSegmentation = semanticseg(image, panop_model);
figure;
subplot(1, 1, 1);
imshow(image);
subplot(1, 1, 2);
imshow(label2rgb(panopticSegmentation));
Code Explanation:
• First Load the pre-trained panoptic segmentation model in MATLAB.
• In second line I used imread function to read the image.
• In third line I used semantics function and passed image and panop model as an arguments to apply panoptic
segmentation.
• And in next lines I used subplot to plot our original image, segmented image and final results.
Input Image:

Figure 1 Image with name as (my_image)

Output:
Input Image:

Output:
Figure 2 Result of Panoptic Segmentation

Water Shed Segmentation:


A watershed is a transformation defined on a grayscale picture in image processing. The term alludes to a
geological watershed, or drainage divide, that divides contiguous drainage basins. The watershed
transformation uses the image as a topographic map, with the brightness of each point reflecting its height,
to discover lines that go along the tops of ridges.
A watershed has several technical meanings. Watershed lines in graphs can be formed on nodes, edges, or
hybrid lines on both nodes and edges. Watersheds can be defined in the continuous domain as well.[1]
Watersheds may also be computed using a variety of techniques. Watershed algorithms are generally
employed in image processing for object segmentation.

Relief of the Gradient magnitude Watershed of the gradient Watershed of the


gradient magnitude image gradient (relief)

Watershed Algorithm
Thresholding:
In the context of the Watershed Algorithm, thresholding plays an important role in identifying
certain parts of the image. After converting the image to grayscale, the algorithm applies
thresholding to the grayscale image to obtain a binary image that helps in segregating the foreground
(objects to be segmented) and the background.
Opening (Erosion followed by Dilation):
In this step, the opening operation, which is an erosion operation followed by a dilation
operation, is performed. The purpose of this step is primarily to remove noise. The erosion operation
removes small white noise in the image, but it also shrinks our objects. Following this with a dilation
operation allows us to retain the size of our objects while keeping the noise out.

Erosion:
This operation erodes away the boundaries of the foreground object. It works by creating a
convolution kernel and passing it over the image. If any of the pixels in the region under the kernel
are black, then the pixel in the middle of the kernel is set to black. This operation is effective at
removing small white noise.

Dilation:
After erosion, dilation is performed, which is essentially the opposite of erosion. It adds pixels
to the boundaries of objects in an image. If any of the pixels in the region under the kernel are white,
then the pixel in the middle of the kernel is set to white.
Dilation for Background Identification:
In this step, the dilation operation is used to identify the background region of the image. The
result of previous step, where noise has been removed, is subjected to dilation. After dilation, a
significant portion around the objects (or the foreground) is expected to be the background region
(since dilation expands the objects). This “sure background” region aids in the subsequent steps of
the Watershed algorithm where we aim to identify distinct segments/objects.

Distance Transformation:
Watershed Algorithm involves applying a distance transform to identify regions that are likely to
be the foreground.
How the watershed algorithm works?
The concept of “flooding” and “dam construction” in the Watershed Algorithm is essentially a
metaphorical way to describe how the algorithm works
Flooding:
The “flooding” process refers to the expansion of each labeled region (the markers) based on
the gradient of the image. In this context, the gradient represents the topographic elevation, with
high-intensity pixel values representing peaks and low-intensity pixel values representing valleys.
The flooding starts from the valleys, or the regions with the lowest intensity values. The flooding
process is carried out in such a way that each pixel in the image is assigned a label. The label it
receives depends on which marker’s “flood” reaches it first. If a pixel is equidistant from multiple
markers, it remains as part of the unknown region for now.
Dam Construction:
As the flooding process continues, the floodwaters from different markers (representing
different regions in the image) will eventually start to meet. When they do, a “dam” is constructed.
In terms of the algorithm, this dam construction corresponds to the creation of boundaries in the
marker image. These boundaries are assigned a special label (usually -1). The dams are constructed
at the locations where the floodwaters from different markers meet, which are typically the areas of
the image where there’s a rapid change in intensity — signifying the boundary between different
regions in the image.

Implementation of Watershed Algorithm:


MATLAB Code:
rgb = imread("pears.png");
I = im2gray(rgb);
imshow(I)
text(732,501,"Image courtesy of Corel(R)","FontSize",7,"HorizontalAlignment","right")
gmag = imgradient(I);
imshow(gmag,[])

title("Gradient Magnitude")
L = watershed(gmag);
Lrgb = label2rgb(L);
imshow(Lrgb)
title("Watershed Transform of Gradient Magnitude")
se = strel("disk",20);
Io = imopen(I,se);
imshow(Io)

title("Opening")
Ie = imerode(I,se);
Iobr = imreconstruct(Ie,I);
imshow(Iobr)

title("Opening-by-Reconstruction")
Ioc = imclose(Io,se);
imshow(Ioc)
title("Opening-Closing")
Iobrd = imdilate(Iobr,se);
Iobrcbr = imreconstruct(imcomplement(Iobrd),imcomplement(Iobr));
Iobrcbr = imcomplement(Iobrcbr);
imshow(Iobrcbr)

title("Opening-Closing by Reconstruction")
fgm = imregionalmax(Iobrcbr);
imshow(fgm)
title("Regional Maxima of Opening-Closing by Reconstruction")
I2 = labeloverlay(I,fgm);
imshow(I2)

title("Regional Maxima Superimposed on Original Image")


se2 = strel(ones(5,5));
fgm2 = imclose(fgm,se2);
fgm3 = imerode(fgm2,se2);
fgm4 = bwareaopen(fgm3,20);
I3 = labeloverlay(I,fgm4);
imshow(I3)
title("Modified Regional Maxima Superimposed on Original Image")
bw = imbinarize(Iobrcbr);
imshow(bw)

title("Thresholded Opening-Closing by Reconstruction")


D = bwdist(bw);
DL = watershed(D);
bgm = DL == 0;
imshow(bgm)
title("Watershed Ridge Lines")
gmag2 = imimposemin(gmag, bgm | fgm4);
L = watershed(gmag2);
labels = imdilate(L==0,ones(3,3)) + 2*bgm + 3*fgm4;
I4 = labeloverlay(I,labels);
imshow(I4)

title("Markers and Object Boundaries Superimposed on Original Image")


Lrgb = label2rgb(L,"jet","w","shuffle");
imshow(Lrgb)
title("Colored Watershed Label Matrix")
imshow(I)
hold on
himage = imshow(Lrgb);
himage.AlphaData = 0.3;
title("Colored Labels Superimposed Transparently on Original Image")

You might also like