Download as docx, pdf, or txt
Download as docx, pdf, or txt
You are on page 1of 5

GIS SCIENCE JOURNAL ISSN NO : 1869-9391

Image/Video Segmentation using BodyPix model


Gopikrishna Nair1, Zarin Shaikh2, Drishti Singh3, *Angelin A. Florence

Department of Computer
St. John College of Engineering and Management

Abstract: Now-a-days, the entertainment’s center of attraction is in animating a scene,


action scenes and visual effects. There are countless ways to add this in the movie.
While performing this there are some complications or certain things which is to be
eliminate while recording or editing the video. Pictures can be manipulated but when it
comes to videos; things become complicated. With the assistance of BodyPix this
project is attempting to assemble the work by them and certain alterations to create a
website to separate person or background from a preexisting video or a real-time video.

Keywords: BodyPix, Convolutional Neural Network, JavaScript, Machine


Learning, Picture Segmentation

1. INTRODUCTION
Conforming to the entertainment field for decades, the green screen process is applied
mostly in everything whether it be a film or song. No matter where the cinematography
is seen, the green screen process goes in conjunction with filming. It is observed that
during the shooting or behind the scenes (BTS) of any entertainment clip; this process is
applied directly or indirectly. This method includes various arrangements in the
background. High- specified cameras, a bunch of technicians, and a lot of manpower
required to set up the stage during this method [1]. If any of the elements are misled;
then the green screen can lead to a mishap. For example, if the news reporter wears a
green-colored outfit on the set while explaining the weather report or a related thing then
the entire news in the foreground will come over her attire. So, there are many things
which should be considered while doing it. The green screen method auxiliary attaches a
special effect in the clip, but doing so requires a vast prop up of equipment and a variety
of people to function together. . This process seems to be easy by its name but as much
as the scene has a wow factor, the set-up of its green screen is complex; BTS of such
clips are the proof for it. Also, this is done by the one who has knowledge of the same
but our system is trying to avail this to a greater number of people with ease in the
process. Thus, TensorFlow made a pre-trained model with the help of JavaScript and
this system will make it available in a simpler way with the help of the website.

2. RELATED WORK
In 2004, G. Mori, Xiao Feng Ren, A.A. Efros, J. Malik developed Recovering
physical body Configurations: Combining, Segmentation, and Recognition which helped
us to know the structure of an object with a limb and torso detection method using pixel
recognition process. This model can identify a person's body from the given or selected
datasets and also help to get rid of the unwanted background from the image. This model
uses both processes thus far to take care of the accuracy of its model. The input receives
datasets that have various positions of various players. the ultimate output leads to
providing the position of the physical body with/without background removal; here
background removal isn't prioritized.
In 2016, Yi-Hsuan Tsai, Ming-Hsuan Yang, Michael J. Black developed a Video
segmentation via Object flow. Since Video Segmentation and optical flow are quite
difficult thanks to fast-moving objects, deforming shapes and cluttered backgrounds
and flow is

VOLUME 8, ISSUE 3, 2021 PAGE NO: 1


inaccurate. So, they need to introduce their model which contains multiscale,
spatiotemporal which uses optical flow estimation, particularly at object boundaries. the
method of the model also includes an iterative Scheme for video and optical flow. So,
this paper has initially used frame ‘t’ as input and have added frame ‘t+1’ to urge a
motion of an object and that they have also used optical flow when the thing is in motion
the optical flow of the thing is updated and, in any case, this process the motion of the
thing in video segmentation is accurate.
In 2017, Sergi Caelles, Kevis-Kokitsi Maninis, Jordi Pont-Tuset, Laura Leal-Taixe,
Daniel Cremers, Luc Van Gool developed One-Shot Video Object Segmentation which
involved Convolutional Neural Network (CNN) architecture to dodge the matter of
semi- supervised video object segmentation. the primary contribution of the paper is to
adapt the CNN to a specific object instance given one annotated image (hence one-shot).
The second contribution of this paper is that OSVOS processes each frame of a video
independently, obtaining temporal consistency as a by-product instead of because of the
results of an explicitly imposed, expensive constraint. Deep learning approaches often
require an enormous amount of coaching data to unravel a selected problem like
segmenting an object during a video. Quite in contrast, human observers can solve
similar challenges with only one training example. during this paper, the authors
demonstrate that one can reproduce this capacity of one-shot learning during a machine.
In 2018, Eithab Saati Alsoruji and Shikharesh Mujumdar developed A video
segmentation strategy for video processing applications to process video data on Hadoop
clusters, a video file should be first stored in HDFS. HDFS divides the video files into
blocks and stores the blocks across the cluster nodes. Storing video files is tougher than
storing text files. this is often because the video frames are stored within a container
format that has the video data also as other information like the video coding format
(video codec) used. Storing a video file directly in HDFS will destroy the container
format and thus the correlation between the frames if the file size is larger than the
HDFS block. This paper proposes a video segmentation strategy that supports both
frame-series oriented and single- frame oriented video processing. In frame-series
oriented algorithms like BS, dividing an outsized video file requires overlapping some
frames between every two consecutive segments. Those overlapping frames represent
overhead since they're processed twice. The proposed strategy depends on sampling the
overlapping frames to scale back the overhead. In 2019, TensorFlow during a blog state
the BodyPix, the pre-trained model made on the JavaScript language. The blog
explained and gave a deep knowledge about its model;
described various things utilized in it.

3. PROPOSED SYSTEM
A diagram could even be a diagram of a system during which the principal parts or
functions are represented by blocks connected by lines that show the relationships of the
blocks. Block diagrams are intended to clarify overall concepts without worrying for the
tiny print of implementation. In figure 1, the working of the system is diagrammatically
exhibited.
Input Realtime Feed/File: Here, the data that's video or photo is injected for the
further process. Data processing and Encoding: the foremost intention of the pre-
processing step is to figure out the planet of focus within the image. because the input
image may have a specific amount of noise, it is necessary to reduce or remove the
noise. Encoding the contents of a 2-D image during a raw bitmap (raster) format is
usually not economical and will end in very large files. Since raw image representations
usually require an outsized amount of space for storing and proportionally long
transmission times within the case of file uploads/ downloads, most image file formats
employ some kind of compression. Compression methods are often lossy when a
tolerable degree of degradation within the visual quality of the resulting image is
suitable, or lossless when the image is encoded in its full quality.
Feature Extraction: Feature extraction could also be a neighborhood of the
dimensionality reduction process, in which, an initial set of the info is split and reduced
to more manageable groups. So, once you would like to process it'll be easier. the
foremost important characteristic of those large data sets is that they need an outsized
number of variables. These variables require plenty of computing resources to process.
Feature extraction helps to urge the only feature from those big data sets by selecting
and mixing variables into features resulting in an efficient reduction of the amount of
data. These features are easy to process, but still able to describe the actual data set with
accuracy and originality.
Structural Mesh: Structural mesh generation is used for rendering to a monitor and for
physical simulation like finite element analysis or computational fluid dynamics. Meshes
are composed of straightforward cells like triangles, lines, etc. they're formed by
computer algorithms, often with human guidance through a Graphical interface (GUI),
relying on the complexity of the domain and thus the type of mesh desired. The goal is
to form a mesh that accurately captures the input domain geometry, with high-quality
also as well-shaped cells, and without numerous cells to make subsequent calculations
intractable. The mesh should even be fine therein its small elements in areas that are
important for subsequent calculations.
Body Segmentation: In computer vision, image segmentation refers to the technique
of grouping pixels during a picture into semantic areas typically to locate objects and
limits. Person segmentation segments an image into pixels that are a neighborhood of a
private and other people that are not. Under the hood, after a picture is fed through the
model, it gets converted into a two-dimensional image with float values between 0 and 1
at each pixel indicating the probability that the person exists therein pixel. a worth called
the “segmentation threshold” represents the minimum value a pixel’s score must get to
be considered a neighborhood of a private. Using the segmentation threshold, those 0 – 1
float values become binary 0’s or 1’s. Post-processing: to reinforce the segmented
image, further processing could even be required which is performed during this step.
Background/Person Removed: during this step, the required output is displayed.

Input Realtime Data


Feed/Processing
File and Encoding
Feature Extraction

Structural Mesh

Background
/ Person Removed Post Processing
Body Segmentation

Figure 1. System architecture of the proposed system

4. EXPERIMENTAL RESULTS
The system is based on choice based according to the user. Here, the user has to feed the
file to perform a specific task; that is, whether the user wants to change the background
or the user wants to remove the person from a particular background. To get the result as
figure 2, the object should be in motion; the boy in the first canvas should be moving his
limbs to get identified as a moving object and to get the result as in the second canvas.
First Canvas Second Canvas

Figure 2. Implementation of video segmentation: extraction of background

Figure 3. Person’s Movement to distinguish the object from a background

Person’s movement is shown in figure 3 so that system captures the background; these
are the snapshots from a video whose result is displayed in figure 2. In such a way this
system eradicates the object to exhibit the background; vice versa of this process is also
possible. Also, we can alter the background of a pre-existing image with the help of the
same as shown in figure 4.

Figure 4. Background Change in pre-existing picture


5. CONCLUSION AND FUTURE WORK
The system is trying to avail the TensorFlow’s creation more simply and easily to be
used by the increased number of people. The day-to-day changes in the green screen are
quietly notable by the viewers but the team involved in that scene is bulkier. So, the
system may be of further use to them as it will ease their work.
Starting from an already developed thing, it inspired to make a normalized thing that
can be used by everyone. As this is indeed the initial stage of this system, some
additional things can take it to the optimum level and ease everyone.
In the future our system can be refurbished in segmentation of multiple people;
changing background for multiple people and by adding the three-dimensional feature,
the system also may be capable of three-dimensional modelling and scanning.
Introducing a three- dimensional feature can alleviate things from varied areas.

REFERENCES:
6.1. Journal Article
[1] Infocusfilmschool.com ‘Filming with Green Screen: Everything you need to know’,
Julia Courtenay, 2018. [Online]. Available: https://infocusfilmschool.com/filming-
green- screen- guide/#:~:text=Green%20screen%20basically%20lets%20you,the
%20subject%2Factor
%2Fpresenter.&text=This%20lets%20the%20other%20image%20to%20show%20throug
h.

[2] Tensorflow.com ‘BodyPix: Real-time Person Segmentation in the Browser with


TensorFlow.js’, 2019. [Online]. Available:
https://blog.tensorflow.org/2019/11/updated- bodypix-2.html.

6.2. Conference Proceedings


[3] Yi-Hsuan Tsai, Ming-Hsuan Yang, Michael J. Black, ‘Video segmentation via
Object Flow’, 2016, Proceedings of the IEEE conference on computer vision and
pattern recognition (pp. 3899-3908).

[4] G. Mori, Xiao Feng Ren, A.A. Efros, J. Malik, ‘Recovering Human Body
Configurations: Combining, Segmentation and Recognition’, July 19, 2004,
Proceedings of the 2004 IEEE Computer Society Conference on Computer Vision and
Pattern Recognition, CVPR 2004.

[5] Sergi Caelles, Kevis-Kokitsi Maninis, Jordi Pont-Tuset, Laura Leal-Taixe, Daniel
Cremers, Luc Van Gool; ‘One-Shot Video Object Segmentation’,2017, Proceedings of
the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 221-
230

You might also like