Download as pdf or txt
Download as pdf or txt
You are on page 1of 4

© 2022 IJRAR August 2022, Volume 9, Issue 3 www.ijrar.

org (E-ISSN 2348-1269, P- ISSN 2349-5138)

Video Compression For Surveillance Application


Using Deep Neural Network
Mr. V.M Sarvana Perumal1, Navyashree G2, Mantosh Kumar3, Abhilesh Patil4 and Pratibha Deshak 5
Department of Computer Science Engineering,
Rajarajeshwari College of Engineering, Bangalore, Karnataka, India-560074

Keywords Abstract
Video A fresh comparison analysis on video compression technologies was offered in this study. Video
Compression, streaming programmes are becoming increasingly popular as internet technology and computers
Algorithm, advance at a rapid pace. As a result, uncompressed raw video demands a lot of disc space and
Deep Learning , network bandwidth to store and deliver today. We describe a novel technique to video compression
Motion Estimation, surveillance that improves on the shortcomings of previous approaches by replacing each standard
BitRate Estimation. component with a neural network counterpart. . We describe a novel technique to video
compression surveillance that improves on the shortcomings of previous approaches by replacing
each standard component with a neural network counterpart. It delivers a higher-quality video
stream at a consistent bit rate (compared to previous standards). As a result, you must select the
appropriate video compression technology to fulfil the requirements of your video application.. Our
work is founded on a set of principles common method for reducing the bit rate while minimising
distortions in decoded frames by using In video frames, there is spatial and temporal redundancy.
We use a neural network to develop a video compression in the traditional sense strategy and
encode redundant data with fewer bits. Experiments have revealed that our solution is successful
and surpass traditional MPEG encoding while retaining visual quality at similar bit rates. Although
our approach is geared at surveillance, it can simply be applied to other types of video.

type of neural network that uses deep learning to analyse vast


INTRODUCTION volumes of data. The quantity pre-processing required by a
ConvNet is significantly less than that of other classification
algorithms.
Surveillance cameras are becoming more common in
countries around the world. Approximately in There are
760 million security cameras deployed around the world, OVERVIEW OF THE SYSTEM
with the number expected to climb to 1 billion by 2020 .
Many existing standard for video compression, such as Block-Diagram
MPEG and H.264, which use mathematical techniques to
compress video, are widely used. While they have been
meticulously created and fine-tuned, they are intended to
be utilised in a specific environment. They can only be
general in their surveillance applications, which limits
their potential to be specific.
The process of compressing a video file such that it takes
up less space is known as video compression. It's smaller
than the original file and easier to send over the internet.
It's a compression technique of some sort. It minimises
the size of video files by removing unnecessary and non-
functional data. Encoding of video is the process of
compressing and preparing your video file for playback
in the appropriate formats and specifications. Machine
learning has a subset called "Deep Learning" that uses
algorithms that are influenced by data structures. Neural
network architecture and operation are examples of brain
structure. A convolutional neural network (CNN) is a Fig 01-Block Diagram
IJRAR1CSP053 International Journal of Research and Analytical Reviews (IJRAR) www.ijrar.org 348
© 2022 IJRAR August 2022, Volume 9, Issue 3 www.ijrar.org (E-ISSN 2348-1269, P- ISSN 2349-5138)
Motion Estimation: It is determined the distance between rather than using the MV method. Flow (OF), which should be
the current and previous frames. more precise without requiring the massive amounts of data that
Motion Compensation: Pixels from previous frames are pixel-by-pixel movement encoding demands.
shifted on the basis of motion vector or optical flow obtained
during the motion estimating process to create the frame with Different surveillance video codecs are used by different
motion compensation. manufacturers depending on their demands. Because of the
Residual Compression: It is calculated and compressed the time constraints, The ideal approach would have a minimal
difference between the targeted and compensated for computational cost while producing superior results, given the
movement frames. processing power of most surveillance systems. When it comes
Motion Compression: The motion data gathered during the to JPEG, When JPEG and M-JPEG are utilised in movies,
motion estimate step is compressed and delivered to be blocky artefacts and a large memory footprint result. Despite
encoded. the fact that H.263 and H.261 are used, their effectiveness is
Encoding: After compression, the residue and motion limited. In comparison to MPEG-2 and MPEG-4, which need a
information are encoded and delivered to the decoder. lot of bandwidth, the performance is poor. They result in blocky
Frame Reconstruction : A reconstructed frame is produced artefacts in the output stream when the bitrate is restricted.
by merging the motion-compensated frame from the motion Despite being a cutting-edge video compression technology,
compensation step with the leftovers from the residue H.264, it does have several drawbacks high cost of processing
compression phases. As a result, each system has its own combination of drawbacks
and benefits.
LITERATURE SURVEY
B. INTER PREDICTION WITH NEURAL NETWORK:
1) B.Sathiyasivam,M G Sumithra, P Sreelatha “Survey on
video compression techniques for efficient transmission” Using Neural Networks, namely CNNs, to collect and use
May 2021 matching block properties to help them develop. The most
The research publications about image or video compression typical method of increasing has been inter-prediction Inter-
techniques are surveyed in this work. The video file is made Prediction is a term that refers to when two or more predictions
up of a series of images that are meant to be presented as one are made at the Reduce the Inter-Prediction residual error by
whole. The most efficient technique to increase transmission improving the predicted block. To train a CNN network , a pair
speed and decrease storage space is to use image of previous and future frames are used. When you combine
compression. More bandwidth is necessary for the video them, you get a more accurate forecast frame than just
transmission. High-quality images must be transferred to user averaging. When compared to HEVC, the proposed
devices without loss or lag.This inspires academics to methodology is stated to have achieved in BD-rate savings of
develop suitable compression techniques. up to 10.5 percent and an average of 3.1 percent. Combining
Even if there are numerous compression methods, there will temporal and spatial redundancy is another method to increase
still be a need for quick compression algorithms that generate Inter-Prediction accuracy. The CNN and FC networks' training
photos or videos of acceptable quality while taking up the (Inter-Prediction) block pixels from a previous frame's motion
least amount of space. The goal of video compression is to adjusted block, as well as the environment pixels in a block,
preserve high picture quality while delivering video content were employed for any motion compensation. Basic motion
with a low bit rate. Redundancy can be eliminated in order to compensated block combinations outperformed. Simple
carry compression. High quality video is typically required motion-compensated block pixels were input into a network
for encoding and transmission when using video input layer from the temporal and spatial domains, and the
compression. A high-quality movie, however, takes more network was then trained. A network input layer received
storage and network bandwidth. straightforwardly motion-corrected block pixels, after which the
2) Ms B Nagarthan “Video Compression using DCT network was constructed.
Algorithm”published in the year 2019
C. MODES OF INTRA PREDICTION
Video Compression is very essential in the real time video
transmission. The technique video compression is related to Intra prediction refers to the process of foreseeing a block based
image processing which has many applications. The on blocks that have already been decoded. The macro blocks
compression technique decreases the memory size of the must be decoded in accordance with the raster-scan order, which
media which is to be transmitted. The proposed system uses is left to right. in order of top to bottom. Researchers have been
DCT algorithm in order to increase the compression rate and examining how neural networks might be used to get superior
also to maintain the video quality results. predictions for the previous three years, as they are
excellent at making predictions. Numerous techniques have
DESIGN AND EXPERIMENTS been used to enhance intra-prediction using deep learning. A
thorough MSE computation at the encoder should be avoided as
A. END TO END DEEP LEARNING MODELS: the optimal standard mode for each block. predicted. The
authors train the network to estimate the results and utilise
classification to assess picture blocks. The best mode is
RD is widely used as a loss function in end-to-end video
probably HEVC. neural networks with convolutions and guided
compression systems to optimise the network for the optimal
learning (CNN). There are three available intra-prediction
bitrate/quality trade-off. Few studies have gone down this
modes: DC, Planar, and 33 Angular. Estimating block residual
path, but it's an intriguing way of image compression that
errors from neighbouring blocks and altering values as a result
requires creating optimization parameters that maximise Rate
to improve the (reduced) residual error CNNs were used in the
while minimising Distortion. Another interesting technique
study to forecast residual error between HEVC and H.264. In
involved replacing whole functional blocks with Neural
the study, CNNs are frequently employed to forecast residual
Networks to mimic the normal codec structure. They also use
error between HEVC and H.264.
CNNs to extract distinguishing features from the OF map,

IJRAR1CSP053 International Journal of Research and Analytical Reviews (IJRAR) www.ijrar.org 349
© 2022 IJRAR August 2022, Volume 9, Issue 3 www.ijrar.org (E-ISSN 2348-1269, P- ISSN 2349-5138)
METHODOLOGY USED own receptive field, which is the area where the neuron receives
information. Linked to neighbouring neurons' receptive areas
A. TRANSFORMATION FUNCTIONS to encompass the whole visual field. Only the biological
perception system's neurons are involved. Responds to external
The frame data is converted from the spatial domain using stimuli Each neuron in a CNN analyses data solely in its
the transformations function. This thesis focuses on the most receptive area, which is a small portion of the brain. The
recent and widely discussed innovations in the last five years. receptive field is a small part of the visual field. A smaller
As a result, H.264 and HEVC have been chosen for further
amount of complicated pattern (lines, curves, and so on) appear
investigation. The transform basis functions are obtained
from the DCT in video coding techniques in general and initially in the layers, then in the more intricate pattern (faces,
H.264, HEVC in particular. The frequency domain is where objects, etc.). A CNN can be utilised to give a computer sight.
redundancy can be found, and the spatial domain is where The convolution layer is the most important part of the CNN. It
image component picture elements are transformed using the is in charge of the majority of the system's processing power.
DCT approach. In video compression, the frame is split up
into blocks that range in size from 4*4 to 64*64. a picture- This layer does a dot product of two-dimensional matrices, the
compression technique The image is divided into 8*8 blocks restricted matrix being one of them. area of the receptive field
by JPEG, which then performs a two-dimensional discrete
and the other a set of learnable parameters called a kernel.
cosine transform (DCT) on each of the blocks.
Despite its modest size, the kernel has more depth than a picture.
B.QUANTIZATION FUNCTION If the image has a small size, the kernel height and width will be
small. Although there are three (RGB) channels, the depth will
The inevitability of representing a value as a number with a span all three. A neural network could also include up to 20 or
definite number of decimal places is quantization. The 30 layers. The convolutional layer is a special element of layer.
quantization scale code is divided element-wise by the Several layers of a convolutional neural network is what really
quantization matrix in video coding, and each resultant give it its power.
element is rounded. The step size for associating the
transformed coefficients with a finite number of steps is
determined by the quantization parameter.When it comes to
video coding, the value is inversely proportional to the PSNR RESULT AND DISCUSSION
value and directly proportional to the CR. In both cases, the
DC value is the value with zero frequency. Inverse In our dataset, the average PSNR and MS-SSIM values were
quantization formulas are applied to a converted signal, and 74 dB and 0.98, respectivelyWhen the frames are similar to
equations (3) and (4) are the quantization equations. the frames in the dataset, our technique outperforms standard
Equations (5) and (6), on the other hand, are the quantization MPEG in terms of MS-SSIM and PSNR and is comparable to
formulas for intra and inter-coding with AC values the non- standard H.264 in terms of MS-SSIM.
zero frequencies. The inverse quantization formula used on
the Standard is equation (7). Despite being trained on MS-SSIM, our model performs
better in terms of PSNR than the typical MPEG model. But
more crucially, We see that at the lower bitrate, the
reconstructed frames' visual quality has increased.
Due to its smaller size and fewer parameters, our model runs
quicker and uses less memory. Due to its exact correspondence
to the conventional video compression technique, our proposed
framework can serve as a reference model for deep learning-
based video compression.
Fig. Comparision betn our model

C.ENTROPY CODING

Entropy encoding is a lossless data compression approach that


is unaffected by the medium's particular features. Each unique
symbol in the input is given a unique prefix code through
entropy coding.

D.CONVOLUTIONAL NEURAL NETWORK


Fig. Comparision Betn our
Neural networks with convolutional layers are referred as And MPEG based on MS-SSIM model and MPEG based on PSNR
convolutional neural networks (CNNs). networks that have
been taught to handle grid-like features Photographs, for
example, are a good example of structure. As an example, For
instance, consider a binary representation of visual data. It's
crafted up is made up of a grid of pixels with different pixel
values. Each one's brightness and colour should be indicated.
When we see a picture, our brain goes through a series of
steps. The volume of data is enormous. Each neuron has its

IJRAR1CSP053 International Journal of Research and Analytical Reviews (IJRAR) www.ijrar.org 350
© 2022 IJRAR August 2022, Volume 9, Issue 3 www.ijrar.org (E-ISSN 2348-1269, P- ISSN 2349-5138)
1811.06981.
[8] Farnebäck, G. (2003). Two-frame motion estimation based
on polynomial expansion, In Scandinavian conference on
image analysis. Springer.

CONCLUSION

In the past, video was stored on magnetic cassettes. The


discrete in video has been around for a long time. Since the
beginning compression . The cosine transform is a common
tool. H.261, the first practical video computer standard, was
born out of a series of projects. H.264 is a video compression
standard created by a number of major participants in the
video computer industry. It is today's most widely
acknowledged and used video computing standard.
The process of reducing the file size and modifying the
format of a video is known as video compression. It can save
money by reducing the amount of storage space needed to
store video. It also minimises the amount of bandwidth
needed to send video, making media consumption more
enjoyable for users. It eliminates frames that are duplicated
or repeated, leaving only the ones that are required. Assume
there are two frameworks that are quite similar. Compression
removes unnecessary data from one frame and exchanges it
with an instance of the other. The purpose of video
compression is to send video data at a low bitrate while
maintaining image quality.

ACKNOWLEDGEMENT

We seem to be pleased to our Project Manager and Program


Counselor for their unwavering support, as well as our
college'sencouragement throughout the project.

REFERENCES

[1] Le Gall, D. (1991). Mpeg: A video compression standard


for multimedia applications. Communications of the ACM,
34(4), 46–58.
[2] Wiegand, T., Sullivan, G. J., Bjontegaard, G., & Luthra,
A. (2003). Overview of the h. 264/avc video coding standard.
IEEE Transactions on circuits and systems for video
technology, 13(7), 560–576
[3] Liu, H., Chen, T., Shen, Q., Yue, T., & Ma, Z. (2018).
Deep image compression via end-to-end learning., In Cvpr
workshops.
[4] Toderici, G., Vincent, D., Johnston, N., Jin Hwang, S.,
Minnen, D., Shor, J., & Covell, M. (2017). Full resolution
image compression with recurrent neural networks, In
Proceedings of the ieee conference on computer vision and
pattern recognition
[5] Kim, S., Park, J. S., Bampis, C. G., Lee, J., Markey, M.
K., Dimakis, A. G., & Bovik, A. C. (2020). Adver-sarial
video compression guided by soft edge detection, In Icassp
2020-2020 ieee international conference on acoustics, speech
and signal processing (icassp). IEEE
[6] Lu, G., Ouyang, W., Xu, D., Zhang, X., Cai, C., & Gao,
Z. (2019). Dvc: An end-to-end deep video compression
framework, In Proceedings of the ieee conference on
computer vision and pattern recognition.
[7] Rippel, O., Nair, S., Lew, C., Branson, S., Anderson, A.
G., & Bourdev, L. (2018). Learned video compression, arXiv
IJRAR1CSP053 International Journal of Research and Analytical Reviews (IJRAR) www.ijrar.org 351

You might also like