Project - Video Compression - Scribd

You might also like

Download as pdf or txt
Download as pdf or txt
You are on page 1of 12

Video Compression

Suhailie Binti Daud


Universiti Teknikal Malaysia Melaka (UTeM)

INTRODUCTION TO VIDEO COMPRESSION

Most data generated in the world today is videos. With the increasing of video content,
nowadays, 70% of Internet traffic is used for video-based applications, including live
streaming like (YouTube and Twitch), low-latency real time online communication, and
video-on-demand platform (Zoom and Webex). The main task given to compression
techniques is to minimize the number of bits required for code on data or information
provided, further minimizing the memory required to store the given data. Graceful
degradation is a quality-of-service term used to explain that as bandwidth drops or
transmission error occurs, the user experience becomes degraded and tries to be
meaningful. Traditional data compression algorithms use handcrafted encoder–decoder
pairs called “codecs”. Video compression standards such as H.264/AVC, H.265/HEVC,
and the incoming H.266/VVC get involved in those artifacts by assigning large
quantization parameter and coding unit size.

Meanwhile, video resolution and fidelity have also made huge steps forward (e.g., 4k, 8k,
gigapixel, high dynamic range, bit-depth) that make the dire situation worse. Therefore,
many efforts have been invested in improving the video compression algorithms, whose
task is to reduce the video size while keeping an acceptable visual quality reflected in the
human visual system. Moreover, compression is necessary for much real-time and
complex applications such as space, live time-series data, and medical imaging, which
require exact recoveries of original images. Many human efforts are spent analysing the
details of these new data formats and providing efficient compression methods.
LITERATURE REVIEW
• A survey on video compression fast block matching algorithms
Reducing the amount of data required to represent digital video while maintaining
video quality is known as video compression. It consists of multimedia
transmission, video phone, teleconferencing, high definition television, CD-ROM
storage, etc. Compression is a technique used to remove redundant information in
a video sequence. According to (Richardson, 2010), digital video consists of four
types of redundancy, namely color, space, temporal and statistical. Redundant
segregation is processed separately based on feature differences.
As is known, video coding consists of two systems, namely video encoder and
video decoder. The video encoder consists of three main functional units, which
are color subsamples, temporal models (encoders between frames) or spatial
models (encoders within frames) and entropy encoders. The goal of the encoder
is to condense the large amount of information required to display a video frame
to achieve a high compression ratio as shown:

Compression Ratio = Original Video Size (Bytes)


Compressed Video Size (Bytes)

Coder classification can be divided into two approaches namely lossy and lossless.
Lossless is defined as bit preservation or a reversible method used for statistical
redundancy compression. The ratio of this compression method is low which is
about 3:1 or 4:1 in the best case, but the reconstructed data is the same as the
original data. While the lossy technique has a high compression ratio of 50:1 to
200:1 or more than usual. However, the reconstructed data is not the same as the
original data and causes loss of information (H.Mukhtar, A.Al-Dweik, M .Al-Mualla,
2016).
However, the use of block matching algorithms can provide an overview of
solutions based on competitiveness, computational complexity, processing time
and distortion levels. The block matching algorithm can be observed and
emphasized based on a summary of video compression techniques as shown in
table 1..

Ref. Technique Details


D.Jha, F.Kannampuz ha, Wavelet based • Hybrid video
J.Joseph (2013) compression
algorithm.
• Adaptive motion
compensation
scheme is used.
• Spatial orientation
tree modified zero
tree algorithms are
also used.
Fabrizio, S.Dubuisson, Particle swarm • PSO approach
D.Bereziat (2012) optimization used to achieve
high accuracy in
block matching.
J. Cai and D.Pan (2012) Block matching using • Develop block
DCT & DWT matching algorithm
with DCT & DWT
Thomas and Varier Wavelet domain • Develop motion
(2012) estimation and
compensation in
the wavelet
domain
Table 1: Summary of video compression.
• Deep Learning Approaches for Video Compression: A Bibliometric Analysis
What is Compression?
Reversible coding conversion of data containing redundant bits is known as
compression. It works and allows data storage and transmission to be done more
efficiently. Decompression is an inverse process. While software and hardware
that can encode and decode are called decoders. The combination of the two can
form a codec and it should not be confused with the term data container or
compression algorithm.

Figure 1: Relation between codec, data containers and compression algorithms.

i. LOSSLESS AND LOSSY

.
Type of
Compression

Lossy Lossless Near Lossless

Decorrelation Entropy Coding

FIGURE 2: Types of compression


Loss of information is a major damage that can happen specially to text or files.
Therefore, the implementation of data recovery is 100% used through the lossless
method. The use of statistical information is often used in compression algorithms
to reduce redundancy. Based on (Huffman-Coding and Run Length Encoding,
1951) are two popular examples that allow high compression ratios depending on
the data. Using lossy compression does not allow exact recovery of the original
data. Nevertheless, it can be used for data, which is not very sensitive to loss, and
which contains a lot of redundancy, such as images, videos, or sounds. Lossy
compression allows higher compression ratios than lossless compression.

Choudhary, S.M.; Patel, U.S.; Parmar, 2015 states that the compression process
is complex and needs to be completed with two Steps, namely decollation and
entropy. Decorrelation removes redundancy between pixels with decorrelation
techniques such as long-term coding, prediction techniques, transformation
techniques or SCAN language-based methodologies. Entropy coding eliminates
coding redundancy. Entropy means how many average bits are required to
represent a symbol. In coding, for frequently used symbols, fewer bits (values less
than entropy) are assigned, and more bits (values greater than entropy) are
assigned to infrequently used symbols. This led to the formation of VLC (Variable
Length Code).

Therefore, identifying intelligent models that can control data in various real-time
applications is very important. For example, in the case of a video call, it may
require seeing a person or people then other objects in the frame. Another example
involves a tennis match. In a tennis match, it is more important to maintain the
quality of the players and the court than the distinguishing features of the audience.
Based on the example, it is clearly proven that the use of machine learning
approaches can meet expectations. Therefore, there are so many algorithms for
machine learning to perform functions such as regression, classification, grouping,
decision trees, extrapolation and more. Machine learning trains algorithms to
extract data information to perform data-dependent tasks. While designing such
algorithms, various machine learning approaches such as Supervised Learning,
Unsupervised Learning, Reinforcement Learning, etc., can be used (Van Eck, N.J.;
Waltman, 2010).
• Joint video compression and encryption using parallel compressive sensing
and improved chaotic maps.

According to F. Liu, H. Koenig, 2010, the development of video encryption often


occurs in the field of research where it is important in secure communication
scenarios. Nevertheless, the use of chaos maps for video encryption has received
much attention (H. Elkamchouchi, W.M. Salama, Y. Abouelseoud, 2015). Chaos
maps are famous for generating pseudorandom sequences because they are
highly unpredictable, sensitive to initial conditions and control parameters, and
ergodic in nature. However, some common melee maps such as logistics, Tents
and Sinus have lower melee ranges. So, it is necessary to increase the chaos
range for better security. Another common problem for video encryption methods
is that the size of the input video and the encrypted video are the same.

R. Ranjithkumar, D. Ganeshkumar, A. Suresh, K. Manigandan, 2019 Encrypted


video frames only exploiting spatial redundancy using techniques such as chaos
maps is a method of encrypting video at the frame level. While X. Zhang, S.H. Seo,
C. Wang, 2018 also stated that cellular automata and quantum walking are also
categorized as frame-by-frame video encryption techniques (A.A. Abd El-Latif, B.
Abd-El-Atty, W. Mazurczyk, C. Fung, 2020). There are divided into three
subsections. In the first subsection, it present the theory of parallel compressive
sensing and show how it is being applied to the method. Second subsection
Permutation of DCT coefficients, generation of initial states and controlling
parameters, generation of chaotic sequence, and measurement matrix are
discussed next. The steps of method are described in the last subsection with the
help of three algorithms. The complete architecture of the proposed combined
video compression and encryption scheme.

To recover original video frames, first decryption is performed, followed by inverse


quantification and decompression using the recovery technique. In decryption,
inverse permutation is performed first, followed by inverse substitution. Then we
apply the inverse of quantification process to bring back the original DCT
coefficients for the reference frame. We retrieve the sparse data for both reference
and non-reference frames during decompression using the l1-magic package (E.
Candes, J. Romberg, 2005). Then, for the reference frame, we do the inverse
operation of the permutation so that the DCT coefficients will take their original
position. Then 2D IDCT is applied to sparse data to get reference frame. Similarly,
after recovering sparse data for non-reference frames, addition is performed
between two consecutive frames to recover non-reference frames. The proposed
method considers new permutations before applying parallel compressive sensing
using an improved chaotic map. The new permutation enhances the average
quality of the recovered video. Proposed substitution method and data dependent
chaotic sequence generation which is used for shuffling make video frames more
secure. Experimental results and comparisons with state-of-the-art methods
clearly demonstrate the efficacy of our solution. Use different video codecs (R. Li,
H. Liu, R. Xue, Y. Li, 2015) considering motion part in the future for joint
compression and encryption strategy. Another direction of future research will be
to explore different joint reconstruction algorithms for better recovery of video.
• Video Compression: Challenges and Opportunities
The rapid development in the field of digital video communication is in line with the
progress experienced in video coding techniques. It leads to a high number of
video applications, such as High-Definition Television (HDTV), video conferencing
and real-time video transmission via multimedia. Therefore, the increasing demand
for video increases significantly, its storage and manipulation in raw form is very
expensive and it increases the transmission time significantly and makes storage
expensive (Khalifa and Dlay, 1998). Jeremiah, 2004; Sullivan and Wiegand, 2005;
White Paper, 2008 states that usage of up to 165 Mbps can be used when analog
sequences are digitized. The use of very high bandwidth is required if the transfer
of video data without compression is done (Khalifa, 2003). To avoid this problem,
a series of techniques called video compression techniques have been obtained
to reduce the number of bits required to represent digital Video data while
maintaining fidelity or Video quality is acceptable.

Data compression is possible because images are extremely data intensive and
contain a large amount of redundancy which can be removed by accomplishing
some kind of transform, with a reversible linear phase to de-correlate the image
data pixels (Khalifa and Dlay, 1998). To understand the video formats, the
characteristics of the video and how these characteristics are used in defining the
format need to be understood. Video is a sequence of images which are displayed
in order. Each of these images is called a frame. Since, we cannot notice small
changes in the frames like a slight difference of color, video compression standards
do not encode all the details in the video; some of the details are actually lost
(Abomhara et al., 2010). This is called lossy compression. It is possible to get very
high compression ratios when lossy compression is used. Whereas there are some
compressions techniques are reversible or non-destructive compression (Haseeb
and Khalifa, 2006). It is guaranteed that the decompression image is identical to
the original image. This is an important requirement for some applications where’
high quality is demanded. This called lossless compression (Khalifa and Dlay,
1998, 1999). Typically, 30 frames are displayed on the screen every second. There
will be lots of information repeated in the consecutive frames. If a tree is displayed
for one second, then 30 frames are used for that tree. This information can be used
in the compression and the frames can be defined based upon previous frames.
Frames can be compressed using only the information in that frame (intraframe)
or using information in other frames as well (intraframe).

When used to convey multimedia transmissions, video streams contain a huge


amount of data that requires a large bandwidth and subsequent storage space. As
a result of the huge bandwidth and storage requirements, digital video is
compressed to reduce its storage or transmitting capacity. This technology (video
compression) reduces redundancies in spatial and temporal directions. Spatial
reduction physically reduces the size of the video data by selectively discarding up
to a fourth or more of unneeded parts of the original data in a frame. Temporal
reduction, Inter-frame delta compression or motion compression, significantly
reduces the amount of data needed to store a video frame by encoding only the
pixels that change between consecutive frames in a sequence. Several important
standards like Moving Picture Experts Group (MPEG) standard, H.261, 263 and
264 standards are the most commonly used techniques for video compression.

The first video compression standard to gain widespread acceptance was the
H.261 standard. The H.261 and 263 standards are suitable for carrying video over
ISDN. They are used for video delivery over low bandwidths (Marcel et al., 1997).
The MPEG standards provide a range of compression formats that are suitable for
applications that require higher bit rates. The MPEG-1 provides compression for
standard VHS quality video compression. The MPEG-2 meets the requirements of
applications with bit rates up to 100 Mbps and can easily cater for digital television
broadcasting applications. Video compression is gaining popularity since storage
and network bandwidth requirements can be reduced with compression. Many
algorithms for video compression which are designed with a different target in mind
have been proposed.
CONCLUSION

An overview of the updated methods of learning-based video compression


research, the conventional-learning-based modules have successfully been
implemented and can improve the compression rate in most of the existing
conventional codecs, including H.264/AVC, H.265/HEVC, and H.266/VVC.
Whereas the learning based end-to-end approach has achieved comparable
distortion efficiency while outperforming and perceptual quality compares to
H.265/HEVC under some specific settings. However, there is no learning-based
end-to-end method that can reach the performance on the distortion evaluation
performance on high bitrate compression. The advantages of learning-based video
compression are mainly four-folds. First, since the learning-based model is
content-adaptive to the huge amount of training data, it easy to surpass the
handcrafted designed module on specific tasks. Second, the difference from the
conventional codec, the learning based models usually explore the large receptive
field in both spatial and temporal domains, therefore provides a more accurate
prediction or latent distribution. This manner also helps the codec to avoid the
blocking artifact and become flexible in temporal exploration. Third, the direct
linkable ability allows the learning-based modules to perform the global
optimization that is the potential factor for further improvement on the R-D trade-
offend specific human vision task. Finally, the flexibility of the learning-based
method allows them to quickly inherit the newest technology, extend the design
and transfer the knowledge easily. In the trade-offer the compression ratio
performance, current learning based video compression methods are facing many
obstacles that are required to be further investigated: • Complexity and memory
requirement. One of the major limitations of the learning-based approach
compared to the conventional one is the enormous burden of computation and
memory requirements. The current learning-based model requires too much.
REFERENCES
[1] D. Jha , F. Kannampuzha , J. Joseph , S. Possa : Motion estimation algorithms
for baseline profile of H.264 video codec, Int. J. Eng. Trends Technol. IJETT 4
(4) (2013) 727–731.

[2] Fabrizio , S. Dubuisson , D. Bereziat , Motion compensation based on tangent


distance prediction for video compression, J. Signal Process. Image Commun.
27 (2) (2012) 153–171 .

[3] J. Cai , D. Pan , On fast and accurate block-based motion estimation algorithms
using particle swarm optimization, Int. J. Inf. Sci. 197 (2012) 53–64 .

[4] D.M. Thomas , S. Varier , A novel based approach for finding motion estimation
in video compression, Int. J. Adv. Res. Comput. Commun. Eng. 1 (8) (2012)
514–520 .

[5] H. Mukhtar , A. Al-Dweik , M. Al-Mualla( 2016): Content-aware and occupancy-


based hybrid ARQ for video transmission, Abu Dhabi, United Arab Emirates,

[6] Richardson, John Wiley & Sons Inc. (2010): The H.264 Advanced Video
Compression Standard, second ed.

[7] HUFFMAN, D. A. (1951). A method for the construction of minimum redundancy


codes. In the Proceedings of the Institute of Radio Engineers 40, pp. 1098-
1101.

[8] van Eck, N.J. and Waltman, L. (2009). Software survey: VOS viewer, a computer
program for bibliometric mapping. Scient metrics, 84(2), pp.523–538.
doi:10.1007/s11192-009-0146-3. I.E.G. Richardson , The H.264 Advanced Video
Compression Standard, second ed., John Wiley & Sons Inc., UK, 2010

[9] Choudhary, S.M.; Patel, A.S.; Parmar, (2015) .S.J. Study of LZ77 and LZ78
Data Compression Techniques. Int. J. Eng. Sci. Innov. Technology.

[10] F. Liu, H. Koenig, A survey of video encryption algorithms, Comput. Secur.


29(1) (2010) 3–15

[11] H. Elkamchouchi, W.M. Salama, Y. Abouelseoud, New video encryption schemes


based on chaotic maps, IET Image Process. 14(2) (2019) 397–406.Choudhary,
S.M.; Patel, A.S.; Parmar, (2015) .S.J. Study of LZ77 and LZ78 Data
Compression Techniques. Int. J. Eng. Sci. Innov. Technology.

[12] R. Ranjithkumar, D. Ganeshkumar, A. Suresh, K. Manigandan, A new one round


video encryption scheme based on 1D chaotic maps, in: 5th International
Conference on Advanced Computing Communication Systems (ICACCS), 2019,
pp.439–444..

[13] X. Zhang, S.-H. Seo, C. Wang, A lightweight encryption method for privacy
protection in surveillance videos, IEEE Access 6 (2018) 18074–18087

[14] J. Sethi, J. Bhaumik, A.S. Chowdhury, Fast and secure video encryption using
divide-and-conquer and logistic tent infinite collapse chaotic map, in: International
Confer-ence on Computer Vision and Image Processing, 2022, pp.151–163.

[15] E. Candes, J. Romberg, l1-magic: recovery of sparse signals via convex


programming 4(14) (2005) 16

[16] R. Li, H. Liu, R. Xue, Y. Li, Compressive-sensing-based video codec by


autoregressive prediction and adaptive residual recovery, Int. J. Distrib. Sens.
Netw. 11(8) (2015) 562840,

[17] Khalifa, O.O. and S.S. Dlay, 1998. Wavelets image data compression.
Proceedings of the IEEE International Symposium on Industrial Electronics, July
07, South Africa, pp: 407-410.

[18] Jeremiah, G., 2004. Comparing media codecs for video content. Proceedings of
the Embedded Systems Conference, (ESC`04), San Francisco, pp: 1-18.

[19] Khalifa, O.O., 2003. Image data compression in wavelet transform domain using
modified LBG algorithm. ACM Int. Conf. Proc. Ser., 49: 88-93

[20] Abomhara, M., O. Zakaria, O.O. Khalifa, A.A. Zaidan and B.B. Zaidan,
2010. Enhancing selective encryption for H.264/AVC using advance encryption
standard. Int. J. Comput. Electr. Eng., 2: 1793-8201.

[21] Haseeb, S. and O.O. Khalifa, 2006. Comparative performance analysis of image
compression by JPEG 2000: A case study on medical images. Inform. Technol.
J., 5: 35-39.

[21] Haseeb, S. and O.O. Khalifa, 2006. Comparative performance analysis of image
compression by JPEG 2000: A case study on medical images. Inform. Technol.
J., 5: 35-39.

[22] Marcel, A., H.S. Cornelis and J.B. Ferderik, 1997. Low-bitrate video coding based
upon geometric transformations. Proceeding of the proRISC Work on Circuits,
System and Signal Processing, 20 May 1998, South Africa, pp: 561-568.

You might also like