Data Compression

DATA COMPRESSION
Data Compression algorithms can be defined as the process of reduction in sizes of files at the time
of retaining the same or similar to some extent of data. This is done by performing the elimination of
unnecessary data or making the data again for higher efficiency.
At the time of compression, you have the option to choose from lossy or lossless methods. If we
talk about the lossy method it permanently erases the data. On the other hand, lossless take care
of your original data. The type you choose depends on how quality you require your files to be.
In this article, you will find a mixture of lossless data compression algorithms and image and video
compression algorithms based on deep learning.
Lossless Data Compression Algorithms are normally beings used for performing the function of
archive or any other high-quality functions. These data compression algorithms permit you to
perform a reduction of file size. It also ensures that files can be restored fully if they needed to be
retored.
1. LZ77: LZ77 was announced in 1977 and termed as the base of so many other lossless
compression algorithms. It normally uses the method of “Sliding Window”. In this
methodology, it takes care of a dictionary that makes use of triples for representing:
2. Offset- It can be termed as the actual start of the phrase and the beginning of the file.
3. Run-length—It is defined as the quantity of the characters that help you in making a phrase.
4. Deviating characters—These are the marketers that indicate a new phrase.
It includes an indication that the phrase used is completely equal to the original phrase and also
defines if there is any different character.
LZR
LZR was developed and announced in the year 1981. Michael Rodeh announced it, and he modified
it later. It can be used as a primary alternative for the LZ77 data compression algorithm, but you
also have the option to use it for any offset within the file. If you are not using it linearly, then it
needs a significant amount of memory storage. This condition makes LZ77 a better option for using.
This data compression algorithm is straightforward to implement and has the potential for very
high performance when implemented on hardware.
It is the algorithm that is widely used Unix data compression algorithm utility compress and is used
in the GIF image format. It became the first data compression algorithm that was widely used on
computers. A large English text file can typically be compressed from LZW to about half its original
size.
LZSS
LZSS stands for Lempel Ziv Storer Szymanski and it was developed and announced in the year
1982. This is a data compression algorithm that improves on LZ77. This process of compression is
done by including a method that will keep an eye on whether a substitution decreases the file size.
If it does not decrease then the input will be left in its original form. It also doesn’t prefer the use of
deviating characters, and only prefer using offset length pairs.
This algorithm is majorly used for archiving the files into different formats such as RAR or ZIP, or
for the compression of network data.
It can be termed as a dictionary coding technique. LZSS tries to replace a string of symbols with a
reference to a dictionary location of the same string. The key difference between LZ77 and LZSS in
that in LZ77, the dictionary reference could be longer than the string it was replacing. In LZSS, such
references are omitted if the length is less than the „break-even“ point.
Deflate
Deflate is a lossless data compression algorithm file format that makes the use of a combination of
LZSS and Huffman coding. It was designed by Phil Katz in the year 1993. Huffman coding is also an
algorithm that was developed in the year 1952. It can be defined as an entropy encoding algorithm
that helps you in assigning your code based on the frequency of the character.
LZMA
LZMA stands for the Lempel Ziv Markov chain algorithm and it was designed and released in the
year 1998. It can be also said as a modification of the LZ77. This modification was done for the –Zip
archiver with a .7z format. The method of chain compression is being used that implements the
modified LZ77 at a bit rather than byte level. Output appearing will be further processed using
arithmetic coding for performing more compression.
The performance of other compression steps is dependent on the exact implementation. This data
compression algorithm uses a dictionary compression scheme somewhat very similar to the LZ77
algorithm that was published by Abraham Lempel and Jacob Ziv in the year 1977. It also features a
high compression ratio and a variable compression-dictionary size. While it still maintains the
speed of decompression very similar to other commonly used compression algorithms.
LZMA2
LZMA2 was designed and released in the year 2009. It can also be defined as the modification of
LZMA. It makes the LZMA better by improving its performance with greater multithreading
capabilities. LZMA2 also provides you with improved handling of incompressible data. It is a simple
container format that can include both uncompressed data and LZMA data and that too with
multiple different LZMA encoding parameters.
LZMA2 supports arbitrarily scalable multithreaded compression and decompression and efficient
compression of data that is partially incompressible. However, it may have some security issues
and can be unsafe and less efficient than LZMA. In simple words, this can be useful for you but yes
it is not that safe in comparison to LZMA.
Multi-Layer Perceptron (MLP)- Based Compression
MLP can be defined as a technology that uses multiple neuron layers for input, processing, and
giving output data. It can be implemented for the reduction of dimension tasks and also
compression of data. The MLP-based algorithm was first developed in the year 1988 and join the
already existing processes of:
● Binary coding- standard two-symbol coding.

● Quantization- problems of input from a continuous set to a discrete set.
● Spatial domain transformation – pixel by pixel changes to data.
For the determination of the optimal binary code, the MLP algorithm uses outputs from the above
processes into a decomposition neural network.
Further, this algorithm was modified with intuitive techniques that permitted accurate
approximation of data completely based on neighboring data through backpropagation. It can be
very useful for you in the image and video data compression.
Dee Coder- Deep Neural Network Based Video Compression
Deep Coder is defined as a Convolutional Neural Network (CNN) based framework. This makes the
representation of an alternative to those video compression techniques that we have been using so
long. For predictive and residual signals different Convolutional Neural Networks (CNNs) are bring
used by this model.
It performs encoding of feature maps into the binary stream with the use of scalar quantization and
a very old and traditional file compression algorithm called Huffman encoding. It is claimed that this
model is capable to provide superior performance in comparison to the well-known H.264/AVC
video coding standard.
Convolutional Neural Network (CNN) – Based compression
CNN’s are defined as neural networks of different layers. This is majorly used for the recognition of
images and detection of the feature. The moment you apply it to compression, these networks
make use of convolution for calculating the connection between neighboring pixels. If we compare
it with MLP-based algorithms, CNNs shows better results of compression than them.
This also provides you with improved super-resolution performance and artifact reduction. In
addition to this, CNN-based data compression algorithms make improvements in the quality of
JPEG images. This is done by reducing the peak signal to noise ratio and the structural similarity.
CNN based compression also can get together with the performance of the High-Efficiency Video
Coding standard. This is done with the use of entropy estimation. It can be also very useful for you
in performing the compression of files.
Generative Adversarial Network (GAN)- Based Compression
GANs can be defined as an alternative of neural networks that make use of two networks that
compete. This is done for the production of more accurate analyses and predictions. In the year
2017 GAN-based algorithms were first developed. These data compression algorithms can
compress the files more than two and a half times smaller in comparison to traditional commonly
used methods, like JPEG or WebP.
GAN based algorithms can be used for real-time compression with parallel processing being used
together. This algorithm works as it compresses the images completely based on the most
matching features.
When the decoding is performed, based on the predictions that were made by these features
images are reconstructed. If we compare it with CNN based compression, The GAN based
compression will produce very high-quality images for you by the elimination of adversarial loss.
Prediction by partial matching (PPM)
PPM is an adaptive statistical data compression technique based on context modeling and
prediction. These models use a set of previous symbols in the uncompressed symbol stream to
predict the next symbol in the stream. PPM algorithms can also be used to cluster data into
predicted groupings in cluster analysis.
The number of previous symbols, n, determines the order of the PPM model which is denoted as
PPM(n).
Unbounded variants where the context has no length limitations also exist and
are denoted as PPM. If no prediction can be made based on all n context symbols
a prediction is attempted with n − 1 symbol. This process is repeated until a
match is found or no more symbols remain in context. At that point, a fixed
prediction is made.
PPM compression implementations vary greatly in other details. The actual symbol selection is
usually recorded using arithmetic coding, though it is also possible to use Huffman encoding or
even some type of dictionary coding technique.
Run-length encoding (RLE)
RLW is a form of lossless data compression in which runs of data (sequences in which the same
data value occurs in many consecutive data elements) are stored as a single data value and count,
rather than as the original run. This is most useful on data that contains many such runs.
For example, simple graphic images such as icons, line drawings, Conway’s Game of Life, and
animations. It is not useful with files that don’t have many runs as it could greatly increase the file
size.
RLE may also be used to refer to an early graphics file format supported by CompuServe for
compressing black and white images, but was widely supplanted by their later Graphics
Interchange Format (GIF). RLE also refers to a little-used image format in Windows 3.x, with the
extension rule, which is a Run Length Encoded Bitmap, used to compress the Windows 3.x start-up
screen.
bzip2
bzip2 is a free and open-source data compression program that uses the Burrows-Wheeler
algorithm. It only compresses single files and is not a file archiver. It is developed by Julian Seward
and maintained by Federico Mena. Seward made the first public release of bzip2, version 0.15, in
July 1996. The compressor’s stability and popularity grew over the next several years, and Seward
released version 1.0 in late 2000.
Following a nine-year hiatus of updates for the project since 2010. On June 4, 2019, Federico Mena
accepted the maintainership of the bzip2 project. bzip2 compresses data in blocks of size between
100 and 900 kB.
It uses the Burrows-Wheeler transform to convert frequently-recurring character sequences into

strings of identical letters. It then applies move-to-front transform and Huffman coding. bzip2’s
ancestor bzip used arithmetic coding instead of Huffman. The change was made because of a
software patent restriction.
Huffman coding
Huffman code is a particular type of optimal prefix code that is commonly used for lossless data
compression. The process of finding or using such a code proceeds to utilize Huffman coding, an
algorithm developed by David A. Huffman while he was an Sc.D. student at MIT, and published in
the 1952 paper „A Method for the Construction of Minimum-Redundancy Codes.
The output from Huffman’s algorithm can be viewed as a variable-length code table for encoding a
source symbol (such as a character in a file). The algorithm derives this table from the estimated
probability or frequency of occurrence (weight) for each possible value of the source symbol.
As in other entropy encoding methods, more common symbols are generally represented using
fewer bits than less common symbols. Huffman’s method can be efficiently implemented. Finding a
code in time linear to the number of input weights if these weights are sorted.
ZStandard
Zstandard (or zstd) is a lossless data compression algorithm developed by Yann Collet at Facebook.
Zstd is the reference implementation in C. Version 1 of this implementation was released as free
software on 31 August 2016.
Zstandard was designed to give a compression ratio comparable to that of the DEFLATE algorithm
(developed in 1991 and used in the original ZIP and gzip programs), but faster, especially for
decompression. It is tunable with compression levels ranging from negative 5 (fastest) to 22
(slowest in compression speed, but best compression ratio).
Zstd at its maximum compression level gives a compression ratio close to LZMA, LZAHM, and PPM.
It performs better than LZA, or bzip2. Zstandard reaches the current Pareto frontier, as it
decompresses faster than any other currently-available algorithm with a similar or better
compression ratio
Dictionaries can have a large impact on the compression ratio of small files, so Zstandard can use a
user-provided compression dictionary. It also offers a training mode, able to generate a dictionary
from a set of samples.

Data Compression

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Data Compression

Uploaded by

Copyright:

Available Formats

DATA COMPRESSION

Multi-Layer Perceptron (MLP)- Based Compression

● Binary coding- standard two-symbol coding.

Dee Coder- Deep Neural Network Based Video Compression

Generative Adversarial Network (GAN)- Based Compression

Prediction by partial matching (PPM)

Run-length encoding (RLE)

It uses the Burrows-Wheeler transform to convert frequently-recurring character sequences into

You might also like