Download as pptx, pdf, or txt
Download as pptx, pdf, or txt
You are on page 1of 26

WSU-DTC Campus

Dept. of Information Technology

Itec3121 - Multimedia Systems


Year: III, Semester: II

Chapter 7:
Lossless Compression Algorithms
Abraham Abayneh
abraham12abay@gmail.com

Itec3121 - Multimedia Systems

1
Contents of the chapter

• Introduction
• Basics of Information Theory
• Run-Length Coding
• Variable-Length Coding (VLC)
• Dictionary Based Coding
• Huffman Coding
• Arithmetic Coding
• Lossless Image Compression

Itec3121 - Multimedia Systems

2
Introduction Lossless Compression Algorithms

• Compression is the process of eliminating redundant information


to decrease file size.
• Compression: the process of coding that will effectively reduce the
total number of bits needed to represent certain information.
• Compression converts frames and pixels to mathematical
algorithms that the computer can understand.
• Data Compression refers to a technique where a large file to
reduced to smaller sized file and can be decompressed again to the
large file.
• Decompression converts mathematical algorithms back to frames
and pixels for playback.

Itec3121 - Multimedia Systems

3
Lossless Compression

• In lossless compression (as the name suggests) data are


reconstructed after compression without errors, i.e. no
information is lost.
• Lossless compression retains all of the data of the original file as
it's converted to a smaller file size.
• Typical application domains where you do not want to loose
information is compression of text, files, fax.
 In lossless compression the information is recovered without any
alteration after the decompression stage.
 When a lossless file is opened, algorithms restore all compressed
information, creating a duplicate of the source file.

Itec3121 - Multimedia Systems

4
Lossless Compression….

• Lossless Compression typically is a process with three stages:


• The model: the data to be compressed is analyzed with respect
to its structure and the relative frequency of the occurring
symbols.

• The encoder: produces a compressed bit stream / file using the


information provided by the model.

• The adaptor: uses information extracted from the data (usually


during encoding) in order to adapt the model (more or less)
continuously to the data

Itec3121 - Multimedia Systems

5
What is lossless compression algorithm?

• Lossless compression is a form of data compression that reduce


file sizes without sacrificing any significant information in the
process - meaning it will not diminish the quality of your photos.
• Different Although there are various compression methods,
including Motion JPEG, only MPEG-1 and MPEG-2 are
internationally recognized standards for the compression of
moving pictures (video).
• The main advantages of compression are reductions in storage
hardware, data transmission time, and communication bandwidth.
• This can result in significant cost savings.
• Compressed files require significantly less storage capacity than
uncompressed files, meaning a significant decrease in expenses for
storage.
Itec3121 - Multimedia Systems

6
Information theory
• Information theory is defined to be the study of efficient coding and its
consequences.
• It is the field of study concerned about the storage and transmission of
data.
• It is concerned with source coding and channel coding.
– Source coding: involves compression
– Channel coding: how to transmit data, how to overcame noise, etc
• Entropy is the measure of information content in a message
• Data compression may be viewed as a branch of information theory in
which the primary objective is to minimize the amount of data to be
transmitted.

Itec3121 - Multimedia Systems


7
• Information theory examines the utilization, processing, transmission and
extraction of information.
• In the scenario of information communication over noisy channels, this
theoretical concept was formalized by Claude Shannon (in his work called A
Mathematical Theory of Communication) in 1948.
• In his work, information is considered as a group of possible messages.
• The main goal is to transfer these messages over noisy channels and to have the
receiving device redevelop the message with negligible error probability
(regardless of the channel noise).
• The main result of Shannon’s work is the noisy-channel coding theorem.

Itec3121 - Multimedia Systems


8
 According to the famous scientist Claude E. Shannon, of
Bell Labs, the entropy of an information source with
alphabet is defined as


 Where is the probability that symbol in S will occur.
 The term indicates the amount of information (the so-
called self-information defined by Shannon ) contained in Si,
which corresponds to the number of bits needed to encode Si.

Itec3121 - Multimedia Systems

9
7.3. The algorithms used in lossless
compression are:
7.3.1. Run-Length Coding
Memoryless Source: an information source that is
independently distributed.
Namely, the value of the current symbol does not depend
on the values of the previously appeared symbols.
Instead of assuming memory less source, Run-Length
Coding (RLC) exploits memory present in the information
source.
Rationale for RLC: if the information source has the
property that symbols tend to form continuous groups,
then such symbol and the length of the group can be coded
Itec3121 - Multimedia Systems

10
7.3.2. Variable-Length Coding (VLC)

Since the entropy indicates the information content in an information


source S, it leads to a family of coding methods commonly known as
entropy coding methods.
As described earlier, variable-length coding (VLC) is one of the best-
known such methods.
Here, we will study the Shannon–Fano algorithm, Huffman coding,
and adaptive Huffman coding.
Shannon–Fano Algorithm
 A Top-down approach
 Sort the symbols according to the frequency count of their
occurrences.
 Recursively divide the symbols into two parts, each with
approximately the same number of counts,
Itec3121 - Multimedia Systems until all parts contain only
11
Itec3121 - Multimedia Systems

12
Itec3121 - Multimedia Systems

13
 Huffman Coding

A bottom-up approach
 Initialization: Put all symbols on a list sorted according to
their frequency counts.
 Repeat until the list has only one symbol left:
 From the list pick two symbols with the lowest frequency counts.
Form a Huffman sub-tree that has these two symbols as child nodes
and create a parent node.
Assign the sum of the children’s frequency counts to the parent and
insert it into the list such that the order is maintained
Delete the children from the list.
Assign a code word for each leaf based on the path from the
root. Itec3121 - Multimedia Systems

14
Itec3121 - Multimedia Systems

15
Properties of Huffman Coding
Unique Prefix Property: No Huffman code is a prefix of
any other Huffman code.
Optimality: minimum redundancy code - proved optimal
for a given data model

Itec3121 - Multimedia Systems

16
7.3.3 Dictionary-based Coding
The Lempel-Ziv-Welch (LZW) uses fixed-length code words to
represent variable-length strings of symbols/characters that
commonly occur together,
e.g., words in English text.
The LZW encoder and decoder build up the same dictionary
dynamically while receiving the data.
LZW places longer and longer repeated entries into a dictionary,
and then emits the code for an element, rather than the string
itself, if the element has already been placed in the dictionary.

Itec3121 - Multimedia Systems

17
Itec3121 - Multimedia Systems

18
7.3.4. Arithmetic Coding
Arithmetic coding is a more modern coding method that
usually outperforms Huffman coding in practice.
Huffman coding assigns each symbol a code word which
has an integral bit length. Arithmetic coding can treat the
whole message as one unit.
A message is represented by a half-open interval [a; b]
where a and b are real numbers between 0 and 1.
Initially, the interval is [0; 1). When the message becomes
longer, the length of the interval shortens and the number of
bits needed to represent the interval increases.
Itec3121 - Multimedia Systems

19
Itec3121 - Multimedia Systems

20
 Suppose the alphabet is [A, B, C, D, E, F, $],in which$ is a special symbol used to
terminate the message, and the known probability distribution is as shown in the
following figure;

 a string of symbols CAEE$ is encoded as follows;


Initially, low = 0, high = 1.0, and range = 1.0.

 After the first symbol C, Range_low(C) = 0.3, Range_high(C)


= 0.5;
 so low = 0 + 1.0 x 0.3 = 0.3, high = 0 + 1.0 x 0.5 = 0.5.
Itec3121 - Multimedia Systems

21
7.4. Lossless Image Compression

One of the most commonly used compression


techniques in multimedia data compression is
differential coding.
Given an original image I(x; y), using a simple
difference operator we can define a difference
image d(x; y) as follows:
d(x; y) = I(x; y) − I(x − 1; y)
Due to spatial redundancy existed in normal images
I, the difference image d will have a narrower
histogram and hence a smaller entropy, as shown in
Fig. 7.9

Itec3121 - Multimedia Systems

22
Itec3121 - Multimedia Systems

23
Lossless JPEG

 Lossless JPEG: A special case of the JPEG image compression.


 The Predictive method
1. Forming a differential prediction: A predictor combines
the values of up to three neighboring pixels as the predicted value for the current
pixel, indicated by ‘X’ in Fig.
7.10. The predictor can use any one of the seven schemes
listed in Table 7.6.

Fig. 7.10: Neighboring Pixels for Predictors in Lossless JPEG.


Itec3121 - Multimedia Systems

24
Lossless JPEG

2. Encoding: The encoder compares the prediction with


the actual pixel value at the position ‘X’ and encodes the
difference using one of the lossless compression
techniques we have discussed, e.g., the Huffman coding
scheme

Itec3121 - Multimedia Systems

25
Thank you!!!!

Itec3121 - Multimedia Systems

26

You might also like