Professional Documents
Culture Documents
1-Data Compression-2022
1-Data Compression-2022
Sid Lamrous
0001 0100 0011 1111 0101 0101 0101 0101 0101 0101
0101 0101 1110 1111 0000 1111 1111 1111 1111 1111
1111 1111 1111 1111 1111 1111 0110 0111 0111 0000
0111 0000 1010 1111 0000 0000 0001 1111 0001 1111
0001 1111
Let's start by writing a signaling byte, to locate it, we will write it in bold.
0000 0010
The first bit indicates if the following data byte is repeating, if so the bit will
be set to 1, if not it will be set to 0.
If first bit = 0
The following 7 bits will indicate the number of bytes without
repetition
If first bit = 1
The following 7 bits will indicate the number of repetitions
Result:
0000 0010 0001 0100 0011 1111 1000 0100 0101 0101 0000 0010
1110 1111 0000 1111 1000 0101 1111 1111 0000 0001 0110 0111
1000 0010 0111 0000 0000 0010 1010 1111 0000 0000 1000 0011
0001 1111
This chain is 19 bytes, 2 less than the original chain,
we have a compression rate of 21/19 = 1.1
Encode all the values we have to store with the same number of bits, is
this the best solution?
Theory tells us not!
Idea: Use shorter codes for frequent values and reserve longer codes for less
frequent values. Example of the histogram for the image
=> We will therefore focus on VLC (Variable Length Code), we also say
entropy coding.
=144/118 = 1,22
MITGL01 / Course 1 By Sid Lamrous/ Data compression
Huffman Algorithm
Principle:
An element i is represented by a sequence of bits of length inversely
proportional to the probability of appearance p(i) integers.
The problem is to define a variable length code of which no element
is the beginning of another (prefixed code).
The method is based on the construction of a tree based on the
probabilities of appearance of the elements.
Example :
Let the following elements be coded, with the following probabilities:
1 0
■ Message: BRACADABRA
■ Each character will have a representation
interval. The allocation of the interval has no
influence on the compression / decompression
of the message
» But Attention! for decompression use the same table
that was used for compression
New_High_Limit =
Old_Low_Limit +Valeur* High_value(c)
New_Low_Limit =
Old_Low_Limit +Valeur* Low_value(c)
Value =
Value / (Value_high(c)-Value_low (c))
LZW
« Lempel-Ziv-Welch »
Message compression :
« DU CODAGE AU DECODAGE »