Ic23 Unit06 Script

Recently on Image Compression ...
MI
A
1 2
Recently on Image Compression ...
3 4
5 6
What did we learn in the last learning unit? 7 8
Dictionary coding: Assign codes to large words. 9 10
Very interesting for fast encoding and decoding. 11 12

Marlin: Markov processes + plurally parsable dictionaries. 13 14

15 16
Open Questions: 17 18
Can we combine the benefits of adaptive and higher-order coding? 19 20
What is the current state-of-the-art for compression performance? 21 22

23 24
25 26
27 28
29 30
31 32
33 34
Recently on Image Compression ... MI

A
Today’s Learning Unit 1 2
3 4
Image Compression
5 6
Part I: Lossless Part II: Lossy 7 8
Entropy Coding Transform Coding
9 10
Other Approaches
Lossless Codecs
Information Theory JPEG 11 12
PNG Fractal Compression
Huffman Coding JPEG2000
gif Neural Networks 13 14
Integer Coding HEVC intra
JBIG
Arithmetic Coding 15 16
JBIGLS
RLE, BWT, MTF, bzip
Inpainting-based Basics 17 18
Compression
Dictionary Coding PDE-based Inpainting

quantisation 19 20
error measures
Prediction
LZ-Family
Data Selection
21 22
Tonal Optimisation
linear prediction
Deflate
Patch-based Inpainting
Teaser: Video Coding
23 24
High Throughput PPM
Inpainting-based Codecs
PAQ
PDE-based video coding
25 26
MPEG-family 27 28
29 30
New class of coding strategies that are fast and flexible. 31 32
33 34
Outline MI
A
1 2
Learning Unit 06:
3 4
PPM, and PAQ 5 6
7 8
9 10
11 12
Contents 13 14
1. Prediction by Partial Matching 15 16
2. PAQ 17 18
19 20
3. Comparison of Compression Algorithms
21 22
23 24
25 26
27 28
29 30
31 32
c 2023 Christian Schmaltz, Pascal Peter
33 34
Shannon’s Prediction Experiments MI

A
1 2
Prediction by Partial Matching
3 4
5 6
Motivation: Shannon’s Prediction Experiments
7 8
Already in 1950: Shannon explored prediction. 9 10
Goal: predicting English sentences with human predictors (his wife Mary) 11 12
identical twin concept: 13 14

15 16
• emitter and receiver both have someone who can do the same guesses
17 18
• experimental limits for coding efficiency based on predictions
19 20
We will repeat some of their experiments in the live meetings.
21 22
Result: Humans are suprisingly good at predictions!
23 24
• Coding costs between 0.6 and 1.3 bits per letter on average. 25 26
• The can use grammar, semantic context, idioms, ... 27 28
• Not easily reproducable with a computer. 29 30
31 32
33 34
Shannon’s Prediction Experiments MI
A
Towards a more General Prediction 1 2
3 4
Idea: We have already defined higher order coding!
5 6
• lower entropy due to conditional probabilities for already encoded symbols 7 8
Problems: |S|k contexts of order k, (≈ 1.1 · 1012), huge overhead. 9 10
Idea: Combine with adaptive coding (Version II with NYT symbol). 11 12

• Start with S = {NYT} 13 14

15 16
• Introduce new symbols to S: Transmit NYT, new symbol in binary.
17 18
Problem: Which order k? Many NYT symbols needed if context too large.
19 20
Idea: If next symbol was not seen before in current context, reduce order.
21 22
23 24
25 26
27 28
29 30
31 32
33 34
Prediction by Partial Matching MI

A
Motivation 1 2
3 4
Problem: The decoder does not know if the next symbol has appeared in the
5 6
current context before.
7 8
Idea: Replace NYT-symbol by an escape symbol indicating that we should
9 10
consider a smaller context.
11 12
For now, we fix the counter of the escape symbol to 1. (This will change later!)
13 14
Problem: A symbol that was never seen cannot be encoded at all.
15 16
Idea: Add a −1-th order context in which each symbol occurs with equal
17 18
probability.
19 20
→ Combining all these ideas results in the 21 22
prediction by partial matching (PPM) coding scheme.
23 24
25 26
27 28
29 30
31 32
33 34
A
Algorithm: k-th order PPM 1 2
3 4
We need the following definitions: 5 6

7 8
cX,C is the counter for symbol X in context C.
9 10
SC contains all symbols that have occured in context C.
11 12
S−1 contains symbols occurring in −1-th order context.
13 14
For the next symbol X of the source word 15 16
17 18
1. Set ` ← k.
19 20
2. If ` = −1 set P (X) = 1/|S−1|, goto 4.
21 22
3. Check if `-th order context C of X has been seen before:
23 24
P
• YES: Set P (X) = cX,C / Y ∈SC cY,C . 25 26
c
• NO: Encode ESC with P (ESC) = P ESC,C , set ` ← ` − 1, goto 2. 27 28
Y ∈S cY,C
C
29 30
4. Encode X with P (X), update counters for all relevant contexts.
31 32
5. Encode next symbol (or terminate if none are left).
33 34

A
Example – 5th order PPM 1 2
3 4
Input word: BANANABOAT. Already encoded: BANAN. Next symbol: A
5 6
The order 5 context BANAN has not been seen previously. Nothing must be 7 8
stored.
9 10
The order 4 context ANAN has not been seen previously.
11 12
The order 3 context NAN has not been seen previously. 13 14
The order 2 context AN was seen once, followed by A. Thus, the counter for the 15 16
symbol A (and the escape symbol) is 1, all other counters are zero. Thus, the 17 18
symbol “A” can be encoded with P (A) = 21 (using arithmetic coding).
19 20
21 22
23 24
25 26
27 28
29 30
31 32
33 34
A
3 4
Input word: BANANABOAT. Already encoded: BANANA. Next symbol: B
5 6
The order 5 context ANANA has not been seen previously. 7 8
The order 4 context NANA has not been seen previously. 9 10

The order 3 context ANA has been seen once before, followed by N. The symbol 11 12
we want to encode is different so we encode the ”escape” symbol with 13 14

P (ESC) = 12 .
15 16
The order 2 context NA was seen once, followed by N. The symbol we want to
17 18
encode is different so we encode an ”escape” symbol with P (ESC) = 12 .
19 20
The order 1 context is A. This was seen twice, in both cases followed by N. The
21 22
symbol we want to encode is different so we encode an ”escape” symbol with
P (ESC) = 13 . 23 24
The order 0 context was seen 6 times with values B (once), A (3 times) and N 25 26
(twice). Now, we can encode the “B” with P (B) = 17 . 27 28

29 30
31 32
33 34

A
3 4
Input word: BANANABOAT. Already encoded: BANANAB. Next symbol: O
5 6
The order 5-2 contexts have not been seen previously. 7 8
The order 1 context is B. This was seen once, followed by an “A”. The symbol 9 10
we want to encode is different, so we encode an ”escape” symbol with 11 12

P (ESC) = 12 .
13 14
The order 0 context was seen 7 times with values B (twice), A (3 times) and N
15 16
(twice). The symbol we want to encode is different, so we encode an ”escape”
symbol with P (ESC) = 18 . 17 18
In the order −1 context, as all symbols are equally likely, there is no escape 19 20
symbol. Now, we can encode the “O” with P (O) = 15 , assuming 21 22

S = {A, B, N, O, T } is known. 23 24
25 26
27 28
29 30
31 32
33 34
A
Exclusion Principle 1 2
3 4
In the example two slides ago, we had the following situation:
5 6
• Input word: BANANABOAT. Already encoded: BANANA. Next symbol: B 7 8
• The order 3 context ANA has been seen once before, followed by N. The 9 10
symbol we want to encode is different so we encode an ”escape” symbol 11 12
with P (ESC) = 12 .
13 14
• The order 2 context NA was seen once, followed by N. The symbol we want
15 16
to encode is different so we encode an ”escape” symbol with P (ESC) = 12 .
17 18
By sending the first escape symbol, the decoder already knows that the next
symbol is not “N”. 19 20

21 22
Thus, we can temporarily remove all occurrences of “N” in the smaller contexts.
23 24
This exclusion principle is used by PPM to improve the compression ratios.
25 26
27 28
29 30
31 32
33 34

A
Example with Exclusion Principle 1 2
3 4
Input word: BANANABOAT. Already encoded: BANANA. Next symbol: B
5 6
The order 3 context ANA has been seen once before, followed by N. The symbol 9 10
we want to encode is different so we encode an ”escape” symbol with 11 12

P (ESC) = 12 .
13 14
The order 2 context NA was seen once, followed by N. Because N was already
15 16
considered, we exclude it. Since there are no other predictions, there is nothing
to encode. 17 18
The order 1 context is A. This was seen twice, in both cases followed by N which 19 20
was excluded. Again, there is nothing to encode. 21 22

The order 0 context was seen 6 times with values B (once), A (3 times) and N 23 24
(twice). After excluding N, P (B) = 15 . 25 26

27 28
29 30
31 32
33 34
A
Example with Exclusion Principle 1 2
3 4
Input word: BANANABOAT. Already encoded: BANANAB. Next symbol: O
5 6
The order 1 context is B. This was seen once, followed by an A. The symbol we 9 10
want to encode is different, so we encode an ”escape” symbol with 11 12

P (ESC) = 12 .
13 14
Ignoring all As, the order 0 context was seen 4 times with values B (twice) and N
15 16
(twice). The symbol we want to encode is different, so we encode an ”escape”
symbol with P (ESC) = 15 . 17 18
In the order −1 context, as symbols are equally likely, there is no escape symbol. 19 20
Since A, B, and N are excluded, we can encode the symbol O with P (O) = 12 , 21 22
assuming S = {A, B, N, O, T } is known. 23 24
25 26
27 28
29 30
31 32
33 34

A
Benefit from the Exclusion Principle 1 2
3 4
Total probability of symbol B after encoding BANANA:
5 6
1 1 1
• with exclusion principle: 2 · 5 = 10 7 8
• without exclusion principle: 21 · 12 · 13 · 17 = 84
1
9 10
11 12
⇒ log2 84
10 ≈ 3.07 bits saved through exclusion principle.
13 14
Total probability of symbol O after encoding BANANAB:
15 16
1
• with exclusion principle: · 15 · 12 = 1
2 20 17 18
1 1 1 1
• without exclusion principle: · · = 2 8 5 80 19 20

⇒ log2 80 21 22
20 = 2 bits saved through exclusion principle.
23 24
25 26
27 28
29 30
31 32
33 34
A
Estimating the Probability of the Escape Symbol 1 2
3 4
So far, we used the PPMa algorithm which fixes ESC counter to 1.
5 6
However, many different approaches have been proposed: 7 8
• PPMc: Set counter to number of different predictions in the context. 9 10

• PPMb: As PPMc, but each other symbol starts with a counter of zero after 11 12
it was seen once, i.e. only symbols seen twice have a positive probability. 13 14
• PPMx: Set counter to number of symbols with a counter of 1. 15 16
• PPMcx: Use PPMx when possible, and PPMc otherwise. 17 18
19 20
• PPMd: Use a secondary escape estimation (SEE): Estimate probability of a
new symbol from similar situations. 21 22
The exclusion principle does not influence the ESC counter! 23 24
25 26
Remark: In the literature, the variants might be named differently.
27 28
29 30
31 32
33 34

A
Context Length 1 2
3 4
A longer context usually allows a more accurate estimation of the next symbol.
5 6
However, this only holds if the context was seen sufficiently often. 7 8
Thus, choosing the context too large will result in many escape symbols. 9 10
The best context length depends on the file to be compressed. 11 12

13 14
15 16
17 18
19 20
21 22
23 24
25 26
27 28
29 30
Bits required per symbol depending on the maximal context length. Source: J. Cleary 31 32
and W. Teahan: Unbounded Length Contexts for PPM. 33 34
A
PPM∗ 1 2
3 4
A context is called deterministic if it was always followed by the same symbol
5 6
and seen at least once, i.e. if there is exactly one prediction.
7 8
PPM* algorithm:
9 10
• Look for shortest matching deterministic context and try to encode the next 11 12
symbol using this context.
13 14
• If that did not work, continue with a “normal” PPM algorithm, i.e. with a
15 16
“standard” maximal context length.
17 18
Remarks 19 20
21 22
PPM algorithms are quite good, but also slow and memory consuming.
23 24
Often, PPM algorithms are used to estimate the next bit (instead of the
complete next symbol). Then, the context are the last n bits. 25 26
27 28
PPM algorithms combining SEE and deterministic contexts (such as PPMz)
yield the best results. 29 30

31 32
33 34
Outline MI
A
1 2
Learning Unit 06:
3 4
PPM, and PAQ 5 6
7 8
9 10
11 12
Contents 13 14
2. PAQ 17 18
19 20
21 22
23 24
25 26
27 28
29 30
31 32
33 34
PAQ MI
A
1 2
PAQ
3 4
5 6
7 8
9 10
Matt Mahoney, an IBM researcher, started examining the relations
between machine learning and compression in 2002. 11 12
13 14
15 16
The result was the open source project PAQ.
17 18
PAQ is an highly evolved version of PPM with two core differences: 19 20
• PAQ allows to choose contexts arbitrarily from known data. 21 22

• PAQ combines several different predictors (called models). 23 24
To this day, PAQ is one of the top performes in the Hutter challenge. 25 26
27 28
29 30
31 32
33 34
PAQ MI
A
The Hutter Challenge 1 2
3 4
Marcus Hutter (ANU Canberra) offered to award a total of 50.000 5 6

Euros for the best compression algorithm of human knowledge. 7 8
For each new record, money is payed out according to 9 10

11 12
Z(L − S) 13 14
L
15 16
(S new record, L old record, Z price fund) 17 18
Test set: 100 MB from Wikipedia 19 20
Rules 21 22
23 24
• memory limit: 1GB RAM, 10GB HDD during compression.
25 26
• time limit: ≈ 8 hours on test machine, no GPU, single core
27 28
• standalone program with no additional input
29 30
You can still take part, but competition is fierce! 31 32
33 34
Context Mixing MI
A
Context Mixing 1 2
3 4
Most important feature of PAQ is context mixing.
5 6
Motivation: 7 8
9 10
For the PPM algorithms, we always use consecutive previous symbols as contexts.
11 12
In some situations, other contexts are better:
13 14
• When storing an image, a good context are the neighbouring pixels that are
15 16
already stored. These are not always the last encoded symbols.
17 18
• When storing the channels of an RGB image as R1G1B1 R2G2B2 . . ., we
might want to use only values from the same colour channel as context. 19 20
21 22
• For text files, one might consider case-insensitive words as contexts.
23 24
25 26
27 28
29 30
31 32
33 34
Context Mixing MI
A
Context Mixing 1 2
3 4
Idea: Allow non-consecutive contexts as arbitrary functions of the known data.
5 6
Problems: 7 8
• It is unclear which contexts to consider. 9 10

• There is no obvious order between these contexts. 11 12
• Different contexts might be helpful for different files, or even in different 13 14

parts of the same file. 15 16
Idea: 17 18
Get a prediction from each context, and combine them to a single prediction. 19 20
→ The resulting encoding schemes are called context mixing algorithms.
21 22
23 24
25 26
27 28
29 30
31 32
33 34
Context Mixing MI
A
How to Combine Predictions? 1 2
3 4
Assume P (y = 0|A) and P (y = 0|B) are known. What is P (y = 0|A ∩ B)?
5 6
Problem: Math does not answer this question. 7 8
P (y = 0|A ∩ B) could be anything!

9 10
Additional problem: We might have different confidence in P (y = 0|A) and
11 12
P (y = 0|B), which should be taken into account.
13 14
Idea: Make an educated guess.
15 16
17 18
19 20
21 22
23 24
25 26
27 28
29 30
31 32
33 34
Context Mixing MI
A
Context Mixing by Weighted Averaging 1 2
3 4
Use variables n0i and n1i to count how often a 0 and a 1 have been encountered
5 6
in the i-th context.
7 8
Furthermore, since some contexts are more useful than others, each context is
9 10
assigned a weight wi.
11 12
Then, the estimated probability po that the next symbol is a zero is given by:
13 14
S0 X 15 16
p0 = , Sj = ε + winj,i, j ∈ {0, 1}
S0 + S1 i 17 18
19 20
where the sum Sj computes the weighted average over all currently occurring
contexts, and ε is a small number to prevent divisions by zero. 21 22
23 24
• S0 is the weighted evidence for 0.
25 26
• S1 is the weighted evidence for 1.
27 28
• S0 + S1 is the total evidence.
29 30
31 32
33 34
Context Mixing MI
A
Example 1 2
3 4
We consider two contexts: Context A was seen 9 times (3 times followed by a 0,
5 6
6 times by a 1), while context B was seen 3 times (always followed by a 0).
Furthermore, we assume that w1 = 31 , w2 = 32 . 7 8
9 10
Then, we estimate the probability that a 0 follows as:
11 12
1 2
ε+ ·3+ ·3
3 3 13 14
p0 =
ε + 13 · 3 + 32 · 3 + ε + 13 · 6 + 32 · 0 15 16
ε + 13 · 3 + 23 · 3 17 18
=
2ε + 13 · 9 + 23 · 3 19 20
ε+3 3
= ≈ = 60% 21 22
2ε + 5 5
23 24
25 26
27 28
29 30
31 32
33 34
PAQ MI
A
Context Mixing in PAQ 1 2
3 4
Up to version 3, the PAQ compression algorithm uses a context mixing algorithm
5 6
as described before to estimate the next bit.
7 8
Problem: The importance of different contexts might vary in different parts of a
9 10
file, but the weights wi are static.
11 12
Solution: In PAQ4 to 6, the weights are adapted dynamically during the coding
process by a gradient descent scheme. 13 14

15 16
• For a bit x ∈ {0, 1} compute the partial derivatives ∂wi − log px of its
coding cost − log px. 17 18
• The weights are updated by moving along this cost gradient in weight space: 19 20
21 22
(S0 + S1)ni,1 − S1ni,0 23 24
wi ← max 0, wi + (x − p1)
S0S1
25 26
(Verification of this formula in the tutorials.) 27 28

29 30
31 32
33 34
PAQ MI
A
PAQ 4-6 1 2
3 4
Additionally, whenever a bit is observed and the count for the opposite bit is
5 6
more than 2, the excess is halved (to adapt to changing probabilities).
7 8
Example 9 10
11 12
If n0 = 0, n1 = 10 and several zeros are observed next, the following counters
are used: 13 14
15 16
• After 1 zero : n0 = 1, n1 = 10 − 10−22 =6
6−2 17 18
• After 2 zeros: n0 = 2, n1 = 6 − 2 = 4
19 20
• After 3 zeros: n0 = 3, n1 = 4 − 4−2 =3
2 21 22

• After 4 zeros: n0 = 4, n1 = 3 − 3−22 =2 23 24
• After 5 zeros: n0 = 5, n1 = 2 25 26
• After 6 zeros: n0 = 6, n1 = 2 27 28
29 30
31 32
33 34
PAQ MI
A
PAQ - Version 7 1 2
3 4
Starting from version 7, neural networks are responsible for estimating weights.
5 6
Basic idea of simple neural networks with backpropagation: 7 8
• The network has input values ti. 9 10

• According to weights wi, the network computes an output. 11 12
• The goal is to minimise a cost function. 13 14

15 16
• Gradient descent is used to optimise the weights.
17 18
In our case, the inputs should be the probabilities from our model.
19 20
The output is the prediction, which we can check against the actual next symbol.
21 22
As in previous verisons of PAQ, we want to minimise the coding cost, which is 23 24
our loss function.

25 26
27 28
29 30
31 32
33 34
PAQ MI
A
PAQ - Version 7 1 2
3 4
Furthermore, a logistic mixing (instead of the weighted average) is used:
5 6
• We define the operations stretch and squash as: 7 8
9 10
p −1 1
stretch(p) = ln squash(x) = stretch (x) =
1−p 1 + e−x 11 12
13 14
• The network inputs are the stretched probabilities
15 16
ti := stretch(p1,i). 17 18
19 20
• The output probability is computed according to 21 22
! 23 24
X
p1 = squash witi 25 26
i
27 28
The gradient descent step with learning rate η and next bit x becomes
29 30
wi ← wi + ηti(x − p1) 31 32
33 34
PAQ MI
A
Types of Contexts in PAQ 1 2
3 4
k-th order contexts as in PPM: Consider the last k bytes.
5 6
sparse contexts: Do not consider continuous sequences of previous bytes, but 7 8
allow gaps inbetween.

9 10
Text contexts consider k full words consisting only of alphabetic characters and
11 12
convert to lowercase.
13 14
2D-contexts for images and tables: Search for repeating byte patterns to
15 16
convert 1-D sequence into multiple rows. Also consider colour channels if
available. 17 18
specialised models, for instance: 19 20

21 22
• x86 executables
23 24
• BMP, TIFF, or JPEG images
25 26
• WAV audio files
27 28
• ... 29 30
many more 31 32
33 34
PAQ MI
A
Dirty Tricks in PAQ 1 2
3 4
It does not just use one neural network. Lower order contexts determine which
5 6
neural network out of hundreds should be used.
7 8
For the Hutter challenge (text-based):
9 10
• Dictionary-based preprocessing 11 12
• Replace common words with codes. 13 14
JPEG: 15 16
• Only uses Huffman coding. 17 18

19 20
• Decompress, use contexts for arithmetic coding.
21 22
executables:
23 24
• convert relative to absolute memory addresses
25 26
27 28
29 30
31 32
33 34
PAQ MI
A
Advantages of PAQ 1 2
3 4
(Currently) best compression ratios
5 6
Believed to be patent free 7 8
Open source, i.e. freely available 9 10

11 12
Disadvantages of PAQ
13 14
Very slow 15 16
Requires a lot of memory 17 18

Decompression algorithm as slow and memory hungry as compression algorithm 19 20

21 22
23 24
25 26
27 28
29 30
31 32
33 34
Outline MI
A
1 2
Learning Unit 06:
3 4
PPM, and PAQ 5 6
7 8
9 10
11 12
Contents 13 14
2. PAQ 17 18
19 20
21 22
23 24
25 26
27 28
29 30
31 32
33 34
Comparison - Lossless Entropy Coding MI

A
1 2
Comparison of Lossless Compression Algorithms
3 4
5 6
Test Setup
7 8
Number of files: 510 9 10
Total size: 316.355.757 bytes 11 12

File types: mailboxes, executables, text files in several languages, bitmaps/TIF 13 14

images, log files, HTML files, MS Word files, source code, databases, windows 15 16
help files, precompressed chess-databases, . . . (46 in total)
17 18
Allowed memory consumption: 800 MB memory 19 20
Allowed time consumption: 12 hours, or 43,200 seconds. 21 22

23 24
25 26
27 28
29 30
31 32
33 34
A
The Competition 1 2
3 4
WinRK: commercial, discontinued, PPMd + PPMz + context modeling
5 6
NanoZip: free, discontinued, LZ + BWT + context modeling 7 8
WinZip: commercial, supports deflate, bzip2, LZMA, PPMd 9 10

BZIP2: free, BWT + MTF + Runlength Coding + Huffman 11 12

WinRAR: commercial, variants of PPM and LZ, prefiltering, Huffman 13 14

15 16
GZIP: free, Deflate (LZ77 + Huffman)
17 18
ARJ32: commercial, now free under GPL, modified LZ77
19 20
21 22
23 24
25 26
27 28
29 30
31 32
33 34

A
Comparison - Lossless Entropy Coding 1 2
3 4
Rank Name Switches Size Ratio Time Time
(Percent) comp. decomp. 5 6
001 PAQ8px -7 62.443.738 80.26 23427 22019 7 8
002 PAQ8P -7 62.683.172 80.19 17976 16653
003 WinRK 3.1.2 Maximum 64.294.243 79.68 14265 13985 9 10
008 NanoZip 0.09a -cc -m800m 69.857.803 77.92 464 438 11 12
015 NanoZip 0.09a -cO 75.720.696 76.06 208 45.3
025 LPAQ8 8 78.704.186 75.12 504 507 13 14
029 NanoZip 0.09a (none) 79.905.525 74.74 58.4 14.1 15 16
049 WINZIP 14 Zipx Best M. 86.276.462 72.73 95 27.5
050 WinRAR 4.1b3 Best solid 86.658.210 72.61 48.7 22.5 17 18
170 BZIP2 1.0.5 (none) 107.720.923 65.95 48.7 14.1 19 20
182 WINZIP 14 Enh. Deflate 111.503.290 64.75 47.9 6.5
201 GZIP 1.2.4 -9 114.857.977 63.69 35.1 9.9 21 22
209 GZIP 1.2.4 (none) 115.442.964 63.51 15.5 10.2 23 24
212 ARJ32 3.15 (none) 116.107.265 63.30 25.8 10.6
229 WINZIP 14 SuperFast 125.279.057 60.40 9 6.4 25 26
255 TESTSET 316.355.757 0.00 0 0 27 28
Compression performance of different programs (excerpt). Source: 29 30
http://www.maximumcompression.com/data/summary mf.php (from 11.12.2012) 31 32
33 34
A
Comparison - Lossless Entropy Coding 1 2
3 4
The performance of compression algorithms strongly depends on the file type.
5 6
Test Original PAQ8PX WinRK 3.1.2 7 8
Size (bytes) Size (bytes) Size (bytes) 9 10
logfile 20.617.071 257.193 271.628
English text 2.988.578 352.722 330.571 11 12
sorted wordlist 4.067.439 386.032 393.704 13 14
OCX help file 4.121.418 400.112 415.522
15 16
MS-word doc file 4.168.192 482.864 688.237
bitmap 4.149.414 539.003 569.053 17 18
jpg/jpeg 842.468 637.124 812.700 19 20
executable 3.870.784 909.161 896.365
21 22
dll (executable) 3.782.416 1.292.869 1.236.643
pdf 4.526.946 3.556.044 3.549.197 23 24
25 26
Comparison of the two best compression programs on different file types. Source:
http://www.maximumcompression.com/data/summary sf.php (from 11.12.2012) 27 28
29 30
31 32
33 34
Summary MI
A
1 2
Summary
3 4
PPM combines the best of higher order and adaptive coding: 5 6
• Conditional probabilities are adapted to contexts. 7 8

• No overhead has to be stored. 9 10
11 12
PAQ is currently the best generic compression algorithm:
13 14
• more complex contexts than PPM
15 16
• combines many contexts with context mixing
17 18
• drawback: very slow and complex 19 20
Outlook 21 22
23 24
So far, we have only looked at generic input data.
25 26
How to address the specific task of image compression? 27 28
29 30
31 32
33 34
References MI
A
References 1 2
3 4
M. Mahoney, Data Compression Explained, 2012. Available at
5 6
http://mattmahoney.net/dc/dce.html
(Explains PPM, context mixing, and PAQ) 7 8
9 10
K. Sayood. Introduction to Data Compression. Morgan Kaufmann, 2006.
(Explains PPM) 11 12
J. Cleary and I. Witten, Data Compression using adaptive coding and partial 13 14
string matching, 1984. 15 16

(Introduced the first PPM algorithms)
17 18
J. Cleary and W. Teahan, Unbounded Length Contexts for PPM, The Computer 19 20
Journal, 1997
(Introduced PPM*) 21 22
23 24
25 26
27 28
29 30
31 32
33 34

Ic23 Unit06 Script

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Ic23 Unit06 Script

Uploaded by

Copyright:

Available Formats

Recently on Image Compression ...

Very interesting for fast encoding and decoding. 11 12

Marlin: Markov processes + plurally parsable dictionaries. 13 14

What is the current state-of-the-art for compression performance? 21 22

Recently on Image Compression ... MI

Dictionary Coding PDE-based Inpainting

Shannon’s Prediction Experiments MI

identical twin concept: 13 14

Idea: Combine with adaptive coding (Version II with NYT symbol). 11 12

• Start with S = {NYT} 13 14

Prediction by Partial Matching MI

We need the following definitions: 5 6

Prediction by Partial Matching MI

The order 4 context NANA has not been seen previously. 9 10

we want to encode is different so we encode the ”escape” symbol with 13 14

(twice). Now, we can encode the “B” with P (B) = 17 . 27 28

Prediction by Partial Matching MI

we want to encode is different, so we encode an ”escape” symbol with 11 12

symbol. Now, we can encode the “O” with P (O) = 15 , assuming 21 22

symbol is not “N”. 19 20

Prediction by Partial Matching MI

we want to encode is different so we encode an ”escape” symbol with 11 12

was excluded. Again, there is nothing to encode. 21 22

(twice). After excluding N, P (B) = 15 . 25 26

want to encode is different, so we encode an ”escape” symbol with 11 12

Prediction by Partial Matching MI

• PPMc: Set counter to number of different predictions in the context. 9 10

Prediction by Partial Matching MI

The best context length depends on the file to be compressed. 11 12

yield the best results. 29 30

• PAQ allows to choose contexts arbitrarily from known data. 21 22

Marcus Hutter (ANU Canberra) offered to award a total of 50.000 5 6

For each new record, money is payed out according to 9 10

• It is unclear which contexts to consider. 9 10

• Different contexts might be helpful for different files, or even in different 13 14

P (y = 0|A ∩ B) could be anything!

process by a gradient descent scheme. 13 14

(Verification of this formula in the tutorials.) 27 28

• The network has input values ti. 9 10

• The goal is to minimise a cost function. 13 14

our loss function.

allow gaps inbetween.

specialised models, for instance: 19 20

• Only uses Huffman coding. 17 18

Open source, i.e. freely available 9 10

Requires a lot of memory 17 18

Decompression algorithm as slow and memory hungry as compression algorithm 19 20

Comparison - Lossless Entropy Coding MI

Total size: 316.355.757 bytes 11 12

File types: mailboxes, executables, text files in several languages, bitmaps/TIF 13 14

Allowed time consumption: 12 hours, or 43,200 seconds. 21 22

WinZip: commercial, supports deflate, bzip2, LZMA, PPMd 9 10

BZIP2: free, BWT + MTF + Runlength Coding + Huffman 11 12

WinRAR: commercial, variants of PPM and LZ, prefiltering, Huffman 13 14

Comparison - Lossless Entropy Coding MI

• Conditional probabilities are adapted to contexts. 7 8

string matching, 1984. 15 16

You might also like