Data Compression Intro

DATA COMPRESSION
Lecture By
Kiran Kumar KV PESSE
Block diagram of Data Compression
Unit 1 Chapter 1
INTRODUCTION TO LOSSLESS COMPRESSION
Preface
Introduction
Data
Need for Compression
Compression Techniques
Lossless and Lossy Compression
Performance Measure
Modeling and Coding

Problems
Introduction
The word data means "to give", hence "something given". In geometry, mathematics, engineering, and so on, the terms given and data are used interchangeably. Also, data is a representation of a fact, figure, and idea. In computer science: data are numbers, words, images, etc., accepted as they stand.
Data(Analog)
First Images sent over Atlantic using submarine cable(telegraph) in 1920's
1964 Lunar Probe
Data (Analog)
To The Principal College
Respected Sir, Subject: Need a Heater in Class Students please fill in other things. Yours Sincerely Faculty
Data(Digital World)
Raw Data
=>
Digital Data 011.0110101.
Difference b/w Data, Information and Knowledge?

Data is the lowest level of abstraction, information is the next level, and finally, knowledge is the highest level among all three. Data on its own carries no meaning. In order for data to become information, it must be interpreted and take on a meaning.
For example, the height of Mt. Everest is generally considered as "data", a book on Mt. Everest geological characteristics may be considered as "information", and a report containing practical information on the best way to reach Mt. Everest's peak may be considered as "knowledge".
Compression
What is the need of compression?
What are the different kinds of Compression?

Which is the better one ? Which technique is used more often? What the use of combining both the technique?
What is the need for compression?
Weather Forecasting
Internet data
Broadband
Planning Cities
Introduction to Lossless Compression
COMPRESSION TECHNIQUES
Different kinds of compression
Loss-less compression
Compressed data can be reconstructed back to the exact original data itself.
Lossy Compression
Compressed data cannot be reconstructed back to the exact original data.
Loss-less Compression
Involve no loss of information. Area: Text Compression.

Reconstructed text is identical to the original. Do not send money
Do now send money
Other Areas: Radiology, Satellite Imagery. Main Advantage? Zero Distortion
Main Disadvantage?
Amount of Compression is less when compared to lossy compression.
Lossy Compression
Disadvantage:
Data that have been compressed using lossy techniques generally cannot be recovered or reconstructed exactly.(Involves some loss of information) Much higher compression ratios.
Advantage:
Areas
Audio and Video Compression.

(MP3, MPEG, JPEG)
MEASURE OF PERFORMANCE
How do we measure or quantify Compression performance?

1.
Based on Relative complexity of the algorithm? Memory required to implement the algorithm. How fast the algorithm performs on a given machine, (Secondary).
2.
3.
or 1. The amount of compression?

2.
How closely the reconstruction resembles the original. (Primary).
Compression Ratio
Mostly widely used measure to compute data compressed is -- Compression Ratio. Ratio of the number of bits required to represent the data before compression to the number of bits required to represent the data after compression. Example : Suppose storing an image made up of a square array of 256 X 256 pixels requires 65,536 bytes. The image is compressed and the compressed version requires 16,384 bytes. Compression Ratio = 4:1
Another Measure
Rate: It is the average number of bits required to represent a single sample. Consider the last example, 256*256 original image contains 65536 bytes. Hence each pixel contains 1 byte or 8 bits per pixel(sample). Now the compressed image contains 16384 bytes. So many bits does each pixel contain? The rate is 2 bits/pixel. Is the above 2 Measures fine for lossy compression?
Distortion
In lossy compression, the reconstruction differs from the original data.
In order to determine the efficiency of a compression algorithm, we have to quantify/measure the difference. The difference between the original and the reconstruction is called the distortion.
Lossy techniques are generally used for the compression of data that originate as analog signals, such as speech and video.
For comparison of speech and video, the final arbiter/judge of quality is human.(behavioral analysis)
Because human responses are difficult to model mathematically, many approximate measures of distortion are used to determine the fidelity/quality of the reconstructed waveforms.
MODELING AND CODING
Modeling and Coding

1) Compression scheme can be either Loss-less or Lossy, based on the requirements/Appli.
2) Exact compression scheme depends on different factors. But the main factor it depends, is based on the characteristics of the data.
3) Eg. Technique used for compression of a text may not work well for compressing images.
The best approach for a given application, largely depends on the redundancies inherent in the data.
Modeling and Coding

Redundancies
Redundant means that is not needed.

Or that can be omitted without any loss, or significance.
Example: If we take an portrait image, most of the background is same and all of it need not be encoded at all.
The approach may work for one kind of data, but may not work for other kind of data ( a landscape or group photo).
Modeling and Coding
The development of data compression algorithms for a variety of data can be divided into two phases.
Modeling: Extract information about any redundancy present in the data and model it.
Coding: Description of the model and a "description" of how the data differ from the model are encoded (binary alphabet). The difference between the data and the model is often referred to as the residual.
DATA MODELING EXAMPLES
Example 1
Q. Consider the following sequence of numbers X= {x1,x2, x3,...}: 9 11 11 11 14 13 15 17 16 17 20 21. How many bits are required to store or transmit every sample?
Ans. 5 bits/sample or by exploiting the structure of data.

1) Model the data Structure of a straight line. Y=mX'+c or Y=X'+8 2) Residue Difference b/w Model and Data e=X-X'. X'={1 2 ......}
010-11-101-1-111
Example 1
The residual sequence consists of only three numbers { -1,0, 1}. Assign a code of 00 to -1, 01 to 0 & 10 to 1, we need to use 2 bits to represent each element of the residual sequence. Therefore, we can obtain compression by transmitting the parameters of the model and the residual sequence. Lossy if only model is transmitted. Lossless if both residue/difference and parameter are transmitted. Q. Model the given data for compression. { 6 8 10 10 12 11 12 15 16}
{ 5 6 9 10 11 13 17 19 20}
Example 2
Q. Find the structure present in this data sequence. 27 28 29 28 26 27 29 28 30 32 34 36 38 I. Ans. No structure is found. Hence
II. Check for closeness of the values. III. 27 1 1 -1 -2 1 2 -1 2 2 2 2 IV. Send First value. Then send rest of residue . V. Are the bits/sample reduced? VI. Decoder adds current value to the previous decoded value to reconstruct back the original sequence.
Note
Techniques that use the past values of a sequence to predict the current value and then encode the error in prediction, or residual, are called predictive coding schemes. Note: Assuming both encoder and decoder know the model being used, we would still have to send the value of the first element of the sequence.
Example 3
Suppose we have the following sequence:
aba ray a ranba rrayb ranbfa rbfaa rbfaaa rbaway

To repesent above 8 Symbols, 3 bits/symbol are required. Suppose if we assign 1 bit to the symbol that occurs most often.
As there are 41 symbols in the sequence, this works out to approximately 2.58 bits per symbol. This means we have obtained a compression ratio of 1.16:1. (Huffman coding)
Dictionary Compression Scheme. (Letter/Words repeat)
Note
There will be situations in which it is easier to take advantage of the structure if we decompose the data into a number of components. We can then study each component separately and use a model appropriate to that component. There are a number of different ways to characterize data. Different characterizations will lead to different compression schemes. We can compress something with products from one vendor and reconstruct it using the products of a different vendor. International standards organizations have standards for various compression applications.
Unit-1 Chapter 2 Loss-Less Compression
MATHEMATICAL PRELIMINARIES FOR LOSS-LESS COMPRESSION
Overview
This chapter deals with Lossless scheme mathematical framework. Starting with Information Theory Basic Probability Concept. Based on the above mathematical concepts, modeling of the data.
Introduction to Information Theory

Quantitative measure of Information.

Father of Information Theory: Claude Elwood Shannon, an electrical Engineer at Bell Labs.
He defined a quantity called Self Information.

Example: Given an Random Experiment, if A is an Event occurring in a set of outcomes. The self information associated with A is given by
Where i(A) is the self information.

P(A) is the probability of the event A.
Introduction to Information Theory
Log(1) = 0 and -log(x) increases if x decreases.
Probability of an event is low, information associated with it is high and vice-versa.

Another Property: Information obtained from the occurrence of 2 independent events is the sum of information obtained form occurrence of the individual events.
Suppose A & B are 2 independent events. The self information associated with the occurrence of both event A & B is
Intro. To Information Theory
Entropy
Suppose we have independent events Ai, which are the outcomes of some experiment S.
Then the average self information associated with the random experiment is
This quantity is called entropy associated with the experiment.(Shannon)

Note: Entropy is also the measure of average no. of binary symbols needed to code the output.
Note
Most of the experiment results that we see in this subject are independent and identical distributed(iid). Above Entropy equation holds good only if the experiment is iid. Theorem: Shannon showed that the maximum average no. of bits that a loss-less compression scheme can achieve will be equal to the entropy of the source.
The estimate of the entropy depends on our assumptions about the structure of the source sequence.
Example 4
Q. Consider the sequence 1 2 3 2 3 4 5 4 5 6 7 8 9 8 9 10 The probability of occurrence of each element is
P(l) = P(6) = P(7) = P(10) = 1/16
P(2) = P(3) = P(4) = P(5) = P(8) = P(9) =2/16 Assuming the sequence is iid, the entropy for this sequence is first-order entropy calculated as
Entropy of this source is 3.25bits.
Hence w.r.t. Shannon the maximum no. of bits required to code the sample is 3.25 bits/sample.
Example 4
Step 2 : Model the given data to remove redundancy.

Solution: there is a sample-to-sample correlation between the samples and we remove the correlation by taking differences of neighboring sample values. 111-1111-111111-111 The sequence is constructed using two values 1 and -1. P(1) = 13/16 and P(-1)=3/16.
The entropy in this case is 0.70 bits per symbol.

Knowing only these sequence is not enough to reconstruct the original sequence. We must know the process by which this sequence was generated from the original. Process depends on the assumption about input data structure. Assumption = Model
Note
If the parameter rn does not changes with n , then it is called static model. A model whose parameters does not change or adapt with n to the changing characteristics of the data is called an adaptive model. Basically, we see that knowing something about the structure of the data can help to "reduce the entropy."
Structure
Consider the following sequence:
1 2 1 2 3 3 3 3 1 2 3 3 3 3 1 2 3 3 12
Obviously, there is some structure to this data. However, if we look at it one symbol at a time, the structure is difficult to extract. Consider the probabilities: P{1) = P{2) = 1/4 , and p(3) = 1/2. The entropy is 1.5 bits/symbol. This particular sequence consists of 20 symbols; therefore, the total number of bits required to represent this sequence is 30.
Now let's take the same sequence and look at it in blocks of two. Obviously, there are only two symbols, 1 2, and 3 3. The probabilities are P(l 2) = 1/2, P(3 3) = 1/2, and the entropy is 1 bit/symbol. As there are 10 such symbols in the sequence, we need a total of 10 bits to represent the entire sequencea reduction of a factor
Derivation of Average Information Not in Syllabus
Models
Good models for sources lead to more efficient compression algorithms. In general, in order to develop techniques that manipulate data using mathematical operations, we need to have a Mathematical model for the data. There are several approaches to build Mathematical model.

Physical Model Probability Model Markov Model Composite Source Model
Physical Model
If we know something about the physics of the data generation process, we can use that information to construct a model. For example, In speech-related applications, knowledge about the physics of speech production can be used to construct a mathematical model for the sampled speech process. Sampled speech can then be encoded using this model. If residential electrical meter readings at hourly intervals were to be coded, knowledge about the living habits of the populace could be used to determine when electricity usage would be high and when the usage would be low. Then instead of the actual readings, the difference (residual) between the actual readings and those predicted by the model could be coded.
Physical Model
Disadvantages
In general, however, the physics of data generation is simply too complicated to understand, let alone use to develop a model. Since the physics of the problem is too complicated, currently we can a model based on empirical observation of the statistics in data.
Probability Model
The simplest Mathematical model for the source is to assume that all the events are independent and identically distributed(IID). Hence the name ignorance model.
Used when we dont know anything about the source.

Next lets assume the events are independent but not equally distributed. Using Entropy equation we can find the entropy. For a source that generates letters from an alphabet A ={a1,a2 , . . . , aM}, can be represented by a probability model P = {P(a1), P(a2) ........P(aM)}
Probability Model
Next if we discard the assumption of independence also, we come up with a better data compression scheme but we have to define the dependency of data sequence on each other.
One of the most popular ways of representing dependence in the data is through the use of Markov models, named after the Russian mathematician Andrei Andrevich Markov (1856-1922).
Markov Models
For models used in loss-less compression, we use a specific type of Markov process called a discrete time Markov chain. Let {Xn} be a sequence, which is said to follow a Kthorder morkov model if P(Xn|Xn-1,........,Xn-k) = P(Xn|Xn-1,........,Xn-k,.........) The knowledge of the past k symbols is equivalent to the knowledge of the entire past history of the process. The values taken on by the set process {Xn-1 . . . . . . ,........,Xn-k} are called the states of the process.
Markov Models
The most commonly used Markov model is the firstorder Markov model, for which
P(Xn|Xn-1) = P(Xn|Xn-1,Xn-2,Xn-3.......,) Markov chain property: probability of each subsequent state depends only on what was the previous state: The above equations indicate the existence of dependence between samples. However, they do not describe the form of the dependence. We can develop different first-order Markov models depending on our assumption about the form of the dependence between samples.
To define Markov model, the following probabilities have to be specified: transition probabilities P{X2|X1} and initial probabilities P{X1}.
Markov Models
If we assumed that the dependence was introduced in a linear manner, we could view the data sequence as the output of a linear filter driven by white noise. The output of such a filter can be given by the difference equation
En is the white noise. This model is often used when developing coding algorithms for speech and images.
The Markov model does not need the assumption of linearity
Markov Model Example

For example, consider a binary image. The image has only two types of pixels, white pixels and black pixels. Q. Based on the current pixel appearance, can we predict the appearance the next observation? Ans. Yes, we can model the pixel process as a discrete time Markov chain. Define two states Sw and Sb.{Sw} would correspond to the case where the current pixel is a white pixel, and {Sb} corresponds to the case where the current pixel is a black pixel). We define the transition probabilities P{w/b)and P(b/w), and the probability of being in each state P(Sw) and P{Sb). The Markov model can then be represented by the state diagram shown in Figure.
Markov Model
The entropy of a finite state process with states S, is simply the average value of the entropy at each state:
Example of Markov Model

0.3 0.7
Rain
Dry
0.2
0.8
Two states : Rain and Dry. Transition probabilities: P(Rain|Rain)=0.3 , P(Dry|Rain)=0.7 , P(Rain|Dry)=0.2, P(Dry|Dry)=0.8 Initial probabilities: say P(Rain)=0.4 , P(Dry)=0.6 .
Markov Model Example
Markov Model in text Compression
Probability of the next letter is heavily influenced by the preceding letter in English. Current Text Compression literature, the k-order Markov models are widely known as finite context models, the word context is being used for state. Example:
Consider the word preceding . Suppose we have already processed precedin and are going to encode the next letter. If no account of context is taken into consideration and we treat the letter as a surprise, the probability of the letter g occurring is relatively low.
Example
If we use a first-order Markov model (i.e. we look at n probability model), we can see that the probability of g would increase substantially. As we increase the context size (go from n to in to din and so on), the probability of the alphabet becomes more and entropy decreases.
Shannon used a second-order model for English text consisting of the 26 letters and one space to obtain an entropy of 3.1 bits/letter . Using a model where the output symbols were words rather than letters brought down the entropy to 2.4 bits/letter. Note: The longer the context, the better its predictive value.
Markov Model in Text Compression

Disadvantage: To store the probability model with respect to all contexts of a given length, the number of contexts would grow exponentially with the length of context. Source imposes some structure on its output, many of these contexts may correspond to strings that would never occur in practice. Different sources may have different repeating patterns. Solution: PPM(Prediction & Partial March) Algorithm. Context is found for the symbol of non-zero/max probability first(Encoding). Zero Prob symbols are replaced with escape symbols and computed.
Composite Source Model

In many applications, it is not easy to use a single model to describe the source. In such cases,we can define a composite source, which can be viewed as a combination or composition of several sources, with only one source being active at any given time. Source Si will have its own Model Mi based on probability Pi.
Coding
Coding: Assignment of binary sequences (0s or 1s) to elements or symbols.
The set of binary sequences is called a code, and the individual members of the set are called codewords.
Code ( 100101100110010101) Codewords ( a -> 001, b -> 010) An alphabet is a collection of symbols called letters. For example, the alphabet used in writing most books consists of the 26 lowercase letters, 26 uppercase letters, and a variety of punctuation marks. In the terminology used in this book, a comma is a letter. The ASCII code for the letter a is 1000011, the letter A is coded as 1000001, and the letter "," is coded as 0011010. Notice that the ASCII code uses the same number of bits to represent each symbol. Such a code is called a fixed-length code.
Coding
To reduce the number of bits required to represent different messages, we need to use a different number of bits to represent different symbols. If we use fewer bits to represent symbols that occur more often, on the average we would use fewer bits per symbol. The average number of bits per symbol is often called the rate of the code. Example: Morse Code, Huffman code. letters that occur more frequently are shorter than for letters that occur less frequently. The codeword for E is 1 bit while the codeword for Z is 7 bits.
Uniquely Decodable Codes
Average length of the code, is not only the criteria for good code.
Example Suppose our source alphabet consists of four letters a1, a2, a3 & a4 with probabilities P(a1) = 1/2 , P(a2) = 1/4, P(a3) = P(a4) = 1/8. The entropy for this source is 1.75 bits/symbol.
where n(ai) is the number of bits in the codeword for letter ai and the average length is given in bits/symbol.
From the table, w.r.t average length Code1 appears to be the best code. However code should have the ability to transfer information in an unambiguous way.
Code 1
Both a1 and a2 have been assigned the codeword 0. When a 0 is received, there is no way to know whether an a1 was transmitted or an a2. Hence we would like each symbol to be assigned a unique codeword.
Uniquely Decodable Code
Code 2 seems to have no problem with ambiguity. However if we encode {a2 a1 a1}. Binary string would be 100. However 100 can be decoded as {a2 a1 a1} and {a2 a3} Meaning original sequence cannot be recovered with certainty. There is no Unique decodability. (Not Desirable)
How about Code 3?
First 3 codewords end with 0. 0 denotes termination of codeword.

And a4 codeword is 3-bit 1's. Which is easily decodable.
Code 3
Code 3?
Notice that the first three codewords all end in a 0. In fact, a 0 always denotes the termination of a codeword.
The final codeword contains no 0s and is 3 bits long. Because all other codewords have fewer than three 1s and terminate in a 0, the only way we can get three as in a row is as a code for a4. The decoding rule is simple. Accumulate bits until you get a 0 or until you have three 1s. There is no ambiguity in this rule, and it is reasonably easy to see that this code is uniquely decodable.

Code 4.
Each codeword starts with a 0, and the only time we see a 0 is in the beginning of a codeword. Decoding rule is to accumulate bits until you see a 0. The bit before 0 is the last bit of the previous codeword.
Code 4
Difference between Code 3 and Code 4 is that the Code 3, decoder knows the moment a code is complete. In Code 4, we have to wait till the beginning of the next codeword before we know that the current codeword is complete. Because of this property, Code 3 is called an instantaneous code and code 4 is near instantaneous code. Q). Is code 4 Uniquely Decodable? Ans. Decode the string 011111111111111111.
Instantaneous and near-Instantaneous Decode 011111111111111111 from Code 5.
From the above string

First codeword can be either 0 (a1) or 01(a2).
Assuming 1st codeword is a1, after decoding other 8 codewords as a3's. We will be left with a single (dangling) 1. If we assume 1st codeword as a2, we will be able to decode next 16 codewords as 8 a3's.
The string can be uniquely decoded. In fact. Code 5, while it is certainly not instantaneous, but is uniquely decodable in one way and not unique in other way.

Decode the 01010101010101010 from Code 6.
Step 1: Decode a1 and 8 a3's.
Step 2: Decode 8 a2's and one a1.

Not Uniquely Decodable.
Even with looking at these small codes, it is not immediately evident whether the code is uniquely decodable or not. For Lager codes?
Hence a systematic procedure should be followed to test for unique decodability.
A Test for Unique Decodability is not there in portion
Test for Unique Decodability: Example 1

Consider Code 5.
First list the codewords

{0,01,11} The codeword 0 is a prefix for the codeword 01. Hence the dangling suffix is 1.
There are no other pairs for which one element of the pair is the prefix of the other.
Example 1
Let us augment (add) the codeword list with the dangling suffix. {0,01,11,1} Comparing the elements of this list, we find 0 is a prefix of 01 with a dangling suffix of 1. But we have already included 1 in our list.
Also, 1 is a prefix of 11. This gives us a dangling suffix of 1, which is already in the list.
Example 1
There are no other pairs that would generate a dangling suffix, so we cannot augment the list any further. Therefore, Code 5 is uniquely decodable.
Test for Unique Decodability: Example 2

Consider Code 6. First list the codewords
{0,01,10}
The codeword 0 is a prefix for the codeword 01. The dangling suffix is 1. There are no other pairs for which one element of the pair is the prefix of the other. Augmenting the codeword list with 1, we obtain the list
{0,01,10,1}
Example 2
In this list, 1 is a prefix for 10. The dangling suffix for this pair is 0, which is the codeword for a1. Therefore, Code 6 is not uniquely decodable.
Prefix Codes
The test for unique Decodability requires examining the dangling suffixes. If the dangling suffix is itself a codeword, then the code is not uniquely decodable.
One type of code in which we will never face the possibility of a dangling suffix being a codeword is a code in which no codeword is a prefix of the other
A code in which no codeword is a prefix to another codeword is called a prefix code. A simple way to check if a code is a prefix code is to draw the rooted binary tree corresponding to the code.
Prefix Codes
Draw a tree that starts from a single node(the root node) & has possible 2 branches at each node. One of the branch corresponds to 1 and the other 0. The Convention followed is that root node at the top, left branch is 0 and the right branch is 1. Using convention, draw binary tree for Code 2, 3 & 4.
Prefix Codes
Prefix Codes
Note that apart from the root node, the trees have 2 Kinds of nodes. 1. Internal nodes (nodes that give rise to other nodes)and
2. External nodes(doent give rise to other nodes) They are also called leaves.
In a prefix code, the codewords are only associated with the external nodes. A code that is not a prefix code, such as Code 4, will have codewords associated with internal nodes.(01111111111111).
The Kraft-McMillan Inequality Not in Syllabus
Algorithmic information Theory
Information theory dealt with data and corresponding from it.

While Algorithmic Information Theory deals with program you code for compressing the data.
At the heart of algorithmic information theory is a measure called Kolmogorov complexity.
The Kolmogorov complexity k(x)) of a sequence x is the size of the program needed to generate x.
Size: includes all the needed i/p's for the program are present.
Algorithmic information Theory
If x was a sequence of all ones, a highly compressible sequence, the program would simply be a print statement in a loop.
On the other extreme, if x were a random sequence with no structure then the only program that could generate it would contain the sequence itself. The size of the program, would be slightly larger than the sequence itself. Thus, there is a clear correspondence between the size of the smallest program that can generate a sequence and the amount of compression that can be obtained. Lower bound uncertain and is not practically used.
Huffman Coding
Huffman Coding Overview
The Huffman Coding Algorithm Developer: David Huffman class assignment; information theory, taught by Robert Fano at MIT. These codes are prefix codes and are optimum for a given model (set of probabilities).
The Huffman procedure is based on two observations regarding optimum prefix codes.
1. In an optimum code, symbols that occur more frequently (have a higher probability of occurrence) will have shorter codewords than symbols that occur less frequently. 2. In an optimum code, the two symbols that occur least frequently will have the same length.
Design of Huffman Code
Let us design a Huffman code for a source that puts out letters from an alphabet A = {a1, a2, a3, a4, a5} with P(a1) = P(a3) = 0.2, P(a2) = 0.4, and P(a4) = P(a5) = 0.1. First find first order Entropy? Step1: Sort the letters in Descending Probability order.
Huffman Coding Algorithm Example
Step 3: Find the average length.
L = . 4 x l + . 2 x 2 + . 2 x 3 + . l x 4 + . l x 4 = 2.2 bits/symbol.
Step 4: Calculate Redundancy?
Example
Step 3: Find the average length.
L = . 4 x l + . 2 x 2 + . 2 x 3 + . l x 4 + . l x 4 = 2.2 bits/symbol.

Step 4: Calculate Redundancy?

Step 5: Binary Huffman Tree
Example 2
Transmit 28 data samples using Huffman Code. 11111112222223333344445 55667
Minimum Variance Huffman Coding
Minimum Variance Huffman Coding

L = . 4 x 2 + . 2 x 2 + . 2 x 2 + . 1 x 3 + . 1 x 3 = 2.2 bits/symbol. The two codes are identical in terms of their redundancy. However, the variance of the length of the codewords is significantly different.
Minimum Variance Huffman Codes

Remember that in many applications, although you might be using a variable-length code, the available transmission rate is generally fixed. For example, if we were going to transmit symbols from the alphabet we have been using at 10,000 symbols per second, we might ask for transmission capacity of 22,000 bits per second. This means that during each second the channel expects to receive 22,000 bits, no more and no less. As the bit generation rate will vary around 22,000 bits per second, the output of the source coder is generally fed into a buffer. The purpose of the buffer is to smooth out the variations in the bit generation rate.
However, the buffer has to be of finite size, and the greater the variance in the codewords, the more difficult the buffer
Minimum variance Huffman coding

Suppose that the source generates a string of a4 and a5 for several seconds. If we are using the first code, this means that we will be generating bits at a rate of 40,000 bits per second. For each second, the buffer has to store 18,000 bits
On the other hand, if we use the second code, we would be generating 30,000 bits per second, and the buffer would have to store 8000 bits for every second. If we have a string of a2's instead of a string of a4 and a5 the first code would result in the generation of 10,000 bits per second. Deficit of 12000 bits per second.
,
Second code would lead to deficit of 2000 bits per second. So which do we select?
Application of Huffman coding

Huffman coding is often used in conjunction with other coding techniques in
Loss-less Image Compression Text Compression
Audio Compression
Loss-Less Image Compression
Monochrome Image
Pixel(0-255)
Monochrome Images
Compression of test images using Huffman Coding
Original (Uncompressed) test images are represented using bits/pixel. The image consists of 256 rows of 256 pixels, so the uncompressed representation uses 65,536 bytes.
Image Compression
From a visual inspection of the test images, we can clearly see that the pixels in an image are heavily correlated with their neighbors. We could represent this structure with the crude model Xn = Xn-1 The residual would be the difference between neighboring pixels.
Huffman Coding Text Compression
We encoded the earlier version of this chapter using Huffman codes that were created using the probabilities of occurrence obtained from the chapter. The file size dropped from about 70,000 bytes to about 43,000 bytes with Huffman coding.
Audio Compression
The End of Unit 1 Any Thoughts , Doubts or Ideas |

Data Compression Intro

Uploaded by

Copyright:

Available Formats

You might also like

Data Compression Intro

Uploaded by

Document Information

Original Description:

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Data Compression Intro

Uploaded by

Copyright:

Available Formats

DATA COMPRESSION

Kiran Kumar KV PESSE

Block diagram of Data Compression

INTRODUCTION TO LOSSLESS COMPRESSION

Modeling and Coding

First Images sent over Atlantic using submarine cable(telegraph) in 1920's

1964 Lunar Probe

Digital Data 011.0110101.

Difference b/w Data, Information and Knowledge?

What is the need of compression?

What are the different kinds of Compression?

What is the need for compression?

Need for Compression

Need for Compression

Introduction to Lossless Compression

Different kinds of compression

Compressed data cannot be reconstructed back to the exact original data.

Involve no loss of information. Area: Text Compression.

Reconstructed text is identical to the original. Do not send money

Do now send money

Other Areas: Radiology, Satellite Imagery. Main Advantage? Zero Distortion

Amount of Compression is less when compared to lossy compression.

Audio and Video Compression.

Introduction to Lossless Compression

How do we measure or quantify Compression performance?

or 1. The amount of compression?

How closely the reconstruction resembles the original. (Primary).

Introduction to Lossless Compression

MODELING AND CODING

Modeling and Coding

Modeling and Coding

Redundant means that is not needed.

Modeling and Coding

Introduction to Lossless Compression

DATA MODELING EXAMPLES

Ans. 5 bits/sample or by exploiting the structure of data.

aba ray a ranba rrayb ranbfa rbfaa rbfaaa rbaway

Dictionary Compression Scheme. (Letter/Words repeat)

Unit-1 Chapter 2 Loss-Less Compression

MATHEMATICAL PRELIMINARIES FOR LOSS-LESS COMPRESSION

Introduction to Information Theory

Quantitative measure of Information.

He defined a quantity called Self Information.

Where i(A) is the self information.

Introduction to Information Theory

Log(1) = 0 and -log(x) increases if x decreases.

Probability of an event is low, information associated with it is high and vice-versa.

Intro. To Information Theory

Intro. To Information Theory

Intro. To Information Theory

This quantity is called entropy associated with the experiment.(Shannon)

Entropy of this source is 3.25bits.

Step 2 : Model the given data to remove redundancy.

The entropy in this case is 0.70 bits per symbol.

Derivation of Average Information Not in Syllabus

Physical Model Probability Model Markov Model Composite Source Model

Used when we dont know anything about the source.

The Markov model does not need the assumption of linearity