Download as pdf or txt
Download as pdf or txt
You are on page 1of 23

COMPRESSION

TECHNIQUES
TY-CS-A GROUP-11
Archit Kothawade-15
Maitrey Chitale-58
Sharvari Deshmukh-73
Atharva Deshpande-75
Guide- Prof. Pushkar Joglekar
TABLE OF CONTENT

01 02
INTRODUCTION TYPES OF COMPRESSION TECHNIQUES

03 04
LOSS-LESS TECHNIQUES LOSSY TECHNIQUES
WHAT-
Process of reducing size of data files
without significantly affecting their
quality/ integrity.

WHY-
1. Reduced Storage requirements
2. Faster data transmission
3. Cost efficiency
4. Reduced traffic in the network
5. Faster processing
TYPES-
Lossless Lossy
Original data can be completely Some data is permanently removed
Data
reconstructed from compressed data from original data to achieve higher
integrity without any loss of information compression ratios

For multimedia data (images, audio,


For text, program files, databases where
video) where a certain degree of
Suitability preserving every bit of the original
imperceptible loss is acceptable in
information is crucial.
exchange for significant file size reduction

Relatively quick
Maintains original data
Advantage Offers lower compression ratios
Reduces file size dramatically
User can select compression level

Run Length Encoding Discrete Cosine Transform


Algorithms Huffman Encoding Wavelet Transform
Lempel–Ziv–Welch Quantization
LOSS-LESS ALGORITHMS
1. Run Length Encoding
Working-

RLE works by examining the input data


and identifying consecutive
occurrences of the same symbol.

It then replaces these sequences with a


single symbol followed by its frequency.
Suitability-
Best suited for simple images & animations with many redundant pixels.
Useful for black and white images in particular.
May not be as effective for data with minimal repetition, as the encoding could
potentially result in a longer string than the original data.

Advantages-
Lossless compression
Easy to implement, minimal computational resources
Effective for large redundant data

Drawbacks-
Ineffective for complex data
2. Huffman Encoding
Working-

Assigns variable-length codes to input characters, with more frequent


characters having shorter codes and less frequent characters having
longer codes.

These are called Prefix Codes (bit sequences). This ensures no ambiguity
while decoding the generated bitstream.
Step 2- Building the Huffman Tree
Step 1- Frequency Analysis
A Huffman tree is constructed using min-heap

Character Frequency

a 5

b 9
Character Frequency
c 12
c 12
d 13
d 13
e 16
Internal Node 14
f 45
e 16

f 45
character frequency

Internal Node 14 character frequency

e 16 Internal Node 25

Internal Node 25 Internal Node 30

f 45 f 45
character frequency character frequency

f 45 Internal Node 100

Internal Node 55
Step 3- Assigning Codes character code-word

f 0

c 100

d 101

a 1100

b 1101

e 111
Time Complexity-
O(nlogn) where n is the number of unique characters.

Space Complexity-
O(n)

Advantages-
Lossless compression
No ambiguity in decoding
Efficient for data with varying frequencies of characters

Disadvantages-
Encoding overhead
3. Lempel–Ziv–Welch

Working-
As input data is being processed, a dictionary keeps correspondence
between the longest encountered words and a list of code values.

The words are replaced by their corresponding codes and so the input
file is compressed.

Efficiency of the algorithm increases as the number of long, repetitive


words in the input data increases.
Time Complexity-
O(n) where n is the length of the input data.
Space Complexity-
O(n)
Advantages-
Lossless compression
LZW requires no prior information about the input data
stream.
LZW can compress the input stream in one single pass.
LZW can achieve high compression ratios

Drawbacks-
Slower compression
Dictionary Size Matters because of Memory Constraints
LOSSY ALGORITHMS
1. Discrete Cosine Transform
Breaking the Signal into Blocks-
Input signal (eg. image) is divided into small, square blocks of pixels. Each
block is treated as a 2D matrix of pixels

Transforming the Blocks:


DCT calculates weighted sum of cosine functions of varying frequencies
that oscillate across the block.

These functions capture the changes in intensity across the block & represent
the image data in terms of its frequency components.
Separating High and Low Frequencies
The resulting DCT coefficients represent the contributions of different
frequencies to the original signal.

Lower-frequency components tend to capture the overall structure of the


image, while the higher-frequency components capture the details and edges.
2. Quantization
Breaking Values into Intervals:
Quantization involves dividing the range of continuous values into distinct intervals.

Rounding Values:
Each original value is rounded to the nearest value in the reduced set of levels. This
rounding process reduces precision of the data, leading to a loss of detail or accuracy.

Loss of Information:
Since the original data is approximated or simplified during quantization, there is typically a
loss of some information or fine details. This loss can affect the quality of the reconstructed
data, especially in the case of highly detailed or complex signals.
Quantization of a grayscale image :
Pixel intensity range : 0 to 255.

8 levels : [0-31], [32-63], [64-95], [96-127], [128-159], [160-191], [192-223], [224-


255].

Rounding pixel intensity:


THANK
YOU

You might also like