4 LZW

You might also like

Download as doc, pdf, or txt
Download as doc, pdf, or txt
You are on page 1of 7

Lab Manual – DCE---Expt 4

Sem VII EXTC

Title:
Implementation of LZW Compression Scheme & Finding compression provided by LZW Compression
scheme.

Estimated time to complete this experiment: ( 02 hours)

Objective: To make the students acquainted with


 Coding techniques
 Lossless compression
 Dictionary Coding
 Dynamic dictionary updation

Course Outcomes :

Sr.No. Course Outcomes Mapped


to Expt
CO1 Students will be able to implement text, audio and video compression techniques. √
CO2 Students will be able to understand symmetric and asymmetric key cryptography
schemes.
CO3 Students will be able to understand network security and Ethical hacking. √
CO4 Students will be able to develop advanced algorithms for secure data transmission,
CO5 Students will be able to design robust and secure systems using industry standard √
algorithms

Books/ Journals/ Websites referred:

1. Introduction to Data compression – Khalid Sayood


2. Data compression – David Solomon
3. Data Encryption Techniques – William Stallings
4. Data compression & Encryption – Forouzan
5. Data compression Book – Mark Nelson

Pre Lab/ Prior Concepts: Probability theory

Historical Profile:
Lab Manual – DCE---Expt 4
Sem VII EXTC
Data compression can be viewed as a special case of data differencing Data differencing consists of
producing a difference given a source and a target, with patching producing a target given a source and
a difference, while data compression consists of producing a compressed file given a target, and
decompression consists of producing a target given only a compressed file. Thus, one can consider
data compression as data differencing with empty source data, the compressed file corresponding to a
"difference from nothing." This is the same as considering absolute entropy (corresponding to data
compression) as a special case of relative entropy (corresponding to data differencing) with no initial
data.

When one wishes to emphasize the connection, one may use the term differential compression to refer
to data differencing.

New Concepts to be learned:

Dictionary Coding, Digrams

Requirements:

1. Personal computer
2. MATLAB

Theory:

LZW compression is named after its developers, A. Lempel and J. Ziv, with later modifications by Terry
A. Welch. It is the foremost technique for general purpose data compression due to its simplicity and
versatility. Typically, you can expect LZW to compress text, executable code, and similar data files to
about one-half their original size. LZW also performs well when presented with extremely redundant
data files, such as tabulated numbers, computer source code, and acquired signals. Compression
ratios of 5:1 are common for these cases. LZW is the basis of several personal computer utilities that
claim to "double the capacity of your hard drive."

Encoding

A high level view of the encoding algorithm is shown here:

1. Initialize the dictionary to contain all strings of length one.


2. Find the longest string W in the dictionary that matches the current input.
3. Emit the dictionary index for W to output and remove W from the input.
4. Add W followed by the next symbol in the input to the dictionary.
5. Go to Step 2.

A dictionary is initialized to contain the single-character strings corresponding to all the possible input
characters (and nothing else except the clear and stop codes if they're being used). The algorithm
works by scanning through the input string for successively longer substrings until it finds one that is not
Lab Manual – DCE---Expt 4
Sem VII EXTC
in the dictionary. When such a string is found, the index for the string without the last character (i.e., the
longest substring that is in the dictionary) is retrieved from the dictionary and sent to output, and the
new string (including the last character) is added to the dictionary with the next available code. The last
input character is then used as the next starting point to scan for substrings.

In this way, successively longer strings are registered in the dictionary and made available for
subsequent encoding as single output values. The algorithm works best on data with repeated patterns,
so the initial parts of a message will see little compression. As the message grows, however, the
compression ratio tends asymptotically to the maximum.

Decoding

The decoding algorithm works by reading a value from the encoded input and outputting the
corresponding string from the initialized dictionary. In order to rebuild the dictionary in the same way as
it was built during encoding, it also obtains the next value from the input and adds to the dictionary the
concatenation of the current string and the first character of the string obtained by decoding the next
input value, or the first character of the string just output if the next value cannot be decoded (If the next
value is unknown to the decoder, then it must be the value that will be added to the dictionary this
iteration, and so its first character must be the same as the first character of the current string being
sent to decoded output). The decoder then proceeds to the next input value (which was already read in
as the "next value" in the previous pass) and repeats the process until there is no more input, at which
point the final input value is decoded without any more additions to the dictionary.

In this way the decoder builds up a dictionary which is identical to that used by the encoder, and uses it
to decode subsequent input values. Thus the full dictionary does not need be sent with the encoded
data; just the initial dictionary containing the single-character strings is sufficient (and is typically defined
Lab Manual – DCE---Expt 4
Sem VII EXTC
beforehand within the encoder and decoder rather than being explicitly sent with the encoded data.)

PROBLEM STATEMENT

The string is as given below


abbcdbabdbbabbaccbd

MATLAB CODE :

clc;
clear all;
close all;
datain=input('enter the string in single quote with symbol $ as End of string =');%input data
Lab Manual – DCE---Expt 4
Sem VII EXTC
lda=length(datain);%length of datainput
dictionary=input('enter the dictionary in single quote(symbol used in string are to be included)=');%input
dictionary
ldi=length(dictionary);%length of dictionary
j=1;%used for generating code
n=0;%used for
%loop used for string array to cell array conversion
for i=1:ldi
dictnew(i)={dictionary(i)};
end

p=datain(1);%first symbol
s=p;%current symbol
k=1; %used for generating transmitting output code
i=1;%for loop
m=0;
while datain(i)~= '$'%end of symbol
c=datain(i+1);
if c~='$'
comb=strcat(s,c);%just for see combination
if strcmp(dictnew,strcat(s,c))==0
dictnew(j+ldi)={strcat(s,c)};
%lopp and check used for generating transmitting
%code array
check=ismember(dictnew,s);
for l=1:length(check)
if check(l)==1
tx_trans(k)=l;
k=k+1;
break;
end
end

s=c;
j=j+1;
i=i+1;
m=m+1;

else

s=strcat(s,c);
i=i+1;
end

else
%for sending last and eof tx_trans
check=ismember(dictnew,s);
for l=1:length(check)
if check(l)==1
Lab Manual – DCE---Expt 4
Sem VII EXTC
tx_trans(k)=l;
k=k+1;
tx_trans(k)=0;
end
end
break;
end
end
display('new dictionary=')
display(dictnew);
display(tx_trans);

PROGRAM OUTPUT :

enter the string with symbol $ as End of string ’abbcdbabdbbabbaccbd$ ’


enter the dictionary(symbol used in string are to be included)= ’abcd ’
new dictionary=
dictnew =
Columns 1 through 12
‘a’ ‘b’ ‘c’ ‘d’ ‘ab’ ‘bb’ ‘bc’ ‘cd’ ‘db’ ‘ba’ ‘abd’ ‘dbb’
Columns 13 through 18
‘bab’ ‘bba’ ‘ac’ ‘cc’ ‘cb’ ‘bd’
tx_trans =
Columns 1 through 15
1 2 2 3 4 2 5 9 10 6 1 3 3 2 4
Column 16
0

Conclusion:
Thus by using coding we have the dictionary of the given string which makes the compression easier

Advantages :
1. Algorithm is easy to implement
2. Produce a lossless compression of images

Disadvantages :
1. LZ78 provides output in three information format.
2. If there is no match in the character, it provides data expansion.

LZW is the best technique which makes the dictionary simple. As the dictionary size increases compression rate
is larger.

Real Life Application:


Lab Manual – DCE---Expt 4
Sem VII EXTC
 LZW compression became the first widely used universal data compression method on
computers. A large English text file can typically be compressed via LZW to about half its
original size.
 LZW was used in the public-domain program compress, which became a more or less standard
utility in Unix systems circa 1986. It has since disappeared from many distributions, both
because it infringed the LZW patent and because gzip produced better compression ratios
using the LZ77-based DEFLATE algorithm, but as of 2008 at least FreeBSD includes both
compress and uncompress as a part of the distribution. Several other popular compression
utilities also used LZW, or closely related methods.
 LZW became very widely used when it became part of the GIF image format in 1987. It may
also (optionally) be used in TIFF and PDF files. (Although LZW is available in Adobe Acrobat
software, Acrobat by default uses DEFLATE for most text and color-table-based image data in
PDF files.)

Post Lab Questions:

1. Explain LZW coding technique.


2. Compare static and dynamic dictionary coding techniques.
3. What are the applications of LZW?
4. What do you mean by Ditionary?
5. What are the limitations of LZW?
6. Compare LZ77 and LZ78 and LZW coding techniques.

Viva Questions:
1. Given an initial dictionary consisting of letters a,b,c,r,y. Encode using LZW algorithm
acbarcarraycbycarray.
2. Given an initial dictionary consisting letters (a,b,c,d,e,f). Encode using LZW algorithm
addaeabccdaceaeafccdeafccde. Also decode the encoding sequence.
3. Show the encoding with example for LZ77. What are the disadvantages over LZW coding?
4. Explain the encoding and decoding for LZW along with its disadvantages.
5. Encode the string ‘mnop mnop ponm’ using LZW. What are the limitations of this method?

You might also like