Download as txt, pdf, or txt
Download as txt, pdf, or txt
You are on page 1of 3

SOURCE: http://blogs.msdn.

com/bclteam/archive/2007/05/16/system-io-compression-
capabilities-kim-hamilton.aspx

System.IO.Compression Capabilities [Kim Hamilton]


We often get asked about the capabilities of the .NET compression classes in
System.IO.Compression. I'd like to clarify what they currently support and mention
some partial workarounds for formats that aren't supported.

The .NET compression libraries support at the core only one type of compression
format, which is Deflate. The Deflate format is specified by the RFC 1951
specification and a straightforward implementation of that is in our DeflateStream
class.

Other compression formats, such as zlib, gzip, and zip, use deflate as a possible
compression method, but may also use other compression methods. In the case that
they use deflate, you can think of these formats as a wrapper around deflate: they
take bytes generated by deflate compression and tack on header info and checksums.

Our GZipStream class does exactly that – it uses DeflateStream and then adds header
info and checksums specific to the gzip format. The gzip format is specified in RFC
1952.

So, out of the box, we support deflate and gzip formats.

Until we provide support for the other formats, which we plan to do soon, there are
partial workarounds that may help you out in some situations, but they're
definitely not a complete solution.

Working with zlib


The zlib format is specified by RFC 1950. Zlib also uses deflate, plus 2 or 6
header bytes, and a 4 byte checksum at the end. The first 2 bytes indicate the
compression method and flags. If the dictionary flag is set, then 4 additional
bytes will follow (which explains why the header will be 2 or 6 bytes). Note that
in the wild, preset dictionaries aren't very common (and our classes don't support
them).

This diagram from RFC 1950 shows the zlib structure:

0 1
+---+---+
|CMF|FLG| (more-->)
+---+---+

(if FLG.FDICT set)

0 1 2 3
+---+---+---+---+
| DICTID | (more-->)
+---+---+---+---+

+=====================+---+---+---+---+
|...compressed data...| ADLER32 |
+=====================+---+---+---+---+
This means that to read a zlib file using only the .NET libraries, you can often
just chop off the first two bytes and 4 end bytes and use DeflateStream on the rest
of the stream as normal. (It would be better to check the dictionary bit and not
attempt to read anything in that case).

Going in the opposite direction isn't as trivial, so I'm not really suggesting to
generate zlib files this way. However, a couple people have asked in the past so
I'll sketch an overview of that.

To start, you need to know which bytes to add at the beginning. With our deflate
implementation, those bytes are 0x58 and 0x85. If you're curious about how this is
derived from RFC 1950, see section 2.2 "Data format" and note that we use a window
size of 8K and the value of FLEVEL should be 2 (default algorithm).

After that, you need to add the Adler-32 checksum at the end. The checksum will
depend on the payload that you're compressing so you need to calculate it
programmatically. Because of this, the easiest way to generate the checksum is to
subclass DeflateStream and override the Write/BeginWrite methods to update the
checksum. Steven Toub's NamedGZipStream article (mentioned at the end) shows an
example of creating such a subclass for generating named gzip files.

Working with other compression formats


The big format you're probably thinking about is zip. Currently the .NET libraries
don't support zip but the J# class libraries do. The following article describes
using these libraries with a C# app.

http://msdn.microsoft.com/msdnmag/issues/03/06/ZipCompression/default.aspx

But if you don't want to rely on the J# class libraries, we'll need to provide a
better solution.

Now that you're familiar with some compression specifications, let's focus on zip a
little more. A zip specification is here:

http://www.pkware.com/documents/casestudies/APPNOTE.TXT

Notice that zip also allows deflate. Again the same principle applies – there are
deflate bytes packaged in a header and footer. This may tempt you into writing a
zip reader/writer based on DeflateStream (as described above for zlib), but there
are two key differences that make zip more complicated.

First, the zip header contains a lot more information than the zlib header. To read
a zip file, you'd definitely have to parse the header to figure out how many bytes
to skip over because the header contains variable length items such as a file name.

Second, zip tools actively use different compression methods. For example, use
Windows compression tool on a very small text file (with just a few words in it)
and then a bigger file, say around 20 KB. Chances are it used no compression (yes,
that's an option) for the small file and deflate for the 20 KB file.

Because different compression methods are used, an extension of the zlib technique
described above may not help you much if you want to use the .NET libraries to read
zip files. You'd definitely have to read the compression method to determine how to
proceed. If it's deflate, then chop off the header and proceed as above. If it's no
compression, chop off the header and read the bytes as a normal stream of bytes. If
it's something else, then the .NET libraries have no built-in support for it.

Additional Note: Using WinZip with our GZipStream


Steven Toub observed in an MSDN article that WinZip can't handle our GZipStream
because it requires filename info. He's created a NamedGZipStream implementation
that generates files readable by WinZip
http://msdn.microsoft.com/msdnmag/issues/05/10/NETMatters/

Our Future Compression Plans


We'd like to address the shortcomings of our compression library in future
releases. The following items are our highest priority compression requests:

Support for more formats, such as ones described above


Better compression ratio
Better compression speed
Are there any others you'd like us to address?

Published Wednesday, May 16, 2007 8:00 AM by BCLTeam


Filed under: compress, System.IO, zip

You might also like