Version 43 of vfs::zip

Updated 2007-05-18 19:22:07 by lvirden

vfs::zip aka zipvfs is a part of the tclvfs package. Bug reports, patches, enhancement requests, etc. go there.


NJG Jun 22, 2004

Recently I have fetched close on a hundred zip archives from the Internet each containing several tens of small ASCII files. (All describing sokoban levels if you want to know. What sokoban is you have to find out for yourself. (LV See also tkSokoban.) Having eyed tclvfs for some time I wrote up a small Tcl program for extracting them that intended to do the job with the vfs::zip package.

I immediately got a laconic "bad header" message. Now I don't give up so easily. (Especially because all the other unzippers I have could handle the archive.) Took a zip format specification, displayed the archive in a hex editor and went at deciphering the content together with the operation of zipvfs.tcl (in Tcl/lib/vfs1.3/). Soon enough I located the cause of the problem: the code assumes that the start of the "end of archive" record is at offset 22 from the end of the file. Which is not necessarily true even according to the format spec. My files all contained a trailing 0x0A (end of line), which is how the original zipper represented the zero length string specified in the "end of archive" record. I think the correction is obvious (locate the record by searching for its signature) but I don't think I am the proper one to do it. But I did write a simple script that removed these trailing linefeeds.

My small Tcl program then happily run to completion and I was rather pleased with myself ... until I found the first file that contained seemingly binary garbage. (In fact I found 22 of them.) I copy it here in hex form:

 2dcc310a80301044d13e903b7c582bad6cedc4ca2244d41b684041118201bdbdc638d563d81d9018ad400a4814c9858c8fe4bc792991423c10be3689f4ffcf8cebb9b90ad336bd6dec60ec40596a558773397c457defceaf1373a073db112ead1e

(Let me not forget to mention that the compression method is deflate (#8).)

Back to debugging. The first fact I found was that the content was not garbage at all; it was the compressed form of the file copied unchanged from the archive. My second finding was that by default Trf (and eventually some version of zlib) was called to do uncompressing.

Then I run the program again but modified to collect all these cases. My third observation was that the first octet was always 0x2d and the second octets were very similar and seemingly following some pattern. Not by chance, according to the PKzip spec these two constitute a flag byte describing the representation of the following content.

That's it. I am finished. Someone more knowledgeable in this area could do something about it. Would be real nice.


This might this be the same problem I found with vfs::zip...my fix (more of a workaround) has been checked in but I don't know when exactly. You can search the page revisions on Chuck Ferril to see the change I made to get deflate working with OpenOffice files.


Have these problems been submitted as bug reports to the tclvfs sf.net project? Yes, as far as I know it has

Vince says he sees no open bug reports against tclvfs for any of the above.


NJG June 29, 2004 (CET)

Sorry, I could not come back to the wiki sooner.

No, I have not submitted any bug report yet. For a couple of reasons:

  • there should have been two -- only the first problem belongs strictly to tclvfs the other concerns Trf;
  • on second thought I decided to take care of them myself (I am not that lame after all).

Here is the first act. Find the following portion of zipvfs.tcl

 proc zip::EndOfArchive {fd arr} {
    upvar 1 $arr cb

    seek $fd -22 end
    set pos [tell $fd]
    set hdr [read $fd 22]

    binary scan $hdr A4ssssiis xhdr \
        cb(ndisk) cb(cdisk) \
        cb(nitems) cb(ntotal) \
        cb(csize) cb(coff) \
        cb(comment) 

    if { ![string equal "PK\05\06" $xhdr]} {
        error "bad header"
    }

and replace it with

 proc zip::EndOfArchive {fd arr} {
    upvar 1 $arr cb

    seek $fd -512 end
    set hdr [read $fd 512]
    set pos [string first "PK\05\06" $hdr]
    if {$pos == -1} {
        error "no header found"
    }
    set hdr [string range $hdr [expr $pos + 4] [expr $pos + 21]]
    set pos [expr [tell $fd] + $pos - 512]

    binary scan $hdr ssssiis \
        cb(ndisk) cb(cdisk) \
        cb(nitems) cb(ntotal) \
        cb(csize) cb(coff) \
        cb(comment) 

This works if the end-of-archive header can be found in the last 512 bytes of the archive. Although the ending zip file comment can be as long as 65535 characters I do not think that an absolute correct solution should be provided. After all until now the problem have not even been detected!

As for the second act, I am going to write an extension adding a zlib command to TCL (initially providing deflate/inflate). It seems that zipvfs.tcl would automatically use it (instead of zip provided by Trf) if available. (see vfslib.tcl)

When? Tough question. Hopefully this coming weekend I will find time for it. I have already looked into the zlib-1.2.1 source distribution. Should not be hard vamping it up with standard Tcl object command generation.

If the problem will still persist then I will indeed submit a bug report ... to the zlib people.

Finally why not ztcl? Because it depends on another package. Compression/decompression should be a single generic service! (Later to be extended from the initial deflate/inflate pair.)

Oh! I forgot to mention in my first communication that mkZiplib [L1 ] I first turned to simply caused an addressing error on Win2k when I selected and then tried to read a file from the archive!

AK: The fix for EndOfArchive above is wrong and does not handle any zip archive smaller than 512 bytes. Any such archive will cause the [seek $fd -512 end] to fail, causing the mount to fail. Bug report see http://sourceforge.net/tracker/index.php?func=detail&aid=1003574&group_id=32692&atid=406308 , original report at http://bugs.activestate.com/show_bug.cgi?id=33195

Here is my fix. Replace the seek $fd -512 end with

    seek $fd 0 end
    set n [tell $fd]
    if {$n < 512} {set n -$n} else {set n -512}
    seek $fd $n end

This limits the seek to the size of the zip or 512 bytes, whatever is smaller.


NJG July 1, 2004

Funny ending of the story ...

To eliminate problem #2, line

 set data [vfs::zip -mode decompress -nowrap 1 $data]

in zipvfs.tcl should be replaced with

 set data [vfs::zip -mode decompress -nowrap 1 -- $data]

I have come to the solution the hard way. I put together a VC++ project to create the tcl zlib package and was already in the process of writing the checks for the command options when suddenly remembered that before conversion to hex the first charater had been - (minus). Now I went at the Trf sources and soon enough found that the option processing was misled by the leading - and the net result was that the zip command instead of decompressing the input string compressed it! Fortunately the end-of-option flag , --, was delt with, hereby the above correction.

Now perhaps would be the time to provide the abs0lute correct patch for problem #1 as well!

SEH it was pointed out by jcw that AMSN [L2 ] has a pure-Tcl zlib-compatible decompress routine. Perhaps someone should look into incorporating that into vfs::zip.

AK: Not possible, as much as I like this. AMSN is GPL. vfs is BSD-type. so we cannot take part of it and add it to vfs. The other way around is possible, but not amsn-2-vfs.

Daniel : Have you tried contacting them and asking for changing the license of the library? It should be no problem

SEH I corresponded with the author of AMSN, and he's agnostic about license issues. He would be fine with reclassifying the decompress code as BSD-style license, making it eligible for incorporation into Tcllib.


LV So, bottom line - have the above bugs been reported to the Trf and vfs maintainers or not? Reporting bugs and solutions here on the wiki is nice, but even more importantly is to get them into the appropriate bug/patch databases, so the authors can resolve the underlying issues.


[ Category Package a part of tclvfs.

Category Compression Category VFS ]