[NJG] Jun 22, 2004 Recently I have fetched close on a hundred zip archives from the Internet each containing several tens of small ASCII files. (All describing sokoban levels if you want to know. What sokoban is you have to find out for yourself. ([LV] See also [tkSokoban].) Having eyed tclvfs for some time I wrote up a small Tcl program for extracting them that intended to do the job with the vfs::zip package. I immediately got a laconic "bad header" message. Now I don't give up so easily. (Especially because all the other unzippers I have could handle the archive.) Took a zip format specification, displayed the archive in a hex editor and went at deciphering the content together with the operation of zipvfs.tcl (in Tcl/lib/vfs1.3/). Soon enough I located the cause of the problem: the code assumes that the start of the "end of archive" record is at offset 22 from the end of the file. Which is not necessarily true even according to the format spec. My files all contained a trailing 0x0A (end of line), which is how the original zipper represented the zero length string specified in the "end of archive" record. I think the correction is obvious (locate the record by searching for its signature) but I don't think I am the proper one to do it. But I did write a simple script that removed these trailing linefeeds. My small Tcl program then happily run to completion and I was rather pleased with myself ... until I found the first file that contained seemingly binary garbage. (In fact I found 22 of them.) I copy it here in hex form: 2dcc310a80301044d13e903b7c582bad6cedc4ca2244d41b684041118201bdbdc638d563d81d9018ad400a4814c9858c8fe4bc792991423c10be3689f4ffcf8cebb9b90ad336bd6dec60ec40596a558773397c457defceaf1373a073db112ead1e (Let me not forget to mention that the compression method is deflate (#8).) Back to debugging. The first fact I found was that the content was not garbage at all; it was the compressed form of the file copied unchanged from the archive. My second finding was that by default Trf (and eventually some version of zlib) was called to do uncompressing. Then I run the program again but modified to collect all these cases. My third observation was that the first octet was always 0x2d and the second octets were very similar and seemingly following some pattern. Not by chance, according to the PKzip spec these two constitute a flag byte describing the representation of the following content. That’s it. I am finished. Someone more knowledgeable in this area could do something about it. Would be real nice. ---- This might this be the same problem I found with vfs::zip...my fix (''more of a workaround'') has been checked in but I don't know when exactly. You can search the page revisions on [Chuck Ferril] to see the change I made to get deflate working with OpenOffice files. ---- Have these problems been submitted as bug reports to the tclvfs sf.net project? ''Yes, as far as I know it has'' [Vince] says he sees no open bug reports against tclvfs for any of the above. ---- [NJG] June 29, 2004 (CET) Sorry, I could not come back to the wiki sooner. No, I have not submitted any bug report yet. For a couple of reasons: * there should have been two -- only the first problem belongs strictly to tclvfs the other concerns Trf; * on second thought I decided to take care of them myself (I am not that lame after all). Here is the first act. Find the following portion of zipvfs.tcl proc zip::EndOfArchive {fd arr} { upvar 1 $arr cb seek $fd -22 end set pos [tell $fd] set hdr [read $fd 22] binary scan $hdr A4ssssiis xhdr \ cb(ndisk) cb(cdisk) \ cb(nitems) cb(ntotal) \ cb(csize) cb(coff) \ cb(comment) if { ![string equal "PK\05\06" $xhdr]} { error "bad header" } and replace it with proc zip::EndOfArchive {fd arr} { upvar 1 $arr cb seek $fd -512 end set hdr [read $fd 512] set pos [string first "PK\05\06" $hdr] if {$pos == -1} { error "no header found" } set hdr [string range $hdr [expr $pos + 4] [expr $pos + 21]] set pos [expr [tell $fd] + $pos - 512] binary scan $hdr ssssiis \ cb(ndisk) cb(cdisk) \ cb(nitems) cb(ntotal) \ cb(csize) cb(coff) \ cb(comment) This works if the end-of-archive header can be found in the last 512 bytes of the archive. Although the ending zip file comment can be as long as 65535 characters I do not think that an absolute correct solution should be provided. After all until now the problem have not even been detected! As for the second act, I am going to write an extension adding a ''zlib'' command to TCL (initially providing deflate/inflate). It seems that zipvfs.tcl would automatically use it (instead of ''zip'' provided by Trf) if available. (see vfslib.tcl) When? Tough question. Hopefully this coming weekend I will find time for it. I have already looked into the zlib-1.2.1 source distribution. Should not be hard vamping it up with standard Tcl object command generation. If the problem will still persist then I will indeed submit a bug report ... to the zlib people. Finally why not [ztcl]? Because it depends on another package. Compression/decompression should be a single generic service! (Later to be extended from the initial deflate/inflate pair.) Oh! I forgot to mention in my first communication that '''''mkZiplib''''' [http://mkextensions.sourceforge.net/mkZiplib10.htm] I first turned to simply caused an addressing error on Win2k when I selected and then tried to read a file from the archive! ---- [Category Package] a part of [tclvfs]. | [Category Compression]