[Philip Quaife] ''30 Oct 05'', I have toyed with a tcl only installer some time ago, here is the code for inflating compressed files from a zip archive. I lifted the code from [AMSN], not sure of the copyright. Check with developer before distributing. I changed the code to take a string and return decompressed string. I also have bracketed the [expr] and it runs in acceptable time. Enough speed that you could use it to uncompress the zlib dll from a SXE file and then use the dll to uncompress the rest of the archive. [NEM] AMSN is distributed under the [GPL]. See also [zlib]. [PT] 30-Oct-2005: I contacted the author (Youness Alaoui) of the AMSN zlib code some time back with a view to importing this into [tcllib]. I havn't got around to doing that but this was his reponse: "anyways, about the code I made for the tclzlib, I really don't mind you use it in tcllib... tcllib is great and I'd be happy to help... in fact, I was going to send it to tcllib devs (since it's pure-tcl) if only I had time to finish it... about the license, I don't really understand these things, so I don't mind, as long as it can be useful to as much people as possible... so, of course, you have the right to use my code, make it into the license you want, etc... the problem is, it works perfectly BUT it's really too slow.. (and it's not modular, everything inside one proc only... I did everything in a single day, so... I just wanted it to work).." [PWQ] ''31 Oct 05'', While the use of bit strings is very ''Tclish'', it is not very efficient. Currently you get 1-2kb/sec decode rate. If you applied specialisation you could probably get up to 25kb/s. However since 8.5 is going to get the whole zlib added, as well as other limitations in distributing binary extensions, make this more of an academic study rather than a useful module. [Lars H], 31 Oct 2005: I think it could be useful, even if only to improve backwards compatibility. You could have a tcllib package that goes: 1. If [[info tclversion]]>=8.5, then use core inflate. 2. Otherwise check if [Trf] (or whatever binary extension) is available, and if so use that. 3. If all else fails, fall back on some variant of the code below. As I understand it, there already is quite a number of packages in [tcllib] which work precisely like that. [PWQ] ''1 Nov 05'', I have removed the remaining inefficient constructs and the current decode speed is now 7-8kb/s. Unfortunately there are errors in the decoding process as tested by unziping gzip created files. '''May be fixed now. Unset all arrays at start of each block'''. In responce to LarsH:, It would be of use as part of a TCL only installer (once it is debugged) but what else is it good for? [Lars H]: My primary use case would be a [PDF] file reader -- I don't mean something like Acrobat, but rather something for programmatically reading data stored in a PDF file. Streams (e.g. page contents) in PDF files are routinely flate-compressed to reduce overall size, even though each individual stream usually is no more than a couple of Kb in size. [SEH] 20051101 -- Nice work. I've been suggesting incorporation of the AMSN zlib code into tcllib for a while. It's nice to see that all it took was asking the author to get permission. I think this should go into tcllib, if only for demo and performance benchmarking purposes. Plus, even if there is a new zlib library in the core, Version 1.0 problems may result in bugs or incompatibilities on some obscure platform out there. Pure Tcl good. ---- # # Lifted from AMSN (http://amsn.sf.net) # # # pre optimisation code cleanup # proc zip/inflate { stream } { variable zip/tell #status_log "reading from file $stream\n" set time [clock clicks] zip/reset set def {} binary scan [zip/read $stream 20] b* zlib set bfinal 0 set idx 0 set len 160 while { $bfinal != "1" } { set bfinal [string range $zlib $idx $idx] incr idx binary scan [binary format b* [string range $zlib $idx [expr {$idx + 1}]]] c btype set idx [expr $idx + 2] #status_log "Reading compressed block, with compression type $btype and final bloc = $bfinal\n" if { $btype == 0 } { if { [expr $idx % 8] != 0 } { set idx [expr {$idx + 8 - ( $idx % 8)}] } binary scan [string range $zlib $idx [expr {$idx + 31}]] SS len nlen set idx [expr {$idx + 32}] if { [string map { "0" "1" } $nlen] != $len } { error "Len and NLen does not match : [string range $zlib [expr {$idx -32}] [expr {$idx - 17}]] --- [string range $zlib [expr {$idx -16}] [expr {$idx - 1}]]\nValues are $len and $nlen\n" return -1 } else { binary scan [string range $zlib $idx [expr {$idx + 1}]] S len } #status_log "Reading uncompressed block with length $len from index $idx to [expr $idx + 3 + $len]\n" #set def "${def}[string range $zlib [expr {$idx + 4}] [expr {$idx + 3 + $len}]]" append def [string range $zlib [expr {$idx + 4}] [expr {$idx + 3 + $len}]] set idx [expr {$idx + 3 + $len}] } elseif { $btype == 3 } { error "Got reserved word 11 for compression type : error\n" return -1 } else { if { $btype == 2 } { puts "@${zip/tell}: Got Huffman's dynamic compression block, processing\n" binary scan [binary format b* [string range $zlib $idx [expr {$idx + 4}]]] c hlit set idx [expr {$idx + 5}] set hlit [expr {$hlit + 257}] binary scan [binary format b* [string range $zlib $idx [expr {$idx + 4}]]] c hdist set idx [expr {$idx + 5}] incr hdist binary scan [binary format b* [string range $zlib $idx [expr {$idx + 3}]]] c hclen set idx [expr {$idx + 4}] set hclen [expr {$hclen + 4}] catch {unset clen} set codelengths [list 16 17 18 0 8 7 9 6 10 5 11 4 12 3 13 2 14 1 15] for { set i 0 } { $i < [expr {$hclen * 3}] } { set i [expr {$i + 3}]} { if { $idx > $len - 3 } { binary scan [zip/read $stream 30] b* tmp set zlib "[string range $zlib $idx end]$tmp" set idx 0 set len [string length $zlib] } binary scan [binary format b* [string range $zlib $idx [expr {$idx + 2}]]] c clen([lindex $codelengths [expr {$i / 3}]]) set idx [expr {$idx + 3}] } catch {unset huffcodes} array set huffcodes [zip/createcodes [array get clen] 7 18] puts " dyn code lens $hlit , # dist $hdist, # lens $hclen" set inc 0 set index 0 while { $index < $hlit } { if { $idx > $len - 7 } { binary scan [zip/read $stream 30] b* tmp set zlib "[string range $zlib $idx end]$tmp" set idx 0 set len [string length $zlib] } set bin [string range $zlib $idx [expr {$idx + $inc}]] if { [info exists huffcodes($bin)] } { # #status_log "Found a length, for litteral value $index = $huffcodes($bin)\n" set idx [expr {$idx + $inc + 1}] if { $huffcodes($bin) < 16 } { set litclen($index) $huffcodes($bin) incr index } elseif { $huffcodes($bin) == 16 } { set tocopy $litclen([expr {$index - 1}]) binary scan [binary format b* [string range $zlib $idx [expr {$idx + 1}]]] c length set length [expr {$length + 3}] incr idx 2 for { set t 0 } { $t < $length } { incr t } { # #status_log "Literal length $index, copied value : $tocopy\n" set litclen($index) $tocopy incr index } } elseif { $huffcodes($bin) == 17 } { binary scan [binary format b* [string range $zlib $idx [expr {$idx + 2}]]] c length set length [expr {$length + 3}] set idx [expr {$idx + 3}] # #status_log "Copying value 0 into the next $length codes starting from $index\n" for { set t 0 } { $t < $length } { incr t } { # #status_log "Literal length $index, copied value : 0\n" set litclen($index) 0 incr index } } else { binary scan [binary format b* [string range $zlib $idx [expr {$idx + 6}]]] c length set length [expr {$length + 11}] set idx [expr {$idx + 7}] # #status_log "Copying value 0 into the next $length codes starting from $index\n" for { set t 0 } { $t < $length } { incr t } { # #status_log "Literal length $index, copied value : 0\n" set litclen($index) 0 incr index } } set inc 0 } else { incr inc if { $inc > 7 } { error "Erreur.. l'increment a depasse 7.. \ndump :\nindex = $idx - increment = $inc, index = $index, $hlit\nmemoire = [string range $zlib [expr $idx ] [expr $idx + $inc ]]\n" return -1 } } } catch {unset litval} array set litval [zip/createcodes [array get litclen] 18 $hlit] set inc 0 set index 0 while { $index < $hdist } { if { $idx > $len - 7 } { binary scan [zip/read $stream 30] b* tmp set zlib "[string range $zlib $idx end]$tmp" set idx 0 set len [string length $zlib] } set bin [string range $zlib $idx [expr {$idx + $inc}]] if { [info exists huffcodes($bin)] } { # #status_log "Found a length, for distance value $index = $huffcodes($bin)\n" set idx [expr {$idx + $inc + 1}] if { $huffcodes($bin) < 16 } { set distclen($index) $huffcodes($bin) incr index } elseif { $huffcodes($bin) == 16 } { set tocopy $distclen([expr {$index - 1}]) binary scan [binary format b* [string range $zlib $idx [expr {$idx + 1}]]] c length set length [expr {$length + 3}] incr idx 2 # #status_log "Copying value $tocopy into the next $length codes starting from $index\n" for { set t 0 } { $t < $length } { incr t } { # #status_log "distance length $index, copied value : $tocopy\n" set distclen($index) $tocopy incr index } } elseif { $huffcodes($bin) == 17 } { binary scan [binary format b* [string range $zlib $idx [expr {$idx + 2}]]] c length set length [expr {$length + 3}] set idx [expr {$idx + 3}] # #status_log "Copying value 0 into the next $length codes starting from $index\n" for { set t 0 } { $t < $length } { incr t } { # #status_log "distance length $index, copied value : 0\n" set distclen($index) 0 incr index } } else { binary scan [binary format b* [string range $zlib $idx [expr {$idx + 6}]]] c length set length [expr {$length + 11}] set idx [expr {$idx + 7}] # #status_log "Copying value 0 into the next $length codes starting from $index\n" for { set t 0 } { $t < $length } { incr t } { # #status_log "distance length $index, copied value : 0\n" set distclen($index) 0 incr index } } set inc 0 } else { incr inc if { $inc > 7 } { error "Erreur.. l'increment a depasse 7.. \ndump :\nindex = $idx - increment = $inc, index = $index, $hlit\nmemoire = [string range $zlib [expr $idx ] [expr $idx + $inc ]]\n" return -1 } } } catch {unset distval} array set distval [zip/createcodes [array get distclen] 18 $hdist] } else { puts "Got Huffman's compressed block, processing\n" catch {unset litval} array set litval [zip/createcodes [zip/fill_length lit] 18 287] catch {unset distval} array set distval [zip/createcodes [zip/fill_length dist] 18 32] } # status_log "Time for processing header: [expr [clock clicks] - $time]\n" ############################################################################################################ set inc 0 set index [string length $def] #set time [clock clicks] for { } { 1 } { } { if { $idx > $len - 15 } { binary scan [zip/read $stream 30] b* tmp set zlib "[string range $zlib $idx end]$tmp" set idx 0 set len [string length $zlib] } set bin [string range $zlib $idx [expr {$idx + $inc}]] #status_log "time for string range : [time "string range $zlib $idx [expr {$idx + $inc}]"]\n" #status_log "Time for infoexits : [time "info exists litval($bin)"] --- bin = $bin\n" if { [info exists litval($bin)] } { set out $litval($bin) #status_log "Found a length in index $index, for output = $out\n" set idx [expr {$idx + $inc + 1}] if { $out < 256 } { #set def "${def}[binary format c $out]" append def [binary format c $out] incr index #status_log "Time for literal value : [expr [clock clicks] - $time]\n" } elseif { $out == 256 } { #status_log "FOUND END OF BLOCK\n" red break } else { #status_log "Need to move backward distance $out -- processing\n" #set time [clock clicks] if { $idx > $len - 5 } { binary scan [zip/read $stream 30] b* tmp set zlib "[string range $zlib $idx end]$tmp" set idx 0 set len [string length $zlib] } if { $out < 265 } { set plus 0 set length [expr {$out - 254}] } elseif { $out == 285 } { set plus 0 set length 258 } elseif { $out > 264 && $out < 269 } { binary scan [binary format b* [string range $zlib $idx $idx]] c plus incr idx set length [expr {(($out - 265) * 2) + $plus + 11}] } elseif { $out > 268 && $out < 273} { binary scan [binary format b* [string range $zlib $idx [expr {$idx + 1}]]] c plus incr idx 2 set length [expr {(($out - 269) * 4) + $plus + 19}] } elseif { $out > 272 && $out < 277 } { binary scan [binary format b* [string range $zlib $idx [expr {$idx + 2}]]] c plus set idx [expr {$idx + 3}] set length [expr {(($out - 273) * 8) + $plus + 35}] } elseif { $out > 276 && $out < 281 } { binary scan [binary format b* [string range $zlib $idx [expr {$idx + 3}]]] c plus set idx [expr {$idx + 4}] set length [expr {(($out - 277) * 16) + $plus + 67}] } else { binary scan [binary format b* [string range $zlib $idx [expr {$idx + 4}]]] c plus set idx [expr {$idx + 5}] set length [expr {(($out - 281) * 32) + $plus + 131}] } #status_log "time for ifelses : [expr [clock clicks] - $time]" #status_log "Found length $length with added $plus\n" set out2 -1 set inc2 0 while { $out2 == -1 } { if { $idx > $len - 15 } { binary scan [zip/read $stream 30] b* tmp set zlib "[string range $zlib $idx end]$tmp" set idx 0 set len [string length $zlib] } set bin [string range $zlib $idx [expr {$idx + $inc2}]] if { [info exists distval($bin)] } { set out2 $distval($bin) #status_log "Found a distance code $out2\n" set idx [expr {$idx + $inc2 + 1}] } else { incr inc2 if { $inc2 > 15 } { error "Erreur.. l'increment a depasse 15.. \ndump :\nindex = $idx - increment = $inc2, index = $index, $hlit\nmemoire = [string range $zlib [expr $idx ] [expr $idx + $inc2 ]]\n" return -1 } } } if { $idx > $len - 13 } { binary scan [zip/read $stream 30] b* tmp set zlib "[string range $zlib $idx end]$tmp" set idx 0 set len [string length $zlib] } if { $out2 < 4 } { set plus 0 set distance [expr {$out2 + 1}] } elseif { $out2 == 4 || $out2 == 5} { binary scan [binary format b* [string range $zlib $idx $idx]] c plus set plus [expr $plus % 256] incr idx set distance [expr {(($out2 - 4) * 2) + $plus + 5}] } elseif { $out2 == 6 || $out2 == 7} { binary scan [binary format b* [string range $zlib $idx [expr {$idx + 1}]]] c plus set plus [expr {$plus % 256}] incr idx 2 set distance [expr {(($out2 - 6) * 4) + $plus + 9}] } elseif { $out2 == 8 || $out2 == 9 } { binary scan [binary format b* [string range $zlib $idx [expr {$idx + 2}]]] c plus set plus [expr {$plus % 256}] set idx [expr {$idx + 3}] set distance [expr {(($out2 - 8) * 8) + $plus + 17}] } elseif { $out2 == 10 || $out2 == 11} { binary scan [binary format b* [string range $zlib $idx [expr {$idx + 3}]]] c plus set plus [expr {$plus % 256}] set idx [expr {$idx + 4}] set distance [expr {(($out2 - 10) * 16) + $plus + 33}] } elseif {$out2 == 12 || $out2 == 13 } { binary scan [binary format b* [string range $zlib $idx [expr {$idx + 4}]]] c plus set plus [expr {$plus % 256}] set idx [expr {$idx + 5}] set distance [expr {(($out2 - 12) * 32) + $plus + 65}] } elseif {$out2 == 14 || $out2 == 15 } { binary scan [binary format b* [string range $zlib $idx [expr {$idx + 5}]]] c plus set plus [expr {$plus % 256}] set idx [expr {$idx + 6}] set distance [expr {(($out2 - 14) * 64) + $plus + 129}] } elseif {$out2 == 16 || $out2 == 17 } { binary scan [binary format b* [string range $zlib $idx [expr {$idx + 6}]]] c plus set plus [expr {$plus % 256}] set idx [expr {$idx + 7}] set distance [expr {(($out2 - 16) * 128) + $plus + 257}] } elseif {$out2 == 18 || $out2 == 19 } { binary scan [binary format b* [string range $zlib $idx [expr {$idx + 7}]]] c plus set plus [expr {$plus % 256}] set idx [expr {$idx + 8}] set distance [expr {(($out2 - 18) * 256) + $plus + 513}] } elseif {$out2 == 20 || $out2 == 21 } { binary scan [binary format b* [string range $zlib $idx [expr {$idx + 8}]]] s plus set plus [expr {$plus % 65536}] set idx [expr {$idx + 9}] set distance [expr {(($out2 - 20) * 512) + $plus + 1025}] } elseif {$out2 == 22 || $out2 == 23 } { binary scan [binary format b* [string range $zlib $idx [expr {$idx + 9}]]] s plus set plus [expr {$plus % 65536}] set idx [expr {$idx + 10}] set distance [expr {(($out2 - 22) * 1024) + $plus + 2049}] } elseif {$out2 == 24 || $out2 == 25 } { binary scan [binary format b* [string range $zlib $idx [expr {$idx + 10}]]] s plus set plus [expr {$plus % 65536}] set idx [expr {$idx + 11}] set distance [expr {(($out2 - 24) * 2048) + $plus + 4097}] } elseif {$out2 == 26 || $out2 == 27 } { binary scan [binary format b* [string range $zlib $idx [expr {$idx + 11}]]] s plus set plus [expr {$plus % 65536}] set idx [expr {$idx + 12}] set distance [expr {(($out2 - 26) * 4096) + $plus + 8193}] } elseif {$out2 == 28 || $out2 == 29 } { binary scan [binary format b* [string range $zlib $idx [expr {$idx + 12}]]] s plus set plus [expr {$plus % 65536}] set idx [expr {$idx + 13}] set distance [expr {(($out2 - 28) * 8192) + $plus + 16385}] } # #status_log "Found distance $distance with added $plus\n" set tocopy [string range $def [expr {$index - $distance}] $index] while { [string length $tocopy] <= $length } { #set tocopy "${tocopy}${tocopy}" append tocopy $tocopy } set tocopy [string range $tocopy 0 [expr {$length -1}]] #set def "${def}$tocopy" append def $tocopy set index [expr {$index + $length}] #status_log "Time for distance : [expr [clock clicks] - $time]\n" } set inc 0 # set time [clock clicks] } else { incr inc } } } } return $def } proc zip/createcodes { oclen maxbits maxcode } { puts " Called ccodes with max $maxbits maxcode $maxcode #syms [expr {[llength $oclen]/2}]" array set clen $oclen # set clen [list 3 3 3 3 3 2 4 4] foreach c [array names clen] { if {[info exists bl_count([set _c $clen($c)])] } { incr bl_count($_c) } else { set bl_count($_c) 1 } } set code 0 set bl_count(0) 0; #status_log "bl_cout = [array get bl_count]\n" for { set bits 1 } { $bits <= $maxbits } {incr bits} { if { ![info exists bl_count([expr {$bits - 1}])] } { set bl_count([expr {$bits - 1}]) 0 } set code [expr {($code + $bl_count([expr {$bits - 1}])) << 1}]; set next_code($bits) $code; } #status_log "code = $code\nnext_code = [array get next_code]\n" for {set n 0} { $n <= $maxcode} {incr n} { if { [info exists clen($n)]} { set len $clen($n) } else { set len 0 } if { $len != 0} { binary scan [binary format s $next_code($len)] b$len bin # #status_log "$len = $next_code($len) = $bin = [zip/invert $bin]\n" set bin [zip/invert $bin] set codes($bin) $n incr next_code($len) } } return [array get codes] } proc zip/invert { bin } { set out "" for { set i [expr {[string length $bin] - 1}] } { $i >= 0 } { incr i -1} { append out [string index $bin $i] } set out } proc zip/fill_length { type } { puts "Call fill_length $type" variable _lit variable _dist set out "" switch $type { "lit" { if {[info exists _lit]} {return $_lit} for { set i 0 } { $i <= 287 } { incr i } { if { $i <= 143 } { append out "$i 8 " } elseif { $i <= 255 } { append out "$i 9 " } elseif { $i <= 279 } { append out " $i 7 " } else { append out "$i 8 " } } set _lit $out } "dist" { if {[info exists _dist]} {return $_dist} for { set i 0 } { $i <= 31} { incr i } { append out "$i 5 " } set _dist $out } } return $out } variable zip/tell 0 proc zip/reset {} {variable zip/tell ; set zip/tell 0} proc zip/read {stream cnt} { variable zip/tell set ret [string range $stream ${zip/tell} [expr {${zip/tell} + $cnt -1}]] if {${zip/tell} == [string length $stream]} {return {}} if {${zip/tell} > [string length $stream]} { #error "EOF" } else { incr zip/tell $cnt if {${zip/tell} > [string length $stream]} { puts stderr "zip/read past end of file at ${zip/tell} ([string length $stream])" set zip/tell [string length $stream] } } set ret } package provide tclzlib 0.12 ---- [Category Tutorial] [Category Compression]