Version 4 of zlib - inflate in tcl

Updated 2005-10-31 15:52:04

Philip Quaife 30 Oct 05, I have toyed with a tcl only installer some time ago, here is the code for inflating compressed files from a zip archive.

I lifted the code from AMSN, not sure of the copyright. Check with developer before distributing.

I changed the code to take a string and return decompressed string. I also have bracketed the expr and it runs in acceptable time. Enough speed that you could use it to uncompress the zlib dll from a SXE file and then use the dll to uncompress the rest of the archive.

NEM AMSN is distributed under the GPL. See also zlib.

PT 30-Oct-2005: I contacted the author (Youness Alaoui) of the AMSN zlib code some time back with a view to importing this into tcllib. I havn't got around to doing that but this was his reponse: "anyways, about the code I made for the tclzlib, I really don't mind you use it in tcllib... tcllib is great and I'd be happy to help... in fact, I was going to send it to tcllib devs (since it's pure-tcl) if only I had time to finish it... about the license, I don't really understand these things, so I don't mind, as long as it can be usefull to as much people as possible... so, of course, you have the right to use my code, make it into the license you want, etc... the problem is, it works perfectly BUT it's really too slow.. (and it's not modular, everything inside one proc only... I did everything in a single day, so... I just wanted it to work).."

PWQ 31 Oct 05, While the use of bit strings is very Tclish, it is not very efficient. Currently you get 1-2kb/sec decode rate.

If you applied specialisation you could probably get up to 25kb/s. However since 8.5 is going to get the whole zlib added, as well as other limitations in distributing binary extensions, make this more of an academic study rather than a useful module.

Lars H, 31 Oct 2005: I think it could be useful, even if only to improve backwards compatibility. You could have a tcllib package that goes:

  1. If [info tclversion]>=8.5, then use core inflate.
  2. Otherwise check if Trf (or whatever binary extension) is available, and if so use that.
  3. If all else fails, fall back on some variant of the code below.

As I understand it, there already is quite a number of packages in tcllib which work precisely like that.


 #
 # Lifted from AMSN (http://amsn.sf.net}
 #
 #

    proc zip/inflate { stream } {

        #status_log "reading from file $stream\n"
        set time [clock clicks]

        zip/reset 

 #--- Some sort of header starts the block not zlib block header

        set def ""
        binary scan [zip/read $stream 20] b* zlib

 if {0} {

        puts "read header"
        set CMF [string range $zlib 0 7]
        set FLG [string range $zlib 8 15]
        insLog DEBUG "CMF = $CMF\nFLG = $FLG\n" read

        if { [binary format b* $CMF] != "\x78" || [string range $FLG 5 5] != 0 } {
            insLog ERROR "Compression of the zlib data is in an unknown format\n" error
            return -1
        }

        binary scan [binary format b* [string range $zlib 0 15]] S FCHECK
        #status_log "FCHECK is such that CMF_FLG = $FCHECK\n" red
        if { [expr {$FCHECK % 31} ] != 0 } {
            #status_log "FCHECK is not a multiple of 31, corrupted data\n"
            return -1
        }

 #--- End of header
 } 

        set bfinal  "0"

# set idx 16

        set idx 0
        set len 160

        while { $bfinal != "1" } {


            set bfinal [string range $zlib $idx $idx]
            incr idx
            binary scan [binary format b* [string range $zlib $idx [expr {$idx + 1}]]] c btype
            set idx [expr $idx + 2]

            #status_log "Reading compressed block, with compression type $btype and final bloc = $bfinal\n"
            if { $btype == 0 } {

                if { [expr $idx % 8] != 0 } {
                    set idx [expr {$idx + 8 - ( $idx % 8)}]
                }

                binary scan [string range $zlib $idx [expr {$idx + 31}]] SS len nlen 
                set idx [expr {$idx + 32}]
                if { [string map { "0" "1" } $bnlen] != $blen } {
                    #status_log "Len and NLen does not match : [string range $zlib [expr {$idx -32}] [expr {$idx - 17}]] --- [string range $zlib [expr {$idx -16}] [expr {$idx - 1}]]\nValues are $len and $nlen\n" red
                    return -1
                } else {
                    binary scan [string range $zlib $idx [expr {$idx + 1}]] S len 
                }

                #status_log "Reading uncompressed block with length $len from index $idx to [expr $idx + 3 + $len]\n"
                set def "${def}[string range $zlib [expr {$idx + 4}] [expr {$idx + 3 + $len}]]"
                set idx [expr {$idx + 3 + $len}]

            } elseif { $btype == 3 } {
                #status_log "Got reserved word 11 for compression type : error\n" error
                return -1
            } else {
                if { $btype == 2 } {
                    #status_log "Got Huffman's dynamic compression block, processing\n"

                    #set time [clock clicks]

                    binary scan [binary format b* [string range $zlib $idx [expr {$idx + 4}]]] c hlit
                    set idx [expr {$idx + 5}]
                    set hlit [expr {$hlit + 257}]
                    binary scan [binary format b* [string range $zlib $idx [expr {$idx + 4}]]] c hdist
                    set idx [expr {$idx + 5}]
                    incr hdist
                    binary scan [binary format b* [string range $zlib $idx [expr {$idx + 3}]]] c hclen
                    set idx [expr {$idx + 4}]
                    set hclen [expr {$hclen + 4}]

                    #status_log "Got hlit = $hlit \nhdist = $hdist\nhclen = $hclen\n"


                    set codelengths [list 16 17 18 0 8 7 9 6 10 5 11 4 12 3 13 2 14 1 15]
                    for { set i 0 } { $i < [expr {$hclen * 3}] } { set i [expr {$i + 3}]} {
                        if { $idx > $len - 3 } {
                            binary scan [zip/read $stream 30] b* tmp
                            set zlib "[string range $zlib $idx end]$tmp"
                            set idx 0
                            set len [string length $zlib]
                        }
                        binary scan [binary format b* [string range $zlib $idx [expr {$idx + 2}]]] c clen([lindex $codelengths [expr {$i / 3}]])
                        set idx [expr {$idx + 3}]
                    }
                    #status_log "Read the codelengths, idx = $idx -- len = $len --- zlib = $zlib\ncodelengths = \n[array get clen]\n"

                    array set huffcodes [zip/createcodes [array get clen] 7 18]



                    #status_log "huffcodes = [array get huffcodes]\n"
                    #status_log "binary : [string range $zlib $idx  [expr $idx + 100]]\n\n"
                    set inc 0
                    set index 0
                    while { $index < $hlit } {
                        if { $idx >  $len - 7 } {
                            binary scan [zip/read $stream 30] b* tmp
                            set zlib "[string range $zlib $idx end]$tmp"
                            set idx 0
                            set len [string length $zlib]
                        }
                        set bin [string range $zlib $idx [expr {$idx + $inc}]]
                        if { [info exists huffcodes($bin)] } {
                            #                        #status_log "Found a length, for litteral value $index = $huffcodes($bin)\n"
                            set idx [expr {$idx + $inc + 1}]
                            if { $huffcodes($bin) < 16 } {
                                set litclen($index) $huffcodes($bin)
                                incr index
                            } elseif { $huffcodes($bin) == 16 } {
                                set tocopy $litclen([expr {$index - 1}])
                                binary scan [binary format b* [string range $zlib $idx [expr {$idx + 1}]]] c length
                                set length [expr {$length + 3}]
                                incr idx
                                incr idx

                                #                            #status_log "Copying value $tocopy into the next $length codes starting from $index\n"

                                for { set t 0 } { $t < $length } { incr t } {
                                    #                                #status_log "Literal length $index, copied value : $tocopy\n"
                                    set litclen($index) $tocopy
                                    incr index
                                }

                            } elseif { $huffcodes($bin) == 17 } {
                                binary scan [binary format b* [string range $zlib $idx [expr {$idx + 2}]]] c length
                                set length [expr {$length + 3}]
                                set idx [expr {$idx + 3}]
                                #                            #status_log "Copying value 0 into the next $length codes starting from $index\n"
                                for { set t 0 } { $t < $length } { incr t } {
                                    #                                #status_log "Literal length $index, copied value : 0\n"
                                    set litclen($index) 0
                                    incr index
                                }
                            } else {
                                binary scan [binary format b* [string range $zlib $idx [expr {$idx + 6}]]] c length
                                set length [expr {$length + 11}]
                                set idx [expr {$idx + 7}]
                                #                            #status_log "Copying value 0 into the next $length codes starting from $index\n"
                                for { set t 0 } { $t < $length } { incr t } {
                                    #                                #status_log "Literal length $index, copied value : 0\n"
                                    set litclen($index) 0
                                    incr index
                                }

                            }
                            set inc 0
                        } else {
                            incr inc
                            if { $inc > 7 } {
                                #status_log "Erreur.. l'increment a depasse 7.. \ndump :\nindex = $idx - increment = $inc, index = $index, $hlit\nmemoire = [string range $zlib [expr $idx ] [expr $idx + $inc ]]\n"

                                return -1
                            }

                        }




                    }

                    array set litval [zip/createcodes [array get litclen] 18 $hlit]


                    set inc 0
                    set index 0
                    while { $index < $hdist } {
                        if { $idx > $len - 7 } {
                            binary scan [zip/read $stream 30] b* tmp
                            set zlib "[string range $zlib $idx end]$tmp"
                            set idx 0
                            set len [string length $zlib]
                        }
                        set bin [string range $zlib $idx [expr {$idx + $inc}]]
                        if { [info exists huffcodes($bin)] } {
                            #                        #status_log "Found a length, for distance value $index = $huffcodes($bin)\n"
                            set idx [expr {$idx + $inc + 1}]
                            if { $huffcodes($bin) < 16 } {
                                set distclen($index) $huffcodes($bin)
                                incr index
                            } elseif { $huffcodes($bin) == 16 } {
                                set tocopy $distclen([expr {$index - 1}])
                                binary scan [binary format b* [string range $zlib $idx [expr {$idx + 1}]]] c length
                                set length [expr {$length + 3}]
                                incr idx 2

                                #                            #status_log "Copying value $tocopy into the next $length codes starting from $index\n"

                                for { set t 0 } { $t < $length } { incr t } {
                                    #                                #status_log "distance length $index, copied value : $tocopy\n"
                                    set distclen($index) $tocopy
                                    incr index
                                }

                            } elseif { $huffcodes($bin) == 17 } {
                                binary scan [binary format b* [string range $zlib $idx [expr {$idx + 2}]]] c length
                                set length [expr {$length + 3}]
                                set idx [expr {$idx + 3}]
                                #                            #status_log "Copying value 0 into the next $length codes starting from $index\n"
                                for { set t 0 } { $t < $length } { incr t } {
                                    #                                #status_log "distance length $index, copied value : 0\n"
                                    set distclen($index) 0
                                    incr index
                                }
                            } else {
                                binary scan [binary format b* [string range $zlib $idx [expr {$idx + 6}]]] c length
                                set length [expr {$length + 11}]
                                set idx [expr {$idx + 7}]
                                #                            #status_log "Copying value 0 into the next $length codes starting from $index\n"
                                for { set t 0 } { $t < $length } { incr t } {
                                    #                                #status_log "distance length $index, copied value : 0\n"
                                    set distclen($index) 0
                                    incr index
                                }

                            }
                            set inc 0
                        } else {
                            incr inc
                            if { $inc > 7 } {
                                #status_log "Erreur.. l'increment a depasse 7.. \ndump :\nindex = $idx - increment = $inc, index = $index, $hlit\nmemoire = [string range $zlib [expr $idx ] [expr $idx + $inc ]]\n"

                                return -1
                            }

                        }
                    }

                    array set distval [zip/createcodes [array get distclen] 18 $hdist]



                } else {
                    #status_log "Got Huffman's compressed block, processing\n"
                    array set litval [zip/createcodes [zip/fill_length lit] 18 287]
                    array set distval [zip/createcodes [zip/fill_length dist] 18 32]


                }

# status_log "Time for processing header: expr [clock clicks - $time]\n"

                ############################################################################################################

                set inc 0
                set index [string length $def]
                #set time [clock clicks]
                for { } { 1 } { } {
                    if { $idx > $len - 15 } {
                        binary scan [zip/read $stream 30] b* tmp
                        set zlib "[string range $zlib $idx end]$tmp"
                        set idx 0
                        set len [string length $zlib]
                    }
                    set bin [string range $zlib $idx [expr {$idx + $inc}]]
                    #status_log "time for string range : [time "string range $zlib $idx [expr {$idx + $inc}]"]\n"
                    #status_log "Time for infoexits : [time "info exists litval($bin)"] --- bin = $bin\n"
                    if { [info exists litval($bin)] } {
                        set out $litval($bin)
                                            #status_log "Found a length in index $index, for output = $out\n"
                        set idx [expr {$idx + $inc + 1}]
                        if { $out < 256 } {
                            set def "${def}[binary format c $out]"
                            incr index
                           #status_log "Time for literal value : [expr [clock clicks] - $time]\n"
                        } elseif { $out == 256 } {
                            #status_log "FOUND END OF BLOCK\n" red
                            break
                        } else {
                            #status_log "Need to move backward distance $out -- processing\n"
                            #set time [clock clicks]

                            if { $idx > $len - 5 } {
                                binary scan [zip/read $stream 30] b* tmp
                                set zlib "[string range $zlib $idx end]$tmp"
                                set idx 0
                                set len [string length $zlib]
                            }

                            if { $out < 265 } {
                                set plus 0
                                set length [expr {$out - 254}]
                            } elseif { $out == 285 } {
                                set plus 0
                                set length 258
                            } elseif { $out > 264 && $out < 269 } {
                                binary scan [binary format b* [string range $zlib $idx  $idx]] c plus
                                incr idx
                                set length [expr {(($out - 265) * 2) + $plus + 11}]
                            } elseif { $out > 268 && $out < 273} {
                                binary scan [binary format b* [string range $zlib $idx  [expr {$idx + 1}]]] c plus
                                incr idx 2
                                set length [expr {(($out - 269) * 4) + $plus + 19}]
                            } elseif { $out > 272 && $out < 277 } {
                                binary scan [binary format b* [string range $zlib $idx  [expr {$idx + 2}]]] c plus
                                set idx [expr {$idx + 3}]
                                set length [expr {(($out - 273) * 8) + $plus + 35}]
                            } elseif { $out > 276 && $out < 281 } {
                                binary scan [binary format b* [string range $zlib $idx  [expr {$idx + 3}]]] c plus
                                set idx [expr {$idx + 4}]
                                set length [expr {(($out - 277) * 16) + $plus + 67}]
                            } else {
                                binary scan [binary format b* [string range $zlib $idx  [expr {$idx + 4}]]] c plus
                                set idx [expr {$idx + 5}]
                                set length [expr {(($out - 281) * 32) + $plus + 131}]
                            }

                            #status_log "time for ifelses : [expr [clock clicks] - $time]"
                            #status_log "Found length $length with added $plus\n"

                            set out2 -1 
                            set inc2 0
                            while { $out2 == -1 } {
                                if { $idx > $len - 15 } {
                                    binary scan [zip/read $stream 30] b* tmp
                                    set zlib "[string range $zlib $idx end]$tmp"
                                    set idx 0
                                    set len [string length $zlib]
                                }
                                set bin [string range $zlib $idx [expr {$idx + $inc2}]]
                                if { [info exists distval($bin)] } {
                                    set out2 $distval($bin)
                                    #status_log "Found a distance code  $out2\n"
                                    set idx [expr {$idx + $inc2 + 1}]
                                } else {
                                    incr inc2
                                    if { $inc2 > 15 } {
                                        #status_log "Erreur.. l'increment a depasse 15.. \ndump :\nindex = $idx - increment = $inc2, index = $index, $hlit\nmemoire = [string range $zlib [expr $idx ] [expr $idx + $inc2 ]]\n"
                                        return -1
                                    }
                                }
                            }

                            if { $idx > $len - 13 } {
                                binary scan [zip/read $stream 30] b* tmp
                                set zlib "[string range $zlib $idx end]$tmp"
                                set idx 0
                                set len [string length $zlib]
                            }

                            if { $out2 < 4 } {
                                set plus 0
                                set distance [expr {$out2 + 1}]
                            } elseif { $out2 == 4 || $out2 == 5} {
                                binary scan [binary format b* [string range $zlib $idx  $idx]] c plus
                                set plus [expr $plus % 256]
                                incr idx
                                set distance [expr {(($out2 - 4) * 2) + $plus + 5}]
                            } elseif { $out2 == 6 || $out2 == 7} {
                                binary scan [binary format b* [string range $zlib $idx  [expr {$idx + 1}]]] c plus
                                set plus [expr {$plus % 256}]
                                incr idx 2
                                set distance [expr {(($out2 - 6) * 4) + $plus + 9}]
                            } elseif { $out2 == 8 || $out2 == 9 } {
                                binary scan [binary format b* [string range $zlib $idx  [expr {$idx + 2}]]] c plus
                                set plus [expr {$plus % 256}]
                                set idx [expr {$idx + 3}]
                                set distance [expr {(($out2 - 8) * 8) + $plus + 17}]
                            } elseif { $out2 == 10 || $out2 == 11} {
                                binary scan [binary format b* [string range $zlib $idx  [expr {$idx + 3}]]] c plus
                                set plus [expr {$plus % 256}]
                                set idx [expr {$idx + 4}]
                                set distance [expr {(($out2 - 10) * 16) + $plus + 33}]
                            } elseif {$out2 == 12 || $out2 == 13 } {
                                binary scan [binary format b* [string range $zlib $idx  [expr {$idx + 4}]]] c plus
                                set plus [expr {$plus % 256}]
                                set idx [expr {$idx + 5}]
                                set distance [expr {(($out2 - 12) * 32) + $plus + 65}]
                            } elseif {$out2 == 14 || $out2 == 15 } {
                                binary scan [binary format b* [string range $zlib $idx  [expr {$idx + 5}]]] c plus
                                set plus [expr {$plus % 256}]
                                set idx [expr {$idx + 6}]
                                set distance [expr {(($out2 - 14) * 64) + $plus + 129}]
                            } elseif {$out2 == 16 || $out2 == 17 } {
                                binary scan [binary format b* [string range $zlib $idx  [expr {$idx + 6}]]] c plus
                                set plus [expr {$plus % 256}]
                                set idx [expr {$idx + 7}]
                                set distance [expr {(($out2 - 16) * 128) + $plus + 257}]
                            } elseif {$out2 == 18 || $out2 == 19 } {
                                binary scan [binary format b* [string range $zlib $idx  [expr {$idx + 7}]]] c plus
                                set plus [expr {$plus % 256}]
                                set idx [expr {$idx + 8}]
                                set distance [expr {(($out2 - 18) * 256) + $plus + 513}]
                            } elseif {$out2 == 20 || $out2 == 21 } {
                                binary scan [binary format b* [string range $zlib $idx  [expr {$idx + 8}]]] s plus
                                set plus [expr {$plus % 65536}]
                                set idx [expr {$idx + 9}]
                                set distance [expr {(($out2 - 20) * 512) + $plus + 1025}]
                            } elseif {$out2 == 22 || $out2 == 23 } {
                                binary scan [binary format b* [string range $zlib $idx  [expr {$idx + 9}]]] s plus
                                set plus [expr {$plus % 65536}]
                                set idx [expr {$idx + 10}]
                                set distance [expr {(($out2 - 22) * 1024) + $plus + 2049}]
                            } elseif {$out2 == 24 || $out2 == 25 } {
                                binary scan [binary format b* [string range $zlib $idx  [expr {$idx + 10}]]] s plus
                                set plus [expr {$plus % 65536}]
                                set idx [expr {$idx + 11}]
                                set distance [expr {(($out2 - 24) * 2048) + $plus + 4097}]
                            } elseif {$out2 == 26 || $out2 == 27 } {
                                binary scan [binary format b* [string range $zlib $idx  [expr {$idx + 11}]]] s plus
                                set plus [expr {$plus % 65536}]
                                set idx [expr {$idx + 12}]
                                set distance [expr {(($out2 - 26) * 4096) + $plus + 8193}]
                            } elseif {$out2 == 28 || $out2 == 29 } {
                                binary scan [binary format b* [string range $zlib $idx  [expr {$idx + 12}]]] s plus
                                set plus [expr {$plus % 65536}]
                                set idx [expr {$idx + 13}]
                                set distance [expr {(($out2 - 28) * 8192) + $plus + 16385}]
                            }

                            #                        #status_log "Found distance $distance with added $plus\n"

                            set tocopy [string range $def [expr {$index - $distance}] $index]
                            while { [string length $tocopy] <= $length } {
                                set tocopy "${tocopy}${tocopy}"
                            }

                            set tocopy [string range $tocopy 0 [expr {$length -1}]]
                            set def "${def}$tocopy"

                            set index [expr {$index + $length}]
                           #status_log "Time for distance : [expr [clock clicks] - $time]\n"
                        }
                        set inc 0

# set time clock clicks

                    } else {
                        incr inc

                    }
                }

            }

        }


        #status_log "Finished reading and uncompressing zlib blocks of data\n" blue
        #status_log "Time for zlib: [expr [clock clicks] - $time]\n"

        return $def

    }


    proc zip/createcodes { oclen maxbits maxcode } {

        array set clen $oclen 

        #    set clen [list 3 3 3 3 3 2 4 4]

        foreach c [array names clen] {
            if {[info exists bl_count($clen($c))] } {
                incr bl_count($clen($c))
            } else {
                set bl_count($clen($c)) 1
            }
        }

        set code 0

        set bl_count(0) 0;
        #status_log "bl_cout = [array get bl_count]\n"
        for { set bits 1 } { $bits <= $maxbits } {incr bits} {
            if { ![info exists bl_count([expr {$bits - 1}])] } {
                set bl_count([expr {$bits - 1}]) 0
            }
            set code [expr {($code + $bl_count([expr $bits - 1])) << 1}];
            set next_code($bits) $code;
        }

        #status_log "code = $code\nnext_code = [array get next_code]\n"

        for {set n  0} { $n <= $maxcode} {incr n} {
            if { [info exists clen($n)]} {
                set len $clen($n) 
            } else {
                set len 0
            }
            if { $len != 0} {
                binary scan [binary format s $next_code($len)] b$len bin
                #            #status_log "$len = $next_code($len) = $bin = [zip/invert $bin]\n"
                set bin [zip/invert $bin]
                set codes($bin) $n
                incr next_code($len)
            }
        }


        return [array get codes]

    }


    proc zip/invert { bin } {

        set out ""

        for { set i [expr {[string length $bin] - 1}] } { $i >= 0 } { incr i -1} {
            set out "$out[string index $bin $i]"
        }

        return $out
    }

    proc zip/fill_length { type } {
        set out ""
        switch $type { 
            "lit" {
                for { set i 0 } { $i <= 287 } { incr i } {
                    if { $i <= 143 } {
                        set out "$out $i 8"
                    } elseif { $i <= 255 } {
                        set out "$out $i 9"
                    } elseif { $i <= 279 } {
                        set out "$out $i 7"
                    } else {
                        set out "$out $i 8"
                    }
                }
            }
            "dist" {
                for { set i 0 } { $i <= 31} { incr i } {
                    set out "$out $i 5"
                }
            }
        }

        return $out
    }


 variable zip/tell 0


 proc zip/reset {} {variable zip/tell ; set zip/tell 0}


 proc zip/read {stream cnt} {
        variable zip/tell
        set ret [string range $stream ${zip/tell} [expr {${zip/tell} + $cnt -1}]]

        if {${zip/tell} > [string length $stream]} {
                insLog ERROR zip/read past end of file at ${zip/tell} ([string length $stream])
                #error "EOF"
        }
        incr zip/tell $cnt
        set ret
 }


 package provide tclzlib 0.1


Category Tutorial