This command, part of zlib, creates a streaming compression or decompression command, allowing greater control over the compression/decompression process. It returns the name of the stream instance command. The mode must be one of compress, decompress, deflate, inflate, gzip, gunzip. The optional level, which is only valid for compressing streams, gives the compression level from 0 (none) to 9 (max).
The returned streamInst command will support the following subcommands:
For simple zlib streaming over sockets like in HTTP, zlib push is sufficient. This breaks down more interactive protocols, as it gives you no way to control when a block is flushed to the receiver. If you want to flush each line, for example, you will need something like the following.
This code simply forces a flush each time $zchan write is called. If that proves insufficient, simply remove the flush flag in method write and call the object's method flush directly.
This code was inspired by an experiment by karll
See zlib manual and http://www.bolet.org/~pornin/deflate-flush.html for more detail on Zlib's flushing modes.
# it appears that [$transchan flush] doesn't get called any time interesting. # So each [$transchan write] needs to flush by itself. # # Flushing an already flushed stream is a harmless error {TCL ZLIB BUF}, so we catch it # oo::class create zchan { variable Stream variable Chan variable Mode constructor {mode} { set Stream [zlib stream $mode] # oo::objdefine [self] forward stream $Stream } method initialize {chan mode} { set Chan $chan set Mode $mode if {$mode eq "write"} { return {initialize finalize write flush} } elseif {$mode eq "read"} { return {initialize finalize read drain} } } method finalize {chan} { my destroy } method write {chan data} { try { $Stream add -flush $data # equivalent to: # $Stream put $data # $Stream flush # $Stream get } trap {TCL ZLIB BUF} {} { return "" } } method flush {chan} { try { $Stream add -flush {} # equivalent to: # $Stream flush # $Stream get } trap {TCL ZLIB BUF} {} { return "" } } method read {chan data} { $Stream add $data } method drain {chan} { $Stream add -finalize {} } } if 0 { lassign [chan pipe] r w chan configure $w -translation binary -buffering none chan configure $r -translation binary -blocking 0 lassign {gzip gunzip} out in puts $w "Frumious bandersnatch!" puts "read: [gets $r]" chan push $w [zchan create gw $out] chan push $r [zchan create gr $in] puts $w "Vorpal snacks!" puts "read: [gets $r]" puts $w "And bric-a-brac!" puts "read: [gets $r]" chan pop $w chan pop $r puts $w "Galumphing back" puts "read: [gets $r]" }
AMG: I'm trying to read data from disk, compress it, and store the compressed result into an SQLite database. For small files this is easy, but Tcl panics when files exceed two gigabytes in size. Tcl strings simply can't grow that large. Thus, I need to stream the data rather than buffer it all at once.
At first I thought the way to go was to use [zlib push deflate] on [db incrblob], then [chan copy] from disk to the incrblob channel, but I have to preallocate the blob. If I set the blob size to that of the disk file (plus 10% in case the file is too random), this would work, except I have to follow up by truncating the blob to the actual compressed size. How can I tell what that size is? [chan copy] returns the uncompressed size, which doesn't do me any good. [chan tell] doesn't work on an incrblob channel. [zlib push] adds some configuration options to the channel, but none of them tell me how many bytes have passed in or out of the stream.
If I could use [zlib push deflate] on the read channel, [chan copy] would return the compressed size, but I get the error "compression may only be applied to writable channels". I really don't know why this error exists, but it's definitely getting in my way.
Next up: [::tcl::transform::zlib] from Tcllib. However, I found that for small files it doesn't produce any output at all. When [finalize] gets called, it's too late to finalize the zlib stream and return the last of the compressed data, so my version does this in [drain] instead. There may be cases where [drain] is too early to finalize the zlib stream, but [chan copy] only does one [drain] at the very end. Code below.
# zlibCompressor -- # Input stream compression. oo::class create zlibCompressor { variable stream method initialize {handle mode} { set stream [zlib stream deflate -level 9] return {initialize finalize drain read} } method finalize {handle} { $stream close my destroy } method drain {handle} { $stream finalize return [$stream get][$stream reset] } method read {handle data} { $stream add $data } } oo::objdefine zlibCompressor method push {chan} { chan push $chan [my new] }
Alas, this still doesn't work. [finalize] can return a lot of data all at once, but [chan copy] throws away all but the first four kilobytes or so. If I attempt to manually drain the rest using [chan read], that incurs more [finalize]s, giving me an unlimited stream of bogus data.
The only thing I can really do is bypass [chan copy] altogether:
set inChan [open $input rb] set outChan [$db incrblob $table $column $rowid] set stream [zlib stream gzip -level 9] set size 0 set end 0 while {!$end} { if {[set inData [chan read $inChan 4096]] ne {}} { $stream put $inData } else { $stream finalize set end 1 } set outData [$stream get] chan puts -nonewline $outChan $outData incr size [string length $outData] } $stream close chan close $inChan chan close $outChan chan puts $size
As I dug in deeper, I discovered SQLite has blob size limits too, much tighter than Tcl even, so I had to implement a chunking scheme dividing files across multiple table rows. The incrblob system became less and less a good fit, but [zlib stream] is proving to be indispensable for this task.
AMG: Even though the documentation says that [get] returns as much data as is available, in practice it seems to only return at most 65536 bytes at a time. If more data than that is immediately available, [get] has to be called repeatedly until it returns less than that amount (or empty string). I lost so much time trying to debug this in my program...