** Documentation ** * http://www.tcl.tk/man/tcl/TclCmd/zlib.htm#M43 : '''zlib stream''' ''mode'' ?''level''? This command, part of [zlib], creates a streaming compression or decompression command, allowing greater control over the compression/decompression process. It returns the name of the stream instance command. The ''mode'' must be one of '''compress''', '''decompress''', '''deflate''', '''inflate''', '''gzip''', '''gunzip'''. The optional ''level'', which is only valid for compressing streams, gives the compression level from 0 (none) to 9 (max). The returned ''streamInst'' command will support the following subcommands: : ''streamInst'' '''add''' ?''option''? ''data'' ** Shortcut for a '''put''' followed by a '''get'''. : ''streamInst'' '''checksum''' ** Returns the current checksum of the uncompressed data, calculated using the appropriate algorithm for the stream's ''mode''. : ''streamInst'' '''close''' ** Disposes of the ''streamInst'' command. Deleting with [rename] works the same. : ''streamInst'' '''eof''' ** Returns whether the end of the input data has been reached. : ''streamInst'' '''finalize''' ** Shortcut for “''streamInst'' '''put -finalize''' `{}`”. : ''streamInst'' '''flush''' ** Shortcut for “''streamInst'' '''put -flush''' `{}`”. : ''streamInst'' '''fullflush''' ** Shortcut for “''streamInst'' '''put -fullflush''' `{}`”. : ''streamInst'' '''get''' ?''count''? ** Return up to ''count'' bytes from the stream's internal buffers. If ''count'' is unspecified, return as much as is available (without flushing). : ''streamInst'' '''put''' ?''option''? ''data'' ** Appends the bytes ''data'' to the stream, compressing or decompressing as necessary. The ''option'' controls the type of flush done: '''-flush''' means to ensure that all data appended to the stream has been processed and made ready for '''get''' at some compression performance penalty, '''-fullflush''' also makes sure that the compression engine can restart from the point after the flush (at more penalty), and '''-finalize''' states that no more data will be written to the stream, causing any trailing bytes required by the format to be written. : ''streamInst'' '''reset''' ** Recreates the stream, ready to start afresh. Discards whatever is in the stream's buffers. ** Example - streaming over sockets ** For simple zlib streaming over sockets like in HTTP, [zlib push] is sufficient. This breaks down more interactive protocols, as it gives you no way to control when a block is flushed to the receiver. If you want to flush each line, for example, you will need something like the following. This code simply forces a flush each time `$zchan write` is called. If that proves insufficient, simply remove the `flush` flag in `method write` and call the object's `method flush` directly. This code was inspired by https://github.com/lehenbauer/compressed_socket_speriments%|%an experiment by karll%|% See http://www.zlib.net/manual.html%|%zlib manual%|% and http://www.bolet.org/~pornin/deflate-flush.html for more detail on Zlib's flushing modes. ====== # it appears that [$transchan flush] doesn't get called any time interesting. # So each [$transchan write] needs to flush by itself. # # Flushing an already flushed stream is a harmless error {TCL ZLIB BUF}, so we catch it # oo::class create zchan { variable Stream variable Chan variable Mode constructor {mode} { set Stream [zlib stream $mode] # oo::objdefine [self] forward stream $Stream } method initialize {chan mode} { set Chan $chan set Mode $mode if {$mode eq "write"} { return {initialize finalize write flush} } elseif {$mode eq "read"} { return {initialize finalize read drain} } } method finalize {chan} { my destroy } method write {chan data} { try { $Stream add -flush $data # equivalent to: # $Stream put $data # $Stream flush # $Stream get } trap {TCL ZLIB BUF} {} { return "" } } method flush {chan} { try { $Stream add -flush {} # equivalent to: # $Stream flush # $Stream get } trap {TCL ZLIB BUF} {} { return "" } } method read {chan data} { $Stream add $data } method drain {chan} { $Stream add -finalize {} } } if 0 { lassign [chan pipe] r w chan configure $w -translation binary -buffering none chan configure $r -translation binary -blocking 0 lassign {gzip gunzip} out in puts $w "Frumious bandersnatch!" puts "read: [gets $r]" chan push $w [zchan create gw $out] chan push $r [zchan create gr $in] puts $w "Vorpal snacks!" puts "read: [gets $r]" puts $w "And bric-a-brac!" puts "read: [gets $r]" chan pop $w chan pop $r puts $w "Galumphing back" puts "read: [gets $r]" } ====== ---- [AMG]: I'm trying to read data from disk, compress it, and store the compressed result into an [SQLite] database. For small files this is easy, but Tcl panics when files exceed two gigabytes in size. Tcl strings simply can't grow that large. Thus, I need to stream the data rather than buffer it all at once. At first I thought the way to go was to use [[[zlib push] deflate]] on [[db incrblob]], then [[[chan copy]]] from disk to the incrblob channel, but I have to preallocate the blob. If I set the blob size to that of the disk file (plus 10% in case the file is too random), this would work, except I have to follow up by truncating the blob to the actual compressed size. How can I tell what that size is? [[chan copy]] returns the uncompressed size, which doesn't do me any good. [[[chan tell]]] doesn't work on an incrblob channel. [[zlib push]] adds some configuration options to the channel, but none of them tell me how many bytes have passed in or out of the stream. If I could use [[zlib push deflate]] on the read channel, [[chan copy]] would return the compressed size, but I get the error "compression may only be applied to writable channels". I really don't know why this error exists, but it's definitely getting in my way. Next up: [[::tcl::transform::zlib]] from [Tcllib]. However, I found that for small files it doesn't produce any output at all. When [[finalize]] gets called, it's too late to finalize the zlib stream and return the last of the compressed data, so my version does this in [[drain]] instead. There may be cases where [[drain]] is too early to finalize the zlib stream, but [[chan copy]] only does one [[drain]] at the very end. Code below. ====== # zlibCompressor -- # Input stream compression. oo::class create zlibCompressor { variable stream method initialize {handle mode} { set stream [zlib stream deflate -level 9] return {initialize finalize drain read} } method finalize {handle} { $stream close my destroy } method drain {handle} { $stream finalize return [$stream get][$stream reset] } method read {handle data} { $stream add $data } } oo::objdefine zlibCompressor method push {chan} { chan push $chan [my new] } ====== Alas, this still doesn't work. [[finalize]] can return a lot of data all at once, but [[chan copy]] throws away all but the first four kilobytes or so. If I attempt to manually drain the rest using [[[chan read]]], that incurs more [[finalize]]s, giving me an unlimited stream of bogus data. The only thing I can really do is bypass [[chan copy]] altogether: ====== set inChan [open $input rb] set outChan [ set stream [zlib stream gzip -level 9] set size 0 set end 0 while {!$end} { if {[set inData [chan read $inChan 4096]] ne {}} { $stream put $inData } else { $stream finalize set end 1 } set outData [$stream get] chan puts -nonewline $outChan $outData incr size [string length $outData] } $stream close chan close $inChan chan close $outChan chan puts $size ====== As I dug in deeper, I discovered SQLite has blob size limits too, much tighter than Tcl even, so I had to implement a chunking scheme dividing files across multiple table rows. The incrblob system became less and less a good fit, but [[zlib stream]] is proving to be indispensable for this task. <> Command | Compression