Reading and writing to a piped command


** See Also **

   [open]:   

   [pipe]:   

   [gzip]:   


** Example: A Conversation with another Program **

======
#! /bin/env tclsh

proc communicate {chan msg pump respond eof } {
    fileevent $chan readable [list apply [list {chan respond eof} {
        set res [gets $chan]
        if {$res ne {}} {
            fileevent $chan readable {}
            {*}$respond $res
        }
        if {[eof $chan]} {
            fileevent $chan readable {}
            {*}$eof
        }
    }] $chan $respond $eof]

    fileevent $chan writable [list apply [list {chan msg pump} {
        catch {puts $chan $msg}
        #something like this may be needed if the child program can be configured
        #with reasonable output buffering
        #fileevent $chan writable [list apply [list {chan pump} {
        #       puts $chan $pump
        #}] $chan $pump]
        fileevent $chan writable {}
    }] $chan $msg $pump]
}

proc process {chan count args} {
    puts "sed output: $args"
    if {$count < 2} {
        communicate $chan hello \n [namespace code [list process $chan [incr count]]] \
            [namespace code [list closed $chan]]
    } else {
        catch {close $chan}
        set ::done 0
    }
}

proc closed {chan} {
    #flush first to catch any pipe error, getting it out of the way in order to grab
    #the exist status with [close]
    if {[catch {flush $chan} eres eopts]} {
        set status [catch {close $chan} eres eopts]
        set ::done $status
    }
    set ::done 0
}

#try this line, which causes a sed error, to see how that's handled
#set chan [open {|sed -l {s/hello/goodbye} 2>@stderr} r+]

set chan [open {|sed -l s/hello/goodbye/ 2>@stderr} r+]
fconfigure $chan -buffering none

communicate $chan hello \n [namespace code [list process $chan 1]] \
    [namespace code [list closed $chan]]


vwait ::done
return -code $::done
======

It's up to the individual program when to print out its results.  The `-l`
option to [BSD] sed configures sed to print output whenever at least one line
of output is ready.  `-u` accomplishes the same for some other versions of sed.
The `pump` feature of this example can be used pump some program-specifc value
through the channel until the desired output is collected.  It's a hack, but
depending on the program, might be the only way to accomplish the task.
[Expect] is the fully-featured tool for this type of task.  Another approach
would be to [fork] the current process into a producer and a consumer.


** To Sort: [Arjen Markus] **

[Arjen Markus]: I have experimented a bit with plain Tcl driving another
program. As this needs to work on Windows 95 (NT, ...) as well as UNIX in four
or five flavours, I wanted to use plain Tcl, not Expect (however much I would
appreciate the chance to do something really useful with Expect - apart from
[Android] :-).

I think it is worth a page of its own, but here is a summary:

   * Open the pipeline and make sure buffering is minimal via

======
set inout [open |[list myprog] r+]
fconfigure $inout -buffering line
======

Buffering might be out of your hands, though, for "real 16-bit commandline
applications", which apparently don't have the ability to flush (reliably),
except on close.

   * Set up [[`[fileevent]`] handlers for reading from the process and reading from [stdin].

   * Make sure the process ("myprog" above) does not buffer its output to [stdout]!


** To Sort: [Neil Madden] **

[Neil Madden] - here is a real-life example of interacting with a program
through a pipe. The program in question is ''ispell'' - a UNIX spell-checking
utility. I use it to spell-check the contents of a text widget containing
[LaTeX] markup. There are a number of issues to deal with:

   * Keeping track of the position of the word in the widget.
   * Filtering out useless (blank) lines from ispell
   * Filtering out the version info that my version of ispell dumps out.
   * When passing in a word which is a TeX command (e.g. \maketitle), ispell returns nothing at all.
   * Careful handling of blocking.

The example does not use [[`[fileevent]`], as this would complicate this
particular example. The options passed to ispell are -a (which makes it
non-interactive) and -t (which makes it recognize TeX input).

======
set contents [split [$text get 1.0 end] \n]
set pipe [open [list | ispell -a -t] r+]
fconfigure $pipe -blocking 0 -buffering line
set ver [gets $pipe] ;# Ignore the initial version line
set linenum 1
foreach line $contents {
    set wordnum 1
    foreach word [split $line] {
        puts $pipe $word   ;# Feed word to ispell
        while 1 {
            set len [gets $pipe res]
            if {$len > 0} {
                # A valid result
                # do stuff
                continue
            } else {
                if {[fblocked $pipe]} {
                    # No output
                    break
                } elseif {[eof $pipe]} {
                    # Pipe closed
                    catch {close $pipe}
                    return
                }
                # A blank line - skip
            }
        }
        incr wordnum
    }
    incr linenum
}
======

Thanks to [Kevin Kenny] for helping me figure this out.


** Description **


Recently on comp.lang.tcl someone was trying to get the following code to work:

======
proc gzip {buf} {
    set fd [open "|gzip -c" r+]
    fconfigure $fd -translation binary -encoding binary
    puts $fd $buf
    flush $fd
    set buf [read $fd]
    close $fd
    return $buf
}
======

Here's an altered version in an attempt to get it to work.

======
#! /usr/tcl84/bin/tclsh

proc gzip {buf} {
    set fd [open "|gzip -c" "r+"]
    fconfigure $fd -translation binary -encoding binary
    puts -nonewline $fd $buf
    puts "output finished to gzip"
    flush $fd
    puts "flush finished to gzip"
    set buf [read $fd]
    puts "read finished from gzip"
    close $fd
    return $buf
 }

proc gunzip {buf} {
    set fd [open "|gzip -d" "r+"]
    fconfigure $fd -translation binary -encoding binary
    puts -nonewline $fd $buf
    flush $fd
    set buf [read $fd]
    close $fd
    return $buf
}

set a [gzip "This is a test"]
puts "finish compression"
set b [gunzip $a]
puts "finish uncompression"

puts $b
======

Alas, it still doesn't work.  The output and flush debug statements appear.
But the message after the read doesn't appear.

Now, an alternative version of the command was proposed:

======
proc gzip {buf} {
    return [exec gzip -c << $buf]
}
======

However, that version doesn't demonstrate the method to read and write from a
piped command.  So I'm hoping that someone comes along with a fix for the
initial code.

[Lars H]: Is the problem that gzip won't finish until its input has been read
to end?  The only way to be sure there won't be more data is to close the input
end of the gzip pipe, and that can't be done without closing the output end as
well. Tricky.  Mind you, I've always felt the idea that one uses the same
channel both for reading and writing (but of two distinct data streams) rather
odd.

[LV]: The end of file on input might be an issue .  Frankly, I'm uncertain that
the notation _should_ work.  That is to say, I don't know that both stdin and
stdout are being associated with that one returned file handle.  I have seen,
and probably written, code that used a normal [open] of a file and did both
[read] and [write] type operations.  However, I don't recall whether I've seen
a pipe example of that.

[RM]: The underlying system buffers the output. You need to use the
"[unbuffer]" command like:

======
set fd [open "|unbuffer gzip -c" "r+"]
======

[lexfiend] Note that '''unbuffer''' is part of [Expect], and may thus require
additional work on "Some/All Batteries Not Included" Tcl setups.

[AMG]: I don't see how '''unbuffer''' would help, since I suspect the problem
is with buffering ''inside'' '''gzip''', and to get it to emit the last block
of output you need to close its input.  Can anyone confirm whether it does or
not?  I just '''apt-get install'''ed '''expect''' yet for some reason it didn't
install '''unbuffer'''.

----

[AMG]: I have several comments.  Let's discuss.

   * I prefer to construct the first argument to [[`[open]`] thusly:  '''|[[`[list] progname arg1 arg2 arg3 ...`]'''.  This protects against whitespace embedded in the arguments from futzing up the works.  Even though there's no problem with your command line, one day you might change it, perhaps to use a parameter to your proc as an argument.  So I just do it "right", right from the start, to prevent forgetting to make the change in the future.  It's like the problem with optional { braces } in C: they're not needed for one-line '''if'''/'''for'''/'''while'''/etc. bodies, but when you add another line you might forget to add the braces.
   * Quoting '''r+''' isn't necessary.  Neither character is special in Tcl; very very few characters mean anything to the Tcl interpreter itself, and even then they don't always keep their meaning.  For instance you only need to quote '''#''' if for some weird reason you're trying to call a [proc] named '''#''', but on the other hand you can't begin a comment without ending the previous command with a ''';''' or a newline.
   * '''-translation binary''' automatically does '''-encoding binary''' (no need to be redundant) ''and'' '''-eofchar {}''' (something you forgot, but will only cause trouble on MS-Windows as far as I know).
   * '''[fconfigure] $fd -buffering none''' precludes the need for '''[flush] $fd'''.  Since '''$fd''' is blocking by default, this doesn't interfere with [[[read]]], which will still read until end of file (caused by '''gzip''' closing its '''stdout''').
   * The [[`[exec]`] code is blocking, which is alright in this case, but it should be possible to use '''gzip''' and '''gunzip''' in a non-blocking fashion for long streams of data that arrive over time, possibly over the network.  Pipes ''should'' allow this, except they don't, not in Tcl.
   * '''gzip''' doesn't output anything until it has seen EOF on its '''stdin'''.  (Well, this may be true for short strings of data, but it'll also output when an internal buffer overflows.)  I strongly agree with [Lars H] that the read-write channel is a problem, and I have recommended to [Andreas Kupries] that we add the ability to "unbundle" read-write channels into separate read and write channels.  This would allow separate directions to be closed individually, like BSD sockets' '''shutdown()''' call or '''close()''' on a single fd in a C-style fd pair.  Also this would allow [[[fcopy]]] to work "bidirectionally" on sockets, as is commonly needed for network proxies/bridges.  For symmetry and to allow code expecting read-write channels to work on '''stdin'''/'''stdout''', I also suggest the ability to "bundle" a read and a write channel.


<<categories>> Example | Channel | Interprocess Communication