Version 12 of Example of reading and writing to a piped command

Updated 2013-11-17 06:48:42 by pooryorick

Reading and writing to a piped command

See Also

open
pipe
gzip

Example: A Conversation with another Program

#! /bin/env tclsh

proc communicate {chan msg pump respond eof } {
    fileevent $chan readable [list apply [list {chan respond eof} {
        set res [gets $chan]
        if {$res ne {}} {
            fileevent $chan readable {}
            {*}$respond $res
        }
        if {[eof $chan]} {
            fileevent $chan readable {}
            {*}$eof
        }
    }] $chan $respond $eof]

    fileevent $chan writable [list apply [list {chan msg pump} {
        catch {puts $chan $msg}
        #something like this may be needed if the child program can be configured
        #with reasonable output buffering
        #fileevent $chan writable [list apply [list {chan pump} {
        #       puts $chan $pump
        #}] $chan $pump]
        fileevent $chan writable {}
    }] $chan $msg $pump]
}

proc process {chan count args} {
    puts "sed output: $args"
    if {$count < 2} {
        communicate $chan hello \n [namespace code [list process $chan [incr count]]] \
            [namespace code [list closed $chan]]
    } else {
        catch {close $chan}
        set ::done 0
    }
}

proc closed {chan} {
    #flush first to catch any pipe error, getting it out of the way in order to grab
    #the exist status with [close]
    if {[catch {flush $chan} eres eopts]} {
        set status [catch {close $chan} eres eopts]
        set ::done $status
    }
    set ::done 0
}

#try this line, which causes a sed error, to see how that's handled
#set chan [open {|sed -l {s/hello/goodbye} 2>@stderr} r+]

set chan [open {|sed -l s/hello/goodbye/ 2>@stderr} r+]
fconfigure $chan -buffering none

communicate $chan hello \n [namespace code [list process $chan 1]] \
    [namespace code [list closed $chan]]


vwait ::done
return -code $::done

It's up to the individual program when to print out its results. The -l option to BSD sed configures sed to print output whenever at least one line of output is ready. -u accomplishes the same for some other versions of sed. The pump feature of this example can be used pump some program-specifc value through the channel until the desired output is collected. It's a hack, but depending on the program, might be the only way to accomplish the task. Expect is the fully-featured tool for this type of task. Another approach would be to fork the current process into a producer and a consumer.

To Sort: Arjen Markus

Arjen Markus: I have experimented a bit with plain Tcl driving another program. As this needs to work on Windows 95 (NT, ...) as well as UNIX in four or five flavours, I wanted to use plain Tcl, not Expect (however much I would appreciate the chance to do something really useful with Expect - apart from Android :-).

I think it is worth a page of its own, but here is a summary:

  • Open the pipeline and make sure buffering is minimal via
set inout [open |[list myprog] r+]
fconfigure $inout -buffering line

Buffering might be out of your hands, though, for "real 16-bit commandline applications", which apparently don't have the ability to flush (reliably), except on close.

  • Set up [fileevent] handlers for reading from the process and reading from stdin.
  • Make sure the process ("myprog" above) does not buffer its output to stdout!

To Sort: Neil Madden

Neil Madden - here is a real-life example of interacting with a program through a pipe. The program in question is ispell - a UNIX spell-checking utility. I use it to spell-check the contents of a text widget containing LaTeX markup. There are a number of issues to deal with:

  • Keeping track of the position of the word in the widget.
  • Filtering out useless (blank) lines from ispell
  • Filtering out the version info that my version of ispell dumps out.
  • When passing in a word which is a TeX command (e.g. \maketitle), ispell returns nothing at all.
  • Careful handling of blocking.

The example does not use [fileevent], as this would complicate this particular example. The options passed to ispell are -a (which makes it non-interactive) and -t (which makes it recognize TeX input).

set contents [split [$text get 1.0 end] \n]
set pipe [open [list | ispell -a -t] r+]
fconfigure $pipe -blocking 0 -buffering line
set ver [gets $pipe] ;# Ignore the initial version line
set linenum 1
foreach line $contents {
    set wordnum 1
    foreach word [split $line] {
        puts $pipe $word   ;# Feed word to ispell
        while 1 {
            set len [gets $pipe res]
            if {$len > 0} {
                # A valid result
                # do stuff
                continue
            } else {
                if {[fblocked $pipe]} {
                    # No output
                    break
                } elseif {[eof $pipe]} {
                    # Pipe closed
                    catch {close $pipe}
                    return
                }
                # A blank line - skip
            }
        }
        incr wordnum
    }
    incr linenum
}

Thanks to Kevin Kenny for helping me figure this out.

Description

Recently on comp.lang.tcl someone was trying to get the following code to work:

proc gzip {buf} {
    set fd [open "|gzip -c" r+]
    fconfigure $fd -translation binary -encoding binary
    puts $fd $buf
    flush $fd
    set buf [read $fd]
    close $fd
    return $buf
}

Here's an altered version in an attempt to get it to work.

#! /usr/tcl84/bin/tclsh

proc gzip {buf} {
    set fd [open "|gzip -c" "r+"]
    fconfigure $fd -translation binary -encoding binary
    puts -nonewline $fd $buf
    puts "output finished to gzip"
    flush $fd
    puts "flush finished to gzip"
    set buf [read $fd]
    puts "read finished from gzip"
    close $fd
    return $buf
 }

proc gunzip {buf} {
    set fd [open "|gzip -d" "r+"]
    fconfigure $fd -translation binary -encoding binary
    puts -nonewline $fd $buf
    flush $fd
    set buf [read $fd]
    close $fd
    return $buf
}

set a [gzip "This is a test"]
puts "finish compression"
set b [gunzip $a]
puts "finish uncompression"

puts $b

Alas, it still doesn't work. The output and flush debug statements appear. But the message after the read doesn't appear.

Now, an alternative version of the command was proposed:

proc gzip {buf} {
    return [exec gzip -c << $buf]
}

However, that version doesn't demonstrate the method to read and write from a piped command. So I'm hoping that someone comes along with a fix for the initial code.

Lars H: Is the problem that gzip won't finish until its input has been read to end? The only way to be sure there won't be more data is to close the input end of the gzip pipe, and that can't be done without closing the output end as well. Tricky. Mind you, I've always felt the idea that one uses the same channel both for reading and writing (but of two distinct data streams) rather odd.

LV: The end of file on input might be an issue . Frankly, I'm uncertain that the notation _should_ work. That is to say, I don't know that both stdin and stdout are being associated with that one returned file handle. I have seen, and probably written, code that used a normal open of a file and did both read and write type operations. However, I don't recall whether I've seen a pipe example of that.

RM: The underlying system buffers the output. You need to use the "unbuffer" command like:

set fd [open "|unbuffer gzip -c" "r+"]

lexfiend Note that unbuffer is part of Expect, and may thus require additional work on "Some/All Batteries Not Included" Tcl setups.

AMG: I don't see how unbuffer would help, since I suspect the problem is with buffering inside gzip, and to get it to emit the last block of output you need to close its input. Can anyone confirm whether it does or not? I just apt-get installed expect yet for some reason it didn't install unbuffer.


AMG: I have several comments. Let's discuss.

  • I prefer to construct the first argument to [open] thusly: |[list progname arg1 arg2 arg3 ...]. This protects against whitespace embedded in the arguments from futzing up the works. Even though there's no problem with your command line, one day you might change it, perhaps to use a parameter to your proc as an argument. So I just do it "right", right from the start, to prevent forgetting to make the change in the future. It's like the problem with optional { braces } in C: they're not needed for one-line if/for/while/etc. bodies, but when you add another line you might forget to add the braces.
  • Quoting r+ isn't necessary. Neither character is special in Tcl; very very few characters mean anything to the Tcl interpreter itself, and even then they don't always keep their meaning. For instance you only need to quote # if for some weird reason you're trying to call a proc named #, but on the other hand you can't begin a comment without ending the previous command with a ; or a newline.
  • -translation binary automatically does -encoding binary (no need to be redundant) and -eofchar {} (something you forgot, but will only cause trouble on MS-Windows as far as I know).
  • fconfigure $fd -buffering none precludes the need for flush $fd. Since $fd is blocking by default, this doesn't interfere with [read], which will still read until end of file (caused by gzip closing its stdout).
  • The [exec] code is blocking, which is alright in this case, but it should be possible to use gzip and gunzip in a non-blocking fashion for long streams of data that arrive over time, possibly over the network. Pipes should allow this, except they don't, not in Tcl.
  • gzip doesn't output anything until it has seen EOF on its stdin. (Well, this may be true for short strings of data, but it'll also output when an internal buffer overflows.) I strongly agree with Lars H that the read-write channel is a problem, and I have recommended to Andreas Kupries that we add the ability to "unbundle" read-write channels into separate read and write channels. This would allow separate directions to be closed individually, like BSD sockets' shutdown() call or close() on a single fd in a C-style fd pair. Also this would allow [fcopy] to work "bidirectionally" on sockets, as is commonly needed for network proxies/bridges. For symmetry and to allow code expecting read-write channels to work on stdin/stdout, I also suggest the ability to "bundle" a read and a write channel.