samoc: How to do binary-safe "exec": This is a work-around for exec being broken for binary input/output. -- See http://tip.tcl.tk/259.html ====== proc bexec {command input} { # Execute shell "command", send "input" to stdin, return stdout. # Ignores stderr (but "2>@1" can be part of "command"). # Supports binary input and output. e.g.: # set flac_data [bexec {flac -} $wav_data] # Run "command" in background... set f [open |$command {RDWR BINARY}] fconfigure $f -blocking 0 # Connect read function to collect "command" output... set ::bexec_done.$f 0 set ::bexec_output.$f {} fileevent $f readable [list bexec_read $f] # Send "input" to command... puts -nonewline $f $input close $f write # Wait for read function to signal "done"... vwait ::bexec_done.$f # Retrieve output... set result [set ::bexec_output.$f] unset ::bexec_output.$f unset ::bexec_done.$f fconfigure $f -blocking 1 close $f return $result } proc bexec_read {f} { # Accumulate output in ::bexec_output.$f. append ::bexec_output.$f [read $f] if {[eof $f]} { fileevent $f readable {} set ::bexec_done.$f 1 } } ====== [PYK] 2014-05-11: The code above does not need to go to the effort of setting the channel to non-blocking, as Tcl will manage the buffer behind the scenes. Forget about setting the channel to non-blocking, dispense with the `[vwait]`, send the desired data into the channel, close the `write` side of the channel, and then read the output. For code that does need to use non-blocking channels, it's generally best not to interfere with the event loop by using `[vwait]` as the code above does. Instead, consider structuring the code such that it works in an [event-oriented programming%|%event-oriented] manner. See also [example of reading and writing to a piped command], which provides a template for conducting an interactive conversation with another process. [RLE] (2014-05-18): If you look down to [DKF]'s comment on the [Tcl event loop] page you'll find this statement: DKF: Tcl doesn't run the event loop by default, so idiomatically people do this in their pure Tcl scripts to start the event loop: ====== vwait forever ====== If you are '''not''' running Tk, doing a [vwait] is the only way to get the event loop started. Now, of course, you should not nest vwait calls, and if you are running Tk, you don't need [vwait] because the event loop runs by default. [samoc] 2014-05-16: It would be great to have a simpler way to execute a command in the background. However, I can't see how to make it work for large binary data without `fconfigure $f -blocking 0`. Example (platform::identify > macosx10.9-x86_64): ====== set wav [download http://foo.net/bar.wav]] set flac [bexec {flac - --totally-silent} $wav] ====== In this case `fconfigure $f -blocking 0` is required. Without non-blocking mode, "puts -nonewline $f $input" never returns. My assumption is that flac reads some (but not all) of the input from its stdin side of the pipe, then tries to write some output to its stdout side of the pipe. There is no-one reading flac's output pipe yet, so it blocks on output. We're blocked on sending input to flac, it is blocked on sending output back to us. Deadlock. I expect that this behaviour is common for filter type programs that deal with large amounts of data. It is impractical for these programs to buffer the entire input in RAM, so they process a chunk at a time and send the output to stdout as they go. WRT being "event-oriented". The aim here is to expose a simple, blocking, non-event oriented interface for executing an external command. The whole point is to hide the messy event processing detail. [samoc] 2014-05-20: I just tried the following, but it fails with "channel "rc0" does not support OS handles" ([AMG] mentions this on the [exec] page). ====== set in [tcl::chan::variable input] chan configure $in -translation binary -blocking 0 exec $command <@$in ====== ---- [samoc]: The patch below enables: ====== encoding system iso8859-1 set flac [exec -binarystdout flac --totally-silent - << $wav] ====== Setting the system encoding to iso8859-1 (AKA "binary") causes the `$wav` string to be treated as binary ([encoding system] says "The system encoding is used whenever Tcl passes strings to system calls") The `-binarystdout` option asks exec to configure its internal output channel as "-translation binary". A downside here is that `<<` causes exec to call mkstemps() and write the `$wav` string into a temporary file. exec really should be fixed so that a large string in RAM can be sent to the stdin pipe without intermediate buffering or temporary files. (Note I think the `puts` call in `proc bexec` does something like: copy `$input` to a buffer; set up a "writeable" event handler; write from buffer to pipe inside vwait.) ====== diff -u -r tcl8.6.1/generic/tclIOCmd.c tcl8.6.1.patched/generic/tclIOCmd.c --- tcl8.6.1/generic/tclIOCmd.c 2013-09-20 05:04:14.000000000 +1000 +++ tcl8.6.1.patched/generic/tclIOCmd.c 2014-05-21 00:00:32.000000000 +1000 @@ -877,12 +877,12 @@ const char *string; Tcl_Channel chan; int argc, background, i, index, keepNewline, result, skip, length; - int ignoreStderr; + int ignoreStderr, binaryStdout; static const char *const options[] = { - "-ignorestderr", "-keepnewline", "--", NULL + "-ignorestderr", "-keepnewline", "-binarystdout", "--", NULL }; enum options { - EXEC_IGNORESTDERR, EXEC_KEEPNEWLINE, EXEC_LAST + EXEC_IGNORESTDERR, EXEC_KEEPNEWLINE, EXEC_BINARYSTDOUT, EXEC_LAST }; /* @@ -891,6 +891,7 @@ keepNewline = 0; ignoreStderr = 0; + binaryStdout = 0; for (skip = 1; skip < objc; skip++) { string = TclGetString(objv[skip]); if (string[0] != '-') { @@ -904,6 +905,8 @@ keepNewline = 1; } else if (index == EXEC_IGNORESTDERR) { ignoreStderr = 1; + } else if (index == EXEC_BINARYSTDOUT) { + binaryStdout = 1; } else { skip++; break; @@ -955,6 +958,10 @@ return TCL_ERROR; } + if (binaryStdout) { + Tcl_SetChannelOption(interp, chan, "-translation", "binary"); + } + if (background) { /* * Store the list of PIDs from the pipeline in interp's result and @@ -1002,7 +1009,7 @@ * newline character. */ - if (keepNewline == 0) { + if (keepNewline == 0 && binaryStdout == 0) { string = TclGetStringFromObj(resultPtr, &length); if ((length > 0) && (string[length - 1] == '\n')) { Tcl_SetObjLength(resultPtr, length - 1); ====== Below is an alternate patch that dispenses with the `-binarystdout` option. Instead it looks at what the system encoding is set to. If it is set to binary then exec output is treated as binary. ====== diff -u -r tcl8.6.1/generic/tclIOCmd.c tcl8.6.1.patched/generic/tclIOCmd.c --- tcl8.6.1/generic/tclIOCmd.c 2013-09-20 05:04:14.000000000 +1000 +++ tcl8.6.1.patched/generic/tclIOCmd.c 2014-05-21 00:16:29.000000000 +1000 @@ -877,7 +877,7 @@ const char *string; Tcl_Channel chan; int argc, background, i, index, keepNewline, result, skip, length; - int ignoreStderr; + int ignoreStderr, binaryStdout; static const char *const options[] = { "-ignorestderr", "-keepnewline", "--", NULL }; @@ -955,6 +955,11 @@ return TCL_ERROR; } + binaryStdout = (strcmp(Tcl_GetEncodingName(NULL), "iso8859-1") == 0); + if (binaryStdout) { + Tcl_SetChannelOption(interp, chan, "-translation", "binary"); + } + if (background) { /* * Store the list of PIDs from the pipeline in interp's result and @@ -1002,7 +1007,7 @@ * newline character. */ - if (keepNewline == 0) { + if (keepNewline == 0 && binaryStdout == 0) { string = TclGetStringFromObj(resultPtr, &length); if ((length > 0) && (string[length - 1] == '\n')) { Tcl_SetObjLength(resultPtr, length - 1); ====== <> [samoc] 2014-05-19: Comments from [PYK] below correctly point out that if channels are passed to exec, they can be set to BINARY mode before exec is called and will handle binary data just fine. I believe my example above was misleading in its original form (`set wav [[read [[open foo.wav {RDONLY BINARY}]]]]`). I hope the revised version makes my intentions more clear. i.e. I have large binary data in RAM _not_ in a file (wav and flac is just an example). I need to pass the data to stdin of an "exec"ed process. I definitely don't want to have to make a tmp file on disk. [PYK] 2014-05-16: `exec flac - <@$wavchan` works fine for me. Do you have any evidence to support the claim that ''"exec is broken for binary input/output"'' ? It is best to [Update considered harmful%|%avoid vwait]. Below are three different methods for piping binary data to [https://xiph.org/flac/%|%flac]. The third method polls and uses `[after]`, but even that is preferable to `[vwait]` if the script author doesn't want to design the code around `[chan event%|%file events]`. ====== #! /bin/env tclsh #method 1: redirection of stdin and stdout set chan [open ztest1.wav {RDONLY BINARY}] exec flac - <@$chan 2>@stderr >ztestout1.flac #method 2: redirect data into flac, read flac output into Tcl, then write to channel seek $chan 0 set chan2 [open |[list flac - <@$chan] {RDONLY BINARY}] set chanout [open ztestout2.flac {BINARY CREAT WRONLY}] while {![eof $chan2]} { puts -nonewline $chanout [read $chan2 32768] } close $chan2 close $chanout #method 3: read output, write to flac, read flac output, write to channel seek $chan 0 set chan2 [open |[list flac -] r+] chan configure $chan2 -translation binary -blocking 0 set chan3 [open ztestout3.flac {BINARY CREAT WRONLY}] set done 0 while 1 { if {!$done} { if {[eof $chan]} { close $chan2 write set done 1 } else { set data [read $chan 32768] puts -nonewline $chan2 $data } } set data [read $chan2 32768] if {$data eq {}} { if {[eof $chan2]} { break } } else { puts -nonewline $chan3 $data } after 1 } close $chan close $chan3 ====== [samoc] 2014-05-19: Thanks for the feedback [PYK]. Re exec being broken for binary data, see reference to relevant TIP above. Re making channels BINARY, see new comment above. Re vwait: I am intrigued. Is vwait considered bad? As far as I can see vwait is the Tcl way of doing select(2). i.e. making sure the OS does not schedule the process until a condition is met. This is essential for creating scalable systems. If I have a loop that polls every 1 ms, and I have a few hundred instances blocked on IO, pretty soon the polling and context switching gets expensive. Re "design the code around fileevent", I believe that is how my original "bexec" works (see "fileevent $f readable..."). But file event only works if the event loop gets run. My understanding is that "vwait" is the way to run the event loop. Is there a better way? [PYK] 2014-05-18: I feel a disturbance in the force ... On the one hand, there is "The aim here is to expose a simple, blocking, non-event-oriented interface for executing an external command", but then there is " ...and I have a few hundred instances blocked on IO...", which indicates asynchronous operation. ''-- [samoc] In my system the asynchronous operation is managed at a cloud-scale by a separate queuing and notification system. Each Tcl worker script is a separate process that does a simple job in a blocking, procedural manner.'' Even the name `bexec`, as in "'''background''' exec", implies event-driven processing of incoming data, which must be what you're actually after. ''-- [samoc] see 1st line of page '''b'''inary-safe "exec".'' The mention of `exec flac - << $wav` indicates that `download` has already read the file in its entirety into memory, and also conflicts with "It is impractical for these programs to buffer the entire input into RAM." `download` could instead be modified to return a channel. ''-- [samoc] please understand that "flac" and "download" are just examples. The binary data exists in memory. The point is to not make an entire additional copy of it inside an external filter program. If there was a memory-efficient "string_to_channel" function that would work here.'' The "hundred instances blocked on I/O" is exactly why `[vwait]` is a bad idea in the code above: it means entering the event loop a few hundred times via `[vwait]`, which is just asking for trouble, and is exactly why `[vwait]` is proscribed in [Update considered harmful]. Calling `[vwait]` multiple times "in parallel", or just executing it once in a program where the event loop is already running opens the door to the [Update considered harmful%|%bad things]. ''-- [samoc] I'm afraid you are making a bunch false assumptions. 100 Tcl processes that have each called vwait (and thus select(2)) does not result in event loop reentrancy. There is no "event loop" in this example. The OS is left to do its job and wake each process up when there is work for it to do. This is exactly the same as what would happen with the built-in "exec" (which I would happily use except that it is not binary-safe).'' To repeat, just about the only good use of `[vwait]` is to enter the [Tcl event loop%|%event loop] with exactly one `[vwait]` execution at the bottom of your main script: ====== vwait forever ====== `[vwait]` is not related to `select(2)` at all except in the sense that the program must enter the event loop in order to process channel [chan event%|%events]. `[chan event]` may internally use `select(2)`, and even if it didn't, `[chan event]` is the command to turn to at the script level for reacting to nonblocking I/O. '' -- [samoc] The call path is: vwait -> Tcl_DoOneEvent()-> Tcl_WaitForEvent() -> select(). In fact, in a single-threaded Tcl-script with no timers, "vwait" does nothing except call select(2) and check the wait variable on wakeup.'' ''[PYK]: Even so, that's no reason to construe `[vwait]` as the script-level equivalent of `select(2)`, which just happens to be the result of entering the event loop in that particular scenario. We wouldn't want to give people the wrong idea about the nature or `[vwait]`! That's really my point in critiquing the code. Now that you've explained the circumstances, it's clear that `bgexec` does what you want it to do in your scenario, but if big yellow warning stickers aren't slapped on it, the next person to come across it might paste it wholesale into their script that's already got events going on, and fall into the trap. Hence all the hubbub about not using `[vwait]`.'' Dispense with the "in-memory" requirement and set up event handlers on the source and result channels. Then you can dispense with the `[vwait]` and with the tactic of synchronous channel operation in general. By "design the code around `[chan event]`", I mean something like this: ====== #! /bin/env tclsh proc main {} { #do stuff before converting #... files [namespace current]::incoming [namespace current]::afterconvert } proc afterconvert id { variable done variable state array unset state $id,* if {[array size state]} { return } #do stuff after converting #... puts {all done!} set done 1 } proc incoming id { variable state foreach varname {converter ondone result source} { upvar 0 state($id,$varname) $varname } if {[eof $converter]} { close $source close $converter $ondone $id } else { puts -nonewline $result [read $converter] } } proc convert id { variable state foreach varname {convert converter resname result source} { upvar 0 state($id,$varname) $varname } set result [open $resname w] chan configure $result -translation binary set converter [open |$convert r] chan configure $converter -translation binary -blocking 0 chan event $converter readable [list $state($id,onoutput) $id] } proc files {onoutput ondone} { variable state for {set id 0} {$id < 10} {incr id} { foreach varname {convert index resname source} { upvar 0 state($id,$varname) $varname } set source [open ztest1.wav r] chan configure $source -translation binary set convert [list flac - <@$source 2>@stderr] set resname ztest1_$id.flac set state($id,onoutput) $onoutput set state($id,ondone) $ondone convert $id } } main vwait [namespace current]::done ====== An object system like [TclOO] might make for a cleaner variant of this example. ---- [AMG]: I suggest investigating coroutines, particularly how they are used in [Wibble] to multiplex I/O and related processing. <> <> Binary Data