Version 37 of Commands pipe

Updated 2015-01-26 19:56:32 by Starwer

iu2 30/Dec/2006

Typing in the algorithm the way you are thinking can be much more fun then change the way you are thinking in order to fit the programming language.

Consider the follwing file: lines.txt

 line1 1 2 3 4 5 end-code1-line
 line2 2 2 3 4 5 end-code2-line
 line3 3 2 3 4 5 end-code3-line
 line4 4 2 3 4 5 end-code4-line
 line5 5 2 3 4 5 end-code5-line

In order to read the text I must open the file first, so actually I want to open the file, then read the text. In Python this would be:

 file("lines.txt", "r").read()

which is exactly along my stream of thoughts.

In tcl this would be:

 [read [open lines.txt r]]

which is the opposite of my stream of thoughts. I opened the file, then went back on the line and read it.

This is only but a small example, and is so basic that I actually use it as a pattern when reading files. But what if I wanted to extract the middle codeX word of the last word in the last line?

In Python I write:

  code = file('lines.txt').read().strip().splitlines()[-1].split()[-1].split('-')[1]
  print code

Result:

 code5

Which is read as: "Open file, read it, strip spaces from beginning and end, generate a list of lines, take last line from the list, generate a list of words, take last word, generate a list by splitting this word at '-' chars, then take second word". That's a one liner! But more important then a one liner, the commands in this line stream the way I think on how to get the result I want.

Converting this line to functional programming is not just hard, worse: It's not fun. One of the things that make Python programming fun is just that: Think and program the same path.

The following procedure makes this possible in tcl, and I call it a Commands pipe, or compi for short:

  # Commands Pipe
  proc compi {sep {fns ""} {var ""}} {
    # parse arguments
    if {$var ne ""} {        ;# all three arguments supplied
      foreach {s p} [split $sep ""] {break}
    } elseif {$fns eq ""} {        ;# one argumens supplied
      set s |; set p ~; set fns $sep
    } else {        ;# two arguments supplied
      if {[string length $sep] == 2} {        ;# first argument is separators
        foreach {s p} [split $sep ""] {break}
      } else {        ;# sep is fns and fns is var
        set var $fns; set fns $sep; set s |; set p ~
      }
    }

    # create a valid tcl command
    set commands [split $fns $s]
    set cmd [set curcmd [lindex $commands end]]
    if {[string first $p $curcmd] < 0} {append cmd $p}
    for {set i [expr {[llength $commands] - 2}]}  {$i >= 0} {incr i -1} {
            set curcmd [lindex $commands $i]
            if {[string first $p $curcmd] < 0 && $i > 0} {append curcmd $p}
            set cmd [string map "$p { \[$curcmd\] }" $cmd]
    }

    # perform command
    if {$var eq ""} {return [uplevel 1 $cmd]} else {uplevel 1 [list set $var $cmd]}
  }

Using compi:

 compi ?sp? commands-pipe ?var?

Now, with this I can write:

 compi {open lines.txt r|read|string trim|split~\n|lindex~end|lindex~end|split~-|lindex~1|set code}
 puts $code

Result:

 code5

where each command's output is used as an input to the next command. '|' is a command separator, and '~' is a place holder for previous command's output, and is placed whenever the output is used in the middle of the current command.

The first block of code in compi enables two options (that may be combined): It is possible to choose different chars for the separator and the place holder. This might be needed if the same chars appear in the commands themselves. The previous example could be written:

 compi ,` {open lines.txt r,read,string trim,split`\n,lindex`end,lindex`end,split`-,lindex`1,set code}
 puts $code

If the commands pipe is performed many times, compi will waste time. Hence it is possible to present a variable and then eval it:

 compi {open lines.txt r|read|string trim|split~\n|lindex~end|lindex~end|split~-|lindex~1|set code|puts} pipe1

(please mind the last puts)

Where:

 puts $pipe1

yields:

 puts [set code [lindex [split [lindex [lindex [split [string trim [read [open lines.txt r] ] ] \n] end] end] -] 1] ]

(try writing it straight forward.. ;)

and

 eval $pipe1

yields again:

 code5

While in Python I can "pipe" commands only when each command is a method of the previous object, with compi there is no such limitation. So I can write:

 compi {open lines.txt r|read|string length|puts}

Result:

 155

where in Python the string length and printing commands break the streaming of thinking:

 print len(file('lines.txt').read())

instead of

 file('lines.txt').read().len().print()

Lars H: Technically, such pipes provide a restricted form of RPN, in that the stack can only hold one item. Starting from this example to expound on the philosophy and history of different notations could be quite interesting...

schlenk: Interesting things. I have to use Python at work currently and have to go the other way round in my thinking. In Python i start with the data i want and add braces or subroutines until i get what i want more or less top down, while in python i start from the bottom up and all the time have to think how i coerce the result object into something useful. In Python i always miss an easy way to express new control structures that come from thinking top down..., while Python is really nice for short and efficient low level code, where Tcls verbosity gets in its way.


iu2: I think the stack depth issue shifts here from runtime to the prior parsing process


2006-01-07 gg - Since proc names in Tcl can contain a pipe character, and usually in Tcl arguments to commands are seperated using spaces, would it not make sense to use spaces to seperate the commands? I am sure a different synax can be thought of. Of course, the current syntax is more shell like, but surely a different syntax could be better. Just mentioning.


iu2 You're right. I tried to take this into account by allowing different separator chars, but I guess the syntax can be improved.


Sly Let my guess ;)

  {open lines.txt r} {read} {string trim} {split~\n} {lindex~end} {lindex~end} {split~-} {lindex~1} {set code} {puts}

Er?


dbohdan 2014-06-13: Looking to improve the experience of interactive programming in eltclsh I also implemented a command pipeline. You'll need the package textutil from Tcllib to run this code.

package require textutil

# Run command pipelines.
proc |> {args} {
    if {[llength $args] == 0} {
        # Print help
        puts {
Command pipelines for interactive programming.

usage:
    |> cmd1 |> cmd2 _ |> cmd3 #0 ?-debug?
or
    |> { cmd1 |> cmd2 $_ |> cmd3 $pipe(0) } ?-debug?

See http://wiki.tcl.tk/17419 for more.}
        # End help
        return
    }

    set splitOn {\|>}
    set output {}
    set i 0
    set debug 0
    # Syntax 1: |> a |> b _ |> c _ |> d #0
    # Syntax 2: |> { a |> b $_ |> c $_ |> d $pipe(0) }
    set syntax 1

    # Options.
    switch -exact -- [lindex $args end] {
        -debug {
            set debug 1
            set args [lrange $args 0 end-1]
        }
    }

    if {[llength $args] == 1} {
        # The |> { ... } syntax was used.
        set args [lindex $args 0]
        set syntax 2
        upvar 1 _ _
    }

    set commands [::textutil::split::splitx $args $splitOn]

    foreach cmd $commands {
        switch -exact -- $syntax {
            1 {
                set cmd [regsub {#([0-9]+)} $cmd {$pipe(\1)}]
                set cmd [string map [list _ [list $output]] $cmd]
            }
            2 {
                set _ $output
            }
            default {
                puts "error: syntax $syntax"
                return
            }
        }

        set output [uplevel 1 $cmd];
        upvar 1 pipe($i) currentResult
        set currentResult $output
        if {$debug} {
            puts [
                format "%s -- %s -- %s" \
                       #$i \
                       [string replace $cmd 120 end {...}] \
                       [string replace $output 120 end {...}]
            ]
        }
        incr i
    }

    return $output
}

Invocation

As mentioned in its help message the command |> allows you to use one of two possible syntaxes for building the pipeline, either

|> cmd1 |> cmd2 _ |> cmd3 #0 ?-debug?

or

|> { cmd1 |> cmd2 $_ |> cmd3 $pipe(0) } ?-debug?

Here each _ or $_ refers to the result of the command directly before it. The variables $pipe(0), $pipe(1), ... refer to the results of the respective command in the pipeline. #0, #1, ... are replaced with $pipe(0), $pipe(1), ... when the commands are run and are there to allow the user to avoid quoting the whole command pipeline when they need to refer to different commands' output. (Those variables as assigned during the pipeline's execution, so when the pipeline isn't quoted $pipe(0) won't work -- and neither will \$pipe(0).) The -debug option prints the intermediate commands' results at each step.

If you use $var or [ cmd ] in the pipeline with the first syntax it will substitute commands and variable at the beginning of execution and in the parent scope. The second syntax will make command substitutions happen in order and in the local scope. As an example compare

% cd /home
% set a 1
% |> cd .. |> set a 2 |> puts "$a [pwd]"
1 /home

and

% cd /home
% set a 1
% |> { cd .. |> set a 2 |> puts "$a [pwd]" }
2 /

The second syntax is preferable for use in actual programs but is less convenient to type in a shell.

Examples and discussion

From what I see this implementation is different from iu2's one above in two ways that I think give it an advantage in usability. First, it makes it optional to quote the command pipeline. E.g.,

# Print the size of all files in megabytes.
% |> fileutil::find |> struct::list mapfor x _ { file size $x } |> struct::list fold _ 0 tcl::mathop::+ |> hformat _
0.54 MB>

works the same as

% |> { fileutil::find |> struct::list mapfor x $_ { file size $x } |> struct::list fold $_ 0 tcl::mathop::+ |> hformat $_ }
0.54 MB

(hformat proc source.) This can save editing effort when gradually building the pipeline in an interactive shell with readline support.

Second, you can refer to the output of any command in the pipeline, not just the one directly preceding the current one, as well as disregard the preceding command's output. Example:

# Find out file size.
% proc file-size {fn} { |> open $fn r |> read _ |> string length _ |> close #0 |> return #2 }
% file-size eltclshrc
13659
# The same using quoted commands.
% proc file-size {fn} { |> { open $fn r |> read $_ |> string length $_ |> close $pipe(0) |> return $pipe(2) }}
% file-size eltclshrc
13659

Here we close the file opened in the beginning of the pipeline and still return the desired value.

PYK 2014-06-13: The string map strategy used in |> is too brittle to be used in real code, as it will replace all instances of _ anywhere in the command, whether it be in the command name or in some argument to the command. This problem could be somewhat mitigated by only using with the second syntax, which also eliminates the double substitution issue that the first syntax suffers from.

APN I don't quite follow how all the above pipelining commands distinguish between a | separating commands with a | character that is an argument to one of the commands, possibly passed as a variable.

dbohdan 2014-06-14: PYK, I agree: it is too brittle for real code and you make a good point that with syntax one it will affect arguments as well. However, it is a reasonable trade-off to make for interactive use. At first I upvared $_ but found it still too cumbersome for a shell. Maybe _ substitution should be disabled for syntax two and $_ should be used instead?

APN, they don't (edit: although syntax two prevents splitting on the separator in variables). My original plan was to split the argument string on |> to imitate the a |> b |> c look of command pipelines from other languages but then I decided to go with just the character | (to use plain split). In light of your remarks I changed it to |>, which should occur less commonly.

Offtopic: I wish you could embed code from a repository hosted on Chisel or similar for easier management of long snippets. Heck, just being able to transclude other wiki pages would be neat for the same purpose. You could put the code on a separate page.

dbohdan: After some thought I updated the code to limit #0, #1, ... and _ to syntax one. Syntax two now uses $_ and $pipe(0), $pipe(1) exclusively.


Starwer - 2015-01-26 19:43:13

I also think that Tcl dramatically lakes this way of writting one-liners. In my opinion, a good command-line language must enable a concise writing when we type it live in a console. If you accept a slightly different syntax for the pipe, you can solve that in... one line ;)

proc | {args} { upvar 1 _ _ ; return [set _ [uplevel 1 $args]] }

Then you can use it as:

| open lines.txt r ;| read $_ ;| string trim $_ ;| split $_ \n ;| lindex $_ end ;| lindex $_ end ;| split $_ - ;| lindex $_ 1 ;| set code $_ 

This writting let the Tcl parser do the job of splitting (; keyword) and the commands execute nicelly one after each-other.