Commands pipe

Difference between version 55 and 58 - Previous - Next
** See also **

   [Functional programming]:   

   [aspect]'s TIP#288 parser module [http://chiselapp.com/user/aspect/repository/tcl-hacks/finfo?name=modules/tip288-0.tm%|%in tcl-hacks]:   Uses this construct to generate code for parsing $args.  One useful property of cmdpipe style for this task is that no extraneous variables are created, and the whole expression can be inlined in the function.

   [ycl%|%ycl sugar cmdpipe]:   `pipe2` uses a [list]-based syntax, where each component of the pipline is composed of two lists, and the piped output is concatenated between the.  Example:   `| 

   [pipethread]:   Multithreaded pipes with each consumer/producer running in its own thread, similar to how pipes work in [Unix shells].

** Implementations **


*** iu2's implementation ***

[iu2] 2006-12-30

Typing in the algorithm the way you are '''thinking''' can be much more fun then change the way you are thinking in order to fit the programming language.

Consider the following file, named '''lines.txt''':

======none
line1 1 2 3 4 5 end-code1-line
line2 2 2 3 4 5 end-code2-line
line3 3 2 3 4 5 end-code3-line
line4 4 2 3 4 5 end-code4-line
line5 5 2 3 4 5 end-code5-line
======

In order to read the text I must open the file first, so actually I want to open the file, then read the text.
In [Python] this would be:

======none
file("lines.txt", "r").read()
======

which is exactly along my stream of thoughts.

In Tcl this would be:

======
read [open lines.txt r]
======

which is the opposite of my stream of thoughts. I opened the file, then went back on the line and read it.

This is only but a small example, and is so basic that I actually use it as a pattern when reading files.
But what if I wanted to extract the middle '''codeX''' word of the last word in the last line?

In Python I write:

======none
code = file('lines.txt').read().strip().splitlines()[-1].split()[-1].split('-')[1]
print code
======

'''output''':

======none
code5
======


Which is read as: "Open file, read it, strip spaces from beginning and end, generate a list of lines, take last line from the list, generate a list of words, take last word, generate a list by splitting this word on `-`, then take second word". '''That's a one-liner!'''
But more important than a one liner, the commands in this line stream the way I think on how to get the result I want.

Converting this line to [functional programming] is not just hard, worse: It's not fun. One of the things that make Python programming fun is just that: Think and program the same path.

The following procedure makes this possible in Tcl, and I call it a ''Commands pipe'', or '''compi''' for short:

======
# Commands Pipe
proc compi {sep {fns ""} {var ""}} {
    # parse arguments
    if {$var ne ""} {        ;# all three arguments supplied
        foreach {s p} [split $sep ""] {break}
    } elseif {$fns eq ""} {        ;# one argumens supplied
        set s |; set p ~; set fns $sep
    } else {        ;# two arguments supplied
        if {[string length $sep] == 2} {        ;# first argument is separators
            foreach {s p} [split $sep ""] {break}
        } else {        ;# sep is fns and fns is var
            set var $fns; set fns $sep; set s |; set p ~
        }
    }

    # create a valid tcl command
    set commands [split $fns $s]
    set cmd [set curcmd [lindex $commands end]]
    if {[string first $p $curcmd] < 0} {append cmd $p}
    for {set i [expr {[llength $commands] - 2}]}  {$i >= 0} {incr i -1} {
        set curcmd [lindex $commands $i]
        if {[string first $p $curcmd] < 0 && $i > 0} {append curcmd $p}
        set cmd [string map "$p { \[$curcmd\] }" $cmd]
    }

    # perform command
    if {$var eq ""} {return [uplevel 1 $cmd]} else {uplevel 1 [list set $var $cmd]}
}
======

Using compi:

======
compi ?sp? commands-pipe ?var?
======

Now, with this I can write:

======
compi {open lines.txt r|read|string trim|split~\n|lindex~end|lindex~end|split~-|lindex~1|set code}
puts $code
======

'''output''':

======none
code5
======

where the output of each command is used as an input to the next command. `|` is a '''command separator''', and `~` is a '''place holder''' for the output of the previous command, and is placed whenever the output is used in the middle of the current command.

The first block of code in `compi` enables two options, which may be combined:
It is possible to choose different characters for the separator and the place holder. If the same characters appear in the commands themselves. The previous example could be written:

======
compi ,` {open lines.txt r,read,string trim,split`\n,lindex`end,lindex`end,split`-,lindex`1,set code}
puts $code
======

If the commands pipe is performed many times, compi will waste time. Hence it is possible to present a variable and then `[eval]` it:

======
compi {open lines.txt r|read|string trim|split~\n|lindex~end|lindex~end|split~-|lindex~1|set code|puts} pipe1
======

(please mind the last ''puts'')

Where:

======
puts $pipe1
======

yields:

======
puts [set code [lindex [split [lindex [lindex [split [string trim [read [open lines.txt r] ] ] \n] end] end] -] 1] ]
======

(try writing it straight forward.. ;)

and

======
eval $pipe1
======

yields again:

======none
code5
======

While in Python I can "pipe" commands only when each command is a method of the previous object, with compi there is no such limitation. So I can write:

======
compi {open lines.txt r|read|string length|puts}
======

'''output''':

======none
155
======

where in Python the string length and printing commands break the streaming of thinking:

======
print len(file('lines.txt').read())
======

instead of

======
file('lines.txt').read().len().print()
======

----

[Lars H]: Technically, such pipes provide a restricted form of [RPN], in that the stack can only hold one item. Starting from this example to expound on the philosophy and history of different notations could be quite interesting...

[schlenk]: Interesting things. I have to use Python at work currently and have to go the other way round in my thinking. In Tcl, I start with the data I want and add braces or subroutines until I get what I want more or less top down, while in python i start from the bottom up and all the time have to think how i coerce the result object into something useful.
In Python i always miss an easy way to express [new control structures] that come from thinking top down..., while Python is really nice for short and efficient low level code, where Tcls verbosity gets in its way.

----

[iu2]: I think the stack depth issue shifts here from runtime to the prior parsing process

----

[gg] 2006-01-07: Since procedure names in Tcl can contain a pipe character, and usually in Tcl arguments to commands are seperated using spaces, would it not make sense to use spaces to separate the commands? I am sure a different syntax can be thought of. Of course, the current syntax is more [shell]-like, but surely a different syntax could be better. Just mentioning.

----

[iu2]:  You're right. I tried to take this into account by allowing different separator characters, but I guess the syntax can be improved.

----

[Sly]:  Let my guess ;)

======
{open lines.txt r} {read} {string trim} {split~\n} {lindex~end} {lindex~end} {split~-} {lindex~1} {set code} {puts}
======

Er?



*** dbohdan's implementation ***

[dbohdan] 2014-06-13: Looking to improve the experience of [https://en.wikipedia.org/wiki/Interactive_programming%|%interactive programming%|%] in [eltclsh] I also implemented a command pipeline. You'll need the package [textutil] from [Tcllib] to run this code.

======
package require textutil

# Run command pipelines.
proc |> {args} {
    if {[llength $args] == 0} {
        # Print help
        puts {
Command pipelines for interactive programming.

usage:
    |> cmd1 |> cmd2 _ |> cmd3 #0 ?-debug?
or
    |> { cmd1 |> cmd2 $_ |> cmd3 $pipe(0) } ?-debug?

See http://wiki.tcl.tk/17419 for more.}
        # End help
        return
    }

    set splitOn {\|>}
    set output {}
    set i 0
    set debug 0
    # Syntax 1: |> a |> b _ |> c _ |> d #0
    # Syntax 2: |> { a |> b $_ |> c $_ |> d $pipe(0) }
    set syntax 1

    # Options.
    switch -exact -- [lindex $args end] {
        -debug {
            set debug 1
            set args [lrange $args 0 end-1]
        }
    }

    if {[llength $args] == 1} {
        # The |> { ... } syntax was used.
        set args [lindex $args 0]
        set syntax 2
        upvar 1 _ _
    }

    set commands [::textutil::split::splitx $args $splitOn]

    foreach cmd $commands {
        switch -exact -- $syntax {
            1 {
                set cmd [regsub {#([0-9]+)} $cmd {$pipe(\1)}]
                set cmd [string map [list _ [list $output]] $cmd]
            }
            2 {
                set _ $output
            }
            default {
                puts "error: syntax $syntax"
                return
            }
        }

        set output [uplevel 1 $cmd];
        upvar 1 pipe($i) currentResult
        set currentResult $output
        if {$debug} {
            puts [
                format "%s -- %s -- %s" \
                       #$i \
                       [string replace $cmd 120 end {...}] \
                       [string replace $output 120 end {...}]
            ]
        }
        incr i
    }

    return $output
}
======


**** Invocation ****

As mentioned in its help message the command `|>` allows you to use one of two possible syntaxes for building the pipeline, either

======
|> cmd1 |> cmd2 _ |> cmd3 #0 ?-debug?
======
or
======
|> { cmd1 |> cmd2 $_ |> cmd3 $pipe(0) } ?-debug?
======

Here each `_` or `$_` refers to the result of the command directly before it. The variables `$pipe(0)`, `$pipe(1)`, ... refer to the results of the respective command in the pipeline. `#0`, `#1`, ... are replaced with `$pipe(0)`, `$pipe(1)`, ... when the commands are run and are there to allow the user to avoid quoting the whole command pipeline when they need to refer to different commands' output. (Those variables as assigned during the pipeline's execution, so when the pipeline isn't quoted `$pipe(0)` won't work -- and neither will `\$pipe(0)`.) The `-debug` option prints the intermediate commands' results at each step.

If you use `$var` or `[[ cmd ]` in the pipeline with the first syntax it will substitute commands and variable at the beginning of execution and in the parent scope. The second syntax will make command substitutions happen in order and in the local scope. As an example compare

======
% cd /home
% set a 1
% |> cd .. |> set a 2 |> puts "$a [pwd]"
1 /home
======

and

======
% cd /home
% set a 1
% |> { cd .. |> set a 2 |> puts "$a [pwd]" }
2 /
======

The second syntax is preferable for use in actual programs but is less convenient to type in a shell.


**** Examples and discussion ****

From what I see this implementation is different from [iu2]'s one above in two ways that I think give it an advantage in usability. First, it makes it optional to quote the command pipeline. E.g.,

======
# Print the size of all files in megabytes.
% |> fileutil::find |> struct::list mapfor x _ { file size $x } |> struct::list fold _ 0 tcl::mathop::+ |> hformat _
0.54 MB>
======

works the same as

======
% |> { fileutil::find |> struct::list mapfor x $_ { file size $x } |> struct::list fold $_ 0 tcl::mathop::+ |> hformat $_ }
0.54 MB
======

(`hformat` proc [Human readable file size formatting%|% source].) This can save editing effort when gradually building the pipeline in an interactive shell with [readline] support.

Second, you can refer to the output of any command in the pipeline, not just the one directly preceding the current one, as well as disregard the preceding command's output. Example:

======
# Find out file size.
% proc file-size {fn} { |> open $fn r |> read _ |> string length _ |> close #0 |> return #2 }
% file-size eltclshrc
13659
# The same using quoted commands.
% proc file-size {fn} { |> { open $fn r |> read $_ |> string length $_ |> close $pipe(0) |> return $pipe(2) }}
% file-size eltclshrc
13659
======

Here we close the file opened in the beginning of the pipeline and still return the desired value.

[PYK] 2014-06-13: The `[string map]` strategy used in `|>` is too brittle to be
used in real code, as it will replace all instances of `_` anywhere in the
command, whether it be in the command name or in some argument to the command.
This problem could be somewhat mitigated by only using with the second syntax,
which also eliminates the [double substitution] issue that the first syntax
suffers from.

[APN]: I don't quite follow how all the above pipelining commands distinguish between `|` separating commands an `|` that is an argument to one of the commands, possibly passed as a variable.

[dbohdan] 2014-06-14: PYK, I agree: it ''is'' too brittle for real code and you make a good point that with syntax one it will affect arguments as well. However, it is a reasonable trade-off to make for interactive use. At first I [upvar%|%upvareed] `$_` but found it still too cumbersome for a shell. Maybe `_` substitution should be disabled for syntax two and `$_` should be used instead?

APN, they don't (edit: although syntax two prevents splitting on the separator in variables). My original plan was to split the argument string on `|>` to imitate the `a |> b |> c` look of command pipelines from other languages but then I decided to go with just the character `|` (to use plain `[split]`). In light of your remarks I changed it to `|>`, which should occur less commonly.

Offtopic: I wish you could embed code from a repository hosted on Chisel or similar for easier management of long snippets. Heck, just being able to transclude other wiki pages would be neat for the same purpose. You could put the code on a separate page.

[dbohdan]: After some thought I updated the code to limit `#0`, `#1`, ... and `_` to syntax one. Syntax two now uses `$_` and `$pipe(0)`, `$pipe(1)` exclusively.

**** See also ****

[dbohdan] 2015-08-18: An improved version of `|>` for Tcl 8.5+ and [Jim Tcl] can be found in the module [fptools]. Look for the procedures `::fptools::pipe` and `::fptools::lpipe`.
[dbohdan] 2020-08-18: Out of the pipe commands I have written I've only ended up really using one.  It is available in [Jimlib] as `lib::pipe`.  It works in Tcl 8.6+ and Jim Tcl, and since it is independent of the rest of the library, you can just copy it to your source file or [tclshrc].  Here is an example of its use.

======
lib::pipe x $path {
    glob -nocomplain -dir $x *.json
} -map {
    if {![regexp {20\d\d-\d\d-\d\d} $x date]} continue
    lindex $date
} {
    lsort $x
} {
    set result $x
}
======

*** Starwer's implementation ***

'''[Starwer] - 2015-01-26 19:43:13'''

I also think that Tcl dramatically lacks this way of writing one-liners. In my opinion, a good command-line language must enable a concise writing when we type it live in a console.

If you accept a slightly different syntax for the pipe, you can solve the aforementioned problems in... one line ;)
======
proc | {args} { upvar 1 _ _ ; return [set _ [uplevel 1 $args]] }
======

Then you can use it as:

======
| open lines.txt r ;| read $_ ;| string trim $_ ;| split $_ \n ;| lindex $_ end ;| lindex $_ end ;| split $_ - ;| lindex $_ 1 ;| set code $_
======

This writing lets the Tcl parser do the job of splitting (; keyword) and the commands execute nicely one after each-other. This solves the variable declaration problem ($_) that required the expressions to be enclosed in {}, and it also solves the ambiguity of the | sign usage. Last, but not least: you can break your lines into several lines and it still works: useful to debug it in a console !

----

'''[Starwer] - 2015-01-26 20:04:06'''

Well... I couldn't resist to develop a full command set to mimic my favorite Unix pipe and redirection commands, namely the | > >> grep, sed...

Here is the usage:

======
# normal operations
| cmd1 ;| cmd2 $_ ;| cmd3 $_ args

# file/list operations
< source.txt ;| grep "hello" ;| map {regsub {goodbye} $_ {hello} _} ;> destination.txt

# string operations (regular expressions and substitutions, Perl style...)
|< "Bye World" ;|s bye Hello i ;|~ {^(\w*)\s+(\w+)} ;|< "this says '$1' to '$2'" ;|> destvar

# Numeric operations
|$ 1+2 ;| incr _ ;|> var1 ;|& |< "$var1 different from 0";|| |< "$var1 equal 0"
======


Here is what I called pipeline.tcl:
======
proc | {args} {
    if {[llength $args] == 0} {
        # Print help
        puts {Unix-style file redirection and pipelining commands for interactive programming.

usage:
    # normal operations
    | cmd1 ;| cmd2 $_ ;| cmd3 $_ args

    # file/list operations
    < source.txt ;| grep "hello" | map {regsub {goodbye} $_ {hello} _} ;> destination.txt

    # string operations
    |< "Bye World" ;|s bye Hello i ;|~ {^(\w*)\s+(\w+)} ;|< "this says '$1' to '$2'" ;|> destvar

    # Numeric operations
    |$ 1+2 ;| incr _ ;|> var1 ;|& |< "different from 0";|| |< "equal 0"

See http://wiki.tcl.tk/17419 for more.}
        # End help
        return
    }

    upvar 1 _ _
    return [set _ [uplevel 1 $args]]
}


proc < {{file ""}} {
    if {[string length $file] == 0} {
        puts {store the content of a file in $_ as a list of its lines.}
        |
        return
    }

    upvar 1 _ _
    set fp [open $file r]
    set _ [split [read $fp] "\n"]
    close $fp

    return $_
}

proc > {{file ""}} {
    if {[string length $file] == 0} {
        puts {Save the list $_ to a file.}
        |
        return
    }
    upvar 1 _ _

    set fileId [open $file w]
    puts -nonewline $fileId [join $_ "\n"]
    close $fileId

    return $_
}

proc >> {{file ""}} {
    if {[string length $file] == 0} {
        puts {Append the list $_ in a file.}
        |
        return
    }

    upvar 1 _ _
    set fileId [open $file w+]
    puts -nonewline $fileId [join $_ "\n"]
    close $fileId

    return $_
}


proc grep {args} {
    if {[llength $args] == 0} {
        puts {grep ?list? expr
    From the list given in argument list (default: $_), keep indices matching the regular expression expr.
        }
        |
        return
    }
    if {[llength $args] > 2} { error "bad number of arguments: grep ?list? regexp"}
    if {[llength $args] == 1} {
        set re [lindex $args 0]
        upvar 1 _ _
    } else {
        set re [lindex $args 1]
        set _ [lindex $args 0]
    }

    foreach line $_ {
        if {[regexp $re $line]} {
            lappend ret $line
        }
    }

    return $ret
}


proc map {args} {
    if {[llength $args] == 0} {
        puts {map ?list? command
    From the list given in argument list (default: $_), apply the command given to every index.
    The value of the index is in $_
        }
        |
        return
    }
    if {[llength $args] > 2} { error "bad number of arguments: map ?list? function"}
    if {[llength $args] == 1} {
        set func [lindex $args 0]
        upvar 1 _ _
        set lines $_
    } else {
        set func [lindex $args 1]
        set lines [lindex $args 0]
    }

    foreach _ $lines  {
        lappend ret [uplevel 1 $func]
    }

    return $ret
}


proc |< {args} {
    if {[llength $args] == 0} {
        puts {Store the arguments to variable $_}
        |
        return
    }
    upvar 1 _ _
    if {[llength $args] == 1} { return [set _ [lindex $args 0]] }
    return [set _ $args]
}

proc |> {{var ""}} {
    if {[string length $var] == 0} {
        puts {Copy the content of variable $_ to the given variable.}
        |
        return
    }
    upvar 1 _ _
    return [set $var $_]
}

proc |$ {args} {
    if {[llength $args] == 0} {
        puts {Evaluate the expression given as arguments (expr) and store the result in $_}
        |
        return
    }
    upvar 1 _ _
    return [set _ [uplevel 1 "expr $args"]]
}

proc |& {args} {
    if {[llength $args] == 0} {
        puts {Execute the command in arguments if the content of $_ is a non-empty string and,
if it is a number, different from 0. If the command is executed, its result goes to $_}
        |
        return
    }
    upvar 1 _ _
    if {[string length $_]>0 && (![string is double $_] || $_!=0) } {
        set _ [uplevel 1 $args]
    }
    return $_
}

proc || {args} {
    if {[llength $args] == 0} {
        puts {Execute the command in arguments if the content of $_ is an empty string or
if it is a number equal to 0. If the command is executed, its result goes to $_}
        |
        return
    }
    upvar 1 _ _
    if {[string length $_]==0 || ([string is double $_] && $_==0) } {
        set _ [uplevel 1 $args]
    }
    return $_
}


proc |~ {{re ""} {opts ""}} {
    if {[string length $re] == 0} {
        puts {|~ expr ?options?
    Apply the regular expression expr to $_
    options is a string which can contain i (case-insensitive) and/or m (multi-line match).}
        |
        return
    }

    lappend lopts
    if {![regexp -expanded {[img]*} $opts]} {
        error "bad option for |~ : $opts"
    }
    if {[string match *i* $opts]} { lappend lopts -nocase }
    if {[string match *m* $opts]} { lappend lopts -line }

    upvar 1 _ _
    upvar 1 0 0
    if { ! [uplevel 1 "regexp -expanded $lopts {$re} \$_ 0 1 2 3 4 5 6 7 8 9"] } {
        for {set i 0} {$i<10} {incr i} { uplevel 1 "set $i \"\"" }
    }
    return [set _ $0]
}

proc |s {{re ""} {subs ""} {opts ""}} {
    if {[string length $re] == 0} {
        puts {|s expr subst ?options?
    Apply the regular expression expr to $_ and replace with subst.
    options is a string which can contain i (case-insensitive), m (multi-line match)
    and/or g (global, replace every occurrence).}
        |
        return
    }

    lappend lopts
    if {![regexp -expanded {[img]*} $opts]} {
        error "bad option for |~ : $opts"
    }
    if {[string match *i* $opts]} { lappend lopts -nocase }
    if {[string match *m* $opts]} { lappend lopts -line }
    if {[string match *g* $opts]} { lappend lopts -all }

    upvar 1 _ _
    if { ! [uplevel 1 "regsub -expanded $lopts {$re} \$_ {$subs} _"] } {
        set _ ""
    }
    return $_
}

======

*** Roy Keene's implementation ***

[Roy Keene] 2014-08-18: Here's my approach to this

======
proc pipe args {
    set input [list]

    while 1 {
        set end [lsearch -exact $args "|"]

        if {$end == -1} {
            set end "end"
            set part [lrange $args 0 end]
        } else {
            set part [lrange $args 0 [expr {$end - 1}]]
        }

        set part [string trim $part]

        set mode list
        if {[string index $part 0] == "{" && [string index $part end] == "}"} {
            set part [lindex $part 0]
            set mode "string"
        }

        switch -- $mode {
            list {
                foreach idx [lsearch -all -exact $part @@INPUT@@] {
                    set part [lreplace $part $idx $idx $input]
                }
            }
            string {
                set part [string map [list @@INPUT@@ [list $input]] $part]
            }
        }

        set input [uplevel $part]

        if {$end eq "end"} {
            break
        }

        set args [lrange $args [expr {$end+1}] end]
    }

    return $input
}
======

Example usage and output:
Usage:
======
proc procA {} {
    return [list a b c]
}

proc procB {input} {
    return [lreverse $input]
}

set something [list joe bob]

puts [pipe procA | procB @@INPUT@@ | { set x [list]; foreach blah @@INPUT@@ { lappend x $blah.1 }; return $x } ]
======

Output:

======
c.1 b.1 a.1
======

----

[PYK] 2016-04-15:  Here is another implementation very similar to Roy's, but
which makes it possible to specify a parameter string for each segment of the
pipe.  It avoids string operations on `$args`, which fixes one bug in Roy's
implementation in the case of a command named `{`. 

======
proc pipe args {
    set cmd {}
    set newparam {}
    set run 0
    while {[llength $args]} {
        set args [lassign $args[set args {}] arg]
        if {[string match |* $arg]} {
            set run 1
            if {[string length $arg] > 1} {
                set newparam [string range $arg 1 end]
            }
        } else {
            lappend cmd $arg
        }
        if {$run || [llength $args] == 0} {
            if {$cmd eq {}} {
                return -code error [list {empty command} $cmd]
            }
            if {[info exists result]} {
                set cmd [join $cmd[set cmd {}]]
                set cmd [string map [list $param [list $result]] $cmd]
            }
            set result [uplevel $cmd]
            if {$newparam eq {}} {
                set param @@INPUT@@
            } else {
                set param $newparam
            }
            set cmd {}
            set run 0
        }
    }
    return $result
}
======

'''example''':

======
puts [pipe procA | procB @@INPUT@@ |@param@ { set x [list]; foreach blah @param@ { lappend x $blah.1 }; return $x } ]
======



** A list-based command pipe ** 

[PYK] 2016-10-02:  This very small procedure uses a list-based syntax to accomplish the task:

======
proc | args {
        set args [lassign $args[set args {}] first]
        set result [uplevel 1 $first]
        foreach arg $args {
                lassign $arg one three
                set result [uplevel 1 $one [list $result] $three]
        }
        return $result
}
======

Each component of the pipe can be composed of two lists, between which the
piped value is inserted to form a script by subsequent concatenation.  Although
the syntax is a little more verbose, I appreciate the symmetry and
orthogonality of it: 

======
puts [| {
        lindex {Hello, plato}
} {
        split , 
} {
        lindex end
} {
        {string trim}
} {
        {string totitle}
}]
======


<<categories>> Control Structure