Version 18 of filter

Updated 2010-12-27 17:07:02 by AMG

The filter paradigm is a concept that takes this form: the application takes its input, massages it, and outputs the data. Under Unix, this concept, paired with pipelines, allows one to build bigger tools by combining filters to create new tools.

For instance, one might build a sequence like this:

 grep MyName logfile | sort -u -f | wc -l

which reads logfile, producing on stdout only those lines containing the string MyName. This output is passed, in parallel (on Unix anyways), to the stdin of a command to sort the resulting lines, writing to its stdout lines which differ from one another (all lines identical without regard to upper or lower case are collapsed into one copy of the line). The stdout of the sort is then passed, in parallel, to the stdin of a filter that counts the number of lines and outputs the total.


Tcl's simple syntax for reading and writing to stdin and stdout makes it fairly easy to write these types of programs - see the filter idiom.


Sometimes you will read references to Tcl being used as glue; this is, in my opinion, another way of referring to the filter paradigm.


JBR 2008-11-20 : Here is a wrapper to handle the details of pipeline filtering: How can I run data through an external filter?


RS 2005-03-08 - filter is also a classic operator in functional programming, and it makes nice simple code:

 proc filter {list script} {
   set res {}
   foreach e $list {if {[uplevel 1 $script $e]} {lappend res $e}}
   set res
 }

Testing, with a simple way to do intersection:

 proc in {list e} {expr {[lsearch -exact $list $e]>=0}}

 % filter {a b c} {in {b c d}}
 b c

AMG: Implementation using lcomp:

proc filter {list test_expression} {
    lcomp {$x} for x in $list if $test_expression
}

test_expression is a boolean if expression that should be a function of $x, the list element being tested.


smh2 2006-08-26 Here is a proc based on this to find the intersection of many lists:

 proc intersection { items } {
   set res [lindex $items 0]             ;    # start with the first sub-list
   foreach item [lrange $items 1 end] {  ;    # loop through the rest
      set res [filter $res {in $item}]   ;    # and filter each new subset against the next
   }
   set res
 }

AMG: Tcl is a great language for producing filters, but a limitation in its I/O library severely hampers its ability to use external filters.

 set chan [open |[list cat -nE] a+]
 puts $chan hello
 flush $chan
 puts [gets $chan]
 # Output: "1 hello$"

This works, but I shouldn't have to do that [flush]. Instead I should be able to [close] the output portion of $chan, forcing a flush and causing cat to see EOF on its input, and then continue to read from the input portion of $chan until cat closes its output.

This problem makes it impossible to use external filters such as tac [L1 ] which don't output anything until receiving EOF. Also one usually must know in advance the number of lines or characters the filter will produce, because relatively few filters write EOF before reading EOF.

NEM: See TIP 332 [L2 ] that added exactly this ability:

 close $chan write

AMG: Some filters operate on more than just stdin and stdout, for example multitee [L3 ]. Is there any ability to write to or read from child process file descriptors other than 0, 1, and 2 (stdin, stdout, and stderr, respectively)? Demultiplexing stdout and stderr can be done with [chan pipe], but what about nonstandard file descriptors?