another list comprehension

iu2 - 2007-01-16 (1S has changed the date to the ISO-style)

Forming a list out of another list can be achieved with one line of code.

Taking the examples from the tcl documentation about struct::list map we get mapping with at least two lines of code:

    tclsh> # squaring all elements in a list

    tclsh> proc sqr {x} {expr {$x*$x}}
    tclsh> ::struct::list map {1 2 3 4 5} sqr
    1 4 9 16 25

    tclsh> # Retrieving the second column from a matrix
    tclsh> # given as list of lists.

    tclsh> proc projection {n list} {::lindex $list $n}
    tclsh> ::struct::list map {{a b c} {1 2 3} {d f g}} {projection 1}
    b 2 f

In tcl 8.5 they can be rewritten as one liners like this:

  # squaring
  % ::struct::list map {1 2 3 4 5} {apply {x {expr {$x*$x}}}}
  1 4 9 16 25    

  # projection
  % ::struct::list map {{a b c} {1 2 3} {d f g}} {apply {x {lindex $x 1}}}
  b 2 f

Comments and improvements to this form can be reviewed in foreach little friends, where I talked about struct::list and apply as an introduction to something else.

This is only a step towards list comprehensions. I used list comprehensions a lot in Python and I miss them in tcl. List comprehensions have several advantages over other methods of forming lists:

  • A one liner, yet very clear
  • Can handle more than one list
  • Engages both mapping and filtering

I would describe Python's list comprehensions syntax as addictive. I used it only a couple of times before I through away the classic map/filter and lambda as a means of forming lists out of other lists. I also started using list comprehensions where I would usually use looping.

Many coding patterns seem unnecessary until one tries them out. For example, giving another name to a proc can be achieved via both interp alias {}... and a proc. I still don't get what iterp alias {}... does that a simple proc doesn't, but I do feel that if I use 'iterp' for a couple of times I will. After all it requires less coding. Perhaps when I find it convenient I will use it more often, hence improving my programming techniques - same function with different names. The same applies to list comprehensions.

A few examples form Python:

  # squaring
  >>> lis = [1,2,3,4,5]
  >>> lis = [x*x for x in lis]
  >>> lis
  [1, 4, 9, 16, 25]

Given lis = [1,2,3,4,5] one can calculate the sum of squares in a one short clear line

  >>> sum([x*x for x in lis])
  55  

And here is the projection example

  # projection
  >>> [x[1] for x in [['a','b','c'],[1,2,3],['d','f','g']]]
  ['b', 2, 'f']

Working with more than one list using zip:

  # calculating the differences between elements from two lists and taking only the ones that are not 0
  >>> list1 = [10,20,30,40,50]
  >>> list2 = [9, 20, 28, 40, 50]
  >>> print [x-y for x,y in zip(list1, list2) if x-y > 0]
  [1, 2]  

A few times I needed to count all the non empty lines in a file. A brute force, yet elegant list comprehension solution is

        # txt holds the entire text  
        lis = [line for line in txt.splitlines() if line.strip() != ""]
        coutn = len(lis)
        coutn = len(lis)

Indeed a regular expression will do here too, but crafting it will not take 20 (or less) seconds - the time needed to write the list comprehension.

The same thing with tcl would probably be

        set lines {}
        foreach line [split $txt \n] {
          if {[string trim $line] ne ""} {lappend lines $line}
        }
        set count [split $lines \n]

or

        regsub -all -lineanchor {^\s*?\n} $txt "" txt2
        set count [split $txt2 \n]
        set count [split $txt2 \n]

none of which, I feel, fits the task properly. The first one is too long for such a simple thing, and the second one, well, didn't come out right on the first trial.

Python (like Haskell) also supports nested fors like in

  >>> lis1 = [1,2,3]
  >>> lis2 = ['a','b','c']
  >>> [[x,y] for x in lis1 for y in lis2]
  [[1, 'a'], [1, 'b'], [1, 'c'], [2, 'a'], [2, 'b'], [2, 'c'], [3, 'a'], [3, 'b'], [3, 'c']]

but I'll do without it for now.

Is this nice syntax can be imported to tcl? I would like it in further versions. I played with several forms of list comprehensions. The amount of braces stemming from tcl's syntax realy makes it difficult to come up with a nice syntax. Eventually, inspired by bind and little language I've come up with this:

  # A helper proc: perform foreach the paramters given as one list
  proc foreachlist {list body} {
    uplevel 1 foreach $list [list $body]
  }

  # list comprehension
  proc lisco {group {var ""}} {
    # extract params
    regexp {(.*?)(?:\sfor\s)(.*?)(?:\sif\s(.*?)$|$)} $group dummy cmd lists if
    if {$if eq ""} {set if 1}

    # generate foreach line and string-map expression
    set nums 0
    set mapExp {%% % }
    foreach list [uplevel 1 [list subst [uplevel 1 list $lists]]] {  ;# 8-(
      incr nums
      lappend foreachLine $nums $list
      lappend mapExp %$nums $$nums
    }

    # build result list
    set res {}
    foreachlist $foreachLine {
      set mapExp2 [subst $mapExp]
      set cmd2 [string map $mapExp2 $cmd]
      set rtmp [uplevel 1 $cmd2]
      set cond [string map [concat $mapExp2 [list %r $rtmp]] $if]
      if {[uplevel 1 [list expr $cond]]} {lappend res $rtmp}
    }

    if {$var ne ""} {uplevel 1 [list set $var $res]}
    return $res
  }

Here are a few examples using it:

        # squaring
        % lisco {expr {%1*%1} for {1 2 3 4 5}}
        1 4 9 16 25

Why waste space on named arguments? One list requires one argument - %1.

Now with two lists:

         two lists:
        set list1 {10 20 30 40 50}
        set list2 {9 20 28 40 50}
        lisco {expr {%1-%2} for $list1 $list2 if [expr %1-%2] > 0} result
        puts $result

        1 2

Another convenience 'keyword' is %r (for 'reference'), with which the previous lisco may be written like this:

        lisco {expr {%1-%2} for $list1 $list2 if %r > 0} result
        puts $result

        1 2

Better the Python!

The auto-variables, %1, %2, %r are substituted using string map so when they might contain lists, they need to be grouped. Like in the projection example:

        # projection
        % lisco {lindex "%1" 1 for {{a b c} {1 2 3} {d f g}}}
        b 2 f

Now counting the non-empty lines may be achieved this way

        lisco {list %1 for [split $txt \n] if [llength "%r"] > 0} lines
        or
        lisco {format "%1" for [split $txt \n] if [string trim "%r"] ne ""}  lines
        then
        set counlines [llength $lines]

lisco might need more polishing. It also supports only %1-%9 variables (who needs more anyway). The for and if words are used as keywords so you can't have them anywhere else in the expression (not so good if you've got a list containing one of them...), but the idea is clear: Having a nice syntax for forming lists, which will be convenient to use. Like in Python.

RS For the line counting, the regexps need not be very complicated. My train of thought was: count newlines (\n), subtract double-newlines (\n\n):

 expr {[regexp -all \n $txt] - [regexp -all \n\n $txt]}

But of course this is no list comprehension, just a "string comprehension" alternative :^)


iu2 I've tried it on

 set txt {

        hello,

        This is a 

        line of

        text and another

        line
 }

There are 5 non-empty lines and using

 expr {[regexp -all \n $txt] - [regexp -all \n\n $txt]}

gives 10 Or did you mean counting all the lines?

slebetman: It depends on what you mean by "empty" lines. RS's code, while straightforward, considers lines containing whitespase to be non-empty. This is obviously not what you expected. Would this work?:

  expr {[regexp -all \n $txt] - [regexp -all \n\s*?\n $txt] - [regexp ^\s*?\n $txt]}

Note that the last regexp is needed to handle when the first line is empty like your example $txt. Again, this "bug" clearly demonstrates that using regexp for this is much less intuitive.

iu2: Yes it worked. And this works too

 regexp -all -lineanchor {^\s*\S+[^\n]+\n} $txt

but again, it take more time to craft. regexp requires crafting while list comprehensions don't.


NEM: List comprehensions can be generalised to support not just lists, but any structure supporting two operations: one which wraps a value into an instance of the structure, and another which takes a function and applies it to a member of the structure, returning a new instance of the structure. Structures which provide these two operations are known as monads. See that page for a generalised list comprehension syntax. For instance, here are some of your Python examples using the list monad:

 # specialise the monad do-notation for lists
 interp alias {} lcomp {} do List
 set lis {1 2 3 4 5}
 # square each member of the list
 lcomp x <- $lis { yield [expr {$x*$x}] }
 # projection
 lcomp x <- {{a b c} {1 2 3} {d f g}} { yield [lindex $x 1] }
 # multiple lists using zip
 proc zip {xs ys} {
    set ret [list]
    foreach x $xs y $ys { lappend ret [list $x $y] }
    return $ret
 }
 set list1 {10 20 30 40 50}
 set list2 { 9 20 28 40 50}
 lcomp {x y} <- [zip $list1 $list2] { if {$x-$y>0} { yield [expr {$x-$y}] } else fail }
 # Multiple inputs - produces all combinations of ($x,$y)
 lcomp x <- {a b c} y <- {1 2 3} { yield ($x,$y) }
 # Filter text for non-blank lines
 lcomp line <- [split $txt \n] { if {[string trim $line] ne ""} { yield $line } else fail }

It's performance is decent too, given the flexibility.


iu2: Zipping usually works with two or more arguments:

  proc zip {args} {
    set res {}
    for {set c 0} {$c < [llength [lindex $args 0]]} {incr c} {
      set tmp {}
      foreach list $args {
        lappend tmp [lindex $list $c]
      }
      lappend res $tmp
    }
    return $res
  }

  set list1 {1 2 3 4 5 6}
  set list2 {a b c d e}
  set list3 {10 20 30 40 50}

  puts [zip $list1 $list2 $list3 {One Two Three Four Five Six}]

  Result
  {1 a 10 One} {2 b 20 Two} {3 c 30 Three} {4 d 40 Four} {5 e 50 Five} {6 {} {} Six}

See also List Comprehension, lcomp