[iu2] - 01-16-2007

Forming a [list] out of another list can be achieved with one line of code.

Taking the examples from the [tcl] documentation about '''[struct::list] map''' we get mapping with at least two lines of code:

    tclsh> # squaring all elements in a list

    tclsh> proc sqr {x} {expr {$x*$x}}
    tclsh> ::struct::list map {1 2 3 4 5} sqr
    1 4 9 16 25

    tclsh> # Retrieving the second column from a matrix
    tclsh> # given as list of lists.

    tclsh> proc projection {n list} {::lindex $list $n}
    tclsh> ::struct::list map {{a b c} {1 2 3} {d f g}} {projection 1}
    b 2 f

In tcl 8.5 they can be rewritten as one liners like this:

  # squaring
  % ::struct::list map {1 2 3 4 5} {apply {x {expr {$x*$x}}}}
  1 4 9 16 25    
  
  # projection
  % ::struct::list map {{a b c} {1 2 3} {d f g}} {apply {x {lindex $x 1}}}
  b 2 f
  
Comments and improvements to this form can be reviewed in [foreach little friends], where I talked about ''struct::list'' and ''apply'' as an introduction to something else.

This is only a step towards list comprehensions. I used list comprehensions a lot in Python and I miss them in tcl. List comprehensions have several advantages over other methods of forming lists:
   * A one liner, yet very clear
   * Can handle more than one list
   * Engages both mapping and filtering

I would describe Python's list comprehensions syntax as addictive. I used it only a couple of times before I through away the classic map/filter and lambda as a means of forming lists out of other lists. I also started using list comprehensions where I would usually use looping.

Many coding patterns seem unnecessary until one tries them out. For example, giving another name to a proc can be achieved via both '''interp alias {}...''' and a '''proc'''. I still don't get what '''iterp alias {}...''' does that a simple '''proc''' doesn't, but I do feel that if I use 'iterp' for a couple of times I will. After all it requires less coding. Perhaps when I find it convenient I will use it more often, hence improving my programming techniques - same function with different names. The same applies to list comprehensions.

A few examples form Python:

  # squaring
  >>> lis = [1,2,3,4,5]
  >>> lis = [x*x for x in lis]
  >>> lis
  [1, 4, 9, 16, 25]


Given ''lis = [[1,2,3,4,5]]'' one can calculate the sum of squares in a one short clear line
  >>> sum([x*x for x in lis])
  55  

And here is the projection example
  # projection
  >>> [x[1] for x in [['a','b','c'],[1,2,3],['d','f','g']]]
  ['b', 2, 'f']

Working with more than one list using zip:

  # calculating the differences between elements from two lists and taking only the ones that are not 0
  >>> list1 = [10,20,30,40,50]
  >>> list2 = [9, 20, 28, 40, 50]
  >>> print [x-y for x,y in zip(list1, list2) if x-y > 0]
  [1, 2]  

A few times I needed to count all the non empty lines in a file. A brute force, yet elegant list comprehension solution is
        # txt holds the entire text  
        lis = [line for line in txt.splitlines() if line.strip() != ""]
        coutn = len(lis)
        coutn = len(lis)
Indeed a regular expression will do here too, but crafting it will not take 20 (or less) seconds - the time needed to write the list comprehension.     .''' and a '''proc'''. I still don't get what '''iterp alias {}...''' does that a simple '''proc''' doesn't, but I do feel that if I use 'iterp' for a couple of times I will. After all it requires less coding. Perhaps when I find it convenient I will use it more often, hence improving my programming techniques - same function with different names. The same applies to list comprehensions.

The same thing with tcl would probably be
        set lines {}
        foreach line [split $txt \n] {
          if {[string trim $line] ne ""} {lappend lines $line}
        }
        set count [split $lines \n]

or 
        regsub -all -lineanchor {^\s*?\n} $txt "" txt2
        set count [split $txt2 \n]
        set count [split $txt2 \n]
none of which, I feel, fits the task properly. The first one is too long for such a simple thing, and the second one, well, didn't come out right on the first trial. 

Python (like Haskell) also supports nested '''for'''s like in

  >>> lis1 = [1,2,3]
  >>> lis2 = ['a','b','c']
  >>> [[x,y] for x in lis1 for y in lis2]
  [[1, 'a'], [1, 'b'], [1, 'c'], [2, 'a'], [2, 'b'], [2, 'c'], [3, 'a'], [3, 'b'], [3, 'c']]

but I'll do without it for now.

Is this nice syntax can be imported to tcl? I would like it in further versions. 
I played with several forms of list comprehensions. The amount of braces stemming from tcl's syntax realy makes it difficult to come up with a nice syntax. Eventually, inspired by [bind] and [little language] I've come up with this:

  # A helper proc: perform foreach the paramters given as one list
  proc foreachlist {list body} {
    uplevel 1 foreach $list [list $body]
  }
  
  # list comprehension
  proc lisco {group {var ""}} {
    # extract params
    regexp {(.*?)(?:\sfor\s)(.*?)(?:\sif\s(.*?)$|$)} $group dummy cmd lists if
    if {$if eq ""} {set if 1}
  
    # generate foreach line and string-map expression
    set nums 0
    set mapExp {%% % }
    foreach list [uplevel 1 [list subst [uplevel 1 list $lists]]] {  ;# 8-(
      incr nums
      lappend foreachLine $nums $list
      lappend mapExp %$nums $$nums
    }
    
    # build result list
    set res {}
    foreachlist $foreachLine {
      set mapExp2 [subst $mapExp]
      set cmd2 [string map $mapExp2 $cmd]
      set rtmp [uplevel 1 $cmd2]
      set cond [string map [concat $mapExp2 [list %r $rtmp]] $if]
      if {[uplevel 1 [list expr $cond]]} {lappend res $rtmp}
    }
  
    if {$var ne ""} {uplevel 1 [list set $var $res]}
    return $res
  }

Here are a few examples using it:

        # squaring
        % lisco {expr {%1*%1} for {1 2 3 4 5}}
        1 4 9 16 25

Why waste space on named arguments? One list requires one argument - %1.

Now with two lists:
         two lists:
        set list1 {10 20 30 40 50}
        set list2 {9 20 28 40 50}
        lisco {expr {%1-%2} for $list1 $list2 if [expr %1-%2] > 0} result
        puts $result

        1 2

Another convenience 'keyword' is %r (for 'reference'), with which the previous lisco may be written like this:

        lisco {expr {%1-%2} for $list1 $list2 if %r > 0} result
        puts $result

        1 2

Better the Python!

The auto-variables, %1, %2, %r are substituted using '''string map''' so when they might contain lists, they need to be grouped. Like in the projection example:

        # projection
        % lisco {lindex "%1" 1 for {{a b c} {1 2 3} {d f g}}}
        b 2 f

Now counting the non-empty lines may be achieved this way
        lisco {list %1 for [split $txt \n] if [llength "%r"] > 0}
        or
        lisco {format "%1" for [split $txt \n] if [string trim "%r"] ne ""}  

lisco might need more polishing. It also supports only %1-%9 variables (who needs more anyway). The '''for''' and '''if'' words are used as keywords so you can't have them anywhere else in the expression (not so good if you've got a list containing one of them...), but the idea is clear: Having a nice syntax for forming lists, which will be convenient to use. Like in Python.

----
[RS] For the line counting, the [regexp]s need not be very complicated. My train of thought was: count newlines (\n), subtract double-newlines (\n\n):
 expr {[regexp -all \n $txt] - [regexp -all \n\n $txt]}
But of course this is no list comprehension, just a "string comprehension" alternative :^)


----
[Category Discussion]