Chart of proposed list functionality

Purpose: Proposal for an official extension of additional list operations. The proposal is based on the Chart of existing list functionality and the wishlist in Additional list functions.

Author
Andreas Kupries.

Please note: This proposal discusses only the functionality to include in the extension, but neither command names nor implementation issues (like C vs. Tcl, etc.).


Handling lists as sets: Moved to the Chart of proposed set functionality.

Peter De Rijk (2003) Several of the commands moved to sets here, present another functionality on lists because of the implementation of sets as hashes: The order of elements in a set will not necessarily be retained. Things like removing a set of elements from a list, or checking for presence of an element while retaining the list order are useful.


Lists as queues and stacks: Moved to the Chart of proposed queue functionality and Chart of proposed stack functionality.

  • A function to remove one or more items from the end of a list.
  • A function to remove one or more items from the front of a list.
  • A function to append one or more items at the end of a list. --- This function already exists in the kernel, lappend.
  • A function to prepend one or more items before the front of a list.

Other generic accessor and manipulator functionality

  • Check for an empty list.
  • A function to remove an item by position, in place (lreplace in the kernel does that, but not in place!)
  • A function to replace an item by another item, in place. (lreplace, in the kernel, can do that too, but not in place)
   Remark:
      TclX combines both of the above in one function, as does the
      kernel. To discuss.
  • A function to remove items from a list, by name!
  • A function to append new items to a list, as long as they are not already in the list.
  • A function to append new items to a list, but only the non-empty ones.
  • A function to append complete lists to a list, but not as single items. Their items become items of the modified list too (See 'lvarcat').
   The last three functions could be combined to form a sort of
   generalized lappend, with options controlling the behaviour.
  • A function to add an item at an arbitrary place in the list, in place. Sort of generalized stack/queue insertion functionality.
  • Lispish access to head item and tail of a list (car, cdr), and the complements, last item and front of a list. Four functions.
  • Fast assignment of values in a list to different variables.
  • Removal of duplicate items from a list. Sort of list2set conversion.
  • Removal of empty items from a list.
  • A function to reverse the order of the items in a list.
  • A generalized lindex allowing the selection of more than one element, more than once, exclusion of elements, etc.
  • Computation of the indices of items in a list, by name. Or pattern ?
   Uses: Scatter/gather operations, permutations.
  • A generic function for application of a command to all elements in a list.
  • A generic function for application of a command to all elements in a list, with reduction to a single result.
   Both of the above can be used as the base of several other 
   nice functionalities (see below).  And they might be usable 
   for the implementation of some of the functionality listed above too.
  • A function applying a regsub to all elements of a list and returning the results, again as a list.

Wouldn't a 'map' type function equate this? -FW

  • A function treating a list as list of lists (a matrix) and returning a specified column of that matrix (items of the outer list are rows). This is a projection operation.
  • A function computing the length of the longest item in a list, the items are treated as strings.
  • List universal/existential quantifiers: An expression has to hold for all items or item sequences in a list (universal), or for at least one item/item sequence (existential).

Searching in lists.

  • A function extending the known kernel command lsearch to return all matching elements from a list. Allow negation, i.e. returning all non-matching elements ?
  • Extend the above to allow multiple patterns.

Special functions, not fitting anywhere else

  • A function to merge two or more lists into one, by interleaving the contents. Now to have lazy lists too :-).
  • A function to split a single list into two or more. The reverse of the last function.
  • A function to set several places in a list to an item, controlled by a list of indices. (Basically the second function in 3., extended to work for multiple items).
  • A function to turn a flat list into a nested list by creating a sublist of every n elements where n is known as the stride.
  • An option which can be added (in a consistent manner) to list functions that return a flat list to post process the flat list by converting in into a nested list using a stride value. Two candidates for this new option are 'array get' and 'split'.
  • See Counting Elements in a List.

List Manipulation Performance

It goes without saying that we would like the list manipulation routines to be fast and efficient. Manipulation routines (sometimes called mutators) like insert, replace, remove (described above) change the state of objects. Unfortunately, the core list routines almost always require the list to be passed by value. For example, to simply replace a single element in a long list, you have to copy the list like this:

     set myList [lreplace $myList 20 20 "new Element"]

Tcl's object model is smart enough that it doesn't copy all elements in the list, but it does have to make a new list object of pointers to the list elements. If the list is short (100 elements) then this isn't a big deal. If your list is long (1000+) then list manipulation will be slow.

One way to avoid extraneous list copies is to offer additional list functions which can manipulate the list contents "in place," without the extra overhead of list copies. It is necessary to refer to the lists by name instead of by value, just as the core command lappend does. As an example (although we're not specifying syntax here!), we could replace a single element in a long list with a command like

     replace-in-place myList 20 20 "new Element"

Donal Fellows suggests a K() operator on the Tcl Performance page which can help with pure-Tcl implementations of these "in place" operations.

RWT

DKF: As an aside, there are four classic combinators from theoretical CS: o, I, K and S.

(f o g) x
The function composition combinator. Equivalent to doing f(g x)
I x
The identity combinator. Equivalent to x
K x y
The constant function combinator. Equivalent to x (i.e. you use K to build constant functions.)
S x y z
Generalised function application combinator: Equivalent to doing x z(y z)

Note that SKK is equivalent to I. There is a theorem that states that the calculus of the combinators S and K is equivalent to Lambda Calculus, and is therefore Turing powerful.

End of non-Tcl diversion!


-- AK


escargo 16 Marc 2003 - In my data structures classes from ages gone by, we sometimes discussed self-organizing data structures, especially linear lists that had the property of dynamic reordering based on access. There were two often-used heuristics applied:

  1. When an element in a list was accessed, it was moved to the front of the list.
  2. When an element in a list was accessed, it was moved forward one position in the list.

These are specific instances of moving an element either k positions from the front or k positions forward of its current position.

When using linear searches with bursty access patterns, these heuristics can speed up access.


Michael Schlenker 17 March 2003 Basically you're talking about Result Caching.

AK: Not quite. A list of this type can actually be used when managing caches (LRU, i.e. Least Recently Used) to determine which part of the cache to overwrite if the cache is full and something is added. Namely the elements at the end of the list as they are used least.

RS 2004-11-17: See A size-bounded cache for an implementation of that - and here's the LRU functionality for a list alone, in case someone needs it (e.g. for a list of the n files most recently opened):

 proc lru {listVar maxLength item} {
    upvar 1 $listVar list
    if {![info exists list]} {set list {}}
    set pos  [lsearch $list $item]
    set list [lreplace $list $pos $pos]
    lappend list $item
    if {[llength $list] > $maxLength} {set list [lrange $list 1 end]}
    set list
 }

LV How does the above proposal match against the current tcllib extended list command (struct::list) ? Is anyone considering adding more of the above functionality into tcllib?


Wouldn't the following procedure fit into tcllib?

 proc ltuple {multilist {tuplelen 2}} {
    # init a list res (result)
    for {set i 0;set res ""} {$i<$tuplelen} {incr i} {lappend res ""}
    if {[llength $multilist]%$tuplelen!=0} {
        error "incorrect tuple length, or input mismatch"
    }
    for {set i 0;set j 0;set l [llength $multilist]} {$i<$l} {incr i;incr j;set j [expr {$j%$tuplelen}]} {
        lset res $j [concat [lindex $res $j] [lindex $multilist $i]]
    }
    return $res
 }
 # Example :
 array set table {id 2 name MAIN desc "Main process"}
 foreach {keys values} [ltuple [array get table]]
 # $keys={id name desc} and $values={2 MAIN {Main process}}