Version 29 of linked lists

Updated 2003-08-03 19:39:03

[RMS faults Tcl for its linked-lists-lessness. Explain what linked lists are, who RMS is, and Tcl perspective--mostly that Tcl doesn't need 'em.]

Arjen Markus I do not know who RMS is, but "woe , ye of little imagination", Tcl lists can be used as linked lists - if you regard a linked list as some abstract data structure (apart from the implementation!). I regard his/her statement as an example that most people can not distinguish implementation and concept!

RS RMS is Richard M Stallman, a famous programmer in the GNU scene, but not enthusiastic about Tcl. (See Who says Tcl sucks...) I also have problems understanding the need for linked lists - Tcl's are contiguous in memory, so the benefit of links, i.e. iterating over it, is easily reached with foreach. OTOH, recursing over lists involves copying in Tcl, so real linked lists like LISP's CAR/CDR also have their advantages.

LV I'd say the issue is likely this - people used to algorithmic solutions requiring specific data structures, such as structures, linked lists, etc. find it difficult to envision how to efficiently implement such solutions in Tcl with only flat lists and hash tables.

davidw the real problem is not linked lists in and of themselves, but the lack of references. Linked lists aren't really all that necessary in Tcl for the simple case, but references allow you to easily construct more complex data structures.

Arjen Markus Can this not be achieved in a very similar way as in Scheme (the LISP dialect I am gradually learning because of the SICP book)? For instance:

   set complex_data_struct {A {B C {D E {F G}}}

Such a structure (lists within lists) allows one to define recursive lists as well:

   set recursive_list {A {B {C {D {E {F {G}}}}}}

davidw - maybe it's visually similar, but what's going on underneath is not the same. Have a look at this scheme code, and try and accomplish the same thing in Tcl:

    > (define a (list 1 2))
    > (define b (list 3 4))
    > (define l1 (list a b))
    > l1
    ((1 2) (3 4))
    > (set-car! b 30)
    > l1
    ((1 2) (30 4))

As far as I know, it's not possible. Everything gets turned into a string at some point, so references don't get preserved.

Stephen Trier - back in the dark ages, I wrote LISP-style linked list routines in BASIC that used arrays. The internal pointers were array indexes. One could do the same in Tcl with lset or an array.


Arjen Markus It was a bit of a challenge, but here it is - with Tcl lists!

   #
   # Simple data references
   #
   set ::referenced_data {}

   proc dataref { data } {
      set idx [lsearch $::referenced_data $data ]
      if { $idx < 0 } {
         set idx [llength $::referenced_data]
         lappend ::referenced_data $data
      }
      return $idx
   }

   proc getdata { ref } {
      lindex $::referenced_data $ref
   }

   proc replacedata { ref new } {
      set ::referenced_data [lreplace $::referenced_data $ref $ref $new]
   }

   proc reflist { a b } {
      dataref [list "REF" $a $b]
   }

   proc set-car! { aref b } {
      set adata   [getdata $aref]
      set refcdr  [lindex $adata 2]
      set newdata [list "REF" $b $refcdr]
      replacedata $aref $newdata
   }

   proc print-reflist { aref } {
      set print_data {}
      set adata [getdata $aref]
      if { [lindex $adata 0] == "REF" } {
         foreach {dummy car cdr} $adata {
            append print_data "( " [print-reflist $car] " " \
               [print-reflist $cdr] " )"
         }
      } else {
         append print_data [getdata $aref]
      }
      return $print_data
   }

   set ref1 [dataref "A"]
   set ref2 [dataref "B"]
   set ref3 [dataref "C"]
   set ref4 [dataref "D"]

   set alist [reflist $ref1  $ref2 ]
   set blist [reflist $ref3  $ref4 ]
   set clist [reflist $alist $blist]

   puts $alist
   puts $blist
   puts $clist
   puts [print-reflist $alist]
   puts [print-reflist $blist]
   puts [print-reflist $clist]

   set-car! $alist $ref3
   puts $clist
   puts [print-reflist $clist]

Note that the whole trick is that the data in the list are either tagged and hence identifiable as a reference (or better a pair of references) or not, in which case they are ordinary data. References are regarded as ordinary data and stored in the global list that holds all information.

It can undoubtedly be done more elegantly, but that is not the point: I have proven (I hope) the concept of referenced data in pure Tcl.

davidw - now add garbage collection to it.

Arjen Markus You are not satisfied quickly, are you? :-) Well, the whole model must be elaborated a bit, as now it will work only for examples structured in the same way as the original - pairs of pairs of elements and the like. But all references are stored as data, so when a reference variable gets set or unset, you may register this event and do a bit of cleaning up.

davidw - no, because I know I'm right;-) Basically what you are doing, in a clever way, is building your own memory system on top of everything else, which is neat, but the point of a higher-level language is that memory management gets done for you. You shouldn't have to build a whole new system. I think that feather will resolve some of these problems.

Arjen Markus Yes, I realise that it would be a lot of work, I had thought of a quick implementation like:

    proc GC { } {
       if { [day_of_week] != "wednesday" } {
          puts "No garbage collected today!"
       } else {
          puts "We are out on an extended lunchbreak"
       }
    }

but that is just a feeble attempt to a joke. Still, if feather might solve that, why not try and revive that? I think I heard from Paul Duffin that it never came past a bunch of ideas.

Salvatore Sanfilippo - I like TCL for many reasons, but I must admit that the fact I can't use references to create complex data structures like a graph is for me the biggest TCL problem... Of course it is already possible to create non-trivial data structures with TCL, but the lacks of references makes this structures not good for almost all the tasks where complex data structures are really useful. Most real-world problems that fits well in TCL does not require references, also the fact that TCL implements dictionaries, that are a so useful data structure that many problems can be solved using a dictionary without a too big time complexity, makes the lacks of references not so evident. But it exists... and can do the difference between to solve a problem in O(n), or O(n^2).

Some option to implement this: (warning, bad english, maybe NOT well exposed concepts).

1) opaque implementation using structures build of slots and the dual-rappresentation of objects.

I feel strange there isn't a major effort to stop this problem. From what I can see, one of the possible ways to fix this is to use the (not so) new feature of the TCL core to have a dual rappresentation of objects, this way references can be valid until the interpreter is sure they are valid references, so any attempt to convert it to strings will make they just strings (that will no longer work with the command able to get an object from a reference). The references can be implemented so that they will work with a new kind of data structure that is able to contain slots with name. Every slot can contain a reference, or any other kind of object. This structures build of slot with names containing references to other structures are enough to build all the kind of complex data structures, but at the same time they don't mess with all the rest of TCL. For sure a solution where references are 'free' like any other object, able to be converted to strings and then back to references, seems much more 'interesting' and better designed, but the way TCL works makes me quite sure that it's better to make this subsystem something like a new layer of abstraction that is able to live with the rest, rather then a new fully-featured data type.

DKF: This is indeed an obvious way to do it. This method splits into two subkinds: named and unnamed handles.

With a named handle, you have a global/per-interp hashtable that maps the string representation of the object to the pointer representation, and deletion must always be explicit (there's typically also an introspective mechanism for listing the contents of the hash). Tcl's channels work exactly this way, as does itcl. If anything is done with the handle objects themselves, it is just a cache of what could be simply discovered. As a technique, it tends to leak resources without careful (paranoid?) programming.

With an unnamed handle, deletion of the underlying resource is closely coupled to deletion of the internal representation of the handle object. This cleans things up automatically, but is a bit inclined to be fragile. It's quite possible to deploy real applications that use this though: I've done that.

DKF-end

2) the 'direct' way.

Another trivial idea, that is probably the first that comes in mind thinking at the subject "tcl and references" is to directly expose the C object pointer in a TCL string, something like REF:0xff112233. The classical problem with this implementation is that a buggy program can mess with the string, so that the TCL interpreter will reference a random piece of memory. Can't be this fixed using an hash table with all the valid pointers? Of course this hash table should be in sync with the garbage collector, that will remove the addresses of no-longer referenced objects for the table. Apart from the design point of view (it is not so fun to have pointers directly exposed), what's the problem with this very simple implementation?.

DKF: If you are maintaining such a table, you're just doing some variation on the above two methods: in general, you're free to choose any handle format you want, but letter-prefix and unique-id-suffix is common. If we consider Tcl's channels again, the numeric part of such handles is actually derived in this sort of way (on Unix, it is the file descriptor, and I don't understand the other platforms well enough.)

Or am I getting the wrong end of the stick?

DKF-end

This are just random ideas of implementation, for sure there are better, but my point is that TCL is so good that the TCL's comunity should try to find a solution about this problem, to make it even better. It is something not related to slowness, but directly related to the inability to use all the algorithms yon can find in a computer science book with TCL.

Also I'll be glad if someone can point me to the wiki page about the implementation of references in TCL (if any). If you really read all the trash I wrote this night, thanks, and of course if you'll create some kind of group to discuss this specific problem I'll be glad to take part, for what I can help.


See also Tcl and LISP - Deferred evaluation which uses traces instead of specific memory management. RS wonders in what aspects willset is weaker than the above constructions...