[[[RMS] faults Tcl for its linked-lists-lessness. Explain what linked lists are, who RMS is, and Tcl perspective--mostly that Tcl doesn't need 'em.]] [Arjen Markus] I do not know who RMS is, but "woe , ye of little imagination", [Tcl] lists can be used as linked lists - if you regard a linked list as some abstract data structure (apart from the implementation!). I regard his/her statement as an example that most people can not distinguish implementation and concept! [AMG] Don't mean to interrupt, but this reminds me of [http://www.gnu.org/fun/jokes/luser-vs-rms.html]. [RS] RMS is Richard M Stallman, a famous programmer in the [GNU] scene, but not enthusiastic about Tcl. (See [Who says Tcl sucks...]) I also have problems understanding the need for linked lists - Tcl's are contiguous in memory, so the benefit of links, i.e. iterating over it, is easily reached with [foreach]. OTOH, recursing over lists involves copying in Tcl, so real linked lists like [LISP]'s CAR/CDR also have their advantages. [Mike Tuxford] notes that no one mentioned that RMS is also the author of [emacs], besides being in the news a lot defending the OSS model. [CL] mentions that, true as this is, the 'O' in OSS is anathema to RMS, last time I checked. [LV] I'd say the issue is likely this - people used to algorithmic solutions requiring specific data structures, such as structures, linked lists, etc. find it difficult to envision how to efficiently implement such solutions in Tcl with only flat lists and hash tables. [davidw] the real problem is not linked lists in and of themselves, but the lack of references. Linked lists aren't really all that necessary in Tcl for the simple case, but references allow you to easily construct more complex data structures. [Arjen Markus] Can this not be achieved in a very similar way as in [Scheme] (the LISP dialect I am gradually learning because of the [SICP] book)? For instance: set complex_data_struct {A {B C {D E {F G}}} Such a structure (lists within lists) allows one to define recursive lists as well: set recursive_list {A {B {C {D {E {F {G}}}}}} [davidw] - maybe it's visually similar, but what's going on underneath is not the same. Have a look at this scheme code, and try and accomplish the same thing in Tcl: > (define a (list 1 2)) > (define b (list 3 4)) > (define l1 (list a b)) > l1 ((1 2) (3 4)) > (set-car! b 30) > l1 ((1 2) (30 4)) As far as I know, it's not possible. Everything gets turned into a string at some point, so references don't get preserved. [Stephen Trier] - back in the dark ages, I wrote LISP-style linked list routines in [BASIC] that used arrays. The internal pointers were array indexes. One could do the same in Tcl with [lset] or an array. ---- [Arjen Markus] It was a bit of a challenge, but here it is - with Tcl lists! # # Simple data references # set ::referenced_data {} proc dataref { data } { set idx [lsearch $::referenced_data $data ] if { $idx < 0 } { set idx [llength $::referenced_data] lappend ::referenced_data $data } return $idx } proc getdata { ref } { lindex $::referenced_data $ref } proc replacedata { ref new } { set ::referenced_data [lreplace $::referenced_data $ref $ref $new] } proc reflist { a b } { dataref [list "REF" $a $b] } proc set-car! { aref b } { set adata [getdata $aref] set refcdr [lindex $adata 2] set newdata [list "REF" $b $refcdr] replacedata $aref $newdata } proc print-reflist { aref } { set print_data {} set adata [getdata $aref] if { [lindex $adata 0] == "REF" } { foreach {dummy car cdr} $adata { append print_data "( " [print-reflist $car] " " \ [print-reflist $cdr] " )" } } else { append print_data [getdata $aref] } return $print_data } set ref1 [dataref "A"] set ref2 [dataref "B"] set ref3 [dataref "C"] set ref4 [dataref "D"] set alist [reflist $ref1 $ref2 ] set blist [reflist $ref3 $ref4 ] set clist [reflist $alist $blist] puts $alist puts $blist puts $clist puts [print-reflist $alist] puts [print-reflist $blist] puts [print-reflist $clist] set-car! $alist $ref3 puts $clist puts [print-reflist $clist] Note that the whole trick is that the data in the list are either tagged and hence identifiable as a reference (or better a pair of references) or not, in which case they are ordinary data. References are regarded as ordinary data and stored in the global list that holds all information. It can undoubtedly be done more elegantly, but that is not the point: I have proven (I hope) the concept of referenced data in pure Tcl. [davidw] - now add garbage collection to it. ([SS]: see [TCL references in TCL] for a pure TCL implementation of references and garbage collection) [Arjen Markus] You are not satisfied quickly, are you? :-) Well, the whole model must be elaborated a bit, as now it will work only for examples structured in the same way as the original - pairs of pairs of elements and the like. But all references are stored as data, so when a reference variable gets set or unset, you may register this event and do a bit of cleaning up. [davidw] - no, because I know I'm right;-) Basically what you are doing, in a clever way, is building your own memory system on top of everything else, which is neat, but the point of a higher-level language is that memory management gets done for you. You shouldn't have to build a whole new system. I think that [feather] will resolve some of these problems. [Arjen Markus] Yes, I realise that it would be a lot of work, I had thought of a quick implementation like: proc GC { } { if { [day_of_week] != "wednesday" } { puts "No garbage collected today!" } else { puts "We are out on an extended lunchbreak" } } but that is just a feeble attempt to a joke. Still, if [feather] might solve that, why not try and revive that? I think I heard from Paul Duffin that it never came past a bunch of ideas. [Salvatore Sanfilippo] - I like TCL for many reasons, but I must admit that the fact I can't use references to create complex data structures like a graph is for me the biggest TCL problem... Of course it is already possible to create non-trivial data structures with TCL, but the lacks of references makes this structures not good for almost all the tasks where complex data structures are really useful. Most real-world problems that fits well in TCL does not require references, also the fact that TCL implements dictionaries, that are a so useful data structure that many problems can be solved using a dictionary without a too big time complexity, makes the lacks of references not so evident. But it exists... and can do the difference between to solve a problem in O(n), or O(n^2). Some option to implement this: (warning, bad english, maybe NOT well exposed concepts). 1) opaque implementation using structures build of slots and the dual-rappresentation of objects. I feel strange there isn't a major effort to stop this problem. From what I can see, one of the possible ways to fix this is to use the (not so) new feature of the TCL core to have a dual rappresentation of objects, this way references can be valid until the interpreter is sure they are valid references, so any attempt to convert it to strings will make they just strings (that will no longer work with the command able to get an object from a reference). The references can be implemented so that they will work with a new kind of data structure that is able to contain slots with name. Every slot can contain a reference, or any other kind of object. This structures build of slot with names containing references to other structures are enough to build all the kind of complex data structures, but at the same time they don't mess with all the rest of TCL. For sure a solution where references are 'free' like any other object, able to be converted to strings and then back to references, seems much more 'interesting' and better designed, but the way TCL works makes me quite sure that it's better to make this subsystem something like a new layer of abstraction that is able to live with the rest, rather then a new fully-featured data type. [DKF]: This is indeed an obvious way to do it. This method splits into two subkinds: '''named''' and '''unnamed''' handles. With a named handle, you have a global/per-interp hashtable that maps the string representation of the object to the pointer representation, and deletion must always be explicit (there's typically also an introspective mechanism for listing the contents of the hash). Tcl's channels work exactly this way, as does [itcl]. If anything is done with the handle objects themselves, it is just a cache of what could be simply discovered. As a technique, it tends to leak resources without careful (paranoid?) programming. With an unnamed handle, deletion of the underlying resource is closely coupled to deletion of the internal representation of the handle object. This cleans things up automatically, but is a bit inclined to be fragile. It's quite possible to deploy real applications that use this though: I've done that. ''[DKF]-end'' [SS]: My idea was a bit different, but I exposed it not very well... I'll retry, both the proposed implementations are assuming garbage collection, but the first makes references a ''quality'' of a new data structure, that's a [SELF]-like object composed of many ''slots with names'', where every slot can only contain a TCL object, or a reference to another of this composed objects. That's in some way to isolate the reference system from the rest of TCL, with commands able to handle references and this new structures. For example the 'struct' command may create a new object of this kind, starting from a TCL list containing the names of the slots: set foo [structure {x y z}] Other commands like set-slot, get-slot and so will assign normal TCL objects (in theory just strings) to slots, while set-ref-slot, and get-ref-slot will be able to set/get objects by reference. BTW, after some day of speculations I tend to think this way to implement references in TCL is simply not good (in favor of the second 'direct' way to do it). ''[SS]-end'' 2) the 'direct' way. Another trivial idea, that is probably the first that comes in mind thinking at the subject "tcl and references" is to directly expose the C object pointer in a TCL string, something like REF:0xff112233. The classical problem with this implementation is that a buggy program can mess with the string, so that the TCL interpreter will reference a random piece of memory. Can't be this fixed using an hash table with all the valid pointers? Of course this hash table should be in sync with the garbage collector, that will remove the addresses of no-longer referenced objects for the table. Apart from the design point of view (it is not so fun to have pointers directly exposed), what's the problem with this very simple implementation?. [DKF]: If you are maintaining such a table, you're just doing some variation on the above two methods: in general, you're free to choose any handle format you want, but letter-prefix and unique-id-suffix is common. If we consider Tcl's channels again, the numeric part of such handles is actually derived in this sort of way (on Unix, it is the file descriptor, and I don't understand the other platforms well enough.) Or am I getting the wrong end of the stick? ''[DKF]-end'' [SS]: Yep... but there is an hard problem: attackers. Using this design, the garbage collector have to scan every single TCL string searching for encoded pointers. An attacker may try to guess valid addresses, in an encoding like PTR:0xaabbccdd that's not hard, so something a bit more robust is needed, like PTR:0xaabbccdd:<64bit-random-number-generated-at-startup>. This ensure the attacker will not pass fake pointers using files, sockets, stdin, and alike. The parallel with TCL channels is very interesting, while of course channels are not garbage collected, that opens a lot of new issues. BTW, I changed my mind, and now I think the direct exposure of pointers in TCL can be a good idea while it is almost always a bad idea in other languages, for the fact that in TCL all is string, and the language is _very_ orthogonal, so simply it is hard to think that the inability to put pointers inside a TCL list is good design. That's why the ''opaque'' design no longer get me... it is even more complex to understand because of the non-orthogonality. To come back to channels's parallel, like the kernel doesn't expose its file handler in user space, but just maps an integer to its internal stuff, TCL may do the same, so instead to have PTR:0xaabbccdd we may have PTR:0, PTR:1 and so on. This way the table isn't just there to tell us if an address is valid or not, but can be used to translate some sort of ''object id'' into its address. ''[SS]-end'' This are just random ideas of implementation, for sure there are better, but my point is that TCL is so good that the TCL's comunity should try to find a solution about this problem, to make it even better. It is something not related to slowness, but directly related to the inability to use all the algorithms yon can find in a computer science book with TCL. Also I'll be glad if someone can point me to the wiki page about the implementation of references in TCL (if any). If you really read all the trash I wrote this night, thanks, and of course if you'll create some kind of group to discuss this specific problem I'll be glad to take part, for what I can help. ---- See also [Tcl and LISP] - [Deferred evaluation] which uses [trace]s instead of specific memory management. [RS] wonders in what aspects ''willset'' is weaker than the above constructions... ---- [RHS] wonders, would it be possible to define a type of weak reference at the C level that can be used at the Tcl level. For example, one could say set myList {a b {c d} e f {g {h i j} k} l} reference mySubList myList [lindex [lindex $myList 5] 0] This would create a "weak reference" to the {h i j} element of the ''myList'' variable named ''mySubList''. When someone sets ''mySubList'', it changes the value in both variables at once. set mySubList {z y x} set mySubList ;# -> {z y x} set myList ;# -> {a b {c d} e f {g {z y x} k} l} When the reference count for the head object, or any sub object above the sublist, is set to 0, then ''mySubList'' becomes a strong reference (a normal tcl variable). info variableType mySubList ;# -> weak (a fake instrospection command, so I can show what I mean) set mySubList {a new list} info variableType mySubList ;# -> strong When garbage collecting, it would mean a list object would need to do a little more work: 1. If the object is not a list, clean it up as normal 2. If the object is a list, check to see if there are any weak references to the object at the top level (the level we're looking at). If there is, make one of them a strong reference, chance all the other weak references to point to the new strong reference, and clean up the originial as normal. 3. For each child in the list, repeat the process and perform the same checks/changes I'm not sure this makes all that much sense, and I'm positive its not a complete description of how one would go about implementing the idea. However, it occurred to me while reading the page, and I thought it would be a good idea to at least put down in writing what my thoughts were. [Lars H]: This violates [the immutability of values]. Right now you can do set myList {a b {c d} e f {g {h i j} k} l} set safeCopyOfMyList $myList and the value in safeCopyOfMyList will stay the same no matter what you do with myList. Not so with your suggestion. Since the [Tcl_Obj] is shared between the two variables, a reference mySubList myList [lindex [lindex $myList 5] 0] set mySubList {z y x} would change also $safeCopyOfMyList. The consequences if a package (e.g. http) would start to make use of this technique could be disastrous, since any value handed over to that package might start mutating behind your back. The postulate that values are immutable is heavily relied upon in Tcl. See [32-bit integer overflow] for some of the chaos that can occur when this is violated (in that case because of a bug). [RHS]: I don't know that that's necessarily the case. set myList {a b {c d} e f {g {h i j} k} l} set safeCopyOfMyList $myList The refcount of the particular object is now 2 reference mySubList myList [lindex [lindex $myList 5] 0] Ok, now mySubList points to the object that is a subelement of myList, with safeCopyOfMyList also pointing at the bigger list set mySubList {z y x} As per normal copy-on-write rules, safeCopyOfMyList now has its own copy of the list. Both copies of the list have a ''refcount'' of 1, and the subelement pointed to by mySubList has a ''weakrefcount'' of 1 (all other objects involved have a ''weakrefcount'' of 0). [Lars H]: The big list has a refcount of 2, but the {g {h i j} k} list only has a refcount of 1, and the same holds for the mutable {h i j} element. When modifying mySubList, [set] has no way of knowing that the weakrefcounted object is part of a shared object. The only place you can unshare the two is at the reference command, but I don't think you can get it to make sense. The entire suggestion is based on "weak" references violating copy-on-write rules. It totally breaks down when you do set mySubList $myList string length $myList Since $myList then contains itself, conversion to string will cause infinite recursion. [RHS]: ''[[snip a bunch of stuff I had written, because it came out wrong]]'' What I envision is the following: % set myList {a b {c d} {e {f g} h} i} % reference mySubList myList [lindex $myList 3 1] % set mySubList f g % set myList {a b {c d} {e {f g} h} i} % set mySubList $myList % set mySubList {a b {c d} {e {f g} h} i} % set myList {a b {c d} {e {a b {c d} {e {f g} h} i} h} i} I'm not quite sure what the rules would be on getting it to work like this, but thats what I had in mind. The basic idea for it all is to allow someone to create ''data structures'' using lists, and be able to operate on various nodes in the data structure. For example: % set myTree { {a 1} {b 2} {d {{e 3} {f 4}}} } % reference myLeaf myTree [lindex $myTree 2 0] % proc changeMe {&node value} { upvar ${&node} node set node [list [lindex $node 0] $value] } % changeMe myLeaf 8 % set myTree { {a 1} {b 2} {d {{e 8} {f 4}}} } All much in the same way one used to be able to upvar to an array element, and then use that as an array, to create more complex data structures. ---- [Category Data Structure]