Tcl references in Tcl

SS: This is an implementation of references with Garbage Collection for TCL, witten in TCL itself. A C implementation is under development, but it's API compatible with this one.

Changelog: 6 Feb 2004: GC bugfix, type-checked structures, callable structures support, auto-called GC.

This stuff is splitted in more files. The following is the core of references and GC:

references.tcl

 #!/bin/sh
 # the next line restarts using tclsh \
 exec tclsh "$0" "$@"
 
 # TCL's references implemented in TCL
 #
 # COPYRIGHT AND PERMISSION NOTICE
 # 
 # Copyright (c) 2003-2004 Salvatore Sanfilippo
 # 
 # All rights reserved.
 # 
 # Permission is hereby granted, free of charge, to any person obtaining a
 # copy of this software and associated documentation files (the
 # "Software"), to deal in the Software without restriction, including
 # without limitation the rights to use, copy, modify, merge, publish,
 # distribute, and/or sell copies of the Software, and to permit persons
 # to whom the Software is furnished to do so, provided that the above
 # copyright notice(s) and this permission notice appear in all copies of
 # the Software and that both the above copyright notice(s) and this
 # permission notice appear in supporting documentation.
 # 
 # THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS
 # OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF
 # MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT
 # OF THIRD PARTY RIGHTS. IN NO EVENT SHALL THE COPYRIGHT HOLDER OR
 # HOLDERS INCLUDED IN THIS NOTICE BE LIABLE FOR ANY CLAIM, OR ANY SPECIAL
 # INDIRECT OR CONSEQUENTIAL DAMAGES, OR ANY DAMAGES WHATSOEVER RESULTING
 # FROM LOSS OF USE, DATA OR PROFITS, WHETHER IN AN ACTION OF CONTRACT,
 # NEGLIGENCE OR OTHER TORTIOUS ACTION, ARISING OUT OF OR IN CONNECTION
 # WITH THE USE OR PERFORMANCE OF THIS SOFTWARE.
 # 
 # Except as contained in this notice, the name of a copyright holder
 # shall not be used in advertising or otherwise to promote the sale, use
 # or other dealings in this Software without prior written authorization
 # of the copyright holder.
 #
 # TODO:
 # * Bad pointers detection
 # * GcCycle should be called automatically.
 # * References introspection
 # * Make gcscan lazy, to avoid stack overflows with very long lined structures
 #
 # FUTURE WORK:
 # * Interface that allows to register commands called before an
 #   object gets collected.
 # * To provide a 'free' command to free objects by pointer without to
 #   wait for the GC. Traces can help in doing this.
 
 #
 # Initialization
 #
 namespace eval ref {}
 set ::ref::id 0
 set ::ref::debug 0
 set ::ref::calls 0
 set ::ref::collection-period 5000
 
 #
 # Low-level commands to deal with references
 #
 proc ::ref::ref obj {
     ::ref::collect-if-needed
     set id "<ref:$::ref::id>"
     incr ::ref::id
 
     set ::ref::tab($id) $obj
     return $id
 }
 
 proc ::ref::getbyref ref {
     ::ref::collect-if-needed
     return $::ref::tab($ref)
 }
 
 proc ::ref::setref {ref obj} {
     ::ref::collect-if-needed
     set ::ref::tab($ref) $obj
     return $ref
 }
 
 proc ::ref::isref ref {
     string match {<ref:*>} $ref
 }
 
 proc ::ref::refvar ref {
     if {[info exists ::ref::tab($ref)]} {
         return ::ref::tab($ref)
     }
     error "Can't get variable of invalid reference $ref"
 }
 
 proc ::ref::collect-if-needed {} {
     if {[incr ::ref::calls] == ${::ref::collection-period}} {
         set ::ref::calls 0
         ::ref::collect
     }
 }
 
 #
 # Garbage collection
 #
 proc ::ref::collect {} {
     global ::ref::visited
 
     array set ::ref::visited {}
     array unset ::ref::visited
 
     # Find the roots of the graph (globals and locals of all the levels)
     set roots {}
     set levels [info level]
     set varnames [::ref::getallglobals]
 
     # Remove global variables that must not be collected
     foreach x {::ref::tab ::ref::visited} {
         set index [lsearch -exact $varnames $x]
         if {$index != -1} {
             set varnames [lreplace $varnames $index $index]
         }
     }
 
     # Append the global's content to the root list
     # We need to take care of arrays, but we just don't
     # care about nested lists: our CG discovers references
     # by pattern matching in the whole 'roots' string.
     foreach v $varnames {
         if {[array exists $v]} {
             lappend roots [array get $v]
         } else {
             lappend roots [set $v]
         }
     }
 
     # Do the same for locals in all the levels
     for {set j 1} {$j < $levels} {incr j} {
         set varnames [uplevel $j info locals]
         foreach v $varnames {
             if {[uplevel $j array exists $v]} {
                 lappend roots [uplevel $j array get $v]
             } else {
                 lappend roots [uplevel $j set $v]
             }
         }
     }
 
     # Scan the whole graph collecting all the alive references
     # (GcScan's side effect is the population of the ::ref::visited array)
     ::ref::gcscan $roots
     #puts [array names ::ref::visited]
     ::ref::gcsweep
 }
 
 # Helper functions for the GC follows.
 
 # Return all the namespaces currently defined, except for the ::ref
 # namespace
 proc ::ref::getnamespaces {{nslist ::}} {
     set res {}
     foreach ns $nslist {
         if {[string equal $ns ::ref]} continue
         lappend res $ns
         set res [concat $res [getnamespaces [namespace children $ns]]]
     }
     return $res
 }
 
 # Return all the existing global variables (both in the root and in other namespaces)
 proc ::ref::getallglobals {} {
     set res {}
     set namespaces [::ref::getnamespaces]
     foreach ns $namespaces {
         set res [concat $res [info vars $ns\::*]]
     }
     return $res
 }
 
 # This is a recursive function that walks all reachable strings
 # searching for pointers. It uses a global table to be sure
 # the same reference is not entered multiple times.
 proc ::ref::gcscan roots {
     global ::ref::visited ::ref::debug
 
     set refs [regexp -inline -all {<ref:[0-9]+>} $roots]
     foreach r $refs {
         if {[llength [array names ::ref::visited $r]] == 0} {
             if {$::ref::debug} {
                 puts "GC visiting $r"
             }
             set ::ref::visited($r) 1
             ::ref::gcscan [::ref::getbyref $r]
         }
     }
 }
 
 # The sweep function runs the ::ref::tab array unsetting all the
 # entries not referenced inside the ::ref::visited array.
 proc ::ref::gcsweep {} {
     global ::ref::visited ::ref::tab
 
     foreach r [array names ::ref::tab] {
         if {[llength [array names ::ref::visited $r]] == 0} {
             array unset ::ref::tab $r
             if {$::ref::debug} {
                 puts "GC collecting $r"
             }
         }
     }
 }
 
 proc ::ref::collect-cycles {} {::ref::collect}
 
 namespace eval ::ref {namespace export *}

tclref.tcl

- this implements the higher level layer, C-like structures.

 #load references.so
 source references.tcl
 namespace import ref::*
 
 # The higher level layer, reference-aware structures
 #
 #   struct name slot1 slot2 slot3... slotN
 #
 # create a set of commands able to deal with the described
 # strcuture by reference.
 #
 # === USAGE ===
 #
 # For instance, the command:
 #
 #   struct tree height girth age leaf-shape leaf-color
 #
 # creates a command 'make-tree' that returns a reference
 # to a new tree structure. For every field in the structure
 # a couple of commands to read or set the field are created.
 #
 # For example for the age field, the command 'tree.age' can be
 # used to read the age field, while 'set-tree.age' can be
 # used to set it.
 #
 # An interactive session example:
 #
 # % source struct.tcl 
 # % 
 # % struct tree height girth age leaf-shape leaf-color
 # % set t [make-tree]
 # <ref:0>
 # % set-tree.age $t 10
 # <ref:0>
 # % tree.age $t
 # 10
 #
 # All the commands, except for the 'struct' command, return
 # the reference to the accessed/created structure.
 #
 # Linked structures, even if in cycles, are automatically
 # collected.
 #
 # Null references can be represented using an empty list {}.
 # It's possible to test for a null reference using the [isref] command.
 #
 # === DEFAULT INITIALIZATION ===
 #
 # You can define a structure in a special way so that the 'make' command
 # will return fields initialized to the given values.
 #
 # For example, to create a '2dpoint' structure with points initialized
 # to 0,0 for default, you may want to write:
 #
 #   struct 2dpoint {x 0} {y 0}
 #
 # Just, instead to pass the field name, pass a list with the
 # name of the field and the default value for that field.
 #
 # Another (complementary) way to initialize structure's fields is
 # to pass arguments to the 'make' function of the structure.
 #
 # The following will create a point with x=0 and y=3
 #
 #   set point [make-2dpoint y 3]
 #
 # We specified with the make-2dpoint command to create a point with
 # y set to 3, but x is set to 0 for default, so the point will be at (0,3).
 #
 # You can pass all the field/value pairs you like to the 'make' command
 # of a structure.
 #
 # Note that if you don't specify any initialization for a structure,
 # for default all the fields are initialized to the empty string {}.
 #
 # === ADVANCED STUFF AND COPYING ===
 #
 # Structures are internally represented as Tcl lists (go figure... ;).
 # But all the functions returns, and accepts as argument references
 # to this lists. That's why you should be aware that like in C structures
 # here you are working by references.
 #
 # If you want to duplicate a linked data structure, that's to "deep copy"
 # a structure, you need to use the 'deepcopy' command. deepcopy is a
 # recursive command that replaces every reference in a structure with a
 # reference to a copy of the referenced object.
 #
 # If you want to duplicate just a structure itself, without to recursively
 # duplicate all the referenced objects, use the 'copy' command instead.
 #
 # If you have some experience with Python this should be quite clear
 # as there are similar to 'copy' and 'deepcopy' after an "import copy".
 #
 # Note that the 'deepcopy' command is cycles resistant, so it's safe
 # to call it against circular data structures.
 
 namespace eval ::struct {}
 rename unknown ::struct::unknown
 
 proc ::struct::call {self args} {
     error "Redefine ::struct::call to make structures callable objects"
 }
 
 proc unknown args {
     if {[::ref::isref [lindex $args 0]]} {
         return [uplevel ::struct::call $args]
     }
     uplevel ::struct::unknown $args
 }
 
 proc struct {name args} {
         set idx 2
         set template $name
         set typeassert [format {
             if {![string equal [lindex $t 0] %s]} {
                 error "struct '%s' command used against struct '[lindex $t 0]'"
             }
         } $name $name]
         foreach slot $args {
                 set initializer {}
                 if {[llength $slot] > 1} {
                     set initializer [lindex $slot 1]
                     set slot [lindex $slot 0]
                 }
                 proc set-$name.$slot {ref val} [format {
                         set t [getbyref $ref]
                         %s
                         set t [lreplace $t %s %s $val]
                         setref $ref $t
                 } $typeassert $idx $idx]
 
                 proc $name.$slot {ref} [format {
                         set t [getbyref $ref]
                         %s
                         return [lindex $t %s]
                 } $typeassert $idx]
 
                 lappend template $slot $initializer
                 incr idx 2
         }
         proc make-$name args [format {
             set s [ref {%s}]
             foreach {slot val} $args {
                 set-%s.$slot $s $val
             }
             return $s
         } $template $name]
 }
 
 proc struct-fields ref {
     set fields {}
     foreach {f _} [lrange [getbyref $ref] 1 end] {
         lappend fields $f
     }
     return $fields
 }
 
 proc struct-type ref {
     lindex [getbyref $ref] 0
 }
 
 proc copy ref {
     ref [getbyref $ref]
 }
 
 proc deepcopy ref {
     return [__deepcopy $ref 0]
 }
 
 # The extra complexity here is to avoid to convert references
 # to some other kind of object, and of course to avoid infinite-loops
 # with cyclical structures.
 proc __deepcopy {ref level} {
     if {$level} {
         upvar refarray refarray
     }
     incr level
     if {[info exists refarray($ref)]} {
         return $ref
     }
     set refarray($ref) {}
     if {[isref $ref]} {
         set x [ref [__deepcopy [getbyref $ref] $level]]
         set refarray($x) {}
         return $x
     } else {
         set res {}
         foreach t $ref {
             if {[isref $t]} {
                 set x [__deepcopy $t $level]
                 lappend res $x
                 set refarray($x) {}
             } else {
                 lappend res $t
             }
         }
         return $res
     }
 }

example.tcl

- example usage

 ################################################################################
 #
 # This are examples of simple data structures like
 # linked lists and binary search trees build with tclref.
 #
 # Note that programming is exactly the same as with languages
 # that support references in a native way.
 
 source tclref.tcl
 
 ################################################################################
 # Binary Search Trees
 
 struct tnode parent left right key
 struct tree root
 
 proc bst-insert {tree z} {
     set y {}
     set x [tree.root $tree]
     while {[isref $x]} {
         set y $x
         if {[tnode.key $z] < [tnode.key $x]} {
             set x [tnode.left $x]
         } else {
             set x [tnode.right $x]
         }
     }
     set-tnode.parent $z $y
     if {![isref $y]} {
         set-tree.root $tree $z
     } else {
         if {[tnode.key $z] < [tnode.key $y]} {
             set-tnode.left $y $z
         } else {
             set-tnode.right $y $z
         }
     }
 }
 
 proc bts-inorder-walk {tree} {
     bts-inorder-walk-tnode [tree.root $tree]
     puts {}
 }
 
 proc bts-inorder-walk-tnode {root} {
     if {[isref $root]} {
         bts-inorder-walk-tnode [tnode.left $root]
         puts -nonewline "[tnode.key $root] "
         bts-inorder-walk-tnode [tnode.right $root]
     }
 }
 
 # Example usage. Add 100 random numbers inside the search tree
 set tree [make-tree]
 for {set i 0} {$i < 10} {incr i} {
     lappend list [expr {int(rand()*100)}]
 }
 foreach t $list {
     set newnode [set-tnode.key [make-tnode] $t]
     bst-insert $tree $newnode
 }
 
 # Now print all the keys in-order
 puts -nonewline "Tree inorder-walk output: "
 bts-inorder-walk $tree
 
 ################################################################################
 # Linked list example
 
 struct llnode val next
 struct llist head tail
 
 proc llist-node-add {head val} {
         set n [make-llnode]
         set-llnode.val $n $val
         set-llnode.next $n $head
         return $n
 }
 
 proc llist-node-print head {
         if {[isref $head]} {
                  puts -nonewline "[llnode.val $head] "
                 # TCL isn't tail recursive, but it's just for fun
                 llist-node-print [llnode.next $head]
         } else {
                 puts {}
         }
 }
 
 # Now two simple wrappers to abstract a linked list from
 # it's nodes. Just a structure that holds the head of the list.
 
 # Add elements into the list. Returns a pointer to the inserted node.
 proc llist-add {listref args} {
     foreach v $args {
         set-llist.head $listref [llist-node-add [llist.head $listref] $v]
     }
     if {![isref [llist.tail $listref]]} {
         set-llist.tail $listref [llist.head $listref]
     }
     return $listref
 }
 
 # Print the list content
 proc llist-print listref {
     llist-node-print [llist.head $listref]
 }
 
 # Link the list's tail with the head, making it circular.
 proc llist-make-circular listref {
     set head [llist.head $listref]
     set tail [llist.tail $listref]
     set-llnode.next $tail $head
 }
 
 # Example of linked list usage:
 # create a linked list:
 
 set mylist [make-llist]
 for {set i 0} {$i < 10} {incr i} {
     llist-add $mylist $i
 }
 
 # Print the list
 puts -nonewline "Linked list: "
 llist-print $mylist
 
 # Make it circular and walk the ring of objects 20 times
 puts -nonewline "Circular linked list: "
 llist-make-circular $mylist
 
 # Show it's actually a cycle
 set node [llist.head $mylist]
 for {set i 0} {$i < 20} {incr i} {
     if {![isref $node]} exit
     puts -nonewline "[llnode.val $node] "
     set node [llnode.next $node]
 }
 puts {}
 
 # Collect cyclical data structures now. (The collection is done
 # automatically when needed.)
 
 if 0 {
     set mylist {}
     collect-cycles
 }
 
 set node {}
 set mylist {}
 
 # Now collect cycles. This call is not really useful
 # as the collection is automatically performed when
 # needed.
 collect-cycles

I wonder if this could be used to provide garbage collection for some of the OO packages. To my knowledge, they do not GC objects that have been created, which is an area where Tcl definitely lags behind Python. This package demonstrates one way of adding the missing piece. - davidw

See also: linked lists, Tcl and LISP, complex data structures