Version 4 of Tcl references in Tcl

Updated 2003-08-06 14:41:52

SS: This is an implementation of references with Garbage Collection for TCL, witten in TCL itself. A C implementation will be for sure much faster, but I feel interesting the ability to do it directly using TCL: it offers a way to test different semanthics and syntax without to mess with the TCL source code.

The current implementation just don't care about namespaces because I don't know how they work on TCL and this morning I was too intersted in references to study they. Please if you have some time and you like the basic idea of this code, add the support to the code below.

Another severe problem is that currently the garbage collector needs to be called with an appropriate command. This should be moved in the '&' and '*' commmands (see the code above) when we run out of memory. My problem is that to have an idea of the amount of bytes currently allocated, I need to call string bytelength for every object that gets modified by reference. I suspect this can be slow, involving a lot of string conversions.

So the question is, there is a command to get an idea of the amount of bytes an object is using to live? I don't mean its length after a string conversion, but the actual memory the object is using. If this isn't viable, another simple solution is to to call the GcCycle command every N calls of * and &.

Comments are welcome.

SS-end

 #!/bin/sh
 # the next line restarts using tclsh \
 exec tclsh "$0" "$@"

 # TCL's references implemented in TCL
 #
 # this implementation is not intented to be production-quality
 # but aims to be a working example of references with GC
 # implemented in pure TCL.
 #
 # COPYRIGHT AND PERMISSION NOTICE
 # 
 # Copyright (c) 2003 Salvatore Sanfilippo
 # 
 # All rights reserved.
 # 
 # Permission is hereby granted, free of charge, to any person obtaining a
 # copy of this software and associated documentation files (the
 # "Software"), to deal in the Software without restriction, including
 # without limitation the rights to use, copy, modify, merge, publish,
 # distribute, and/or sell copies of the Software, and to permit persons
 # to whom the Software is furnished to do so, provided that the above
 # copyright notice(s) and this permission notice appear in all copies of
 # the Software and that both the above copyright notice(s) and this
 # permission notice appear in supporting documentation.
 # 
 # THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS
 # OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF
 # MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT
 # OF THIRD PARTY RIGHTS. IN NO EVENT SHALL THE COPYRIGHT HOLDER OR
 # HOLDERS INCLUDED IN THIS NOTICE BE LIABLE FOR ANY CLAIM, OR ANY SPECIAL
 # INDIRECT OR CONSEQUENTIAL DAMAGES, OR ANY DAMAGES WHATSOEVER RESULTING
 # FROM LOSS OF USE, DATA OR PROFITS, WHETHER IN AN ACTION OF CONTRACT,
 # NEGLIGENCE OR OTHER TORTIOUS ACTION, ARISING OUT OF OR IN CONNECTION
 # WITH THE USE OR PERFORMANCE OF THIS SOFTWARE.
 # 
 # Except as contained in this notice, the name of a copyright holder
 # shall not be used in advertising or otherwise to promote the sale, use
 # or other dealings in this Software without prior written authorization
 # of the copyright holder.
 #
 # TODO:
 # * __ref_id may overflow with long running programs (it's just possible
 #   to use two counters instead of one).
 # * Bad pointers detection
 # * Namespace instead of globals for __ref_* vars.
 # * Namespace interaction with 'info globals' and 'locals'. How it works?
 # * GcCycle should be called automatically when we are out of memory,
 #   btw this requires to call 'string bytelength' or something like
 #   for every * and & command: I think this may slow down a lot the code
 #   because it requires a string convertion. For now I just left this
 #   up to the programmer.
 # * References introspection
 #
 # FUTURE WORK:
 # * Interface that allows to register commands called before an
 #   object gets collected.
 # * What about pointers to arrays? How to get/set they by reference?
 # * To provide a 'free' command to free objects by pointer without to
 #   wait for the GC. This makes possible to go without the GC
 #   in speed/memory critical code.
 # * When the API will be stable, implement this in C as an extension.
 # * To implement data structures like red/black trees and double linked
 #   lists using references.

 #
 # Initialization
 #
 set __ref_id 0
 set __ref_debug 1

 #
 # Low-level commands to deal with references
 #
 proc & obj {
         global __ref_id __ref_tab

         set id "TclRef:$__ref_id:"
         incr __ref_id

         set __ref_tab($id) $obj
         return $id
 }

 proc * args {
         global __ref_tab

         set argc [llength $args]
         if {$argc == 1} {
                 return $__ref_tab([lindex $args 0])
         } elseif {$argc == 2} {
                 set __ref_tab([lindex $args 0]) [lindex $args 1]
                 return [lindex $args 0]
         }
 }

 #
 # Garbage collection
 #
 proc GcCycle {} {
         global __ref_visited

         array set __ref_visited {}
         array unset __ref_visited

         # Find the roots of the graph (globals and locals of all the levels)
         set roots {}
         set levels [info level]
         set varnames [info globals]

         # Remove global variables that must not be collected
         foreach x {__ref_tab __ref_visited} {
                 set index [lsearch -exact $varnames $x]
                 if {$index != -1} {
                         set varnames [lreplace $varnames $index $index]
                 }
         }

         # Append the global's content to the root list
         # We need to take care of arrays, but we just don't
         # care about nested lists: our CG discovers references
         # by pattern matching in the whole 'roots' string.
         foreach v $varnames {
                 global $v
                 if {[array exists $v]} {
                         lappend roots [array get $v]
                 } else {
                         lappend roots [set $v]
                 }
         }

         # Do the same for locals in all the levels
         for {set j 1} {$j < $levels} {incr j} {
                 set varnames [uplevel $j info locals]
                 foreach v $varnames {
                         if {[uplevel $j array exists $v]} {
                                 lappend roots [uplevel $j array get $v]
                         } else {
                                 lappend roots [uplevel $j set $v]
                         }
                 }
         }


         # Scan the whole graph collecting all the alive references
         # (GcScan's side effect is the population of the __ref_visited array)
         GcScan $roots
         #puts [array names __ref_visited]
         GcSweep
 }

 # This is a recursive function that walks all reachable strings
 # searching for pointers. It uses a global table to be sure
 # the same reference is not entered multiple times.
 proc GcScan roots {
         global __ref_visited __ref_debug

         set refs [regexp -inline -all {TclRef:[0-9]+:} $roots]
         foreach r $refs {
                 if {[llength [array names __ref_visited $r]] == 0} {
                         if {$__ref_debug} {
                                 puts "GC visiting $r"
                         }
                         set __ref_visited($r) 1
                         GcScan [* $r]
                 }
         }
 }

 # The sweep function runs the __ref_tab array unsetting all the
 # entries not referenced inside the __ref_visited array.
 proc GcSweep {} {
         global __ref_visited __ref_tab

         foreach r [array names __ref_tab] {
                 if {[llength [array names __ref_visited $r]] == 0} {
                         array unset __ref_tab $r
                         puts "GC collecting $r"
                 }
         }
 }

 #
 # The higher level layer, references-aware alist-like structures.
 #
 # struct name slot1 slot2 slot3... slotN
 #
 # create a set of commands able to deal with the described
 # strcuture by reference. Created Commands:
 #
 # NewName
 #   creates a new structure of type 'name' and return its reference
 #
 # For every slot, the struct command creates two additional commands:
 #
 # NameSetSlot structref slotname value
 #   sets the slot of name 'slotname' to 'value', in the structure 'structref'
 #   
 # NameGetSlot structref slotname
 #   return the value fo the slot with name 'slotname', in struct 'structref'
 #
 proc struct {name args} {
         set idx 1
         foreach slot $args {
                 proc [join "$name $slot" Set] {ref val} "
                         set t \[* \$ref\]
                         set t \[lreplace \$t $idx $idx \$val\]
  • \$ref \$t
                 "
                 proc [join "$name $slot" Get] {ref} "
                         set t \[* \$ref\]
                         return \[lindex \$t $idx]
                 "
                 lappend s $slot {}
                 incr idx 2
         }
         proc New$name {} "return \[& \"$s\"\]"
 }

 #
 # Test code
 #

 struct Node Val Next

 proc LListAdd {head val} {
         set n [NewNode]
         NodeSetVal $n $val
         NodeSetNext $n $head
         return $n
 }

 proc LListPrint head {
         if {[string length $head] != 0} {
                  puts -nonewline "[NodeGetVal $head] "
                 # TCL isn't tail recursive, but it's just for fun
                 LListPrint [NodeGetNext $head]
         } else {
                 puts {}
         }
 }

 set head {}
 set head [LListAdd $head "foo"]
 set head [LListAdd $head "bar"]
 set head [LListAdd $head "ciao"]
 for {set j 0} {$j < 10} {incr j} {
         set head [LListAdd $head $j]
 }
 LListPrint $head

 set newlist {}
 set newlist [LListAdd $newlist xxx]
 NodeSetNext $newlist $head

 LListPrint $newlist
 set head {}
 set newlist {}
 GcCycle

 #
 # Test it against cyclical structures
 #

 set a [NewNode]
 set b [NewNode]
 NodeSetNext $a $b
 NodeSetNext $b $a
 NodeSetVal $a A
 NodeSetVal $b B

 # Prove it's actually a cycle
 # this will output ABABABA...

 for {set j 0} {$j < 20} {incr j} {
         puts -nonewline [NodeGetVal $a]
         set a [NodeGetNext $a]
 }
 puts {}

 # Try to collect it
 set a {}
 set b {}
 GcCycle

See also: linked lists, Tcl and LISP, complex data structures