Version 12 of Jim References

Updated 2005-03-18 21:52:57 by jcw

SS 18Mar2005:

This page is an introduction to the usage of the Jim references system.

Rationale

Why did Jim add references to the language? These are the main reasons, in decreasing order of importance:

  • Tcl already uses references. OOP systems often return names of objects, many libraries return handles. They are an idiomatic part of the language, to have support for references directly inside the language is useful.
  • Putting a reference system inside the language makes possible to implement Garbage Collection in a general way.
  • lambda is not very useful if it is not garbage collected.
  • Closures (and Jim closures) are almost useless without garbage collection.
  • The ability to build linked data structures sometimes can be very important in order to model real problems.

Design

Once I decided to add references to Jim, the question was in what form. My goal was to get a very simple semantic, don't violate in any way the Tcl semantic (i.e. be able to collect references mixed in strings after interpolation, shimmering, ...), allow for fast enough garbage collection, don't require extensions that need to take objects in private data structures to export objects in some way (and at the same time, making fragile references not an issue).

I'll not enter into implementation details, but in order to do so Jim is able to reach every object that is currently alive inside the script, even if this object is not reachable starting by the root nodes of the graph of the objects in a program. I.e. Jim is able to reach objects that are not reachable starting from globals and local variables. This feature is exposed at script level via the [debug objects] command.

How they work

References are just names with an associated string. Given the reference it's possible to obtain the associated string, or to set a new one in place of the old.

In order to create a new reference, the ref command is used. It has the following signature:

 ref string tag ?finalizer?

The string is the initial value associated to the reference to create. Can be any kind of string (or object, if you like more the implementation point of view). The tag will be explained in a moment. The finalizer is the name of a procedure that will be called just after the reference was destroyed with two arguments: the name of the reference destroyed, and its content.

An example of reference creation:

 . set r [ref "My little string" test]
 <reference.<test___>.00000000000000000001>

What is returned is indeed the name of the reference. References are all the same length, and can't contain spaces (this way it's possible to skip many kind of objects in the GC scan stage). As you can see the tag argument "test" appears in the reference name. And actually this is the only use for the tag command, it is a little description of the reference content. When debugging programs or in interactive testing it's handy to have a clue about what a reference may contain, or by its creator, just looking at his name. There is no semantical meaning in the tag, it's just a programmer help. The tag can't be longer than 7 characters in order to make references memory usage and creation/destroy time very little.

Now, the reference containing "My little string" is contained in the variable 'r'. Using getref it's possible to obtain the string a reference contains:

 . set r [ref "My little string" test]
 <reference.<test___>.00000000000000000002>
 . getref $r
 My little string

setref will instead set a new string for the specified reference:

 . setref $r "New String"
 New String
 . getref $r
 New String

That's it! Now... why is this special? You can already do it with a global array in Tcl. The unique feature of Jim references is indeed that they are garbage collected. Once a given reference is no longer found in any object, it is automatically destroyed and the finalizer, if there is one, called.

Garbage Collection and finalizers

Jim takes two counters:

  • The number of seconds elapsed since the last collection.
  • The number of references created since the last collection.

Every time a new reference is created, if at least one of these two counters is greater than a given value, a garbage collection cycle is performed. The user may also force garbage collection if working with huge strings, or just for testing, using the collect command.

Using the collect command we can try the effect of the finalizer.

First of all, a reference is created, setting as finalzer procedure myfinalizer:

 . set r [ref foobar test myfinalizer]
 <reference.<test___>.00000000000000000004>

We need to create this procedure of course. It just a one liner:

 . proc myfinalizer {x y} {puts "Finalizer called with $x,$y"}

Now, let's call collect to check what happens:

 . collect
 0

It returns zero and the finalizer is not called. Because there is nothing to collect, there is only one reference and it is referenced by the variable r. The collect command returns the number of references collected. So let's lose the reference we have by setting r to the null string, then try again to call collect:

 . set r {}
 . collect
 Finalizer called with <reference.<test___>.00000000000,foobar
 1

This time the reference was destroyed, and the finalizer called with its name, and the content, as arguments.

The implementation of lambda

Given a system like this, you can create a garbage collected lambda in Jim itself with a few lines of code:

 proc lambda {arglist args} {
     set name [ref {} function lambdaFinalizer]
     uplevel 1 [list proc $name $arglist {expand}$args]
     return $name
 }
 proc lambdaFinalizer {name val} {
     rename $name {}
 }

Indeed this is not an example! This is the real implementation of lambda in Jim. This is how it works: it creates a reference with an empty string as content, using as tag "function", and LambdaFinalizer as finalizer procedure. Then a normal Tcl procedure having with name the reference is created, and its name returned. Done!

Once every reference to the procedure is lost, the finalizer will be called, and it will just use rename to delete the procedure. Note that this is possible because Jim does not scan variables or command names for references.

Jim references will also be the base of the Jim Object System (JOS).

Open issues

Currently Jim lacks two commands that will be implemented in short time. This are [info references] that will return all the references currently defined, and [finalizer] that will be used to query/modify the finalizer of an already created reference.

Space reserved for discussion

That's all for now, but please feel free to ask questions if something is not clear.


This is a w e s o m e - I've pulled my hairs out trying to embed Vlerq, a project I'm working on, in Tcl. Found workable solutions in the end, but the above will make it sooo much simpler. Its requirements are simple: referring to objects not created in Tcl, for the sole purpose of holding on to them, passing them back as opaque handle, and cleaning up when the last reference to these objects goes away.

In Python, Ruby, and C++, this is trivial. In Tcl it's extremely awkward. In jim it might just work perfectly!

Thank you for creating a fascinating next-generation version of Tcl -jcw


Category Jim