Transparent OO for Tcl

Summary

NEM 2004-05-26: Here's a little experiment I've been doing with unifying Tcl's traditional first-class transparent values (like string, lists and dicts), with the handle-based OO types. I call this package TOOT.

See Also

TOOT with {*}
WHD: A slightly different take
TOOT revisited
NEM: And another one
Yet another object system

Description

Using class-is-command-name prefix, and auto-expansion of leading word, it is possible to achieve quite an amazing degree of unification between the two ideas. This allows for use of two different syntaxes:

  • Normal Tcl syntax, using ensembles, e.g., [List length $item]
  • OO syntax with polymorphic method dispatch, e.g., [$item length]

Due to supporting both syntaxes, and having a transparent string representation, you can have the advantages of OO (encapsulation, polymorphic method dispatch, inheritance etc) and the advantages of Tcl's built-in transparent types (natural serialisation, everything is a string, "type" defined by usage, etc). Note, that you can still define handle types using this scheme (for instance, classes themselves are a handle type).

Note that another advantage of the toot package is that (in most cases) "instances" of a class are real, first-class transparent values, and so benefit from Tcl's automatic garbage collection.

You can do also explicit converting of type (i.e., casting) using the syntax [$type | $value], e.g.:

set l [::List create a b c d]
set d [::Dict | $l]

Type-checking is done in a lazy manner at present - the cast will always succeed, but subsequent operations may fail if the string rep doesn't match what that type requires. Note that in most uses, you shouldn't have to do explicit type conversions. Also note, that most operations have no side-effects: values are immutable, making this an object-functional extension by default. You can create mutable values by creating a handle type (just make the string representation a name, by overriding the "create" method).

This has some interesting connections to Feather's interfaces, which I haven't fully explored yet, but will do, to see what the implications are. I think they could work quite well together.

Anyway, here's some simple first-stab at the code. It's pretty poor as a framework currently, and inefficient (most uses involve multiple command lookups, and relies on an ::unknown handler for auto-expansion of leading word). I intend to remedy these defects, and build this up into a powerful OO system over time. This code is just here as a proof-of-concept to show the basic idea.

This idea came to me at about 4 a. m. last night - I went to bed thinking about Tcl's value semantics, and the prefix/auto-expand approach taken by TIP 194 [L1 ] (anonymous procs as apply command prefix) and just suddenly woke up in the middle of the night with a "Eureka!" moment. I'd be very interested in feedback, as I think this could be quite a powerful new approach to building code in Tcl.

TODO:

  • Rewrite to make handling packing-unpacking of values simpler.
  • Define everything a bit better, so most functionality is in an object class at the base of the class hierachy.
  • Implement inheritance.
  • Steal as many interesting features from XOTcl as possible :-)
  • Try and make it rely on less tricks, to improve performance (may need Tcl 9 with new features).

Please remember this is only a proof-of-concept. I know there are serious limitations with the implementation given below, but the idea basically works, I think.

# toot.tcl --
#
#   TOOT - Transparent OO for Tcl. This package implements a simple
#   object oriented programming environment, where all values are fully
#   transparent. This is an experiment in how well the worlds of Tcl's
#   transparent values, and OO design, traditionally implemented using handle
#   types, can be combined. The design presented here is a functional view of
#   objects, in that procedures/methods which update a value return the new
#   value. All values are represented as a tagged list of length three:
#
#   {$type | $value}
#
#   The "|" is some sugar, which also helps to disambiguate method calls from
#   construction cases. I was going to use ":", but that interacts badly with
#   Tcl's namespace separator.
#
#   Classes in the system are namespaces which form ensembles. You can use
#   either an OO syntax or a functional syntax when accessing values. When
#   using the OO syntax [$value method args], type is preserved. When using
#   the functional syntax [type method $value $args], the value will
#   automatically be converted to the correct type, in the usual Tcl way (the
#   original value will not change type however). The type command also serves
#   as a type-conversion copy-constructor, which means that you can explicitly
#   create a copy of a value and change it's type. E.g. to get a list object
#   from a dict object, you could do:
#
#   set l [List | $dict]
#
#   Most of the time, this will be unnecessary. This also has the added
#   benefit of transparent serialisation - values can be written to a file and
#   read in at a later date, and will just work (assuming the correct type is
#   still available). Requires auto-expansion of leading word to work
#   correctly, and currently achieves this through a global unknown handler.
#   If TIP 181 passes, then this could be done on a per-namespace basis.
#   Auto-expansion as a core feature for Tcl-9 would rock!
#
# Copyright (c) 2004, Neil Madden.
# License: Tcl/BSD style.

package provide toot 0.1

namespace eval toot {
    namespace export class extract method
}

# Helper for extracting real value
proc toot::extract {value} {
    return [lindex $value 2]
}

# Helper for creating procs which do the right thing
proc toot::method {name arglist body} {
    set arglist [linsert $arglist 0 self]
    uplevel 1 [list proc $name $arglist $body]
}

# Class is itself a class.
namespace eval toot::class { }

proc toot::class {cmd args} {
    uplevel 1 [linsert $args 0 ::toot::class::$cmd]
}

proc toot::class::| {self args} {
    # Copy constructor/instance command
    if {[llength $args] == 0} {
        # Copy constructor
        return [list ::toot::class | [::toot::extract $self]]
    } else {
        # Redispatch - assume uplevel 1 is within correct namespace
        uplevel 1 [linsert $args 1 $self]
    }
}

# Real constructor
proc toot::class::create {name body} {
    # Fully qualify the name
    if {![string match ::* $name]} {
        set ns [uplevel 1 namespace current]
        if {$ns eq "::"} { set ns "" }
        set name ${ns}::$name
    }
    # Create the namespace
    uplevel 1 [list namespace eval $name $body]
    # Create dispatch method
    set body [string map [list %NAME% $name] {
        if {[llength $args] == 0} {
            return [list %NAME% | [::toot::extract $self]]
        } else {
            uplevel 1 [linsert $args 1 [list %NAME% | $self]]
        }
    }]
    uplevel 1 [list proc ${name}::| {self args} $body]

    # Default constructor
    uplevel 1 [list proc ${name}::create {args} [string map [list %N $name] {
        return [list %N | $args]
    }]]
    
    # Create the type-method
    uplevel 1 [list proc $name {args} [string map [list %NAME% $name] {
        uplevel 1 [list namespace eval %NAME% $args]
    }]]

    return [list ::toot::class | [list $name]]
}

# Install unknown handler for auto-expand
if {[llength [info commands ::_toot_unknown]] == 0} {
    rename ::unknown ::_toot_unknown
    proc unknown {cmd args} {
        if {[llength $cmd] > 1} {
            uplevel 1 $cmd $args
        } else {
            uplevel 1 [linsert $args 0 ::_toot_unknown $cmd]
        }
    }
}

Some test code:

# Test stuff
proc test {} {
    namespace import toot::*
    class create List {
        method index {args} {
            return [eval [linsert $args 0 lindex \
                [extract $self] $args]]
        }

        method length {} {
            return [llength [extract $self]]
        }
    
        method append {args} {
            set s [extract $self]
            return [list ::List | [eval lappend s $args]]
        }

        method loop {varname body} {
            uplevel 1 [list ::foreach $varname [extract $self] $body]
        }
    }
    class create Dict {
        method get {key} {
            return [dict get [extract $self] $key]
        }
        method set {args} {
            return [eval ::Dict create [eval [linsert $args 0 \
                dict replace [extract $self]]]]
        }
    }
    # Now some tests
    set l [List create a b c d e f]
    puts "l = $l"
    # Loop
    $l loop item {
        puts "Item = $item"
    }
    # Use dict method
    puts "a = [Dict get $l a]"
    # Convert to dict
    set d [Dict | $l]
    # Set a key
    set d [$d set a "Hello!"]
    # Loop as list
    List loop $d item {
        puts "item = $item"
    }
    puts "d = $d"
    puts "l = $l"
}

Discussion

jcw: 2004-05-26: Whoa, Neil, this is fascinating. I'll try to wrap my mind around it, especially in the context of a generic relational algebra system I've been working on, early stage is at [L2 ]. (eurein, Gr - to find!)

AM 2004-05-27: Just a few remarks:

  • How would I define and set "fields" for an object?
  • It seems to me that programming a specific method is a bit involved

Otherwise, well, I am simply amazed :) Together with GPS's object system this is really minimalistic!

NEM 2004-05-27: There are a couple of ways of setting "fields". The easiest way, if you want mutable objects, is to simply revert to a handle type:

# a reference handle type
class create Reference {

    variable ref

    method <- {val} {
        variable ref
        set ref([extract $self]) $val
    }

    method -> {} {
         variable ref
         return $ref([extract $self])
    }
}
set a [Reference create foo]
$a <- [List create a b c d e]
$a <- [[$a ->] append f g h]
puts [$a ->]

This example for instance, uses a per-class array and the data part of each object is used as an index into that array. You could easily make each instance a new namespace, as well. If you want fields in a fully transparent object, then you have to start programming in a more functional way:

class create Person {
    method setName {name} {
        set d [dict replace [extract $self] name $name]
        return [list ::Person | $d]
    }
    method getName {} {
        return [dict get [extract $self] name]
    }
    method setAge {age} {
        set d [dict replace [extract $self] age $age]
        return [list ::Person | $d]
    }
    method getAge {} {
        return [dict get [extract $self] age]
    }
}
set neil [Person create name "Neil Madden" age 23]
set neil [$neil setAge 82]
puts "[$neil getName] is [$neil getAge]!"

As you can see, in order for this style to work, any setter methods have to return a new instance - side-effect free objects.

As for the other point, yes programming methods is a little bit involved now, due to the need to pack and unpack the "actual" value from the wrapper. I'm trying to come up with a simpler way of doing this, where things happen automatically.


RS: This is very amazing. I put my tiny reinvention of this wheel at Toot as toot can :)

jcw 2004-05-28: Not to muddle the waters, but if I may summarize your (NEM) and RS's approaches, and re-phrase a bit:

  • every object is by value and carries its class as well as its state: {class | {state...}}
  • a call relies on expansion, so "$o meth args..." will be evaluated as "class | {state...} meth args..."
  • a reshuffle is needed, to put the method call in the right spot: "class meth {state...} args..."
  • ensembles as namespaces would now be useful, turning it into "::class::meth {state...} args ..."
  • to make the story complete, methods are defined as: "proc ::class::meth {self args} {...}"

I love it :)

Would it help to define a "|" that swaps the places of its args 1 and 2? Combined with a slightly modified convention of representing objects as "| class {state...}"?

Comments on mutable state:

  • agree with NEM that a handle is the way to go, name of an array for example
  • it would be great to see refcounts work in this context, i.e., array be unset when last handle ref to it goes away
  • perhaps it can be done: a special list variant, which knows the 2nd item is a handle
  • copies of the list bump the refcount
  • stringify when dual rep is lost causes a copy of the handle contents to be inserted, i.e. the name to be dropped
  • re-use later creates a new handle, puts contents in it, and proceeds with a handle again
  • this has the effect that when dual rep is lost, state is copied, mutability is lost
  • but as long as it isn't it's a real handle, with shared mutable content across all copies of the same Tcl_Obj
  • not sure this makes sense or works, just wanted to dump it here...

NEM: Good summary. Regarding a "|" command, consider the two cases involved:

::List length $obj ;# -> ::List length {::List | {state}}
$obj length        ;# -> ::List | {state} length  -> ::List length {::List | {state}} (i.e. same as above)

With, the "|" command, this would become:

::List length $obj ;# -> ::List length {| ::List {state}}
$obj length        ;# -> | ::List {state} length  -> ::List length {| ::List {state}}

so, it could work, with "|" defined as:

proc ::| {type state method args} {
    uplevel 1 [linsert $args 0 ${type}::$method [list ::| $type $state]]
}

Yes, definitely interesting. The reason for repacking the object is so that method always receive "self" in the same format, and also I was thinking of allowing passing messages to self, which this allows:

method foo {a b} {
    $self bar $a $b
}

Needs more thought though, and some clever wrapping could be done to do conversions transparently, I think. One thing which is a pain with the current method, is that each nested dispatch causes an increment on the reference count of the state, so it's pretty much guaranteed to be shared when it reaches the method, so copy-on-write is inevitable. I can get round some of this using K, but it will obfuscate the code somewhat, and can't be completely hidden. Regarding what you wrote about handles and mutable state... I need to think about it a lot more before forming an opinion. Seems possibly fragile in the face of [evil]... I mean [eval]! ;)

jcw: Uh, oh - more to think about! Wild idea - would a new "interp alias" be useful which can inject/reshuffle/drop args?

CMcC: The infix operator | worries me too, it helps conceptually to reinforce the idea that {class | content} is a special form, but in the end it's got to be inefficient to carry around '|' as essentially dead weight with no real tcl-interpretation.

Next Q: is '|', properly speaking, in the re-interpreted form jcw suggests, not itself some kind of class? If it can be usefully construed as such, it would compel me to think that you're onto something unarguably valid (because the beautiful answer is usually the correct answer qv aesthetic fallacy :)

RS: I understand the "|" as a guard that expresses "this is a toot value". You're right that it need not be in infix position, although this is kind of beautiful in my eyes. One could alternatively use

{| class {the values}}

which would simplify things:

$x get member

gets substituted as:

{| class {the values}} get member

gets auto-expanded first word as:

| class {the values} get member

At this point, some reordering is needed to dispatch to

toot::class::get {the values} member

So the function of prefix "|" could also be called ::toot::dispatch ...

Just that "|" looks nicer. On the other hand, it consumes a possible command name others might want to use, e.g., as prefix for bitwise OR. By having it infix, only the class name itself (under control of the user) is used as command name, and "|" is a subcommand of that (as of any toot class):

{class | {the values}} get member

is auto-expanded to:

class | {the values} get member

where class is an alias resolving as

::toot::dispatch class | {the values} get member

which ends up calling

::toot::class::get {the values} member

Does that make sense?

CMcC: Yes, it makes more sense, thank you. | isn't merely syntactic sugar, but is a method on everything that is a TOOT class, and this dispatch is managed by means of a clever use of alias.

NEM: Indeed, the | method (it could be anything, and I'm open to better suggestions) is absolutely essential to the technique, to ensure that both means of calling a method can be used (::List length $obj vs $obj length). You can rearrange things to make that a prefix command, instead of a message - in which case you can make things a bit more efficient, but then you can't overload it on a per-class basis (not sure if you'd ever want to, but still).


NEM Some more thoughts on TOOT's view of values and types at Interpreting TOOT.


CMcC Another naive/lazy question ... what if TOOT's 'state' component was always interpreted as a namespace name? Would that not provide for mutable state? This consideration arose from DKF's comment on the chat that 'everything is a string, or has a name (which is a string)'

NEM Yes, that would work as well. Indeed, you could have the state be an XOTcl object if you wanted. I'm not sure whether you gain anything in this case, though. At present, I'm playing around with using concepts from monadic programming (see [L3 ] for a "gentle" introduction to monads in Haskell) which (with a bit of syntactic sugar) can make dealing with functional/immutable objects feel like working with stateful objects. It works by threading an extra state argument through function invocations in a way which is transparent to the user of the function, and to the function itself. Not sure how useful this will be in practice, but it's a fun experiment, and monads are a crazy concept to try and get your head round. I'll create a wiki page when I'm done (this week sometime, hopefully).