Version 10 of C-like structs and file scope

Updated 2005-03-20 20:13:14

Bryan Oakley 11-Mar-2005

In our environment we have more C programmers than Tcl programmers, and often the C programmers do code reviews on the Tcl code. We also believe that good data structures make code that is easier to understand.

A common problem in application development is the proliferation of global variables and procedures that may or may not be related. For example, an application may have a working set of global variables, but they may not always be initialized together, and the interaction between various globals may not be obvious.

To limit this problem and to make the code more self-documenting as well as more familiar to a wider pool of programmers we have implemented C-like structs and file scoping.

A copy of this work in progress can be downloaded here:

http://www.bitmover.com/bksyntax.tgz

At present this is all done in Tcl but we are looking into pushing some of this work to the C level to more tighly integrate it with the core.

Even though the syntax presented here is decidedly "un-tclish" in some respect, in practice the usage is fairly intuitive (if you briefly leave some of your Tcl intuition behind).

We would like feedback from the community as to whether some of the ideas presented here would be worth putting into the core.


Structs


The structs we've implemented look similar to C-style structs with the added ability to specify initial values. For example, imagine a typical text editor. The state of the editor includes the name of the file being edited, perhaps a flag saying whether or not the file has been modified, etc:

    struct Editor {
        string    filename=""   // name of file being edited
        int       needsSave=0   // set to 1 when changes need saving
    }

Each element of a struct has a type, and optionally an initial value and comment. A limitation in the existing implementation is that the initial value may not span more than one physical line. Comments are C-style comments that extend from the last occurance of "//" to the end of the line.

If no initial value is provided the empty string is used. The type is purely for documentation purposes since everything's a string in Tcl. No attempt at type checking is done except for the optional initial value. It is possible, for example, to assign a string to a struct element defined as int.

The following types are supported: array. float, index, int, list, string.

To create an instance of the struct we have the new command. It takes an optional name which is useful to create a struct with the same name as a widget:

    # use an automatic name
    set editor [new Editor]

    # use a custom name:
    frame .editor ...
    new Editor .editor

To make using structs more convenient we have implemented a pointer-to operator. This allows one to access an element of a struct in a convenient shorthand:

    set editor->filename "example.txt"
    puts "the filename is $editor->filename"

Because "->" is not normally part of a variable name and because we don't want to resort to strange quoting tricks, we've created a code preprocessor that transforms code with pointer-to operators into standard Tcl syntax. We are investigating how to implement the pointer-to operator at the C level within the core.

The transformation of the code is supplied by the proc struct_pointerto. It may be used, for example, to create a proc that knows about structs:

    proc init {filename} [struct_pointerto {
        set editor [new Editor]
        set editor->filename $filename
        return $editor
    }]

It is rather cumbersome to declare procs in this way. Ultimately we're hoping to implement pointer-to at the C level so we don't have to do this transformation. In the interim, we've hidden this functionality inside another C-ism, namely, file scoping. This will be covered in more detail later.

Additional features of structs

An important feature of structs is that they always exist in the global namespace. This was chosen to mimic the nature of widgets (e.g. from within any proc one may do something like '.foo configure ...' without having to declare '.foo' as global).

Another important feature is that a reference to a struct doesn't require an extra leading dollar sign. Internally, the left side of the pointer-to operation is automatically dereferenced, unless it begins with ".".

This allows a fairly natural coding style where structure references look like normal variable references, for example:

    proc setFile {editor} {
        set editor->filename "/tmp/whatever"
    }
    set editor [new Editor]
    setFile $editor

In the above example, even though 'editor' is a proper tcl variable in its own right within the setFile proc, no dollar sign is required in front of editor->filename. The code preprocessor takes care of the details.

This special dereferencing won't happen if the struct reference begins with ".". This allows for hard-coded references to structs so that one can easily associate structs with widgets, yet be able to pass structs around by name with the same convenient syntax:

    struct SuperWidget {
        list children  // children of the superwidget
    }
    proc demo {} [struct_pointerto {
        # a handy use of structs is to associate metadata with
        # a widget...
        frame .main
        set w [new SuperWidget .main]

        # use hard-coded struct reference:
        set .main->children {.f.one .f.two .f.three}
        puts $.main->children

        # use reference to the struct:
        set w->children {.f.a .f.b .f.c}
        puts $w->children
    }]

Finally, by default structs cannot be redefined since it's not good programming to change the format of a data structure on the fly. For interactive development we allow structs to be redefined by appending the word "recreate" in a struct definition:

    struct SuperWidget {
        list children // children of the superwidget
        int  index    // index to "current" child
    } recreate

Scope


Namespaces can often be confusing to people who aren't well versed in Tcl. File scope, on the other hand, is likely familiar to anyone who has coded in C. We have decided to use an emulation of file scoping rather than namespaces. Because we can control the use of scopes, any code that is used inside of scope is automatically processed by struct_pointerto, so directly calling struct_pointerto isn't necessary. Proc definitions within a scope appear as normal proc definitions, albeit with enhanced abilities.

A scope is created with the scope command. Under the hood it uses namespaces and thus has all the power of namespaces. Scopes differ from namespaces in two significant ways, however.

One way scopes differ from namespaces is that scopes do not nest. All scopes live directly beneath the global scope. If you create a scope named Foo, for example, it will be created as a namespace named ::Foo. If the code inside foo creates a scope named Bar, that namespace will actually be ::Bar rather than ::Foo::Bar.

The second significant difference is that in scopes, unlike namespaces, procs are automatically exported to the global namespace. We've added a 'private' command to create private commands and variables, which is discussed in the following section of this document.

Here is a simple example, using the same file editor analogy from before:

    scope Editor.tcl {

        struct Editor {
            string  filename=""     // currently open file
            int     needsSave=0     // if 1, file needs saving
        }

        proc init {filename} {
            set editor [new Editor]
            set editor->filename $filename
            return $editor
        }

    } ;# end scope Editor.tcl

    init [lindex $argv 0]

Private Vars, Private Procs

It is sometimes convenient to make certain procs available only within the current scope. This can be done with the private command. We've extended private procs to have types, and we do some minimal type checking on return values by redefining the implementation of "return" within a scope.

    private void main {} {
        createWidgets ...
    }
    private void createWidgets {} {
        ...
    }

Because scopes and private procedures are implemented with namespaces, private procedures aren't truly private. This is a useful feature in that scopes may create helper procs for bindings yet still be able to call those helper procs from the global scope.

To reference a private proc from another scope you can use the private command with a single argument to get a fully qualified reference to the private proc. You may also use the command scope_code similar to how namespace code works.

The following example shows how it's possible to use the same name in two different scopes, yet reference those procedures at the global level:

    source bksyntax.tcl
    scope foo {
        private void helper {} {
            puts "this is the foo helper proc"
        }
        button .foo -text "Foo" -command [private helper]
        pack .foo
    }
    scope bar {
        private void helper {} {
            puts "this is the bar helper proc"
        }
        button .bar -text "Bar" -command [private helper]
        pack .bar
    }

We also support private variables. Private variables have the feature of automatically being available to all procs within the same scope, much like file-local variables in C. It is convenient to create a private variable that is a pointer to a struct as a way of giving all procedures in a file access to a working set of data.

In the following example, the variable 'editor' is a private variable that contains a pointer to a struct. The struct itself, however, is a global variable. That is why editor->filename can be used with -textvariable; the actual value of editor is resolved at the time the widget is created via the preprocessor, and resolves to a fully qualified global variable.

    source bksyntax.tcl
    scope Example.tcl {

        struct Editor {
            string    filename=""   // name of file being edited
            int       needsSave=0   // set to 1 when changes have been made
        }

        private var editor = [new Editor]

        private void init {filename} {
            set editor->filename $filename
        }

        private void main {} {
            global argv
            if {[lindex $argv] > 0} {
                init [lindex $argv 0]
            } else {
                puts stderr "usage: example filename"
                exit 1
            }
            makeWidgets
        }

        private void makeWidgets {} {
            text .t
            label .header -textvariable editor->filename
            pack .header -side top -fill x
            pack .t -side top -fill both -expand y
            if {$editor->filename ne ""} {
                set file [open $editor->filename r]
                set data [read $file]
                close $file
                .t insert end $data
            }
        }

        main
    }

Example: a simple megawidget


One use of structs is for managing instance data for megawidgets. In the following example, the main program creates two instances of a megawidget used for picking files. Each megawidget uses a struct to keep track of the last directory and file chosen by the file select dialog.

    source bksyntax.tcl
    scope main {
        proc main {} {
            fileselect .fs1 -title "File 1:"
            fileselect .fs2 -title "File 2:"
            pack .fs1 .fs2 -side top -fill x
        }
    }

    scope FileSelect {

        struct FileSelect {
            string  title="Choose file: " 
            string  lastdir         // dir of last chosen file
            string  lastfile        // name of last chosen file
            string  textvariable    // associated with entry widget
        }

        option add *FileSelect.BorderWidth 2 widgetDefault
        option add *FileSelect.Relief groove widgetDefault

        proc fileselect {w args} {
            frame $w -class FileSelect
            new FileSelect $w

            if {[lindex $args 0] eq "-title"} {
                set w->title [lindex $args 1]
            }

            label $w.label -text $w->title
            entry $w.entry \
                -textvariable w->textvariable \
                -borderwidth 1
            button $w.pick \
                -text "choose..." \
                -command [scope_code selectFile $w] \
                -borderwidth 1
            pack $w.label -side left -fill y -expand n
            pack $w.pick -side right -fill y -expand n -padx 1 -pady 1
            pack $w.entry -side left -fill both -expand y
            return $w
        }

        private void selectFile {w} {
            if {$w->lastdir eq ""} {
                set w->lastdir [pwd]
                set w->lastfile ""
            }

            set file [tk_getOpenFile \
                          -initialdir $w->lastdir \
                          -initialfile $w->lastfile \
                          -title $w->title]

            if {$file ne ""} {
                set w->lastdir [file dirname $file]
                set w->lastfile [file tail $file]
                set w->textvariable $file
            }
        }
    }
    main

Brian Griffin 11-Mar-2005

This is interesting work.

I don't understand the need for "types". When I write code, I'll document how a variable is intended to be used by the way I initialize it, (e.g. set l [list]; set s ""; set i 0) So, why not:

        struct Editor {
                filename ""        ;# name of file being edited
                needsSave 0        ;# set to 1 when changes need saving
        }

I find this just as self documented without resorting to psuedo types.

Bryan Oakley responds: it's simply a matter of preference. We find including the type helps make the code more obvious. Brian Griffin replies: But that's my point. "", [list], and 0 are just as obvious to me and is more in keeping with the Tao of Tcl, then int needsSave.

IMHO, using C syntax, even a little bit, is a bad idea. The more Tcl looks like C, the more you raise the expectation of C style parsing as well. This is the one area that screws with programmers minds the most. Because Tcl looks similar to C, the false conclusion is reached that it's parsed like C which couldn't be farther from the truth. Tcl is just not C, so stop trying to turn it into C. Any programmer worth their salt, can be multi-lingual; you don't need to baby them with fake syntax.

So, if we kept the concepts but changed the syntax, would you be interested in seeing this in the core? Brian Griffin asks: What did you have in mind?

I like the Scope and struct member reference. I wonder if the -> notation could be used for accessing keys in a dict, so that:

    set dict->key "new value"

would be equivalent to:

    dict set dict key "new value"

Bryan Oakley: I don't think this would have the same impact as the -> operator mentioned in this package. Part of the power of the -> operator as we've implemented it is that the LHS doesn't have to be declared global, much like widget paths don't have to be declared global. Brian Griffin says: You're right, but that's a somewhat independent feature from the notion of a "struct". It's really cool that a struct can have this feature, but why not a scalar, list, or array also? It would be nice if I could declare any kind of variable in a namespace and have it accessable from any and all procs in the namespace without having to redeclare it over and over. That's really what you're trying to get at, no?

Larry McVoy: I agree with the variable feature that Brian G wants. There isn't any reason you shouldn't be able to declare variables at the "global" scope within a scope and have them visible everywhere. That would get rid of a lot of redundant typing of "global xyz" and it prevents the global namespace from being polluted with pseudo globals.

In general, though, the idea is that stuff would all be in a struct. I get that you might want the feature and it's fine with me, but I also think that when you start using the struct idea and realize you can have one of these for each instance of your widget or whatever, you'll find dramatically less need for any other variables being passed around within the scope. I can easily believe that you can construct a case where you want that though so why not? The goal is easier to write and read code and I think Brian G's request is in harmony with that.


Category Suggestions | Category Data Structure