Annotating words for better specifying procs in Tcl9

jima 2014-02-19

Novem is already in the works. Many proposals boil around it.


Larry Smith Just for the sheer halibut, I'd like to put an oar into this fish pond. I've advocated before for using () for expressions arguing that use of the null array is a pathological one anyway and that breaking it and forcing a code cleanup might be a good idea in and of itself. That never seemed popular (to say the least from various feedbacks). The annotation idea is intriguing, but it is really a sort of tcl preprocessing concept wherein {} becomes something that itself need be executed before we know how to treat the following word. This idea seems backwards to me. Rather than putting a flag like {*} everywhere we want to have a list parsed for us, it makes much more sense to me to have the proc itself just know darn well it needs a list and Do The Right Thing (tm).

Jim went to using argument syntax for this task: proc foo { bar &grill } {} where the & prefix indicated a reference passed by name, something previous handled with upvar Doing The Right Thing. In many respects, this discussion boils down to that.

Just as an idea off the top of my head with no great amount of consideration, consider this: what if we a) don't make a distinction between a proc and a variable. Either one can be executed simply by calling it, so this is a distinction that makes no difference. Suppose further, that b) rather than using named arguments, we access args as a list, i.e. every proc is assumed to have an implicit arglist of {args}. We access them by way of lindex, or perhaps something shorter like $(0)..$(n).

Now, further suppose c) that we don't automatically evaluate recursively. In other words, a call of "foo [set x 1]" results in a call to foo with an arglist of {"[" "set" "x" "1" "]"}. Now, when we parse the arglist, foo itself has the option of spotting the "[" arg and concluding it has been handed an embedded expression, so it calls something like "eval 0" (which would mean something like "evaluate my args array starting from index 0." Said eval occurs, returning a new args array {"1"} (with the side effect of setting x to 1), and foo then finishes its' arg processing and proceeds to do its' thing and returns a result. The result could be returned evaluated or unevaluated. return -eval $x; versus just return $x. This gives us the TRAC equivalent of returning to the neutral string (result does not require re-evaluation) or to the active string (a Trampoline).

In most cases, this behavior is indistinguishable from normal Tcl, but because it is getting the raw input data, the proc itself can decide how it wants its' argument and what that might mean. If it needs to support references (like incr, say) then it will see a word with a prepended "&" and do the appropriate upvar itself. Likely most of this would actually be handled by some sort of init statement which would take care of the "standards" but would return the remainder of the args that don't mean anything and the proc still has the opportunity to interpret them itself its' own way. This may also let us permanently evade quoting hell, since "{" "a" "b" "c" "}" could simply be parsed as returning argument(whatever)={a b c}.

In fact, it might even be possible that, rather than "parsing" line-by-line, we simply dispatch the next input word and let it access the yet-unevaluated stream of input tokens, doing the work until it sees an ending token like ";" or "fi" or "end if" -- meaning the proc can even interpret past an end-of-line and keep going. In practice, this would mean the bulk of the program might very well be dispatched recursively by init from the very first call.

I hope I've explained this well enough to get the idea across. But this seems more "tclsh" to me in the sense that the proc we are calling gets to decide how to continue processing incoming tokens and can return whatever it knows it is supposed to return in the surrounding context.

One other thought, this provides one way to handle infix notation, if "x = 5" is seen, "x" will be dispatched with {"=" "5"} args and can reset itself, or some variant of itself, such as x.actualvalue or somesuch. Rather than replicating x for every var, we might extract some sort of type information and instead of executing x, we instead execute type(x), thereby, in effect, re-inventing smalltalk.


I am thinking in what changes would have the most impact in how Tcl is perceived. Things like {*} indeed produced a visible change in Tcl as it modified how things looked for the coder at syntax level.

Not long ago (2012 in Tcl time is not that much), TIP 401 was proposed to introduce yet another such word prefix {#}, this one for comments, and with it the idea of a possible proliferation of {prefix} forms.

In my mind the introduction of these prefixes only make sense if the goal is to enhance the way Tcl understands the data is being fed with.

In the case of {*} it was a message of "a expanded list comes here". For {#} the proposed message is, "this word you shall ignore".

Watching JO at the XXth conference it was clear to me that one of the main premises underlying Tcl was that each command is responsible for interpreting and processing the words which form its arguments.

This is an awesome concept, or contract, as Tcl core is made conceptually oblivious to the works of an extension command.

But, there are places where this contract proves a limitation. For instance, reading the efforts done in the sugar macro processor you can see how some commands (if and while for instance) need to be "better explained" to Tcl to facilitate the macro processor understand how it should handle its arguments.

Perhaps sugar or the Tcl core for that matter would need to do better business with Tcl code (a sucession of Tcl commands, each one of them king to its own arguments) a little aid to just get the information needed to process the stuff.

What information would be needed?

Knowing that Tcl is really a nickname for at least three languages working in unison: that of Tcl scripts, that of Tcl expressions, and that of Tcl assembly language, suggest an answer.

Perhaps it would be a good idea to annotate the arguments expected by a command in the following way:

{=} would mean "this word is a Tcl expression".

{%} would mean "this word is a Tcl script".

Well, the semantics are not quite there, as the idea is that these prefixes will only be used to annotate arguments.

So a single:

proc if { {=}expression {%}branch1 {%}branch2  } {
    ...
}

Would provide enough information for other commands to grasp a better understanding of a command they might need to tweak on.

The corresponding info subcommands would be giving information about what type each argument of a command is.


Going farther with the idea of {prefix} forms... it can be argued that type annotations, as in {double}, {int} and such, might be handy... this seems excessive to me... what I propose is limited to what I think is a more fundamental level of the types a word can be in Tcl.

AMG: I think you're better off creating a new command which lets you specify annotations. This command would normally be implemented as a total no-op: proc annotate {args} {}. But in an environment where the annotations are useful (e.g. with Nagelfar or some debugger or a documentation generation tool like Ruff!), this command would squirrel away its arguments in a place known to the interested framework. You could do this today without changing Tcl.

An approach currently in use in many languages is having specially-formatted comments. Tcl lets you extract these comments from procs at runtime, and you can write code to parse them. Ruff! probably does this. I'm not sure how you get the text of the currently running script at runtime, but if the script is in a file, you just read the file. And if it's not in a file, why are you debugging or documenting it?

Lastly, you can keep your annotations out-of-band, e.g. in a Nagelfar configuration file.


PYK 2014-09-04: I'm sure this idea has been proposed before by others, but as I mentioned in the Tcl Chatroom today, if a namespace, e.g. ::tcl::rabbithole, were reserved for the purpose, then commands like = and % could be defined inside it. Each command would take as its arguments both the word in braces and the word it prefixes, and return a list of words to be expanded into the command in place of the {...}{...} value. A namespace named tcl::rabbithole could be used to override the global commands. This would accomodate the proposal that is the subject of this page, but also leave the door open to the evolution of directives within this framework.

{*} could be reimplemented in terms of this framework. Thus, there would be no net growth of the dodekalogue, as rule 5 could be rewritten to describe the generalized form:

Argument expansion
When a word is prefixed by a word in braces, a command by the name of the word in the braces is resolved, first in tcl::rabbithole and then in ::tcl::rabbithole. The command is executed with the word in braces and the prefixed word as its two arguments, returning a list, the words of which are added into the original command in place of the original word and its braced prefix, with each word of the result becoming an individual argument to the command.

This would subsume TIPs 293 and 401 , and change the nature of other like-minded future TIPs such that they would simply be proposing new commands rather than new syntax.

One feature that might be added via this mechanism is the oft-requested ability to traverse object chains syntactically from left-to-right rather than in the embedded fashion implied by Tcl:

set phone {.}{{$person1 employer} CEO phone}

. could be implemented to make the previous command equivalent to:

set phone [[[$person1 employer] CEO] phone]

JJS 2014-09-09: It seems like any proposed {X}$foo could be implemented as

{*}[X $foo]

without hiding anything in a rabbithole. It would be four characters shorter, though. The only special aspect to {X} is the delistifying back into the original command, and that is admirably captured by {*} without needing to make it an attribute that can be added to any arbitrary command.


jima 2014-09-05: The thing is that then you need to make sure that ::tcl::rabbithole::. does not use {.} otherwise you might end up in a recursive pit.

PYK 2014-09-05: Yes, crafting {...} commands would require particular care to avoid that. {*} at least would be easily implemented, and available to other expanding commands:

proc ::tcl::rabbithole::* args {return $args}

ekd123 - 2014-09-05 13:05:06

() might be a better choice, for it's only used for arrays and 99.9999% of the use doesn't have it in the beginning.

{} will cause confusions.

PYK 2014-09-05: That ship sailed when {*} was chosen as the expand syntax.

RLE (2014-09-05): Yep, and couple that with the fact that "" is a valid array (or variable) name to Tcl and you have an incompatibility with using ():

  $ rlwrap tclsh 
  % set (something) "or another"
  or another
  % puts $(something)
  or another
  % unset ""
  % puts $(something)
  can't read "(something)": no such variable
  % set "" "this or that"
  this or that
  % puts ${}
  this or that
  % 

There is a page or two on this Wiki (I can't pull them out of search right now) that discuss using the () array for proc options handling:

  proc task { args } {
    array set "" $args

    if { $(-opt1) eq "something" } {
      do-something
    }

    set fd [ open $(-file) RDONLY ]

    ...
  }

AMG: If you're going to reclaim $(...) and break compatibility, I'm far, far more interested in using that notation as an expr shorthand. Let $(...) be the same as [expr {...}].

RLE (2014-09-07): FWIW, I was arguing not to reclaim it, because it would break compatibility.

However, I find your expr shorthand intriguing. The same (almost) as Bash's math operator, except for bash it is double parens $(( )).

PYK 2014-09-07: My tendencies swing away from having $ at all, since it could be eliminated without too much cost. As minimal as Tcl is in the syntax department, beating even Lisp, it could shrink a little. The {...}... rule proposed above affords another possibility for eliminating the current $ rules:

proc ::tcl::rabbithole::$ args {
    set res {}
    foreach arg $args {
        lappend res [uplevel [list set $arg]]
    }
    return res
}

after which, variable access would look like:

puts {$}name

This implies that the proposed {...}... form should be stackable:

puts {*}{$}names

RLE (2014-09-08): Ok, maybe I'm missing something, but how is this

puts {$}name

a "removal" of $ as compared to this?:

puts $name

Two extra {}'s, and the $ is still there?

When I think of what 'removal of $' would mean, it would mean this:

puts name

PYK 2014-08-08: It's a removal in the sense that it doesn't require any specific mention of $ in the dodekalogue in order to implement it, as the argument expansion rule proposed above covers the case of puts {$}name, which could just as well be puts {deref}name. It doesn't provide for mid-string variable expansion, though.

AMG: Or use single-argument [set]. This will allow for mid-string variable substitution.