Version 36 of What kinds of variable names can be used in Tcl

Purpose: discuss what all is known about Tcl variable names - what are the limits, tricks, and tips?

LV: I've seen people show examples of Tcl variable names from 0 characters long to quite long (more than 30 I believe). The names can contain spaces and many other special characters. I would, myself, avoid the use of metacharacters such as [ ] , ( ) $ : { } and probably a few others, just to save myself the quoting nightmare that one might encounter.

I also believe that some extensions impose additional 'restrictions' - for instance, I believe that Tk generally expects widget names to begin with a period and not to use an upper case alpha as the first character.

RS: As far as I'm aware, a variable name may be any string, from the empty string up. (Well, sequences of colons (two or more :) have special meaning, so they cannot be arbitrarily thrown in). The $ parser restricts variable names to [A-Za-z0-9_], so it's wise to stay in that range (otherwise you might have to brace (which prevents Unicode expansion... or use [set \u20AC] if you have a variable name with the Euro sign.

Rolf Ade: I can do 'set ä foo; puts $ä' (this ä is a diaresis, if you only see a strange char). What do you mean with "The $ parser restricts variable names to [A-Za-z0-9_]?

RS: I've heard there was a bug report so "national" letters would also be allowed by the $ parser... so it seems this has been fixed. Which version?

RS: In 8.2.3, "ä" is not taken by the $ parser.

Rolf Ade:8.3.2

KBK: The bug is still open as of 17 January 2002: [L1 ]. Once it's fixed, there oughtn't to be anything wrong with a variable substitution like $hétérogénéité. (Heterogeneity of the languages accepted would be a good thing...) (Just checked: it depends on the native isalnum() call, but certainly you can't name a variable \u03b3 and accept the $-parser to take it, even when eval'ing a Unicode string.)

TV The delimiters of the variable names are important, I in bwise use variables with dot notation, which works, but for instance when printing a var, evaluating a string with the name in it requires quoting/list-ing. In fact, the dot is taken as name seperator anyway, so braces must be used to access the variable:

 set group.element 3.14
 puts ${group.element}

because this is wrong:

 puts $group.element
 puts $group\.element
 puts "$group.element"

i.e. that doesn't work. I didn't check wether the tcl language spec explicitly allows dots in variable names, or wether such case is mentioned at all, but I've got quite old bwise code which I don't like to go over again and change all dots in. Strictly I guess they aren't needed, but I call blocks by a name, and then make block variables with name.vars notation, so one could (later) use

 info vars groupname.*

which is cool enough. I don't like typing two semicolons :: on a row, but then again, I guess anything to use as explicit seperator can work fine.

It seems that the constructs I use in bwise work in general, have a look at them to at least have working examples of using these type of variable names in complex constructs, also for indirect referencing, and containing tcl code which can be eval-ed. No beauty queen contest of code smoothness, but compact, efficient (as far as that goes in runtime) and for me overseeable and un-tricked stuff.

DKF: Note that periods are never going to be parsed by $ as being part of a variable name (so you'll always have to force the issue with ${} or [set]) because that makes it much easier to build Tk widget names.

Joe English wrote up this (now slightly editted) discussion on Tcl variables, after a discussion of variable vs upvar came up on comp.lang.tcl:

Variables in Tcl are actually rather complicated. I find it easiest if you don't think of "variables" as first-class entities themselves, but instead think of "Names" and "Contexts".

There are two different kinds of Contexts: namespace contexts (including the global namespace) and stack frames (i.e., inside a procedure body). The former are static, the latter dynamic.

[ Will Duquette added the following comment: "And this, I think, is the most important point. It's rare in real code to call upvar anywhere but in a procedure body. And as soon as the procedure returns, that Context is gone, and the upvar-created Name is gone with it." ]

A Context can map a Name to one of several things:

an Array (which is a collection of Elements indexed by Strings)
an Element (which is a Location (v.i.) in an array)
a Scalar (which is a Location (v.i.) not in an array)
a Link to an Array, Element, or Scalar.

(Note that you cannot create a Link to a Link; only one level of indirection is possible.)

[ Lars H adds the following comment: If true, that parenthetical remark needs clarification. It is quite possible (and often necessary) to "upvar 1" to a Name which was itself created by an "upvar 1", repeated an arbitrary number of levels. Consider the following

  proc nestset {levels varname value} {
     upvar 1 $varname var
     if {$levels>0} then {
        nestset [expr {$levels-1}] var $value
     } else {
        set var $value:[info level]
     }
  }
  nestset 10 myVar 5
  set myVar ; # Returns 5:11

This looks a lot like a Link to a Link to me. ]

A Location is simply a place to store a Value (which is a String). A Location can also have a collection of attached Traces, which we will not discuss further here.

The Value is optional. For example, a Location with no Value can still have Traces. Arrays can also have Traces, which again, we will not discuss.

Finally: a Variable is a String which is used to find a Location (or sometimes an Array) starting from a given Context. Variables consist of one or more Names, separated and optionally preceded by namespace separators (two or more colons), and optionally followed by an array Index in parentheses. See Rule #7 in Tcl(n) and Tcl_SetVar(3) for the precise (sort of) syntax.

Most operations on Variables (set, unset, trace, etc.) really operate on a Location. If the variable Name resolves to a Link, Tcl automatically follows the Link and operates on the resolved Location instead. That's why it's so hard to remove a binding from a Name to a Link in any Context once one has been created with upvar or variable.

I agree that the documentation is not very clear, and is quite possibly ambiguous or even wrong in several places. In particular, it's not at all clear what it means for a "variable" to "exist". As near as I can tell, for simple Variables (i.e., those which do not include an array index reference) it means that, after resolving namespace references and following Links, the variable denotes a Location (Scalar or Element) that currently holds a Value, or to an Array.

LES: The Endekalogue already provides some rules for the names of variables, but I thought this topic could be useful to clarify a few more details, especially to newcomers because Tcl allows a lot of liberty with the names of variables that other languages never have allowed and probably never will.

Since I am Brazilian and my language has more than the usual and meager 26 characters, I will begin with a few observations I have made in relation to special characters:

Tcl allows spaces in variable names. You just need to wrap the name with the brackets:

 % set {some variable name} 2
 % set {some variable name}
 2
 % puts ${some variable name}
 2

Tcl also allows special characters, so long as these are also wrapped in brackets:

 % set !@weird¬* "yikes!"
 % set !@weird¬*
 yikes!
 % puts ${!@weird¬*}
 yikes!

 % set acentuação "sim!"
 % set acentuação
 sim!
 % puts ${acentuação}
 sim!

Unfortunately, the cedilla and/or accented vowels are not considered "normal" characters. I could really swear it was not like that until not too long ago. But I am probably mistaken. Anyway, it's still better than most languages.

Even though the brackets are required whenever you refer to the content of the variable (when you use the $ sign), they are not required when you refer to the variable's name, which gives us a lot of liberty:

 % set !#@{blah}%¨& oi {1 2 3 4}
 % foreach  i  [ set !#@{blah}%¨& ] {puts $i}
 1
 2
 3
 4

It also allows many interesting visual tricks:

 % set foo "one little, two little, three little indians"
 % regexp -all {(one).*(two).*(three).*}  $foo  =>  !g1 !g2 !g3

The last example creates four variables:

=> : one little, two little, three little indians
!g1 : one
!g2 : two
!g3 : three

RS 2005-04-19: For well-readable code, one should best use "conventional" variable names, that start with a letter and continue with letters, digits, or underscores. But there are use cases for purely numeric variable names ($1, $2..) which in most programming languages are impossible:

when porting shell or awk scripts "minimally invasive" which used them already
in half-lambdas (pure function body, no explicit argument list)
when transposing a matrix as intermediate containers for rows

"" as a variable name makes most sense for an array name, where you can write

 set (a) $(b)

But beware of conflicts, because stooop and some versions of Tkcon use that special name already :)

MS notes that there is one "interesting" exception to "any string can be a variable name": the variable names starting with : or containing two or more : in a row are special.

they cannot begin with two or more : - as then the leading characters will be taken to mean that the variable is in the global namespace, and the variable name is essentially stripped of the leading :. They cannot contain two or more : in a row, as that will be understood to mean a namespace qualification.

    % set :::x 5
    5
    % set x
    5

if they begin with a single :, they are variables that do not have a qualified name. This means that they are only accessible from their own namespace, as any namespace qualifier will swallow the leading :

   % set y 4
   4
   % set :y 3
   3
   % set :::y
   4

(see also [Bug 458892] [L2 ]

LES: There is no point in making a new page just for the names of procs, but might be worth noting that the $ symbol usually is the only thing that makes the brackets be required with unusual variable names, and that therefore there is no such problem when naming a proc. If the name of the proc contains any space, the brackets are required. In almost all other cases, they are not:

 % proc ! {} {puts wow!}
 % !
 wow!

 % proc } {} {puts hi}
 % }
 hi

 % proc *áé.î+.ôü~çã' {} {puts "teste de acentuação"}
 % *áé.î+.ôü~çã'
 teste de acentuação

 % proc { } {} {puts space}
 % " "
 space

Does any other language in the world allow that kind of freedom? - (Lars H: What Postscript provides is probably comparable.)

RS likes these fancy names for little procs that provide character lists:

 proc 0-9 {} {list 0 1 2 3 4 5 6 7 8 9}
 proc a-z {} {list a b c d e f g h i j k l m n o p q r s t u v w x y z}

When called, they look like the regular expressions they stand for (but can be iterated over):

 foreach digit [0-9] {
    puts hello,$digit
 }

AMG: Clever! I think this is a good candidate for [unknown]. Possibilities: [0-9], [0-99], [00-09], [00-99], [99-0], [a-z], [A-Z], [A-N], [N-A], etc. More possibilities: [0-9'2] for {0 2 4 6 8}, [0-6'2,2-0,c-i'3] for {0 2 4 6 3 2 1 0 c f i}, and whatever other fun stuff we don't really need. :^) And because [unknown] is rumored to be slow, we can optimize the most frequently used lists with "proc $x {} [concat {list} [$x]]]". How easy is that?

LV Why did you have 2-0 map to 3 2 1 0 ? That one puzzled me.

Category Tutorial