What kinds of variable names can be used in Tcl

Purpose: discuss what all is known about Tcl variable names - what are the limits, tricks, and tips?


LV: I've seen people show examples of Tcl variable names from 0 characters long to quite long (more than 30 I believe). The names can contain spaces and many other special characters. I would, myself, avoid the use of metacharacters such as [ ] , ( ) $ : { } and probably a few others, just to save myself the quoting nightmare that one might encounter.

I also believe that some extensions impose additional 'restrictions' - for instance, I believe that Tk generally expects widget names to begin with a period and not to use an upper case alpha as the first character.

RS: As far as I'm aware, a variable name may be any string, from the empty string up. (Well, sequences of colons (two or more :) have special meaning, so they cannot be arbitrarily thrown in). The $ parser restricts variable names to [A-Za-z0-9_], so it's wise to stay in that range (otherwise you might have to brace (which prevents Unicode expansion... or use [set \u20AC] if you have a variable name with the Euro sign.

Rolf Ade: I can do 'set ä foo; puts $ä' (this ä is adiaresis, if you only see a strange char). What do you mean with "The $ parser restricts variable names to [A-Za-z0-9_]?

RS: I've heard there was a bug report so "national" letters would also be allowed by the $ parser... so it seems this has been fixed. Which version?

RS: In 8.2.3, "ä" is not taken by the $ parser.

Rolf Ade:8.3.2

KBK: The bug is still open as of 17 January 2002: [1 ]. (Still open 1 February 2007, no activity since 2001!) Once it's fixed, there oughtn't to be anything wrong with a variable substitution like $hétérogénéité. (Heterogeneity of the languages accepted would be a good thing...) (Just checked: it depends on the native isalnum() call, but certainly you can't name a variable \u03b3 and accept the $-parser to take it, even when eval'ing a Unicode string.)

TV The delimiters of the variable names are important, I in bwise use variables with dot notation, which works, but for instance when printing a var, evaluating a string with the name in it requires quoting/list-ing. In fact, the dot is taken as name seperator anyway, so braces must be used to access the variable:

 set group.element 3.14
 puts ${group.element}

because this is wrong:

 puts $group.element
 puts $group\.element
 puts "$group.element"

i.e. that doesn't work. I didn't check whether the tcl language spec explicitly allows dots in variable names, or whether such case is mentioned at all, but I've got quite old bwise code which I don't like to go over again and change all dots in. Strictly I guess they aren't needed, but I call blocks by a name, and then make block variables with name.vars notation, so one could (later) use

 info vars groupname.*

which is cool enough. I don't like typing two semicolons :: on a row, but then again, I guess anything to use as explicit separator can work fine.

It seems that the constructs I use in bwise work in general, have a look at them to at least have working examples of using these type of variable names in complex constructs, also for indirect referencing, and containing tcl code which can be eval-ed. No beauty queen contest of code smoothness, but compact, efficient (as far as that goes in runtime) and for me overseeable and un-tricked stuff.

DKF: Note that periods are never going to be parsed by $ as being part of a variable name (so you'll always have to force the issue with ${} or [set]) because that makes it much easier to build Tk widget names.


Joe English wrote up this (now slightly editted) discussion on Tcl variables, after a discussion of variable vs upvar came up on comp.lang.tcl:

Variables in Tcl are actually rather complicated. I find it easiest if you don't think of "variables" as first-class entities themselves, but instead think of "Names" and "Contexts".

There are two different kinds of Contexts: namespace contexts (including the global namespace) and stack frames (i.e., inside a procedure body). The former are static, the latter dynamic.

[ Will Duquette added the following comment: "And this, I think, is the most important point. It's rare in real code to call upvar anywhere but in a procedure body. And as soon as the procedure returns, that Context is gone, and the upvar-created Name is gone with it." ]

A Context can map a Name to one of several things:

  • an Array (which is a collection of Elements indexed by Strings)
  • an Element (which is a Location (v.i.) in an array)
  • a Scalar (which is a Location (v.i.) not in an array)
  • a Link to an Array, Element, or Scalar.

(Note that you cannot create a Link to a Link; only one level of indirection is possible.)

[ Lars H adds the following comment: If true, that parenthetical remark needs clarification. It is quite possible (and often necessary) to "upvar 1" to a Name which was itself created by an "upvar 1", repeated an arbitrary number of levels. Consider the following

  proc nestset {levels varname value} {
     upvar 1 $varname var
     if {$levels>0} then {
        nestset [expr {$levels-1}] var $value
     } else {
        set var $value:[info level]
     }
  }
  nestset 10 myVar 5
  set myVar ; # Returns 5:11

This looks a lot like a Link to a Link to me. ]

A Location is simply a place to store a Value (which is a String). A Location can also have a collection of attached Traces, which we will not discuss further here.

The Value is optional. For example, a Location with no Value can still have Traces. Arrays can also have Traces, which again, we will not discuss.

Finally: a Variable is a String which is used to find a Location (or sometimes an Array) starting from a given Context. Variables consist of one or more Names, separated and optionally preceded by namespace separators (two or more colons), and optionally followed by an array Index in parentheses. See Rule #7 in Tcl(n) and Tcl_SetVar(3) for the precise (sort of) syntax.

Most operations on Variables (set, unset, trace, etc.) really operate on a Location. If the variable Name resolves to a Link, Tcl automatically follows the Link and operates on the resolved Location instead. That's why it's so hard to remove a binding from a Name to a Link in any Context once one has been created with upvar or variable.

I agree that the documentation is not very clear, and is quite possibly ambiguous or even wrong in several places. In particular, it's not at all clear what it means for a "variable" to "exist". As near as I can tell, for simple Variables (i.e., those which do not include an array index reference) it means that, after resolving namespace references and following Links, the variable denotes a Location (Scalar or Element) that currently holds a Value, or to an Array.


LES: The Endekalogue already provides some rules for the names of variables, but I thought this topic could be useful to clarify a few more details, especially to newcomers because Tcl allows a lot of liberty with the names of variables that other languages never have allowed and probably never will.

Since I am Brazilian and my native language has more than the "usual" 26 characters, I will begin with a few observations I have made in relation to special characters:

  • Tcl allows spaces in variable names. You just need to wrap the name with the brackets:
 % set {some variable name} 2
 % set {some variable name}
 2
 % puts ${some variable name}
 2
  • Tcl also allows special characters, so long as these are also wrapped in brackets:
 % set [email protected]¬* "yikes!"
 % set [email protected]¬*
 yikes!
 % puts ${[email protected]¬*}
 yikes!

 % set acentuação "sim!"
 % set acentuação
 sim!
 % puts ${acentuação}
 sim!

Unfortunately, the cedilla and/or accented vowels (à, é, õ) are not considered "normal" characters. (EDIT: I just tested puts $acentuação with Tcl 8.5a4 in Tkcon and it works! - 01 Feb 2007) (As remarked above, Tcl internally relies on the C function isalnum for this test, so whether a character is recognized as a letter or not may depend on the platform, the C compiler used, the options passed to it, and perhaps even the current locale!) (In other words, it's not cross-platform and not using curly brackets is asking for trouble(?)-LES)

  • Even though the brackets are required whenever you refer to the content of the variable (when you use the $ sign), they are not required when you refer to the variable's name, which gives us a lot of liberty:
 % set  !#@{blah}%¨&  {1 2 3 4}
 % foreach  i  [ set !#@{blah}%¨& ] {puts $i}
 1
 2
 3
 4

It also allows many interesting visual tricks:

 % set foo "one little, two little, three little indians"
 % regexp -all {(one).*(two).*(three).*}  $foo  =>  !g1 !g2 !g3

The last example creates four variables:

  • => : one little, two little, three little indians
  • !g1 : one
  • !g2 : two
  • !g3 : three

RS 2005-04-19: For well-readable code, one should best use "conventional" variable names, that start with a letter and continue with letters, digits, or underscores. But there are use cases for purely numeric variable names ($1, $2..) which in most programming languages are impossible:

  • when porting shell or awk scripts "minimally invasive" which used them already
  • in half-lambdas (pure function body, no explicit argument list)
  • when transposing a matrix as intermediate containers for rows

"" as a variable name makes most sense for an array name, where you can write

 set (a) $(b)

But beware of conflicts, because stooop and some versions of Tkcon use that special name already :)

[AMG}: Empty string is also a valid array element name:

% set var() abc
abc
% puts $var()
abc
% array get var
{} abc

If you're crazy, you can even mix'n'match it with empty variable name:

% set () abc
abc
% puts $()
abc
% array get ""
{} abc

MS notes that there is one "interesting" exception to "any string can be a variable name": the variable names starting with : or containing two or more : in a row are special.

  • they cannot begin with two or more : - as then the leading characters will be taken to mean that the variable is in the global namespace, and the variable name is essentially stripped of the leading :. They cannot contain two or more : in a row, as that will be understood to mean a namespace qualification.
    % set :::x 5
    5
    % set x
    5
  • if they begin with a single :, they are variables that do not have a qualified name. This means that they are only accessible from their own namespace, as any namespace qualifier will swallow the leading :
   % set y 4
   4
   % set :y 3
   3
   % set :::y
   4

(see also [Bug 458892] [2 ]


LES: There is no point in making a new page just for the names of procs, but it might be worth noting that the $ symbol usually is the only thing that requires brackets for unusual variable names, and that therefore there is no such problem when naming a proc. If the name of the proc contains any space, the brackets are required. In almost all other cases, they are not:

 % proc ! {} {puts wow!}
 % !
 wow!

 % proc } {} {puts hi}
 % }
 hi

 % proc *áé.î+.ôü~çã' {} {puts "teste de acentuação"}
 % *áé.î+.ôü~çã'
 teste de acentuação

 % proc { } {} {puts space}
 % " "
 space

Wow. Does any other language in the world allow that kind of freedom? - (Lars H: What Postscript provides is probably comparable.)

RS likes these fancy names for little procs that provide character lists:

 proc 0-9 {} {list 0 1 2 3 4 5 6 7 8 9}
 proc a-z {} {list a b c d e f g h i j k l m n o p q r s t u v w x y z}

When called, they look like the regular expressions they stand for (but can be iterated over):

 foreach digit [0-9] {
    puts hello,$digit
 }

AMG: Clever! I think this is a good candidate for [unknown]. Possibilities: [0-9], [0-99], [00-09], [00-99], [99-0], [a-z], [A-Z], [A-N], [N-A], etc. More possibilities: [0-9'2] for {0 2 4 6 8}, [0-6'2,3-0,c-i'3] for {0 2 4 6 3 2 1 0 c f i}, and whatever other fun stuff we don't really need. :^) And because [unknown] is rumored to be slow, we can optimize the most frequently used lists with "proc $x {} [concat {list} [$x]]]". How easy is that?

LV Why did you have 2-0 map to 3 2 1 0 ? That one puzzled me.

AMG: Sorry, it's just a boo-boo. Thanks for pointing it out. I corrected my original comment so that it says 3-0.


iu2 I almost forgot that this could be done too... ;-)

 set 0-9 {0 1 2 3 4 5 6 7 8 9}
 foreach digit ${0-9} {puts $digit}
 or, if you don't like pressing SHIFT
 foreach digit [set 0-9] {puts $digit}

LV I wish we had real iterators though. I find myself occasionally needing to do things like:

 foreach i file[0-9][0-9] {
   # open with write a fileNN, where NN is replaced with 00-99
   # write out something
   # close the file
 }

Sometimes a version of ksh will be configured to enable this functionality - the ability to create new filenames (which is not globbing in this case - these file names may or may not exist), falls under filename generation. I'm not suggesting that Tcl get a new function for this small case - but that true iterators make things like this as well as other things such as AMG and others have written above possible.

slebetman thinks that people here are trying to ask foreach to do too much. Remember that tcl's for is very powerful indeed implementing the full semantics of C's for loop:

  for {
    set n 0
    set i file00
  } {
    $n < 100
  } {
    incr n
    set i "file[format %02d $n]"
  } {
    doSomethingWith $i
  }

iu2 Here is a proc, for iterating all ranges in the argument list. A range consists of a pair on numbers.

  proc iter {args} {
    return [iter' {} $args]
  }

  proc iter' {r list} {
    if {[llength $list] <= 0} {return $r}
    for {set i [lindex $list 0]} {$i <= [lindex $list 1]} {incr i} {
      lappend res [iter' [concat $r $i] [lrange $list 2 end]]
    }
    return [join $res]
  }

examples:

  # the file names example
  foreach {i j} [iter 0 9 0 9] {
    puts [format file%d%d $i $j]
  }

  # XOR truth table for 4 digits
  foreach {1 2 3 4} [iter 0 1 0 1 0 1 0 1] {
    puts $1,$2,$3,$4-->[expr $1^$2^$3^$4]
  }

LV Thanks for examples of how to work around the missing functionality. Note that it is my understanding that one nice feature of the iterator is that values are only generated as needed - thus, if someone indicates they need numbers from 1 to 1 million, one doesn't have to wait until all million numbers are generated - they would be generated as needed, keeping delays (and memory usage) down significantly.

I'm not trying to complain about your code - just offering a perspective on the different approaches.


iu2 A Generator I suppose... See also Looking at LISP's SERIES extension I believe it all comes back to tcl's lack of closures.

This is a generator I've written for generating values as needed like python generator

NEM Closures would be nice, but for simple cases, non-essential. Use of upvar/uplevel can result in a satisfying experience:

 proc foreach-range {varName start end body} {
     upvar 1 $varName var
     for {set var $start} {$var <= $end} {incr var} { uplevel 1 $body }
 }
 foreach-range n 0 99 { puts [format file%02d $n] }

iu2 Tclx has that, loop command...


Lectus

One thing I love about Tcl that I can't find in any other language is this flexibility of variable names, complex string-based data structures and how easy it is to create new language constructs: EDIT: Thanks for all suggestions! Optimized the code and reduced its size:

proc range {r {step '} {n 1}} {
        array set m {}
        if { ![ regexp -all {([0-9]+)-([0-9]+)} $r -> m(1) m(2) ] } {
                return [list]
        }
        
        if {$m(1) == $m(2)} { return [list $m(1)] }
        
        set templist [list]
        
        if {$m(1) < $m(2)} {                
                for {set i $m(1)} {$i <= $m(2)} {incr i $n} {
                        lappend templist $i
                }
                return $templist
        }
        
        if {$m(1) > $m(2)} {                
                for {set i $m(1)} {$i >= $m(2)} {incr i $n -1} {
                        lappend templist $i
                }
                return $templist
        }
        
        return [list]
}

puts [range 0-9]
puts ---
puts [range 100-50]
puts ---
puts [range 0-10 step 2]
# The most useful way to use this is inside a foreach: foreach x [range 1-30] { #do something with x }
# which is shorter than writing a full for command.

The output of the script above:

0 1 2 3 4 5 6 7 8 9

---

100 99 98 97 96 95 94 93 92 91 90 89 88 87 86 85 84 83 82 81 80 79 78 77 76 75 74 73 72 71 70 69 68 67 65 64 63 62 61 60 59 58 57 56 55 54 53 52 51 50

---

0 2 4 6 8 10