Tcl style discussion

Purpose: Provide an area for discussions on the fine points of programming style in Tcl.

Contents:

  • Loops with multiple induction variables (10 Oct 2000).
  • When are semicolons stylistically appropriate? (11 Oct 2000)

Loops with multiple induction variables

KBK (10 Oct 2000): Some edits that DKF made to a modular exponentiation program in Sample Math Programs raise an interesting question of Tcl style that the Tcl style manual [1 ] doesn't really address, although section 6.4 comes close by stating that there should be only one command per line.

The question arises when a loop has more than one induction variable; this is a fairly common situation.

When a loop is simultanously controlling more than one variable, I tend to prefer the style where all the induction appears in the initialization and reinitialization blocks of the for command. (Among other things, I see it as an indication that the variables are induction variables. Hence, for the modular exponentiation table, I wrote:

  proc modular_exponent_table { b p } {
      for { set n 1; set value [expr { $b % $p }] } \
          { $n < $p } \
          { incr n; set value [expr { ( $value * $b ) % $p }] } \
          {
              puts "( $b ** $n ) mod $p == $value"
          }
      return
  }

Realizing that this violates the letter of section 6.4 about avoiding the semicolon as a command delimiter, I might have written (were it not for the fact that this style horribly confuses the indentation functions of my editor):

  proc modular_exponent_table { b p } {
      for { 
              set  n     1
              set  value [expr { $b % $p }]
          } {
              $n < $p
          } {
              incr n
              set  value [expr { ( $value * $b ) % $p }]
          } {
              puts "( $b ** $n ) mod $p == $value"
          }
      return
  }

again, stressing that value is equally a loop control variable. (In fact, value > 1 might be an equally good termination condition, resulting in a shorter table where b is not a primitive root.)

DKF edited the code, instead, to look like:

 proc modular_exponent_table { b p } {
    set value 1
    for {set n 1} {$n < $p} {incr n} {
       set value [expr {($value * $b) % $p}]
       puts "( $b ** $n ) mod $p == $value"
    }
 }

which is unquestionably more compact, but loses the idea that the for is controlling value as well as n. Were I to resort to that style, I might actually go all the way and extract both variables:

  proc modular_exponent_table { b p } {
      set n 1
      set value 1
      while { $n < $p } {
          set value [expr { ( $value * $b ) % $p }]
          puts "( $b ** $n ) mod $p == $value"
          incr n
      }
  }

But somehow, neither one of these seems quite to capture the same idea as the original for, which I saw as being analogous to the extremely Tclish:

  foreach { index value } [array get $myArray] {
      # ... do stuff with $index and $value
  }

or

  foreach { keyword value } $args {
      # ... do stuff with $keyword and $value
  }

But that may be my personal prejudice; I realize that extending the idea of multiple loop variables to for is weird the first time you see it.

Have the other participants here any thoughts on this?


DKFs modification of the code is appropriate. In your example, and in almost all code, there is only one controlling variable in a for or while loop. The variable value is calculated iteratively, but it does not, in fact, control the loop. 10/10/00 - RWT


DKF - For me, the question hinges on what exactly is being iterated over. It is a well-known property of finite-field arithmetic that b^p mod p is 1 (whenever b and p are non-equal cardinals greater than one, so we exclude annoying end-cases!) Thus, to generate the finite field powers, we simply need to iterate over a reasonably small number of values - IOW, we can precalculate the number of entries in the table, and do so trivially.

Now a good principle in computing (that applies even more so when working with floating-point numbers, but I digress) is that wherever you are iterating over a finite set, you should use something that generates the finite set directly and which does so based on selection of definite values (i.e. list items or integers) by the simplest possible algorithm as that makes it easier to demonstrate that your code terminates - you should always make the demonstration of finiteness of your loops as simple as possible mathematically - and the single simple iterator technique (foreach has one internally) is the best way to do it in practise. It is also the easiest to optimize.

This then leads to the use of a simple iterator scheme as in my code which describes the fact that we are iterating from 1 to p-1 simply and factorizes out the complex computations into the body of the loop. This also has the advantage of the loop control parts on a single line which makes the loop easier to understand at a glance. It also shortens the code overall at an extra cost of one extra variable read, multiply and assignment.

The only case where I might have been tempted to omit the variable is where the loop variable is itself otherwise completely un-needed. Then, the principle of useless variable elimination applies, giving code like this.

 proc metab {b p} {
    for {set v [expr {$b%$p}]} {$v!=1} {set v [expr {($v*$b)%$p}]} {
       puts $v
    }
 }

We can even generate the same output with a little more work (but we do need to reintroduce the iterator in some form.)

 proc metab {b p} {
    set n 0
    for {set v [expr {$b%$p}]} {$v!=1} {set v [expr {($v*$b)%$p}]} {
       puts "( $b ** [incr n] ) mod $p == $v"
    }
 }

However, I'm not too sure that this really is clearer. The setup and update sections are much harder to understand fully at a glance, and it is certainly not entirely obvious that this is guaranteed to terminate (in fact, it will fail when b==p) unlike the other version where the termination proof is trivial. It also cannot be cleanly extended to display the further values from the sequence, something which may well be useful for pedagogic purposes...


KBK (11 October 2000) -- OK, I'll believe that the key concept is whether the variable actually controls the loop. The 'metab' procedure, though, is somewhat more pleasing in that it catches the cycles that are found when b is not a primitive root. DKF has a valid point, though, that the termination condition is far from obvious.

Perhaps a better compromise is obtained by using break:

 proc yet-another-table { b p } {
     set v [expr { $p % $b }]
     for { set n 1 } { $n < [expr $p-1] } { incr n } {
         puts "( $b ** $n ) mod $p == $v"
         # $v == 0 can happen if b divides p; it's included
         #         in case the alleged 'prime' p is composite.
         if { $v == 0 || $v == 1 } {
             break
         }
         set v [expr { ( $v * $b ) % $p }]
     }
     return; # required per the Tcl Style Guide
 }

This version makes obvious the fact that the loop terminates, and still manages to exit it when a cycle is found. It also makes v start to feel like an induction variable again, and this feeling would tend to lead me back toward something like what I posted originally.

Nevertheless, I'll accept Donal's correction.


Lars H, 13 July 2005 (i.e., years later): It occurred to me today that this variable controlling loop condition is a red herring. What one should ask is whether moving the increment into the body of the loop would make continue misbehave, or otherwise be needlessly complicated (since the "update second variable" code would have to be duplicated in front of every continue).


When are semicolons stylistically appropriate?

KBK (11 October 2000): The Tcl Style Guide forbids the use of semicolons as command delimiters; there is nonetheless a place for them. One fairly noncontroversial use is laying out namespaces:

 namespace eval ::animals {

     variable gnats 0;        # Count of gnats in the system

     variable gnus 1;         # Count of gnus in the system

     namespace export bite;   # Procedure to make a gnat bite a gnu

     namespace export strain; # Procedure to make a gnu strain at a gnat

     # (Of course, there will be appropriate headers above the
     # procedures themselves!)
 }

Using semicolons for commenting lines that are not intended to declare procedures, variables, and so on is not appropriate. In other words, only commands such as 'global', 'variable', 'namespace export, and 'upvar' are suitable for this usage.

Bad:

 set x 1;                     # Initialize the index

Preferred:

 # Initialize the index

 set x 1

And separating multiple executable commands with them is even worse:

Horrible:

 set x 1; set y 2; set result [strainMultipleGnats $x $y]

Another appropriate use of semicolons is generating intentionally obfuscated code!

I'm wondering, more or less idly, where else the evil semicolon is appropriate.

DKF - I use the semicolon for same-line-comments quite a bit (especially where the comment is a light-weight one, where this is a wooly definition based on my gut feeling) and virtually nowhere else. The other case where I think it is fair - loops with double-iterators - is extremely rare.

Curiously, it is more common with lists, but the [foreach] command provides a different mechanism for that...

KBK (12 October 2000) - I used to put end-of-line comments in quite a bit, but Ray Johnson eventually convinced me that it was a bad practice. If you use long lines, they can be obvious - floating far to the right. But in that case, many editing and printing tools give awkward results when applied to the code; much of the world seems to assume that lines are 80 characters, max. If the lines are shorter, the comments don't stand out as much visually, and the code takes on a cluttered appearance.

The exception is in things like the [variable] commands in the example above, where things lay out conveniently in a more-or-less tabular form: command, parameter, explanation.

Another case for end-of-line comments is the labeling of [switch] cases, but this doesn't need a semicolon:

 switch { $value } {
     1 {                    # 1 means that we are stressing gnus
         # ...
     }
     2 {                    # 2 means that we are straining at gnats
         # ... 
     }
     default {              # anything else is a bug
         error "can't happen"
     }
 }

(1) But every command, including variable and namespace, is executable in Tcl! - RS KBK (12 October 2000) - I rephrased the offending paragraph. As you probably know, I meant to distinguish commands having declarative intent from those having imperative intent.


It is often necessary to use a semi-colon (and then a comment sign, #) in variable traces so that the name1, name2, op arguments that will be [concat]ed onto the script prior to evaluation are ignored (HaO or list }, example: traceback).

EG Starting from 8.5, you can use apply to avoid this (IMO ugly) use of semicolon. So instead of

trace add variable myVar read {puts "myVar being read"; #}

you write

trace add variable myVar read {apply {args {puts "myVar being read"}}}