decomment

AMG: The [decomment] command removes comments from a string. This makes it possible to put comments into source code in locations where comments are normally not permitted.

In addition to standard Tcl comments, [decomment] also supports word (a.k.a. inline) comments. A word comment is a word (most likely a braced word, but that's not required) immediately preceded by {#} without any intervening spaces. Word comments may span multiple lines, and they may appear in the middle of a single line.

Because ; is only significant in a script context and [decomment] operates in a list context, [decomment] does not recognize # as a comment introducer when preceded by ;, whereas it would be in a Tcl script. Thus, [decomment] is not 100% suitable for removing comments from a script, only from a list.

Examples

The final argument to the switch command is a list alternating between patterns and script bodies. Putting a comment into a switch necessarily means the comment has to go inside the script body. However, this is not at all obvious to new Tcl programmers and is a frequent source of confusion. However, the [decomment] command makes it possible anyway:

switch -regexp $input [decomment {
# Handle words starting with a capital letter
^[A-Z] {
    # Do the thing
    theThing
}
# Handle everything else
default {
    # Don't do the thing
}}]

The above is equivalent to typing:

switch -regexp $input {

^[A-Z] {
    # Do the thing
    theThing
}

default {
    # Don't do the thing
}}

Notice that the comments inside the switch script bodies are not removed. This is because they are brace-quoted. The [decomment] command effectively treats its argument as a list where the elements may be interspersed with comments, and it removes the comments. Just because some of the elements may look like they contain comments, does not mean those elements themselves are comments, and they are therefore retained.

Also notice the extra blank lines before each pattern/script pair. This occurs because [decomment] does not actually do list processing, only text processing, and its output is a string, not a pure list.

For another example, see [L1 ].

Here is an example of [decomment]'s support for word (a.k.a. inline) comments:

% decomment {
a {#}{this is a comment} b {#}"this is also a comment" c {#}this\ is\ a\ comment d
e {#}{this is a comment
that spans
multiple lines} f}

a  b  c  d
e  f

The extra spaces between output elements are there because [decomment] doesn't do list processing and simply removes the comment text, which happens to be surrounded on either sides by spaces.

Code

# decomment --
# Removes all comments from a list-formatted string.  A comment starts with "#"
# appearing anywhere a word can begin, and it continues until the next newline.
# Comments may be extended across multiple lines by preceding the newline with
# an odd number of backslashes.  A comment may also be a word preceded by "{#}"
# with no intervening whitespace, for example "{#}{this is a comment}".
proc decomment {str} {
    # Find the start of each word in the string.
    set start 0
    while {[regexp -indices -start $start {[^ \f\n\r\t\v]} $str match]} {
        # Check if this is a word, as opposed to a line comment.
        set start [lindex $match 0]
        if {[set comment [regexp -start $start\
                {\A\{#\}[^ \f\n\r\t\v#]} $str]]} {
            incr start 3
        }

        # Find the end of the word.
        switch [string index $str $start] {
        \" {
            # Quoted word: find the close quote.
            if {![regexp -indices -start $start\
                    {\"(?:[^\"\\]+|\\.)*\"} $str end]} {
                return -code error "unmatched open quote in list"
            }
            if {[regexp -start [expr {[lindex $end 1] + 1}]\
                    {\A[^ \f\n\r\t\v]+} $str bad]} {
                return -code error "list element in quotes followed by\
                        \"$bad\" instead of space"
            }
        } \{ {
            # Braced word: find the close brace.
            set level 1
            incr start
            while {1} {
                if {![regexp -indices -start $start {[{}]} $str end]} {
                    return -code error "unmatched open brace in list"
                }
                set start [expr {[lindex $end 1] + 1}]
                if {[string index $str [lindex $end 1]] eq "\{"} {
                    incr level
                } elseif {$level > 1} {
                    incr level -1
                } elseif {[regexp -start $start {\A[^ \f\n\r\t\v]+} $str bad]} {
                    return -code error "list element in braces followed by\
                            \"$bad\" instead of space"
                } else {
                    break
                }
            }
        } # {
            # Comment word: find the end of line.
            regexp -indices -start $start {(?:[^\n\\]+|\\.|\\$)*} $str end
            set comment 1
        } default {
            # Bare word: find the next whitespace.
            regexp -indices -start $start\
                    {(?:[^ \f\n\r\t\v\\]+|\\.|\\$)*} $str end
        }}

        if {$comment} {
            # Remove comments.
            set start [lindex $match 0]
            set str [string replace $str $start [lindex $end 1]]
        } else {
            # Advance past non-comments.
            set start [expr {[lindex $end 1] + 1}]
        }
    }

    # Return the string with all comments removed.
    return $str
}

aplsimple - 2019-02-14 09:32:32

You might replace "canonical switch" by using the decomment.

For example, this way:

proc sw {args} {
  set args "[lrange $args 0 end-1] \{[decomment [lindex $args end]]\}"
  uplevel 1 [list switch {*}$args]
}

proc theThing {} {
  puts "theThing runs: [incr ::theThingCount]"
}

set input "Removing comments huh"

# --------------- Canonical switch
switch -regexp $input {
^[A-Z] {
    # Do the thing
    theThing
}
default {
    # Don't do the thing
    puts Default
}}

# --------------- Decommented switch
switch -regexp $input [decomment {
# Handle words starting with a capital letter
^[A-Z] {
    # Do the thing
    theThing
}
# Handle everything else
default {
    # Don't do the thing
}}]

# --------------- New switch
sw -regexp $input {
# Handle words starting with a capital letter
^[A-Z] {
    # Do the thing
    theThing
}
# Handle everything else
default {
    # Don't do the thing
}}

By the way, this example of 'sw proc' demonstrates the harmfulness of:

 return [uplevel 1 [CODE]]

as this construct (falsely) assumes that the CODE is run in the caller's context. However,the CODE is executed in the sw's context before being upleveled! With this construct, the particular CODE above would return "" (after theThing), so uplevel would evaluate nothing and 'return' would get "" to return. Uff.

So, the only working constructs are:

  uplevel 1 "switch {*}$args"       ;# working but is not so good, due to the string interpolation
  uplevel 1 [list switch {*}$args]  ;# good

The demo of the difference:

proc sw {args} {
  set args "[lrange $args 0 end-1] \{[decomment [lindex $args end]]\}"
#  return [uplevel 1 [switch {*}$args]]      ;# not working
#  uplevel 1 "switch {*}$args"       ;# working but is not so good, due to the string interpolation
  uplevel 1 [list switch {*}$args]  ;# good
}
set err 1-2-3
sw -- $err {
# 1st
# 1st continued
    1 {
      puts {err found}
    }
#second comment
# 1
# 1 2
    1-2 {
      puts {err 1-2 found}
    }
# 1 2 3 default
    default {
      puts "Program encountered code $err"
      set err --???--
    }
}
puts $err

We might rename 'switch' to SWITCH, then modify 'decomment' and 'sw' sort of:

proc decomment {str {switch switch}} {
...
  $switch [string index $str $start] {
...
rename switch SWITCH
proc switch {args} {
   set args "[lrange $args 0 end-1] \{[decomment [lindex $args end] SWITCH]\}"
   uplevel 1 [list SWITCH {*}$args]
 }

...and then we might use the switch with the new inside comments.

Though, to be right, all this beauty isn't so efficient as it could be.

For example:

puts time1=[time {
  switch 1 {
    1 {
      # some innocent
      set a 1
    }
    default {
      # reachless
    }
  }
} 10000]
puts time2=[time {
  sw 1 {
    1 {
      # some innocent
      set a 1
    }
    default {
      # reachless
    }
  }
} 10000]

puts out like that:

time1=0.2361 microseconds per iteration
time2=43.954 microseconds per iteration

(after renaming switch not better)

i.e. the new 'switch' is 150-200 times less efficient than 'canonical switch'. Another lament about 'why Tcl isn't good for calculations'.

AMG: There are several reasons why [decomment] will be slow wherever it is used.

For one, Tcl scripts are not very good at parsing Tcl. That is a task done much better by C, but the innards of the Tcl parser are not sufficiently exposed for a script to be able to ask for the character indexes corresponding to word boundaries, particularly not when the string being parsed is not itself a well-formed Tcl list. Thus, I had to implement a new parser myself, in Tcl, which is far slower than C for this type of task.

For another, the ability to define Tcl_Obj types is only available at the C level, not at the script level. Therefore, [decomment] cannot cache its output the same way the list commands can. The list commands store the output of the parser directly in the Tcl_Obj value, providing a tremendous performance boost as well as automatic and effective lifecycle management. The best [decomment] can do is memoizing, except that has the potential for unbounded caching of single-use values that can never be cleaned up.

One of the reasons I wrote [decomment] is to let people experiment with {#}{...} syntax and see if it is useful in practice and is worth integrating into the Tcl core. It may be convenient enough to make it into code for which performance is not an issue, such as one-time initialization. But that's about it. This exists more for research than practical purposes.

See also

scripted list
Can be used to the same effect.