Version 10 of decomment

Updated 2019-02-14 16:59:13 by aplsimple

AMG: The [decomment] command removes comments from a string. This makes it possible to put comments into source code in locations where comments are normally not permitted.

See Also

scripted list
Can be used to the same effect.

Description

In addition to standard Tcl comments, [decomment] also supports word (a.k.a. inline) comments. A word comment is a word (most likely a braced word, but that's not required) immediately preceded by {#} without any intervening spaces. Word comments may span multiple lines, and they may appear in the middle of a single line.

Because ; is only significant in a script context and [decomment] operates in a list context, [decomment] does not recognize # as a comment introducer when preceded by ;, whereas it would be in a Tcl script. Thus, [decomment] is not 100% suitable for removing comments from a script, only from a list.

Examples

The final argument to the switch command is a list alternating between patterns and script bodies. Putting a comment into a switch necessarily means the comment has to go inside the script body. However, this is not at all obvious to new Tcl programmers and is a frequent source of confusion. However, the [decomment] command makes it possible anyway:

switch -regexp $input [decomment {
# Handle words starting with a capital letter
^[A-Z] {
    # Do the thing
    theThing
}
# Handle everything else
default {
    # Don't do the thing
}}]

The above is equivalent to typing:

switch -regexp $input {

^[A-Z] {
    # Do the thing
    theThing
}

default {
    # Don't do the thing
}}

Notice that the comments inside the switch script bodies are not removed. This is because they are brace-quoted. The [decomment] command effectively treats its argument as a list where the elements may be interspersed with comments, and it removes the comments. Just because some of the elements may look like they contain comments, does not mean those elements themselves are comments, and they are therefore retained.

Also notice the extra blank lines before each pattern/script pair. This occurs because [decomment] does not actually do list processing, only text processing, and its output is a string, not a pure list.

For another example, see [L1 ]. (When it updates, that is. As of 12 Feb 2019, the Tcler's Wiki is plagued with stale cached pages. Try this link if the last one doesn't work: [L2 ].)

Here is an example of [decomment]'s support for word (a.k.a. inline) comments:

% decomment {
a {#}{this is a comment} b {#}"this is also a comment" c {#}this\ is\ a\ comment d
e {#}{this is a comment
that spans
multiple lines} f}

a  b  c  d
e  f

The extra spaces between output elements are there because [decomment] doesn't do list processing and simply removes the comment text, which happens to be surrounded on either sides by spaces.

Code

# decomment --
# Removes all comments from a list-formatted string.  A comment starts with "#"
# appearing anywhere a word can begin, and it continues until the next newline.
# Comments may be extended across multiple lines by preceding the newline with
# an odd number of backslashes.  A comment may also be a word preceded by "{#}"
# with no intervening whitespace, for example "{#}{this is a comment}".
proc decomment {str} {
    # Find the start of each word in the string.
    set start 0
    while {[regexp -indices -start $start {[^ \f\n\r\t\v]} $str match]} {
        # Check if this is a word, as opposed to a line comment.
        set start [lindex $match 0]
        if {[set comment [regexp -start $start\
                {\A\{#\}[^ \f\n\r\t\v#]} $str]]} {
            incr start 3
        }

        # Find the end of the word.
        switch [string index $str $start] {
        \" {
            # Quoted word: find the close quote.
            if {![regexp -indices -start $start\
                    {\"(?:[^\"\\]+|\\.)*\"} $str end]} {
                return -code error "unmatched open quote in list"
            }
            if {[regexp -start [expr {[lindex $end 1] + 1}]\
                    {\A[^ \f\n\r\t\v]+} $str bad]} {
                return -code error "list element in quotes followed by\
                        \"$bad\" instead of space"
            }
        } \{ {
            # Braced word: find the close brace.
            set level 1
            incr start
            while {1} {
                if {![regexp -indices -start $start {[{}]} $str end]} {
                    return -code error "unmatched open brace in list"
                }
                set start [expr {[lindex $end 1] + 1}]
                if {[string index $str [lindex $end 1]] eq "\{"} {
                    incr level
                } elseif {$level > 1} {
                    incr level -1
                } elseif {[regexp -start $start {\A[^ \f\n\r\t\v]+} $str bad]} {
                    return -code error "list element in braces followed by\
                            \"$bad\" instead of space"
                } else {
                    break
                }
            }
        } # {
            # Comment word: find the end of line.
            regexp -indices -start $start {(?:[^\n\\]+|\\.|\\$)*} $str end
            set comment 1
        } default {
            # Bare word: find the next whitespace.
            regexp -indices -start $start\
                    {(?:[^ \f\n\r\t\v\\]+|\\.|\\$)*} $str end
        }}

        if {$comment} {
            # Remove comments.
            set start [lindex $match 0]
            set str [string replace $str $start [lindex $end 1]]
        } else {
            # Advance past non-comments.
            set start [expr {[lindex $end 1] + 1}]
        }
    }

    # Return the string with all comments removed.
    return $str
}

aplsimple - 2019-02-14 09:32:32

You might replace "canonical switch" by using the decomment.

For example, this way:

proc sw {args} {
  set args "[lrange $args 0 end-1] \{[decomment [lindex $args end]]\}"
  uplevel 1 [list switch {*}$args]
}

proc theThing {} {
  puts "theThing runs: [incr ::theThingCount]"
}

set input "Removing comments huh"

# --------------- Canonical switch
switch -regexp $input {
^[A-Z] {
    # Do the thing
    theThing
}
default {
    # Don't do the thing
    puts Default
}}

# --------------- Decommented switch
switch -regexp $input [decomment {
# Handle words starting with a capital letter
^[A-Z] {
    # Do the thing
    theThing
}
# Handle everything else
default {
    # Don't do the thing
}}]

# --------------- New switch
sw -regexp $input {
# Handle words starting with a capital letter
^[A-Z] {
    # Do the thing
    theThing
}
# Handle everything else
default {
    # Don't do the thing
}}

By the way, this example of 'sw proc' demonstrates the harmfulness of:

 return [uplevel 1 [CODE]]

as this construct (falsely) assumes that the returned value is gotten and the CODE is run in the caller's context. However,the CODE is executed in the sw's context!

So, the only working constructs are:

  uplevel 1 "switch {*}$args"       ;# working but is not so good, due to the string interpolation
  uplevel 1 [list switch {*}$args]  ;# good

The demo of the difference:

proc sw {args} {
  set args "[lrange $args 0 end-1] \{[decomment [lindex $args end]]\}"
#  return [uplevel 1 [switch {*}$args]]      ;# not working
#  uplevel 1 "switch {*}$args"       ;# working but is not so good, due to the string interpolation
  uplevel 1 [list switch {*}$args]  ;# good
}
set err 1-2-3
sw -- $err {
# 1st
# 1st continued
    1 {
      puts {err found}
    }
#second comment
# 1
# 1 2
    1-2 {
      puts {err 1-2 found}
    }
# 1 2 3 default
    default {
     puts "Program encountered code $err"
     set err --???--
    }
}
puts $err