Version 3 of decomment

Updated 2019-02-12 18:26:12 by AMG

AMG: The [decomment] command removes comments from a string. This makes it possible to put comments into source code in locations where comments are normally not permitted.

Examples

The final argument to the switch command is a list alternating between patterns and script bodies. Putting a comment into a switch necessarily means the comment has to go inside the script body. However, this is not at all obvious to new Tcl programmers and is a frequent source of confusion. However, the [decomment] command makes it possible anyway:

switch -regexp $input [decomment {
# Handle words starting with a capital letter
^[A-Z] {
    # Do the thing
    theThing
}
# Handle everything else
default {
    # Don't do the thing
}}]

The above is equivalent to typing:

switch -regexp $input {

^[A-Z] {
    # Do the thing
    theThing
}

default {
    # Don't do the thing
}}

Notice that the comments inside the switch script bodies are not removed. This is because they are brace-quoted. The [decomment] command effectively treats its argument as a list where the elements may be interspersed with comments, and it removes the comments. Just because some of the elements may look like they contain comments, does not mean those elements themselves are comments, and they are therefore retained.

Also notice the extra blank lines before each pattern/script pair. This occurs because [decomment] does not actually do list processing, only text processing, and its output is a string, not a pure list.

For another example, see [L1 ]. (When it updates, that is. As of 12 Feb 2019, the Tcler's Wiki is plagued with stale cached pages. Try this link if the last one doesn't work: [L2 ].)

Code

# decomment --
# Removes all comments from a list-formatted string.  A comment starts with "#"
# appearing anywhere a word can begin, and it continues until the next newline.
# Comments may be extended across multiple lines by preceding the newline with
# an odd number of backslashes.
proc decomment {str} {
    # Find the start of each word in the string.
    set start 0
    while {[regexp -indices -start $start {[^ \f\n\r\t\v]} $str start]} {
        set start [lindex $start 0]
        switch [string index $str $start] {
        \" {
            # Quoted word: find the close quote.
            if {![regexp -indices -start $start\
                    {\"(?:[^\"\\]+|\\.)*\"} $str end]} {
                return -code error "unmatched open quote in list"
            }
            set start [expr {[lindex $end 1] + 1}]
            if {[regexp -start $start {\A[^ \f\n\r\t\v]+} $str bad]} {
                return -code error "list element in quotes followed by\
                        \"$bad\" instead of space"
            }
        } \{ {
            # Braced word: find the close brace.
            set level 1
            incr start
            while {1} {
                if {![regexp -indices -start $start {[{}]} $str end]} {
                    return -code error "unmatched open brace in list"
                }
                set end [lindex $end 0]
                set start [expr {$end + 1}]
                if {[string index $str $end] eq "\{"} {
                    incr level
                } elseif {$level > 1} {
                    incr level -1
                } elseif {[regexp -start $start {\A[^ \f\n\r\t\v]+} $str bad]} {
                    return -code error "list element in braces followed by\
                            \"$bad\" instead of space"
                } else {
                    break
                }
            }
        } # {
            # Comment word: remove until (but not including) the end of line.
            regexp -indices -start $start {(?:[^\n\\]+|\\.|\\$)*} $str end
            set str [string replace $str $start [lindex $end 1]]
            incr start
        } default {
            # Bare word: find the next whitespace.
            regexp -indices -start $start\
                    {(?:[^ \f\n\r\t\v\\]+|\\.|\\$)*} $str end
            set start [expr {[lindex $end 1] + 1}]
        }}
    }

    # Return the string with all comments removed.
    return $str
}