Percent Substitution

I've recently been working on some code to do "percent substitution". What I mean by this is that I want a procedure that, given a string and an array of what different letters mean, will substitute the appropriate values into the string and return the new string. For example:

> set sarray(n) "Robert Seeger"
> set sarray(t) [clock format [clock seconds] -format "%H:%M:%S"]
> percent_subst "Hello %n, the time is %t"
Hello Robert Seeger, the time is 05:59:46

The code I have written allows two flags:

-subst-char char
Specifies the substitution character (defaults to %). Must be a single character.
--
Specifies that all other items are arguments, not flags

The flags can be anywhere on the line, but putting them first makes the command more readable, and it is assumed this is where they will be placed in the rest of this page.

In addition, the command takes two arguments. The first is the string to operate on. The second is the name of an array that contains the information for replacement. The sample code at the beginning of this page shows how this works. If a replacement value for the actual substitution character is not specified, it is assumed that any occurrence of two of them in a row is to be replaced by one of them, as if the array had been specified with a line such as:

> set sarray(%) "%"
''This assumes the substitution character is a %''

The code I have written is shown below:

proc percent_subst {args} {
    set substChar %

    set flagsDone 0
    set realArgs {}

    while { [llength $args] } {
        set current [lindex $args 0]
        switch -glob -- $current {
            -- {
                if { $flagsDone } {
                    return -code error "Cannot specify -- flag more than once"
                }
                set flagsDone 1
            }
            -subst-char {
                if { $flagsDone } {
                    lappend realArgs $current
                } else {
                    if { [llength $args] < 2 } {
                        return -code error "No value provided for flag -subst-char"
                    }
                    set args [lrange $args 1 end]
                    set substChar [lindex $args 0]
                    if { [string length $substChar] != 1 } {
                        return -code error "Value for flag -subst-char must be a \
                                single character"
                    }
                }
            }
            -* {
                if { $flagsDone } {
                    lappend realArgs $current
                } else {
                    return -code error "Invalid flag $current"
                }
            }
            default {
                lappend realArgs $current
            }
        }
        set args [lrange $args 1 end]
    }

    if { [llength $realArgs] != 2 } {
        return -code error "Invalid number of args to percent_subst"
    }

    set text [lindex $realArgs 0]
    set arrayName [lindex $realArgs 1]

    upvar 1 $arrayName oldSubstArray

    # Copy the array into a new array, so it can be changed if need be
    foreach elem [array names oldSubstArray] {
        set substArray($elem) $oldSubstArray($elem)
    }

    # If the subst char is not in the array, make sure we replace a
    # double occurrence of it with itself
    if { ![info exists substArray($substChar)] } {
        set substArray($substChar) $substChar
    }

    set retval ""

    while { [string length $text] > 0 } {
        set char [string index $text 0]
        if { ![string compare $substChar $char] } {
            set text [string range $text 1 end]
            if { [string length $text] } {
                set whichChar [string index $text 0]
                if { ![info exists substArray($whichChar)] } {
                    return -code error "Substitution for ${substChar}${whichChar} \
                            not defined"
                }
                append retval $substArray($whichChar)
            } else {
                append retval $substChar
            }
        } else {
            append retval $char
        }

        set text [string range $text 1 end]
    }

    return $retval
}

# If this file is run as the actual executable, run these tests
if { [info exists argv0] &&
    ![string compare [file tail [info script]] [file tail $argv0]] } {

    set sarray(n) myname
    set sarray(t) [clock format [clock seconds] -format %H:%M:%S]
    set sarray(w) window
    set sarray(p) -%-

    lappend commands { 
        # Make sure the generic sample works
        puts [percent_subst "hello there %n" sarray] 
    }
    lappend commands { 
        # Make sure that a double % is replaced by a single %
        puts [percent_subst "hello %phere %n, %%when are you?" sarray] 
    }
    lappend commands { 
        # Make sure that the -subst-char flag works
        puts [percent_subst -subst-char : "hello :phere :n, ::when are you?" sarray] 
    }
    lappend commands { 
        # Make sure that a percent at the end of the string is just left there
        puts [percent_subst "%t hello there %n%" sarray] 
    }
    lappend commands { 
        # Make sure that an invalid flag throws an error
        puts [percent_subst -aflag invalid "a string" sarray] 
    }
    lappend commands { 
        # Make sure that putting in two -- flags, together, throws an error
        puts [percent_subst -subst-char : -- -- "a string" sarray] 
    }
    lappend commands { 
        # Make sure that putting in two -- flags, not together, throws an error
        puts [percent_subst -subst-char : -- "a string" -- sarray] 
    }
    lappend commands { 
        # Make sure putting the -subst-char in the middle works
        puts [percent_subst "a string" -subst-char : sarray] 
    }
    lappend commands { 
        # Make sure putting the -subst-char last works
        puts [percent_subst "a string" sarray -subst-char :] 
    }
    lappend commands { 
        # Check to make sure that a -subst-char that is not a single character
        # throws an error
        puts [percent_subst "a string" sarray -subst-char ::] 
    }
    lappend commands { 
        # This one specifies a null array, meaning just replace occurances of
        # %% with %
        puts [percent_subst "a %%str%%ing" null] 
    }

    foreach elem $commands {
        if { [catch { eval $elem } retval] } {
            puts "Could not run command: $retval"
        }
    }

}

Feel free to make comments, suggestions, etc to mailto:[email protected]


RS: Interesting problem, and solution. I like the self-test condition at the end (always envied the Pythoners for that, but was too lazy to find out how to do it in Tcl ;-) A problem I see is that the semantics of % is again overloaded - differing from Tk bindings where %W would give you the widget pathname, and especially from format, where e.g. %s may appear several times in one format string, standing for different strings. But doesn't the following "dollar substitution" reach the same effect without reinventing:

 set n "Robert Seeger"
 set t [clock format [clock seconds] -format "%H:%M:%S"]
 puts "Hello $n, the time is $t"

DKF: One possibility would be if you don't want to have to (necessarily) fill in a value for each of the substitutions you might want to perform unless it is really necessary, or you might prefer to only ever permit some substitutions and not for any arbitrary variable. But the above implementation doesn't help with the first case (since it copies the array, whose name isn't configurable either) and the second can be handled more neatly with the use of subst -nocommand in a suitably-set-up safe interpreter. I suppose it really depends on whether you like %-substing, or Tcl-style substing...

(And no RS, the semantics of the above code *does* match that used in handling Tk bindings; you just need to set the variables up suitably to start with. It is format and scan that are the odd ones out.)

RS: Right. I learned % entities first from printf, and thus considered that to be the standard ;-)

RHS: I've thought long and hard about why I like the idea of a percent substitution function, as RS has some valid points about using $ substitution instead. In the end, I think a good example of why I like it is shown in RS's first comment, with the source code he listed. If I was to write a command with a format option similar to that provided by the clock command, I would want to use percent substitution in the format string, since that is what I, as a user, expect to use for such things. I guess the main point is that the percent_subst is intended to be used, not directly by the programmer in application code, but in code libraries, where constructs like a format string are necessary.

CMcC: Can't you just use string map for this? Something like [string map [array get sarray] $text] and then require $sarray to have %t etc for names? Snit is implemented using that kind of %-subst.