Version 11 of Splitting strings with embedded strings

Updated 2012-11-22 16:43:42 by pooryorick

Question

Richard Suchenwirth 2001-05-31 - Robin Lauren <[email protected]> wrote in the comp.lang.tcl newsgroup:

I want to split an argument which contains spaces within quotes into proper name=value pairs. But I can't :)

Consider this example:

set tag {body type="text/plain" title="This is my body"} 
set element [lindex $tag 0]
set attributes [lrange $tag 1 end] ;# *BZZT!* Wrong answer!

My attributes becomes the list {type="text/plain"} {title="This} {is} {my} {body"} (perhaps even with the optional backslash before the quotes), which isn't really what i had in mind.

Answer

To a human, $tag intuitively looks like a list of three items, but according to Tcl list syntax, it has 6 items, and the second item, for example, contains two literal quotes. One solution would be to see $tag as a string, and split it on the double-quote characters into a list. Each item in the list that ends in "=" is now followed by an item that is the contents of what was in double quotes in the original string. The solution below assumes that quoted items only begin after an "=" character, and that there is no way within double quotes to escape another double quote:

set result {}
foreach {out in} [split $tag {"}] {

    #catch the case where quote is the last character
    if {$out eq {}} break

    foreach i $out {
        if [regexp =$ $i] {
            set i [list [string range $i 0 end-1] $in]
        }
        lappend result $i
    } 
}
% set result
body {type text/plain} {title {This is my body}}
set attributes [lrange $result 1 end]
{type text/plain} {title {This is my body}}

Another solution uses [regexp], with the same caveats as the previous one:

set matches [regexp -all -inline {([^ =]+)=(\S+|"[^"]+")} $tag]
set result [list]
foreach {dummy key value} $matches {
    set value [string trim $value {"}]
    lappend result $key $value
}
%puts $result
{type text/plain} {title {This is my body}}

That will also match single-word vars which aren't in quotes (short=foo long="foo bar"). It currently dies on empty strings (short= or long="") but just replace the +'s with *'s to make those acceptable.

See Also