Arjen Markus (1 december 2003) I ran into a peculiar problem the other day, when I needed to read data from a string like:
var1 mor.inp 0.0 0.0 10.0 "Some input variable"
that is a string containing simple numbers and strings, but also strings delimited by quotes (or apostrophes).
Such strings can not be read that easily by means of scan or split: there is no format code to force [scan] to skip over whitespace nor is there a way to make [split] recognise that the substring delimited by quotes is to become one single list element.
I tried such solutions as:
scan $line "%s %s %f %f %f %40s" varname filename dvalue \ minvalue maxvalue descr
but invariably the description (variable descr) would be a single word only. The way out was in first instance:
scan $line "%s %s %s %s %s" varname filename dvalue minvalue maxvalue regexp {"(.*)\"$} $line => text
but this is very unsatisfactory: this assumes that the description is always surrounded by quotes and that it is the last part of the line (otherwise the preceeding variables would get the wrong value).
You can get a workaround with formats like:
scan $line "%s %s %f %f %f \"[^"]\"" varname filename dvalue \ minvalue maxvalue descr
but that is still very tricky: if the description is but one word, people might forget about the quotes.
Instead, I have tried to make a more general procedure than the ad hoc solution that works in the same way as [scan] but recognises the delimited strings. Mind you, it is much less general than [scan]:
But it does do the job for me!
AK: Another possible solution: Treat this as a CSV format, using SPACE as the separator character. The tcllib csv routines will then return a list of the element, properly handling the quotes. The elements of the returned list can then be validated in a simple manner.
AM Interesting - I did not know it had such an option.
AK: Well, the csv package allows to use any character as separator (sepChar argument to all commands). And that means that quite a lot of formats can be seen as csv, just with different separator characters.
SMH: The following worked for me if I only use double quotes (TCL 8.4.2, windows)
foreach {varname filename dvalue minvalue maxvalue descr} $line break
AM I considered using that too, but I thought an approach via scan would be more robust ...
TV Not having given the problem more than a quick glance, I'd say that if you have no substitution problem (with $ [ ] { } \} or quote those characters, the example line is interpreted as a list by:
% list var1 mor.inp 0.0 0.0 10.0 "Some input variable" var1 mor.inp 0.0 0.0 10.0 {Some input variable}
which is probably why the foreach works.
For the general quoting problem, you'd have to know how the source handles quoting or quotes themselves, at least. No point in parsing lines with inconsistent, ambiguous, or not defined syntax.
# Procedure read a line with quoted and unquoted strings # proc qscan { data format args } { set data_list [regexp -inline -all {[^ "]+|"[^"]*"|'[^']*'} $data] set format_list [split $format] puts $format puts $format_list foreach form $format_list arg $args datum $data_list { if { $form == "" || $arg == "" } { break } else { puts "$arg - $form - $datum" upvar $arg var if { [string index $datum 0] != "\"" && [string index $datum 0] != "'" } { scan $datum $form var } else { set var [string range $datum 1 end-1] } } } } # # Example of its use: # set line "var1 mor.inp 0.0 0.0 10.0 \"Some input variable\"" qscan $line "%s %s %f %f %f %s" varname filename def min max descr puts "Variable: $varname" puts "Filename: $filename" puts "Default: $def" puts "Minimum: $min" puts "Maximum: $max" puts "Description: $descr"
In a similar vein, I wrote a small package that deals with differences in the representation of decimal numbers: in some countries a comma is used instead of a point to separate the integer part from the fraction. And a period is used to group three digits:
one million and 1 tenth: 1,000,000.1 in English becomes 1.000.000,1 in Dutch or German.
Here is the code. Its use should be clear from the test at the end.
# dformat.tcl -- # Script to handle alternative decimal characters (comma say # instead of dot) on output # # Format -- # Namespace for the two commands # namespace eval ::Format:: { variable decimal "." variable thousands "" namespace export setDecimalChars dformat } # setDecimalChars -- # Set the decimal characters (decimal and thousands separators) # Arguments: # decimal_char Character to use for decimals (default: .) # thousands_char Character to use for thousands (default: nothing) # Result: # Nothing # Side effect: # Private variables set # Note: # To do: simple sanity checks # proc ::Format::setDecimalChars { {decimal_char .} {thousands_char "" } } { variable decimal variable thousands set decimal $decimal_char set thousands $thousands_char } # DecFormat -- # Private routine to handle the decimal characters # Arguments: # format Single numerical format string # value Single numerical value # Result: # Correctly formatting string # proc ::Format::DecFormat { format value } { variable decimal variable thousands set string [format $format $value] set posdot [string first "." $string] if { $posdot >= 0 } { set indices [regexp -inline -indices {([0-9]+)\.} $string] } else { set indices [regexp -inline -indices {([0-9]+)} $string] } foreach {first last} [lindex $indices 1] {break} # puts "$first $last -- $string ($indices)" set prefix [string range $string $first $last] set idx [expr {[string length $prefix]-3}] while { $idx > 0 } { set posidx [string index $prefix $idx] set prefix [string replace $prefix $idx $idx "$thousands$posidx"] incr idx -3 } # # Be careful with these replacements: otherwise interference # may occur # set string [string replace $string $posdot $posdot $decimal] set string [string replace $string $first $last $prefix] return $string } # dformat -- # Format the given variables according to the format string # keeping in mind the current decimal settings # Arguments: # format_string String containing formats # args Values to be formatted # Result: # Formatted string # Note: # No support for %*s and the like # To do: error checks # proc ::Format::dformat { format_string args } { set codes_re {%[^%cdfegsx]*[%cdefgsx]} set codes [regexp -all -inline -indices $codes_re $format_string] regsub -all $codes_re $format_string "%s" new_format set idx 0 set vars {} foreach code $codes { foreach {start stop} $code {break} set substr [string range $format_string $start $stop] # puts "$code -- $substr -- [lindex $args $idx]" if { $substr == "%%" } { lappend vars "%" continue } set value [lindex $args $idx] if { [string first [string index $substr end] "defg"] >= 0 } { set result [DecFormat $substr $value] } else { set result [format $substr $value] } lappend vars $result # puts $vars incr idx } return [eval format [list $new_format] $vars] } # main -- # Testing the routines # #set test 1 if { [file tail $::argv0] == [info script] || $test == 1 } { namespace import -force ::Format::* for { set i 0 } { $i < 2 } { incr i } { puts [dformat %.3f 1000] puts [dformat "A %f %d B" 1000.0 10000] puts [dformat "%d %d %d %d %d" 1 10 100 1000 10000] puts [dformat "%g %g %g %g %g" 1.0 10.0 100.0 1000.0 10000.0] puts [dformat "%g %g %g %g %g" 1.1 10.1 100.1 1000.1 10000.1] setDecimalChars "," "." } }