Version 18 of scan

Updated 2007-11-20 09:22:30 by jdc

See http://www.purl.org/tcl/home/man/tcl8.4/TclCmd/scan.htm for the formal man page for the Tcl command scan.

 scan $possiblyZeroedDecimal %d cleanInteger

Context: Leading zeroes induce octal parsing in the expr command and in Context: Leading zeroes induce octal parsing in the expr command and in Use of scan to remove leading zeros can avoid this often-reported problem.


display and numeric format for ASCII characters. Scan provides the usual answer:

    foreach character {a A b B c} {
        scan $character %c numeric
        puts "ASCII character '$numeric' displays as '$character'."
    }

    ASCII character '97' displays as 'a'.
    ASCII character '65' displays as 'A'.
    ASCII character '98' displays as 'b'.
    ASCII character '66' displays as 'B'.
    ASCII character '99' displays as 'c'.        

In more recent Tcl's (8.3 or so up) you also can call scan inline:

 set numeric [scan $character %c]

And the %c format is not limited to ASCII, but can handle any Unicode. (RS)

LV Can someone explain what this means - calling scan inline? Where is scan getting its input in this case? From stdin?

Unmatched Conversion Specifiers

Lars H: It refers to the result of scan, not its input. Rather than setting some variables, it will return a list of the parsed values. (RS's example is rather ugly, since it really sets $numeric to a list with one element, but as quick character code look-up at a prompt, a scan char %c is hard to beat.) Alastair Davies/PYK: When scan returns a list of matching values, it includes an empty string for each format specifier that wasn't matched. If there is only one format specifier, a list with one blank element is returned, and this is not the same thing as and empty string. For example, scan foo %d returns {}, which not an empty string, but a literal open-brace character followed by a literal close-brace character. The documentation states: Alastair Davies: When using in-line scan, note that the list that is returned will contain blank elements for format specifiers that are not satisfied. If there is only one format specifier, a list with one blank element is returned, and note especially that this is not the same as an empty string. For example, scan foo %d will return {}. Now, noting the statement on the man page:

 In the inline case, an empty string is returned when the end of the input string 
 is reached before any conversions have been performed. 

you might think that scan foo %d would return an empty string. But you might think that scan foo %d would return an empty string. But Don Porter carefully explains this special case (quoting from c.l.t.) Back to the example:

 % scan foo %d
 {}

We're going to scan the string "foo", trying to parse a decimal integer from it, and when we're done, we're going to return the results as an inline list, since we provided no variables for field value storage.

While scanning, we see "f", and "f" can't be part of a decimal integer. In the non inline case, we would assign no value to the variable associated with this field. In the inline format, an empty string is stored in the corresponding list location to represent that situation.

Now we've exhausted the format spec string. But note that we haven't reached "the end of the input string." We're still sitting at the "f". So the "underflow" case where we run out of input before we run out of format spec does not apply, and we should not expect scan to return the empty string.

To see the underflow case in action, consider these examples:

 % scan a ab
 % scan {} %d
 % scan {   } %d
 %

In all these cases we run out of string to parse before we run out of format spec string to guide our parsing. There are more cases where this happens, but they're difficult to construct. Consult the sources and the test suite for details and more examples.

Part of the reason this does tend to be confusing is that the detection of the "underflow" case and returning the special magic empty value when it is detected is just about 100% worthless in Tcl. It's a feature of sscanf() in C that was apparently slavishly copied over without full consideration of whether it had any copied over with full consideration of whether it had any features from Tcl's earliest days. Octal, anyone?)

scan's rules are arcane and weird, but they are what they are. scan's rules are arcane and weird, but they are what they are.


% scan 12a34 {%d%[abc]%d} x - y
3
% list $x $y
12 34

Also see "Dump a file in hex and ASCII", as well as "u2x" in the "Bag of algorithms".

Binary has a scan which complements use of this scan.

Effect on the Internal Representation of a Value


Tcl syntax help - Arts and crafts of Tcl-Tk programming