Error processing request

Parameters

CONTENT_LENGTH0
REQUEST_METHODGET
REQUEST_URI/revision/scan?V=30
QUERY_STRINGV=30
CONTENT_TYPE
DOCUMENT_URI/revision/scan
DOCUMENT_ROOT/var/www/nikit/nikit/nginx/../docroot
SCGI1
SERVER_PROTOCOLHTTP/1.1
HTTPSon
REMOTE_ADDR172.70.127.135
REMOTE_PORT22782
SERVER_PORT4443
SERVER_NAMEwiki.tcl-lang.org
HTTP_HOSTwiki.tcl-lang.org
HTTP_CONNECTIONKeep-Alive
HTTP_ACCEPT_ENCODINGgzip, br
HTTP_X_FORWARDED_FOR3.144.42.196
HTTP_CF_RAY87731ffc19442a12-ORD
HTTP_X_FORWARDED_PROTOhttps
HTTP_CF_VISITOR{"scheme":"https"}
HTTP_ACCEPT*/*
HTTP_USER_AGENTMozilla/5.0 AppleWebKit/537.36 (KHTML, like Gecko; compatible; ClaudeBot/1.0; [email protected])
HTTP_CF_CONNECTING_IP3.144.42.196
HTTP_CDN_LOOPcloudflare
HTTP_CF_IPCOUNTRYUS

Body


Error

Unknow state transition: LINE -> END

-code

1

-level

0

-errorstack

INNER {returnImm {Unknow state transition: LINE -> END} {}} CALL {my render_wikit scan {See
http://www.tcl.tk/man/tcl/TclCmd/scan.htm 
for the formal man page for the [Tcl] command scan.



If ''varName''s are supplied, will store values parsed (from ''string'' according to ''format'') in the named variables and return the number of variables written to. If no ''varName''s are present, returns the list of values parsed.

<<discussion>>
** Description **
 scan $possiblyZeroedDecimal %d cleanInteger
Context: Leading zeroes induce octal parsing in the [expr] command and in 
if and similar commands that invoke expr. 
Use of scan to remove leading zeros can avoid this often-reported problem.


----
display and numeric format for ASCII characters.  
Scan provides the usual answer:

    foreach character {a A b B c} {
        scan $character %c numeric
        puts "ASCII character '$numeric' displays as '$character'."
    }

    ASCII character '97' displays as 'a'.
    ASCII character '65' displays as 'A'.
    ASCII character '98' displays as 'b'.
    ASCII character '66' displays as 'B'.
    ASCII character '99' displays as 'c'.
======none
----
In more recent Tcl's (8.3 or so up) you also can call [scan] inline:
 set numeric [scan $character %c]
And the %c format is not limited to ASCII, but can handle any Unicode. ([RS])

[LV] Can someone explain what this means - calling scan inline?  
Where is scan getting its input in this case?  From [stdin]?
** Unmatched Conversion Specifiers **
[Lars H]: It refers to the result of scan, not its input. 
Rather than setting some variables, it will return a list of the parsed values. 
(RS's example is rather ugly, since it really sets $numeric to a list with one element, 
but as quick character code look-up at a prompt, a '''scan''' ''char'' '''%c''' is hard to beat.)
[Alastair Davies]/[PYK]: When `scan` returns a list of matching values, it includes an empty string for each format specifier that wasn't matched. If there is only one format specifier, a list with one blank element is returned, and this is not the same thing as and empty string. For example, '''scan foo %d''' returns `{}`, which not an empty string, but a literal open-brace character followed by a literal close-brace character.   The documentation states:
[Alastair Davies]: When using in-line scan, note that the list that is returned will contain blank elements for format specifiers that are not satisfied.  If there is only one format specifier, a list with one blank element is returned, and note especially that this is not the same as an empty string.  For example, '''scan foo %d''' will return {}.   Now, noting the statement on the man page:
 In the inline case, an empty string is returned when the end of the input string 
 is reached before any conversions have been performed. 

you might think that '''`scan foo %d`''' would return an empty string.  But
you might think that '''scan foo %d''' would return an empty string.  But [Don Porter] carefully explains this special case (quoting from [c.l.t.])
Back to the example:

 % scan foo %d
 {}

We're going to scan the string "foo", trying to parse a decimal integer
from it, and when we're done, we're going to return the results as an
inline list, since we provided no variables for field value storage.

While scanning, we see "f", and "f" can't be part of a decimal integer.
In the non inline case, we would assign no value to the variable associated 
with this field.  In the inline format, an empty string
is stored in the corresponding list location to represent that situation.

Now we've exhausted the format spec string.  But note that we haven't
reached "the end of the input string."  We're still sitting at
the "f".  So the "underflow" case where we run out of input before
we run out of format spec does not apply, and we should not expect
[scan] to return the empty string.

To see the underflow case in action, consider these examples:

 % scan a ab
 % scan {} %d
 % scan {   } %d
 %

In all these cases we run out of string to parse before we run out of
format spec string to guide our parsing.  There are more cases where
this happens, but they're difficult to construct.  Consult the sources
and the test suite for details and more examples.

Part of the reason this does tend to be confusing is that the
detection of the "underflow" case and returning the special magic
empty value when it is detected is just about 100% worthless in Tcl.
It's a feature of sscanf() in C that was apparently slavishly
copied over without full consideration of whether it had any
value in Tcl.  (A common problem in built-in commands and
features from Tcl's earliest days.  Octal, anyone?)

[scan]'s rules are arcane and weird, but they are what they are.


----

======
% scan 12a34 {%d%[abc]%d} x - y
3
% list $x $y
12 34
======

Notice that the above scan's format string says "some decimal number, followed by either an '''a''', or a '''b''' or a '''c''', followed by another decimal number". Then, the last 3 arguments are x, the name of the variable to be assigned the first decimal number, - , the name of the variable to be assigned the letter, and y, the name of the second decimal number. The name `-` was chosen with the intent to convey "oh, this value isn't going to be used".


----
Also see "[Dump a file in hex and ASCII]", as well as
"u2x" in the "[Bag of algorithms]".

[Binary] has a scan which complements use of this scan.
** Effect on the Internal Representation of a Value **
----

======
% set var [expr 123]; list
% tcl::unsupported::representation $var
value is a int with a refcount of 2, object pointer at 0317BDF8,
internal representation 0000007B:0317BD68, no string representation
% scan $var %d; list
% tcl::unsupported::representation $var
value is a int with a refcount of 2, object pointer at 0317BDF8,
internal representation 0000007B:0317BD68, string representation "123"
======

I'd think the single numeric conversions could be optimized to recognize when the input already has numeric representation, then simply return the input.


----

Perhaps [[scan]] could be given a special option to get its data from a channel rather than a string, then the C-level guts of the [[scan]] and file I/O implementations can work everything out much more efficiently and correctly than can be done at the Tcl level.

[Netstrings] are one scenario where this would be helpful.  You need to read a length prefix (not knowing in advance how many digits form the length), then confirm the next character is colon (for framing), then read the data (length in bytes indicated by the prefix), then confirm the final character is comma (for framing).

In C, you just do `scanf("%d", &len)` to get the length prefix.  But in Tcl, you have to [[[read]]] one character at a time until you find a non-digit.  That's a lot of looping and script which would be much better handled at the C level.

(Pedantic aside: the scanf() approach isn't technically correct because it allows negative lengths and leading zeroes, but you can check for negatives and can forgive leading zeroes.)

(And a question: with `scanf("%d:", &len)`, how can you tell whether or not the colon was found in the input?  I think the answer is to use `%n`.  Trouble with that is inconsistency in whether successful `scanf("%d:%n", &len, &pos)` returns 1 or 2.  So set pos to 0 first then check if it changed.  What a mess.)

Let's say you have all the data read in advance, e.g. it came from a [UDP] datagram, thereby making Tcl [[scan]] suitable for reading the length prefix.  `[[scan $datagram %d len]]` works, but how do you know how many digits were consumed?  Guess you'd need `%n` for that too: `[[scan $datagram %d:%n len pos]]`.  Now the data is `[[[string range] $datagram $pos [[expr {$pos + $len - 1}]]`, and `[[[string index] $datagram [[expr {$pos + $len}]]]]` should be a comma.  So far, so good.

Buuuuuuuuut how do you now process the next netstring in the sequence?  You have to use `[[string range]]` to chop off the consumed portion, which involves a lot of copying.  I recently had a great deal of trouble when doing similar processing with strings many megabytes in size, wishing to scan a few fields from the front at a time.  So if we're going around adding options to [[scan]], I also suggest adding a -start option to skip leading characters.  [[[regexp]]] has such an option.  Or go the [[[binary scan]]] route and offer `%n@` to skip to an arbitrary location, though I prefer the option approach.

(By the way, [[scan]] also has a serious slowdown with large input strings, possibly due to conversion from [UTF-8] to UCS-2 or whatever for random access, even when random access is not required.  Fix that too.)


<<discussion>>
<<categories>> Tcl syntax | Arts and crafts of Tcl-Tk programming | Command | Binary Data | String Processing | Parsing
} regexp2} CALL {my render scan {See
http://www.tcl.tk/man/tcl/TclCmd/scan.htm 
for the formal man page for the [Tcl] command scan.



If ''varName''s are supplied, will store values parsed (from ''string'' according to ''format'') in the named variables and return the number of variables written to. If no ''varName''s are present, returns the list of values parsed.

<<discussion>>
** Description **
 scan $possiblyZeroedDecimal %d cleanInteger
Context: Leading zeroes induce octal parsing in the [expr] command and in 
if and similar commands that invoke expr. 
Use of scan to remove leading zeros can avoid this often-reported problem.


----
display and numeric format for ASCII characters.  
Scan provides the usual answer:

    foreach character {a A b B c} {
        scan $character %c numeric
        puts "ASCII character '$numeric' displays as '$character'."
    }

    ASCII character '97' displays as 'a'.
    ASCII character '65' displays as 'A'.
    ASCII character '98' displays as 'b'.
    ASCII character '66' displays as 'B'.
    ASCII character '99' displays as 'c'.
======none
----
In more recent Tcl's (8.3 or so up) you also can call [scan] inline:
 set numeric [scan $character %c]
And the %c format is not limited to ASCII, but can handle any Unicode. ([RS])

[LV] Can someone explain what this means - calling scan inline?  
Where is scan getting its input in this case?  From [stdin]?
** Unmatched Conversion Specifiers **
[Lars H]: It refers to the result of scan, not its input. 
Rather than setting some variables, it will return a list of the parsed values. 
(RS's example is rather ugly, since it really sets $numeric to a list with one element, 
but as quick character code look-up at a prompt, a '''scan''' ''char'' '''%c''' is hard to beat.)
[Alastair Davies]/[PYK]: When `scan` returns a list of matching values, it includes an empty string for each format specifier that wasn't matched. If there is only one format specifier, a list with one blank element is returned, and this is not the same thing as and empty string. For example, '''scan foo %d''' returns `{}`, which not an empty string, but a literal open-brace character followed by a literal close-brace character.   The documentation states:
[Alastair Davies]: When using in-line scan, note that the list that is returned will contain blank elements for format specifiers that are not satisfied.  If there is only one format specifier, a list with one blank element is returned, and note especially that this is not the same as an empty string.  For example, '''scan foo %d''' will return {}.   Now, noting the statement on the man page:
 In the inline case, an empty string is returned when the end of the input string 
 is reached before any conversions have been performed. 

you might think that '''`scan foo %d`''' would return an empty string.  But
you might think that '''scan foo %d''' would return an empty string.  But [Don Porter] carefully explains this special case (quoting from [c.l.t.])
Back to the example:

 % scan foo %d
 {}

We're going to scan the string "foo", trying to parse a decimal integer
from it, and when we're done, we're going to return the results as an
inline list, since we provided no variables for field value storage.

While scanning, we see "f", and "f" can't be part of a decimal integer.
In the non inline case, we would assign no value to the variable associated 
with this field.  In the inline format, an empty string
is stored in the corresponding list location to represent that situation.

Now we've exhausted the format spec string.  But note that we haven't
reached "the end of the input string."  We're still sitting at
the "f".  So the "underflow" case where we run out of input before
we run out of format spec does not apply, and we should not expect
[scan] to return the empty string.

To see the underflow case in action, consider these examples:

 % scan a ab
 % scan {} %d
 % scan {   } %d
 %

In all these cases we run out of string to parse before we run out of
format spec string to guide our parsing.  There are more cases where
this happens, but they're difficult to construct.  Consult the sources
and the test suite for details and more examples.

Part of the reason this does tend to be confusing is that the
detection of the "underflow" case and returning the special magic
empty value when it is detected is just about 100% worthless in Tcl.
It's a feature of sscanf() in C that was apparently slavishly
copied over without full consideration of whether it had any
value in Tcl.  (A common problem in built-in commands and
features from Tcl's earliest days.  Octal, anyone?)

[scan]'s rules are arcane and weird, but they are what they are.


----

======
% scan 12a34 {%d%[abc]%d} x - y
3
% list $x $y
12 34
======

Notice that the above scan's format string says "some decimal number, followed by either an '''a''', or a '''b''' or a '''c''', followed by another decimal number". Then, the last 3 arguments are x, the name of the variable to be assigned the first decimal number, - , the name of the variable to be assigned the letter, and y, the name of the second decimal number. The name `-` was chosen with the intent to convey "oh, this value isn't going to be used".


----
Also see "[Dump a file in hex and ASCII]", as well as
"u2x" in the "[Bag of algorithms]".

[Binary] has a scan which complements use of this scan.
** Effect on the Internal Representation of a Value **
----

======
% set var [expr 123]; list
% tcl::unsupported::representation $var
value is a int with a refcount of 2, object pointer at 0317BDF8,
internal representation 0000007B:0317BD68, no string representation
% scan $var %d; list
% tcl::unsupported::representation $var
value is a int with a refcount of 2, object pointer at 0317BDF8,
internal representation 0000007B:0317BD68, string representation "123"
======

I'd think the single numeric conversions could be optimized to recognize when the input already has numeric representation, then simply return the input.


----

Perhaps [[scan]] could be given a special option to get its data from a channel rather than a string, then the C-level guts of the [[scan]] and file I/O implementations can work everything out much more efficiently and correctly than can be done at the Tcl level.

[Netstrings] are one scenario where this would be helpful.  You need to read a length prefix (not knowing in advance how many digits form the length), then confirm the next character is colon (for framing), then read the data (length in bytes indicated by the prefix), then confirm the final character is comma (for framing).

In C, you just do `scanf("%d", &len)` to get the length prefix.  But in Tcl, you have to [[[read]]] one character at a time until you find a non-digit.  That's a lot of looping and script which would be much better handled at the C level.

(Pedantic aside: the scanf() approach isn't technically correct because it allows negative lengths and leading zeroes, but you can check for negatives and can forgive leading zeroes.)

(And a question: with `scanf("%d:", &len)`, how can you tell whether or not the colon was found in the input?  I think the answer is to use `%n`.  Trouble with that is inconsistency in whether successful `scanf("%d:%n", &len, &pos)` returns 1 or 2.  So set pos to 0 first then check if it changed.  What a mess.)

Let's say you have all the data read in advance, e.g. it came from a [UDP] datagram, thereby making Tcl [[scan]] suitable for reading the length prefix.  `[[scan $datagram %d len]]` works, but how do you know how many digits were consumed?  Guess you'd need `%n` for that too: `[[scan $datagram %d:%n len pos]]`.  Now the data is `[[[string range] $datagram $pos [[expr {$pos + $len - 1}]]`, and `[[[string index] $datagram [[expr {$pos + $len}]]]]` should be a comma.  So far, so good.

Buuuuuuuuut how do you now process the next netstring in the sequence?  You have to use `[[string range]]` to chop off the consumed portion, which involves a lot of copying.  I recently had a great deal of trouble when doing similar processing with strings many megabytes in size, wishing to scan a few fields from the front at a time.  So if we're going around adding options to [[scan]], I also suggest adding a -start option to skip leading characters.  [[[regexp]]] has such an option.  Or go the [[[binary scan]]] route and offer `%n@` to skip to an arbitrary location, though I prefer the option approach.

(By the way, [[scan]] also has a serious slowdown with large input strings, possibly due to conversion from [UTF-8] to UCS-2 or whatever for random access, even when random access is not required.  Fix that too.)


<<discussion>>
<<categories>> Tcl syntax | Arts and crafts of Tcl-Tk programming | Command | Binary Data | String Processing | Parsing
}} CALL {my revision scan} CALL {::oo::Obj4268095 process revision/scan} CALL {::oo::Obj4268093 process}

-errorcode

NONE

-errorinfo

Unknow state transition: LINE -> END
    while executing
"error $msg"
    (class "::Wiki" method "render_wikit" line 6)
    invoked from within
"my render_$default_markup $N $C $mkup_rendering_engine"
    (class "::Wiki" method "render" line 8)
    invoked from within
"my render $name $C"
    (class "::Wiki" method "revision" line 31)
    invoked from within
"my revision $page"
    (class "::Wiki" method "process" line 56)
    invoked from within
"$server process [string trim $uri /]"

-errorline

4