Error processing request

Parameters

CONTENT_LENGTH0
REQUEST_METHODGET
REQUEST_URI/revision/A+little+XML+parser?V=18
QUERY_STRINGV=18
CONTENT_TYPE
DOCUMENT_URI/revision/A+little+XML+parser
DOCUMENT_ROOT/var/www/nikit/nikit/nginx/../docroot
SCGI1
SERVER_PROTOCOLHTTP/1.1
HTTPSon
REMOTE_ADDR172.69.6.16
REMOTE_PORT51954
SERVER_PORT4443
SERVER_NAMEwiki.tcl-lang.org
HTTP_HOSTwiki.tcl-lang.org
HTTP_CONNECTIONKeep-Alive
HTTP_ACCEPT_ENCODINGgzip, br
HTTP_X_FORWARDED_FOR3.145.119.199
HTTP_CF_RAY87f3ceee2fd561b1-ORD
HTTP_X_FORWARDED_PROTOhttps
HTTP_CF_VISITOR{"scheme":"https"}
HTTP_ACCEPT*/*
HTTP_USER_AGENTMozilla/5.0 AppleWebKit/537.36 (KHTML, like Gecko; compatible; ClaudeBot/1.0; [email protected])
HTTP_CF_CONNECTING_IP3.145.119.199
HTTP_CDN_LOOPcloudflare
HTTP_CF_IPCOUNTRYUS

Body


Error

Unknow state transition: LINE -> END

-code

1

-level

0

-errorstack

INNER {returnImm {Unknow state transition: LINE -> END} {}} CALL {my render_wikit {A little XML parser} {** Description **

[Richard Suchenwirth] 2002-08-20:  [XML] (the eXtensible Markup Language),
shortly demonstrated as

======
<tag attribute="value">textcontent<singletontag/></tag>
======

is not the most beautiful or efficient way to package data, but it's
fashionable and "standard".  Often it's better to design a file format in XML
(and rely on the available tools, e.g. [tDOM] and, based on it, [starDOM]) than
to design a proprietary format, and have to write matching parsers, and
document the format, etc., oneself. After all, computer memory and CPU power
are getting ever cheaper these years than brain cells...

Another way to represent complex data is as a nested Tcl [list]. If you just
get your braces balanced, Tcl will parse a string into a list and let you
process it - which gets easier with the "multi-dimensional" [lindex] and [lset]
commands since Tcl 8.4. [tDOM] has the ''$node asList'' command which, after
parsing the input XML string into a DOM structure in memory, returns a
well-formed Tcl list with these element properties:

   * if the first element is ''#text'', the second is the text content;
   * else the first element is the tag, the second the attributes alternating ''name value...'', and the third is a list of the child elements.

Comparing XML input and ''toList'' output, I thought that many differences can
be handled by string manipulations in local context, for which Tcl offers
powerful tools (for instance, ''[string] map, [regexp]'' and ''[regsub]''). So
one morning I decided to try a direct conversion of an XML string to a string
that makes a well-formed list equivalent to the ''toList'' output. The
following code does some XML well-formedness checking with a stack for matching
start/end tags, but besides it's of course weaker than the power of [tDOM].
Still, it was a nice little evening project, so here goes:

======
proc xml2list xml {
    regsub -all {>\s*<} [string trim $xml " \n\t<>"] "\} \{" xml
    set xml [string map {> "\} \{#text \{" < "\}\} \{"}  $xml]

    set res ""   ;# string to collect the result   
    set stack {} ;# track open tags
    set rest {}
    foreach item "{$xml}" {
        switch -regexp -- $item {
            ^# {append res "{[lrange $item 0 end]} " ; #text item}
            ^/ {
                regexp {/(.+)} $item -> tagname ;# end tag
                set expected [lindex $stack end]
                if {$tagname!=$expected} {error "$item != $expected"}
                set stack [lrange $stack 0 end-1]
                append res "\}\} "
        }
            /$ { # singleton - start and end in one <> group
                regexp {([^ ]+)( (.+))?/$} $item -> tagname - rest
                set rest [lrange [string map {= " "} $rest] 0 end]
                append res "{$tagname [list $rest] {}} "
            }
            default {
                set tagname [lindex $item 0] ;# start tag
                set rest [lrange [string map {= " "} $item] 1 end]
                lappend stack $tagname
                append res "\{$tagname [list $rest] \{"
            }
        }
        if {[llength $rest]%2} {error "att's not paired: $rest"}
    }
    if [llength $stack] {error "unresolved: $stack"}
    string map {"\} \}" "\}\}"} [lindex $res 0]
}

---- Now that this went so well, I'll throw in the converse:

======
proc list2xml list {
    switch -- [llength $list] {
        2 {lindex $list 1}
        3 {
            foreach {tag attributes children} $list break
            set res <$tag
            foreach {name value} $attributes {
                append res " $name=\"$value\""
            }
            if [llength $children] {
                append res >
                foreach child $children {
                    append res [list2xml $child]
                }
                append res </$tag>
            } else {append res />}
        }
        default {error "could not parse $list"}
    }
}
#-------------------------------------------- now testing:
set test {<foo a="b">bar and<grill x:c="d" e="f g"><baz x="y"/></grill><room/></foo>}
proc tdomlist x {[[dom parse $x] documentElement root] asList} ;# reference
proc lequal {a b} {
    if {[llength $a] != [llength $b]} {return 0}
    if {[lindex $a 0] == $a} {return [string equal $a $b]}
    foreach i $a j $b {if {![lequal $i $j]} {return 0}}
    return 1
}
proc try x {
    puts [set a [tdomlist $x]]
    puts [set b [xml2list $x]]
    puts list:[lequal $a $b],string:[string equal $a $b]
}
puts [set res [xml2list $test]]
======

======
foo {a b} {{#text {bar and}} {grill {x:c d e {f g}} {{baz {x y} {}}}} {room {} {}}}
======

which is equal to the ''toList'' result. This may not be the most readable code
I ever wrote (it strongly illustrates that unbalanced braces have to be escaped
in strings ;-), but it demonstrates the mind-boggling power of hopping between
the list and the string representation, which is one of Tcl's unique features.

Having the two converters above, one may start to think about a Tcl list DOM
variation, where accesses go via [lindex]/[lset] in the nested list, and
finally well-formed XML comes out again... But this smells like more work than
fun for an evening ;-)

----

[RS]:  The idea that index vectors, as can now be used with [lindex]/ [lset],
are paths in a tree (which the XML or DOM is) is fascinating - navigating
nested lists can go very fast with this." - Here is a little general-purpose
depth-first traverser that you can run over a listDOM:

======
proc forall {varName list body} {
    set $varName $list
    eval $body
    foreach child [lindex $list 2] {
        forall $varName $child $body
    }
} ;# RS
% forall i $a {puts [lrange $i 0 1]}
foo {a b}
#text {bar and}
grill {x:c d e {f g}}
baz {x y}
room {}
======

----

[Brian Theado] 2003-04-05:  See [http://www.cs.sfu.ca/~cameron/REX.html] for a
paper describing shallow parsing of XML using only a regular expression.  The
regular expression is about 30 lines long, but the paper documents it well.
The Appendix includes sample implementation in Perl, Javascript and Flex/Lex.
The Appendix also includes an interactive demo (using the Javascript
implementation apparently).  The demo helped me understand what they meant by
"shallow parsing".

I'm guessing translation from the Perl implementation to a Tcl implementation
should be pretty straightforward.  ...the next day: Translation to Tcl was
straightforward.  See [XML Shallow Parsing with Regular Expressions].

----

[JM] 2010-01-05:  I suggest [LogParser] for XML files, you can always drive the parser from Tcl of course.

<<categories>> Category Package | Category XML | Category Parsing | RS} regexp2} CALL {my render {A little XML parser} {** Description **

[Richard Suchenwirth] 2002-08-20:  [XML] (the eXtensible Markup Language),
shortly demonstrated as

======
<tag attribute="value">textcontent<singletontag/></tag>
======

is not the most beautiful or efficient way to package data, but it's
fashionable and "standard".  Often it's better to design a file format in XML
(and rely on the available tools, e.g. [tDOM] and, based on it, [starDOM]) than
to design a proprietary format, and have to write matching parsers, and
document the format, etc., oneself. After all, computer memory and CPU power
are getting ever cheaper these years than brain cells...

Another way to represent complex data is as a nested Tcl [list]. If you just
get your braces balanced, Tcl will parse a string into a list and let you
process it - which gets easier with the "multi-dimensional" [lindex] and [lset]
commands since Tcl 8.4. [tDOM] has the ''$node asList'' command which, after
parsing the input XML string into a DOM structure in memory, returns a
well-formed Tcl list with these element properties:

   * if the first element is ''#text'', the second is the text content;
   * else the first element is the tag, the second the attributes alternating ''name value...'', and the third is a list of the child elements.

Comparing XML input and ''toList'' output, I thought that many differences can
be handled by string manipulations in local context, for which Tcl offers
powerful tools (for instance, ''[string] map, [regexp]'' and ''[regsub]''). So
one morning I decided to try a direct conversion of an XML string to a string
that makes a well-formed list equivalent to the ''toList'' output. The
following code does some XML well-formedness checking with a stack for matching
start/end tags, but besides it's of course weaker than the power of [tDOM].
Still, it was a nice little evening project, so here goes:

======
proc xml2list xml {
    regsub -all {>\s*<} [string trim $xml " \n\t<>"] "\} \{" xml
    set xml [string map {> "\} \{#text \{" < "\}\} \{"}  $xml]

    set res ""   ;# string to collect the result   
    set stack {} ;# track open tags
    set rest {}
    foreach item "{$xml}" {
        switch -regexp -- $item {
            ^# {append res "{[lrange $item 0 end]} " ; #text item}
            ^/ {
                regexp {/(.+)} $item -> tagname ;# end tag
                set expected [lindex $stack end]
                if {$tagname!=$expected} {error "$item != $expected"}
                set stack [lrange $stack 0 end-1]
                append res "\}\} "
        }
            /$ { # singleton - start and end in one <> group
                regexp {([^ ]+)( (.+))?/$} $item -> tagname - rest
                set rest [lrange [string map {= " "} $rest] 0 end]
                append res "{$tagname [list $rest] {}} "
            }
            default {
                set tagname [lindex $item 0] ;# start tag
                set rest [lrange [string map {= " "} $item] 1 end]
                lappend stack $tagname
                append res "\{$tagname [list $rest] \{"
            }
        }
        if {[llength $rest]%2} {error "att's not paired: $rest"}
    }
    if [llength $stack] {error "unresolved: $stack"}
    string map {"\} \}" "\}\}"} [lindex $res 0]
}

---- Now that this went so well, I'll throw in the converse:

======
proc list2xml list {
    switch -- [llength $list] {
        2 {lindex $list 1}
        3 {
            foreach {tag attributes children} $list break
            set res <$tag
            foreach {name value} $attributes {
                append res " $name=\"$value\""
            }
            if [llength $children] {
                append res >
                foreach child $children {
                    append res [list2xml $child]
                }
                append res </$tag>
            } else {append res />}
        }
        default {error "could not parse $list"}
    }
}
#-------------------------------------------- now testing:
set test {<foo a="b">bar and<grill x:c="d" e="f g"><baz x="y"/></grill><room/></foo>}
proc tdomlist x {[[dom parse $x] documentElement root] asList} ;# reference
proc lequal {a b} {
    if {[llength $a] != [llength $b]} {return 0}
    if {[lindex $a 0] == $a} {return [string equal $a $b]}
    foreach i $a j $b {if {![lequal $i $j]} {return 0}}
    return 1
}
proc try x {
    puts [set a [tdomlist $x]]
    puts [set b [xml2list $x]]
    puts list:[lequal $a $b],string:[string equal $a $b]
}
puts [set res [xml2list $test]]
======

======
foo {a b} {{#text {bar and}} {grill {x:c d e {f g}} {{baz {x y} {}}}} {room {} {}}}
======

which is equal to the ''toList'' result. This may not be the most readable code
I ever wrote (it strongly illustrates that unbalanced braces have to be escaped
in strings ;-), but it demonstrates the mind-boggling power of hopping between
the list and the string representation, which is one of Tcl's unique features.

Having the two converters above, one may start to think about a Tcl list DOM
variation, where accesses go via [lindex]/[lset] in the nested list, and
finally well-formed XML comes out again... But this smells like more work than
fun for an evening ;-)

----

[RS]:  The idea that index vectors, as can now be used with [lindex]/ [lset],
are paths in a tree (which the XML or DOM is) is fascinating - navigating
nested lists can go very fast with this." - Here is a little general-purpose
depth-first traverser that you can run over a listDOM:

======
proc forall {varName list body} {
    set $varName $list
    eval $body
    foreach child [lindex $list 2] {
        forall $varName $child $body
    }
} ;# RS
% forall i $a {puts [lrange $i 0 1]}
foo {a b}
#text {bar and}
grill {x:c d e {f g}}
baz {x y}
room {}
======

----

[Brian Theado] 2003-04-05:  See [http://www.cs.sfu.ca/~cameron/REX.html] for a
paper describing shallow parsing of XML using only a regular expression.  The
regular expression is about 30 lines long, but the paper documents it well.
The Appendix includes sample implementation in Perl, Javascript and Flex/Lex.
The Appendix also includes an interactive demo (using the Javascript
implementation apparently).  The demo helped me understand what they meant by
"shallow parsing".

I'm guessing translation from the Perl implementation to a Tcl implementation
should be pretty straightforward.  ...the next day: Translation to Tcl was
straightforward.  See [XML Shallow Parsing with Regular Expressions].

----

[JM] 2010-01-05:  I suggest [LogParser] for XML files, you can always drive the parser from Tcl of course.

<<categories>> Category Package | Category XML | Category Parsing | RS}} CALL {my revision {A little XML parser}} CALL {::oo::Obj6451543 process revision/A+little+XML+parser} CALL {::oo::Obj6451541 process}

-errorcode

NONE

-errorinfo

Unknow state transition: LINE -> END
    while executing
"error $msg"
    (class "::Wiki" method "render_wikit" line 6)
    invoked from within
"my render_$default_markup $N $C $mkup_rendering_engine"
    (class "::Wiki" method "render" line 8)
    invoked from within
"my render $name $C"
    (class "::Wiki" method "revision" line 31)
    invoked from within
"my revision $page"
    (class "::Wiki" method "process" line 56)
    invoked from within
"$server process [string trim $uri /]"

-errorline

4