TAX Revisited

EF I have been toying around with the tiny XML parser from TAX: A Tiny API for XML. What I was missing was the ability to actually know the XML tag tree that led to the tag that was being opened or closed in the callback. I have modified the code a bit to support this in the callback. Also, I disliked the two different arguments for telling whether it was a self-closing tag, a closing tag or an opening tag and merged these into a single argument. Since I did not wish to break the beauty of the parser, I have added these on top of the existing procedures. The implementation is also now part of the TIL.

 ############################################################
 #
 # Based heavily on Stephen Uhler's HTML parser in 10 lines
 # Modified by Eric Kemp-Benedict for XML
 #
 # Turn XML into TCL commands
 #   xml     A string containing an html document
 #   cmd     A command to run for each html tag found
 #   start   The name of the dummy html start/stop tags
 #
 # Namespace "tax" stands for "Tiny API for XML"
 #
 
 namespace eval tax {
     # Initialise the global state
     variable TAX
     if {![::info exists TAX]} {
        array set TAX {
            idgene        0
        }
     }
     namespace export new parse
 }
 
 # Core of the TAX parser, XML parser in 10 lines, magic!
 proc tax::parse {cmd xml {start docstart}} {
     regsub -all \{ $xml {\&ob;} xml
     regsub -all \} $xml {\&cb;} xml
     set exp {<(/?)([^\s/>]+)\s*([^/>]*)(/?)>}
     set sub "\}\n$cmd {\\2} \[expr \{{\\1} ne \"\"\}\] \[expr \{{\\4} ne \"\"\}\] \
             \[regsub -all -- \{\\s+|(\\s*=\\s*)\} {\\3} \" \"\] \{"
     regsub -all $exp $xml $sub xml
     eval "$cmd {$start} 0 0 {} \{$xml\}"
     eval "$cmd {$start} 1 0 {} {}"
 }
 
 
 # Internal function that keeps track of the tag calling tree and
 # merges the open/close arguments into one.
 proc tax::__callbacker {id cmd tag cl selfcl props bdy} {
     set varname "::tax::cx_${id}"
     upvar \#0 $varname CONTEXT
 
     set tagpath $CONTEXT(lvl)
     if { $selfcl } {
        set type "OC"
     } elseif { $cl } {
        set CONTEXT(lvl) [lrange $CONTEXT(lvl) 0 end-1]
        set tagpath $CONTEXT(lvl)
        set type "C"
     } else {
        if { [string index $tag 0] ne "?" } {
            lappend CONTEXT(lvl) $tag
        }
        set type "O"
     }
 
     eval "$cmd $tag $type \{$props\} \{$bdy\} \{$tagpath\}"
 
     if { [string first "C" $type] >= 0 && [llength $CONTEXT(lvl)] == 0 } {
        unset CONTEXT
     }
 }
 
 
 # Calling this will return a command that complies to the original TAX
 # callback format and allows the command passed as an argument to
 # comply to the new argument list.
 proc tax::new {cmd} {
     variable TAX
 
     set id [incr TAX(idgene)]
     set varname "::tax::cx_${id}"
     upvar \#0 $varname CONTEXT
     set CONTEXT(id) $id
     set CONTEXT(lvl) ""
 
     return "::tax::__callbacker $id $cmd"
 }
 
 
 # Example procedure that complies to the new callback argument list.
 # This procedure simply dumps back the original XML file almost as is.
 proc tax::output { tag type props bdy tree } {
     # Get rid of XML header information, just duplicate it
     if { [string index $tag 0] eq "?" } {
        puts "<$tag $props>"
        return
     }
     # The first tag is a false "docstart" tag, ignore it.
     if { [llength $tree] == 0 } {
        return
     }
 
     if { $type == "C" } {
        puts "</$tag>"
     } else {
        puts -nonewline "<$tag"
        if { [llength $props] >= 0 } {
            array set properties $props
            foreach p [array names properties] {
                puts -nonewline " $p=\"$properties($p)\""
            }
        }
        if { $type == "O" } {
            if { [string trim $bdy] eq "" } {
                puts ">"
            } else {
                puts -nonewline ">$bdy"
            }
        } else {
            puts " />"
        }
     }
 }

I did not wish to break the original calling style, so I have introduced a procedure that returns back an internal procedure that will translate between the original calling style and the new calling style. The procedure takes one command as an argument, this command will have to match a procedure that follows the new style of arguments for the callback. These arguments will be, in order: the tag, the type (see below), the properties of the tag as a list, the body of the tag and a list of all previous tags in the parsing tree (excluding the current tag). The type is a string composed of several "O" and/or "C". "O" stands for open and "C" stands for close. So a self closing tag will have type "OC", while an opening tag will be of type "O" only.

To print back a file, you would do the following, using the internal tax::output procedure:

 set fd [open "yourxmlfile.xml"]
 tax::parse [tax::new tax::output] [read $fd]
 close $fd

EF Note that I have now amended the code at TAX: A Tiny API for XML. You should really read the bug report and bring the modifications from there into the code above for a working TAX parser.


jbr - 2011-06-06 Playing xslt