Version 2 of XML Tree Walking

Updated 2005-03-22 03:58:24 by kpv

Keith Vetter 2005-03-21 : Recently I had to optimize some code that was using pure-tcl to extract out each element of an XML file in the order it appears in the file. My first step in optimization was simply to use tdom to build a dom tree. Then, the following code does a restartable pre-order traversal of that tree

The first call to :::WalkXML::NextNode with XML data initializes the dom tree. Each subsequent call to ::WalkXML::NextNode returns the next dom node.

Just this optimization alone, without redoing the code to utilize the dom tree, greatly speeded up the processing.

NEM 21Mar05: If you want to walk the elements in the order they appear in the file, might a SAX interface be just what you are after?

KPV Close, but the two models differ in who drives whom. The SAX model makes repeated calls into user code via callbacks while in this model works user code makes repeated calls into the this model. For parsing XML, I find the second way better fitting my mental model of how it should work: e.g. I'm looking at a <time> tag so I just need to call NextNode to get its value. With the SAX model I guess you'd accomplish the same by setting some global state variable.


 ##+##########################################################################
 #
 # WalkXML -- walks an tdom XML tree with each call to NextNode returning
 # the next node in a pre-order traversal of the dom tree.
 # by Keith Vetter, March 2005
 #

 package require tdom
 namespace eval ::WalkXML { variable stack}

 proc ::WalkXML::NextNode {{XML ""}} {
    variable stack                              ;# List of VISITED ancestors

    if {$XML ne ""} {                           ;# Parse given xml
        set doc [dom parse $XML]
        set root [$doc documentElement]
        set stack [list $root]
        return $root
    }

    while {1} {
        set last [lindex $stack end]            ;# Last already visited node

        set next [$last firstChild]             ;# Try going down the tree
        if {$next ne ""} break

        set stack [lrange $stack 0 end-1]
        set next [$last nextSibling]            ;# Try going across the tree
        if {$next ne ""} break

        while {1} {                             ;# Must go up the tree
            set next [lindex $stack end]        ;# Parent
            set stack [lrange $stack 0 end-1]
            if {$next eq ""} break

            set next [$next nextSibling]        ;# Try parent's sibling
            if {$next ne ""} break
        }
        break
    }
    if {$next ne ""} {
        lappend stack $next
    }
    return $next
 }



 ################################################################
 #
 # Demo code
 #
 set XML {<?xml version="1.0" encoding="ISO-8859-1"?>
    <loc version="1.0" src="Groundspeak">
    <waypoint>
    <name id="GCGPXK"><![CDATA[Playing Poker with the Squirrels by Rino 'n Rinette]]></name>
    <coord lat="40.1548166" lon="-82.5202833"/>
    <type>Geocache</type>
    <link text="Cache Details">http://www.geocaching.com/seek/cache_details.aspx?wp=GCGPXK</link>
    </waypoint><waypoint>
    <name id="GC19DF"><![CDATA[Great Playground Caper by Treasure Hunters Inc.]]></name>
    <coord lat="40.0667166666667" lon="-82.5358"/>
    <type>Geocache</type>
    <link text="Cache Details">http://www.geocaching.com/seek/cache_details.aspx?wp=GC19DF</link>
    </waypoint>
    </loc>
 }


 set node [::WalkXML::NextNode $XML]
 while {$node ne ""} {
    if {[$node nodeType] eq "TEXT_NODE"} {
        puts "\"[$node nodeValue]\""
    } else {
        puts "[$node nodeName]"
    }
    set node [::WalkXML::NextNode]
 }