Natively accessing XML

if 0 {

Brian Theado - I read the article at [L1 ] about an extension to the ECMAScript language (a.k.a. JavaScript) that allows XML to be manipulated "natively" and thought that it would be easy to do something similar in Tcl using the tDOM extension. The basic idea is that values from an xml document can be read and modified using XPath queries. One use for this code is for Complex data structures in Tcl.

This really just amounts to a simpler interface to the domNode object than what tDOM provides.

Creation command:

  • xmlstruct::create xml - Returns an extended domNode object command

The following methods are supported in addition to those the tDOM domNode object already provides:

  • $node set xpathQuery ?value? - retrieves or modifies portions of the xml document that match the given xpathQuery
  • $node unset xpathQuery - deletes the portions of the xml document that match the given xpathQuery
  • $node lappend xpathQuery value ?value? ?value? ... - appends the given values to the node(s) that match the given xpathQuery (See examples below--this description is not very good)

Here are some examples from the above article translated to this Tcl version:

 set y [xmlstruct::create {
        <employees>
          <employee>
            <name>Joe</name>
            <age>28</age>
            <hobbies>
               <favorite>
                 <name>Sailing</name>
               </favorite>
              <others>
                 <name>Reading</name>
                 <name>Running</name>
               </others>
            </hobbies>
            <department>
              <name>Engineering</name>
            </department>
           </employee>
        </employees>
 }]
 # Reading values
 % $y set /employees/employee/name
 Joe
 % $y set /employees/employee/age
 28
 % $y set //name
 Joe Sailing Reading Running Engineering
 % $y set //hobbies//name
 Sailing Reading Running
 % $y set {/employees/employee[1]//favorite}
 <name>Sailing</name>

 % $y set /   ;# Returns whole document - output not included here

 # Modifying values
 % $y set {/employees/employee[name='Joe']/age} 27  ;# Braces needed to escape XPath square bracket
 % $y set {/employees/employee[name='Joe']/age}
 27
 % $y set {//employee[1]/name} Bill
 % $y set {//employee/name}
 Bill

 # Deleting
 % $y unset //department
 % $y unset //favorite
 % $y unset //hobbies
 % $y set /
 <employees>
    <employee>
        <name>Bill</name>
        <age>27</age>
    </employee>
 </employees>

 # Building a document incrementally
 % set y [xmlstruct::create <employees/>]
 % $y lappend employee Joe Bill Bob
 % $y set /
 <employees>
    <employee>Joe</employee>
    <employee>Bill</employee>
    <employee>Bob</employee>
 </employees>

 % set y [xmlstruct::create <employees/>]
 % $y lappend employee <name>Joe</name> <name>Bill</name> <name>Bob</name> 
 <employees>
    <employee>
        <name>Joe</name>
    </employee>
    <employee>
        <name>Bill</name>
    </employee>
    <employee>
        <name>Bob</name>
    </employee>
 </employees>

}

 #
 # TODO: return the value when setting a value and return a list of values when multiple values are set
 #
 package require tdom
 # By placing these procs in the ::dom::domNode namespace, they automatically 
 # become add-on domNode methods
 proc ::dom::domNode::unset {node query} {
        ::set resultNodes [$node selectNodes $query type]
        switch $type {
                attrnodes {xmlstruct::unsetattrs $node $query}
        nodes {xmlstruct::unsetnodes $resultNodes}
        empty {error "No results found for '$query'"}
                default {error "$type is an unsupported query result type"}
    }
 }
 proc ::dom::domNode::set {node query args} {
        switch [llength $args] {
                0 {return [xmlstruct::getvalue $node $query]}
                1 {return [xmlstruct::setvalue $node $query [lindex $args 0]]}
                default {error "wrong # args: should be \"set xpathQuery ?newValue?\""}
    }
 }
 proc ::dom::domNode::lappend {node query args} {
    foreach arg $args {
        xmlstruct::setnew $node $query $arg
    }
 }
 namespace eval xmlstruct {}

 # Convenience function for creating an xml doc and returning the root
 proc xmlstruct::create {xml} {
        ::set doc [dom parse $xml]
        return [$doc documentElement] 
 }

 # For '$node set query' calls
 proc xmlstruct::getvalue {node query} {
        ::set resultNodes [$node selectNodes $query type]
        switch $type {
                attrnodes {
                        ::set retVal {}
                        foreach attrVal $resultNodes {
                                lappend retVal [lindex $attrVal 1]
            }
                        return $retVal
        }
                nodes {
                        ::set retVal {}
                        foreach node $resultNodes {
                ::set xml ""
                foreach child [$node childNodes] {
                    append xml [$child asXML]
                }
                                lappend retVal $xml 
            }
            # This is so the curly braces are not there due to the above lappend
            if {[llength $resultNodes] == 1} {::set retVal [lindex $retVal 0]}
                        return $retVal
        }
        empty {return ""}
                default {error "$type is an unsupported query result type"}
    }
 }

 # For '$node set query value' calls
 proc xmlstruct::setvalue {node query value} {
        ::set targetNodes [$node selectNodes $query type]
        switch $type {
                nodes {xmlstruct::setnodes $targetNodes $query $value}
                attrnodes {xmlstruct::setattrs $node $query $value}
                empty {xmlstruct::setnew $node $query $value}
                default {error "$type is an unsupported query result type"}
    }
 }

 # Creates a new attribute/element for an xpath query in which all
 # the elements of the query up to the last exist
 proc xmlstruct::setnew {node query value} {
    set possibleMatch [split $query /]
    set unmatched [lindex $possibleMatch end]
    set possibleMatch [lreplace $possibleMatch end end]
    if {[llength $possibleMatch] == 0} {
        set possibleMatch .
    }
    
    set nodes [$node selectNodes [join $possibleMatch /] type]
    switch $type {
        nodes {
            if {[string index $unmatched 0] == "@"} {
                foreach node $nodes {
                    $node setAttribute [string range $unmatched 1 end] $value
                }
            } else {
                foreach node $nodes {
                    $node appendXML "<$unmatched/>"
                    set newNode [$node lastChild]
                    $newNode set . $value
                }
            }
        }
        attrnodes {error "Can't add children to attributes ($possibleMatch)"}
        empty {error "Create elements matching $possibleMatch first"}
    }
 }

 # For i.e. '$node unset {/employees/employee[1]/@age}' calls
 proc xmlstruct::unsetattrs {node query} {
    ::set nodeQuery [join [lrange [split $query /] 0 end-1] /]
    ::set attribute [string range [lindex [split $query /] end] 1 end]
    foreach matchingNode [$node selectNodes $nodeQuery] {
        $matchingNode removeAttribute $attribute
    }
 }

 # For i.e. '$node set {/employees/employee[1]/@age} 25' calls
 proc xmlstruct::setattrs {node query value} {
    ::set nodeQuery [join [lrange [split $query /] 0 end-1] /]
    ::set attribute [string range [lindex [split $query /] end] 1 end]
    foreach matchingNode [$node selectNodes $nodeQuery] {
        $matchingNode setAttribute $attribute $value
    }
    return $value
 }
 # For i.e. '$node unset {/employees/employee[1]}' calls
 proc xmlstruct::unsetnodes {nodes} {
    # This probably breaks if some nodes are descendents of each other and
    # they don't get deleted in the right order
    foreach node $nodes {
        $node delete
    }
 }

 # Determines if the given string is intended to be valid xml
 proc xmlstruct::isXml {string} {
        ::set string [string trim $string]
        if {([string index $string 0] == "<") && [string index $string end] == ">"} {
                return 1
    } else {
                return 0
    }
 }

 # For i.e. '$node set {/employees/employee[1]} value' calls
 proc xmlstruct::setnodes {targetNodes query value} {
        if {[xmlstruct::isXml $value]} {
                foreach target $targetNodes {xmlstruct::setxml $target $value}
        } else {
                foreach target $targetNodes {xmlstruct::settext $target $value} 
    }
 }
 # TODO: don't allow this to be called for the documentElement node
 # (i.e. $obj set / "some text"  should not be allowed)
 # For i.e. '$node set {/employees/employee/name} Bill' calls
 proc xmlstruct::settext {node text} {
    ::set doc [$node ownerDocument]
    foreach child [$node childNodes] {$child delete}
    if {[string length $text] > 0} {
        ::set textNode [$doc createTextNode $text]
        $node appendChild $textNode
    }
    return $text
 }
 # For i.e. '$node set {/employees/employee} <name>Bill</name>' calls
 proc xmlstruct::setxml {node xml} {
        foreach child [$node childNodes] {$child delete}
        $node appendXML $xml
    return $xml
 }

27sep2002 Jochen Loewer Excellent Brian! I think this XPath based updating of XML/DOM structures makes it even easier to generate/manipulate nested structures. I also read about a similar approach somewhere else. Do you mind if I add this to the next standard tDOM distribution? May be also coded in C?

27Sep2002 Brian Theado - I think it would be great to have this functionality as part of the standard distribution. Use the above code however you like. Note the following limitations that I didn't previously document:

  • Creating brand new elements/attributes isn't supported yet (when it is supported it should somehow be restricted to simple xpath i.e. without filters and without compound queries joined by '|'--really anything that can return multiple nodes/attributes) -- 29sep2002 - This is now supported (see xmlstruct::setnew)
  • The functions xmlstruct::setattrs and xmlstruct::unsetattrs above parse the xpath query using split and is not bulletproof
  • When passing an xml string to the set operation, the xml gets added as a child of the node specified. In the article above, the specified node gets replaced with the xml which is a better behavior IMO. -- 29sep2002 - Now that new elements and attributes and lappend is supported, I like the way it is better

I think the first point above is a very important feature and I plan to add the functionality at some point.

27sep2002 Rolf Ade At least the second point above is a strong reason, to implement this in C - I think there's no other way, to make this bulletproof (the problem is, that there isn't an cmd obj interface to attribute nodes in tDOM). Since Jochen has volunteered, to do this, I know it in the best hands. ;-)

...later - After playing around with the set/unset methods for a short while, it is clear to me that list operations would really be useful. Here's how I imagine a document being built up using list commands (29sep2002 - lappend functionality added):

 % set x [xmlstruct::create <employees/>]
 % $x lappend employee Bill
 % $x set /
 <employees>
   <employee>Bill</employee>
 </employees>
 % $x lappend employee Joe
 % $x set /
 <employees>
   <employee>Bill</employee>
   <employee>Joe</employee>
 </employees>

The lindex command would be unnecessary because XPath has built-in index operations (i.e. $x set employee[1]). The lreplace command is similarly unnecessary. Linsert would be useful though.

Another idea is to see how functionality similar to trace could be adapted to this model.

27sep02 jcw - Would it be an option to use ()'s for indexing - e.g. employee(1)? How about other variations, such as "." as path separator instead of "/"? Another thought - could this approach be used as foundation for different forms of structured data? Things like tdom and tcldom and metakit? Not to over-generalize or even make all notations and features work everywhere, but simply to introduce a way which helps people pick up conventions more quickly. There are so many conventions for nesting ('/', '.', '::', ',', ' ')...

27sep02 Brian Theado - XPath (which is pretty much the core of XSL) has a specific syntax that already makes use of (off the top of my head) the symbols '/', '//', '.', '..', '::', '(', ')', '[', ']'. The parsing for XPath is already handled internally by tDOM. One of the nice things about the above code is that you get the full power of XPath (which I find very powerful). I suppose a simplified syntax would be possible that could unify the various forms of structured data, but XPath in its simplest form (using just the first 4 symbols listed above) is very much like unix filesystem access which a large number of people are already familiar with. Now, the article that got me started on this code [L2 ] does use a '.' separator, so something like you suggest must be possible.

27sep02 jcw - Thanks, also for the pointers. Hrm, if only the world were simpler: another example is 1-based vs. 0-based indexing. Oh well, still I find it most illuminating to see such simplifying approaches as the one you created on this page.

27sep02 Jacob Levy: Would an XPath binding for e4Graph be something people would consider useful?


See also tclxml, tcldom, tdom