Version 9 of Natively accessing XML

Updated 2002-09-27 18:38:33

if 0 {

Brian Theado - I read the article at [L1 ] about an extension to the ECMAScript language (a.k.a. JavaScript) that allows XML to be manipulated "natively" and thought that it would be easy to do something similar in Tcl using the tDOM extension. The basic idea is that values from an xml document can be read and modified using XPath queries. One use for this code is for Complex data structures in Tcl.

This really just amounts to a simpler interface to the domNode object than what tDOM provides.

Creation command:

  • xmlstruct::create xml - Returns an extended domNode object command

The following methods are supported in addition to those the tDOM domNode object already provides:

  • $node set xpathQuery ?value? - retrieves or modifies portions of the xml document that match the given xpathQuery
  • $node unset xpathQuery - deletes the portions of the xml document that match the given xpathQuery

Here are some examples from the above article translated to this Tcl version:

 set y [xmlstruct::create {
        <employees>
          <employee>
            <name>Joe</name>
            <age>28</age>
            <hobbies>
                 <favorite>
                 <name>Sailing</name>
               </favorite>
              <others>
                 <name>Reading</name>
                 <name>Running</name>
                 </others>
            </hobbies>
            <department>
              <name>Engineering</name>
            </department>
           </employee>
        </employees>
 }]
 # Reading values
 % $y set /employees/employee/name
 Joe
 % $y set /employees/employee/age
 28
 % $y set //name
 Joe Sailing Reading Running Engineering
 % $y set //hobbies//name
 Sailing Reading Running
 % $y set {/employees/employee[1]//favorite}
 <name>Sailing</name>

 % $y set /   ;# Returns whole document - output not included here

 # Modifying values
 % $y set {/employees/employee[name='Joe']/age} 27  ;# Braces needed to escape XPath square bracket
 % $y set {/employees/employee[name='Joe']/age}
 27
 % $y set {//employee[1]/name} Bill
 % $y set {//employee/name}
 Bill

 # Deleting
 % $y unset //department
 % $y unset //favorite
 % $y unset //hobbies
 % $y set /
 <employees>
    <employee>
        <name>Bill</name>
        <age>27</age>
    </employee>
 </employees>

}

 #
 # TODO: return the value when setting a value and return a list of values when multiple values are set
 #

package require tdom

 # By placing these procs in the ::dom::domNode namespace, they automatically 
 # become add-on domNode methods
 proc ::dom::domNode::unset {node query} {
        ::set resultNodes [$node selectNodes $query type]
        switch $type {
                attrnodes {xmlstruct::unsetattrs $node $query}
        nodes {xmlstruct::unsetnodes $resultNodes}
        empty {error "No results found for '$query'"}
                default {error "$type is an unsupported query result type"}
    }
 }
 proc ::dom::domNode::set {node query args} {
        switch [llength $args] {
                0 {return [xmlstruct::getvalue $node $query]}
                1 {return [xmlstruct::setvalue $node $query [lindex $args 0]]}
                default {error "wrong # args: should be \"set xpathQuery ?newValue?\""}
    }
 }
 namespace eval xmlstruct {}

 # Convenience function for creating an xml doc and returning the root
 proc xmlstruct::create {xml} {
        ::set doc [dom parse $xml]
        return [$doc documentElement] 
 }

 # For '$node set query' calls
 proc xmlstruct::getvalue {node query} {
        ::set resultNodes [$node selectNodes $query type]
        switch $type {
                attrnodes {
                        ::set retVal {}
                        foreach attrVal $resultNodes {
                                lappend retVal [lindex $attrVal 1]
            }
                        return $retVal
        }
                nodes {
                        ::set retVal {}
                        foreach node $resultNodes {
                ::set xml ""
                foreach child [$node childNodes] {
                    append xml [$child asXML]
                }
                                lappend retVal $xml 
            }
            # This is so the curly braces are not there due to the above lappend
            if {[llength $resultNodes] == 1} {::set retVal [lindex $retVal 0]}
                        return $retVal
        }
        empty {return ""}
                default {error "$type is an unsupported query result type"}
    }
 }

 # For '$node set query value' calls
 proc xmlstruct::setvalue {node query value} {
        ::set targetNodes [$node selectNodes $query type]
        switch $type {
                nodes {xmlstruct::setnodes $targetNodes $query $value}
                attrnodes {xmlstruct::setattrs $node $query $value}
                empty {error "Creating new elements/attributes not supported yet"}
                default {error "$type is an unsupported query result type"}
    }
 }

 # For i.e. '$node unset {/employees/employee[1]/@age}' calls
 proc xmlstruct::unsetattrs {node query} {
    ::set nodeQuery [join [lrange [split $query /] 0 end-1] /]
    ::set attribute [string range [lindex [split $query /] end] 1 end]
    foreach matchingNode [$node selectNodes $nodeQuery] {
        $matchingNode removeAttribute $attribute
    }
 }

 # For i.e. '$node set {/employees/employee[1]/@age} 25' calls
 proc xmlstruct::setattrs {node query value} {
    ::set nodeQuery [join [lrange [split $query /] 0 end-1] /]
    ::set attribute [string range [lindex [split $query /] end] 1 end]
    foreach matchingNode [$node selectNodes $nodeQuery] {
        $matchingNode setAttribute $attribute $value
    }
    return $value
 }
 # For i.e. '$node unset {/employees/employee[1]}' calls
 proc xmlstruct::unsetnodes {nodes} {
    # This probably breaks if some nodes are descendents of each other and
    # they don't get deleted in the right order
    foreach node $nodes {
        $node delete
    }
 }

 # Determines if the given string is intended to be valid xml
 proc xmlstruct::isXml {string} {
        ::set string [string trim $string]
        if {([string index $string 0] == "<") && [string index $string end] == ">"} {
                return 1
    } else {
                return 0
    }
 }

 # For i.e. '$node set {/employees/employee[1]} value' calls
 proc xmlstruct::setnodes {targetNodes query value} {
        if {[xmlstruct::isXml $value]} {
                foreach target $targetNodes {xmlstruct::setxml $target $value}
        } else {
                foreach target $targetNodes {xmlstruct::settext $target $value} 
    }
 }
 # TODO: don't allow this to be called for the documentElement node
 # (i.e. $obj set / "some text"  should not be allowed)
 # For i.e. '$node set {/employees/employee/name} Bill' calls
 proc xmlstruct::settext {node text} {
    ::set doc [$node ownerDocument]
        ::set textNode [$doc createTextNode $text]
        foreach child [$node childNodes] {$child delete}
        $node appendChild $textNode
    return $text
 }
 # For i.e. '$node set {/employees/employee} <name>Bill</name>' calls
 proc xmlstruct::setxml {node xml} {
        foreach child [$node childNodes] {$child delete}
        $node appendXML $xml
    return $xml
 }

27sep2002 Jochen Loewer Excellent Brian! I think this XPath based updating of XML/DOM structures makes it even easier to generate/manipulate nested structures. I also read about a similar approach somewhere else. Do you mind if I add this to the next standard tDOM distribution? May be also coded in C?

27Sep2002 Brian Theado - I think it would be great to have this functionality as part of the standard distribution. Use the above code however you like. Note the following limitations that I didn't previously document:

  • Creating brand new elements/attributes isn't supported yet (when it is supported it should somehow be restricted to simple xpath i.e. without filters and without compound queries joined by '|'--really anything that can return multiple nodes/attributes)
  • The functions xmlstruct::setattrs and xmlstruct::unsetattrs above parse the xpath query using split and is not bulletproof
  • When passing an xml string to the set operation, the xml gets added as a child of the node specified. In the article above, the specified node gets replaced with the xml which is a better behavior IMO.

I think the first point above is a very important feature and I plan to add the functionality at some point.

...later - After playing around with the set/unset methods for a short while, it is clear to me that list operations would really be useful. Here's how I imagine a document being built up using list commands:

 % set x [xmlstruct::create <employees/>]
 % $x lappend employee Bill
 % $x set /
 <employees>
   <employee>Bill</employee>
 </employees>
 % $x lappend employee Joe
 % $x set /
 <employees>
   <employee>Bill</employee>
   <employee>Joe</employee>
 </employees>

The lindex command would be unnecessary because XPath has built-in index operations (i.e. $x set employee[1]). The lreplace command is similarly unnecessary. Linsert would be useful though.

Another idea is to see how functionality similar to trace could be adapted to this model.