XPath

"XPath is a language for addressing parts of an XML document, designed to be used by both XSLT and XPointer." That's the summary of version 1.0 of the W3C specification of the XML Path Language--XPath. You can read this standard (or "recommendation", in W3C vernacular) for yourself at http://www.w3.org/TR/xpath . One way to think about XPath is that it does for XML instances a bit of what SQL does for RDBMSs--it's a kind of query language. It complements XSLT in particular; XSLT describes what changes to make, and XPath tells where in a document to make them (very roughly). [Also recommend http://www.w3schools.com/ ? ]


Examples of XPath

        <html>
            <a href="https://wiki.tcl-lang.org/">Tcler's Wiki</a>
            <a href="https://www.tcl-lang.org/">Tcl Developer Xchange</a>
            <a id="getme" href="http://openacs.org/">OpenACS</a>
        </html>

Given the example XML above we could extract all <a> tags using the following XPath:

        //a

We could also grab the "OpenACS" link with the following XPath:

        //a[@id="getme"]

More examples can be found at: http://www.zvon.org/xxl/XPathTutorial/General/examples.html

JAC [Nice comments, JAC.]


[Explain XPath implementations in existing Tcl-XSLT bindings.]


tDOM since 1999 makes XPath queries on script level very easy. With the example from above:

    package require tdom
    
    set data {
    <html>
        <a href="https://wiki.tcl-lang.org/">Tcler's Wiki</a>
        <a href="https://www.tcl-lang.org/">Tcl Developer Xchange</a>
        <a id="getme" href="http://openacs.org/">OpenACS</a>
    </html>}
    
    set doc [dom parse $data]
    set root [$doc documentElement]
    
    set aNodes [$root selectNodes {//a[@id="getme"]}]
    
    foreach node $aNodes {
        puts "Visit [$node text] at [$node @href]"
    }
    
    $doc delete

Two notes:

The brackets '[ ]' have syntactical meaning both in tcl and in XPath expressions. Don't forget to protect the brackets in your XPath expressions.

The XPath expression //a is not the best example one could choose. The '//' (which is the abbreviation for /descendant-or-self::node()/) is one of the most expensive XPath location steps for almost all known XPath engines. It means, that the XPath engine has to scan the whole tree beneath the node. Avoiding // - of course, if possible - could amazingly speed up your XPath queries or your XSLT stylesheets. Rolf Ade.


TclXML implemented XPath in around 2001. It's possible the TclXML side is not as actively maintained. "DOM Level 3 has an official method for using XPath with a DOM tree, but I haven't had time to implement that yet." In any case, here's an example of TclDOM XPath programming that arose from a comp.lang.tcl request to make a list of all attributes across all tags:

    package require dom
    set xml_image {<?xml version = "1.0" encoding = "UTF-8" ?>
          <via id="1" trivia_id="255" question="How much wood would a
        woodchuck chuck?" answer_id="1" answer="A lot"
        expDate="116494926"></via>}
    set doc [dom::parse $xml_image]
    set result {}
        # "//@*" says, "all tags, all attributes."  "//via/@*"
        # gives all attributes of all "via" tags.  And so on.
    foreach node [dom::selectNode $doc //@*] {
        lappend result [dom::node configure $node -localName] \
                     [dom::node stringValue $node]
    }
    puts $result
    # The output is "id 1 trivia_id 255 question {How much wood would a      woodchuck chuck?} answer_id 1 answer {A lot} expDate 116494926"

Steve once did "some work on a higher-level package that allows XPath over a streaming interface ..." See the xmlswitch package for more on the subject.


XPath users and students will want to have XGrep [L1 ] at their sides. "XGrep is a grep like utility for XML documents" which uses XPath syntax for its searches.


RS 2005-06-13: Here's experimental code to fill in a default namespace into XPath expressions:

 proc XPathNS {ns path} {
    set res {}
    foreach e [split $path /] {
        if {$e eq "."}      continue
        if {![in {"" ..} $e] && ![has : $e]} {set e $ns:$e}
        lappend res $e
    }
    join $res /
 }
 proc in {list el} {expr {[lsearch $list $el]>=0}}
 proc has {substr str} {expr {[string first $substr $str]>=0}}

#-- Testing:

 % XPathNS NN /a/b/c
 /NN:a/NN:b/NN:c
 % XPathNS NN /a/b/../c
 /NN:a/NN:b/../NN:c
 % XPathNS NN /a/./b/../c/foo:bar ;# preserve explicit namespaces, "foo" here
 /NN:a/NN:b/../NN:c/foo:bar

RS 2006-03-24: A little command line tool to do an XPath query on an XML file:

 #!/usr/bin/env tclsh
 set usage {
    usage: xpath xmlfile query
 }
 if {[llength $argv] != 2} {puts stderr $usage; exit}
 package require tdom

 proc main argv {
    foreach {xmlfile query} $argv break
    set f [open $xmlfile]
    set docel [[dom parse -channel $f doc] documentElement]
    close $f
    foreach node [$docel selectNodes $query] {
        puts [$node asXML]
    }
 }
 main $argv

2006-05-22 From linux-magazine article issue 20

Table 3: XPATH examples

Query Description
/option The option element directly below the root node
//option All elements in the document called option
//option[3] The third option element
/table/* All elements below table, where table must be located directly below the root node
//table[1] The first table element in a document
//table[last()] The last table element
//@colspan All colspan attributes in a document
//td[@colspan] All td elements with the attribute colspan
//table[@width] All table elements that have a width attribute
//table[@width=690] All table elements with a width attribute that has avalue of 690
//*[count(tr)=2] All elements with two tr child nodes
//tr/td|th All td and th elements contained within a tr element
//table//img All img elements contained within a table-element
//table[1]//img[2] Second img element in the first table
//status[.='hit']/..All elements that contain a <status>hit</status> element
//p/br/following-sibling::text() The bar in <p>foo<br/>bar</p>

DKF: A query that I find useful at demonstrating the power of XPath is this one that allows you to use sub-searches:

  //table[tr/td/@width=690]
                      Tables that contains a tr with a td inside that is of width 690

There are other ways of writing the above:

  //table/tr[td/@width=690]/..
  //table/tr/td[@width=690]/../..
  //table/tr[td[@width=690]]

As you can see, XPath is powerful, tricky, baroque, ... :-)


anjalijaingmailcom - 2010-03-26 07:15:12

 <div class="tabUnselectedText" align="center">
 <a href="javascript:renderPage('mainForm:consoleBeanId.3','Analysis' , 'analysisdetails.faces');">Analysis</a>
 </div>

What would be Xpath for element 'Analysis'?

Lars H: That depends on what you consider to characterise it. From the fragment you quote, one possibility would be //div/a, but in a typical XHTML document that'll probably match lots of things.


xpath.tcl : A wrapper script that models an XML file as a filesystem to make using XPATH via Tdom feel like using file commands.