Version 14 of XPath

Updated 2003-03-19 19:32:14

"XPath is a language for addressing parts of an XML document, designed to be used by both XSLT and XPointer." That's the summary of version 1.0 of the W3C specification of the XML Path Language--XPath. You can read this standard (or "recommendation", in W3C vernacular) for yourself at http://www.w3.org/TR/xpath . One way to think about XPath is that it does for XML instances a bit of what SQL does for RDBMSs--it's a kind of query language. It complements XSLT in particular; XSLT describes what changes to make, and XPath tells where in a document to make them (very roughly). [Also recommend http://www.w3schools.com ? ]


Examples of XPath

        <html>
            <a href="http://wiki.tcl.tk/">Tcler's Wiki</a>
            <a href="http://www.tcl.tk/">Tcl Developer Xchange</a>
            <a id="getme" href="http://openacs.org/">OpenACS</a>
        </html>

Given the example XML above we could extract all <a> tags using the following XPath:

        //a

We could also grab the "OpenACS" link with the following XPath:

        //a[@id="getme"]

More examples can be found at: http://www.zvon.org/xxl/XPathTutorial/General/examples.html

JAC [Nice comments, JAC.]


[Explain XPath implementations in existing Tcl-XSLT bindings.]


tDOM makes XPath queries on script level very easy. With the example from above:

    package require tdom

    set data {
    <html>
        <a href="http://wiki.tcl.tk/">Tcler's Wiki</a>
        <a href="http://www.tcl.tk/">Tcl Developer Xchange</a>
        <a id="getme" href="http://openacs.org/">OpenACS</a>
    </html>}


    set doc [dom parse $data]
    set root [$doc documentElement]

    set aNodes [$root selectNodes {//a[@id="getme"]}]

    foreach node $aNodes {
        puts "Visit [$node text] at [$node @href]"
    }

    $doc delete

Two notes:

The brackets '[ ]' have syntactical meaning both in tcl and in XPath expressions. Don't forget to protect the brackets in your XPath expressions.

The XPath expression //a is not the best example one could choose. The '//' (which is the abbreviation for /descendant-or-self::node()/) is one of the most expensive XPath location steps for almost all known XPath engines. It means, that the XPath engine has to scan the whole tree beneath the node. Avoiding // - of course, if possible - could amazingly speed up your XPath queries or your XSLT stylesheets. Rolf Ade.


XPath users and students will want to have XGrep [L1 ] at their sides. "XGrep is a grep like utility for XML documents" which uses XPath syntax for its searches.