"XPath is a
language for addressing
parts of an XML document, designed to be used by both [XSLT] and
[XPointer]."
That's the summary of version 1.0 of the [W3C] specification of the
[XML] Path Language--XPath. You can read this standard (or
"recommendation", in W3C vernacular) for yourself at http://www.w3.org/TR/xpath .
One way to think about XPath is that it does for XML instances
a bit of what [SQL] does for [RDBMS]s--it's a kind of query language.
It complements XSLT in particular; XSLT describes what changes to
make, and XPath tells where in a document to make them (very roughly).
[[Also recommend http://www.w3schools.com ? ]]
----
Examples of XPath
Tcler's Wiki
Tcl Developer Xchange
OpenACS
Given the example XML above we could extract all tags using the following
XPath:
//a
We could also grab the "OpenACS" link with the following XPath:
//a[@id="getme"]
More examples can be found at: http://www.zvon.org/xxl/XPathTutorial/General/examples.html
[JAC] [[Nice comments, JAC.]]
----
[[Explain XPath implementations in existing Tcl-XSLT bindings.]]
----
[tDOM] makes XPath queries on script level very easy. With the example from above:
package require tdom
set data {
Tcler's Wiki
Tcl Developer Xchange
OpenACS
}
set doc [dom parse $data]
set root [$doc documentElement]
set aNodes [$root selectNodes {//a[@id="getme"]}]
foreach node $aNodes {
puts "Visit [$node text] at [$node @href]"
}
$doc delete
Two notes:
The brackets '[[ ]]' have syntactical meaning both in tcl and in XPath expressions. Don't forget to protect the brackets in your XPath expressions.
The XPath expression //a is not the best example one could choose. The '//' (which is the abbreviation for /descendant-or-self::node()/) is one of the most expensive XPath location steps for almost all known XPath engines. It means, that the XPath engine has to scan the whole tree beneath the node. Avoiding // - of course, if possible - could amazingly speed up your XPath queries or your XSLT stylesheets. [Rolf Ade].
----
XPath users and students will want to have XGrep
[http://software.decisionsoft.com/pathanXgrepDocumentation.html]
at their sides. "XGrep
is
a grep like utility for XML documents"
which uses XPath syntax for its searches.
----
[RS] 2005-06-13: Here's experimental code to fill in a default namespace into XPath expressions:
proc XPathNS {ns path} {
set res {}
foreach e [split $path /] {
if {$e eq "."} continue
if {![in {"" ..} $e] && ![has : $e]} {set e $ns:$e}
lappend res $e
}
join $res /
}
proc in {list el} {expr {[lsearch $list $el]>=0}}
proc has {substr str} {expr {[string first $substr $str]>=0}}
#-- Testing:
% XPathNS NN /a/b/c
/NN:a/NN:b/NN:c
% XPathNS NN /a/b/../c
/NN:a/NN:b/../NN:c
% XPathNS NN /a/./b/../c/foo:bar ;# preserve explicit namespaces, "foo" here
/NN:a/NN:b/../NN:c/foo:bar
----
[RS] 2006-03-24: A little command line tool to do an XPath query on an XML file:
#!/usr/bin/env tclsh
set usage {
usage: xpath xmlfile query
}
if {[llength $argv] != 2} {puts stderr $usage; exit}
package require tdom
proc main argv {
foreach {xmlfile query} $argv break
set f [open $xmlfile]
set docel [[dom parse -channel $f doc] documentElement]
close $f
foreach node [$docel selectNodes $query] {
puts [$node asXML]
}
}
main $argv
----
2006-05-22 From linux-magazine article issue 20
Table 3: XPATH examples
Query Description
/option The option element directly below the root node
//option All elements in the document called option
//option[3] The third option element
/table/* All elements below table, where table must be located directly below the root node
//table[1] The first table element in a document
//table[last()] The last table element
//@colspan All colspan attributes in a document
//td[@colspan] All td elements with the attribute colspan
//table[@width] All table elements that have a width attribute
//table[@width=690] All table elements with a width attribute that has avalue of 690
//*[count(tr)=2] All elements with two tr child nodes
//tr/td|th All td and th elements contained within a tr element
//table//img All img elements contained within a table–element
//table[1]//img[2] Second img element in the first table
----
[Category Glossary]