Version 67 of XML

Updated 2009-08-10 17:46:15 by LV

XML = eXtensible Markup Language [L1 ].

Very generally spoken it is a simplified form of SGML, but stricter (more regular) in some aspects:

  • Singleton elements must end with />
  • attribute values must be quoted

Example:

 <father name="Jack" att1="1">
   <child name="Tom" born="1997" />
 </father>

"Programming XML in Tcl" [L2 ] surveys the state-of-the-art as of spring 2001, mainly from a Zveno-biased perspective.

One deficiency of that article is its neglect of Jochen Loewer's tDOM work.


Parsing XML

Related Technologies

There are a whole host of technologies related to XML, such as XPath for selecting nodes from a document, XSL/XSLT for transforming XML documents, and various tools for validating XML documents for well-formedness and conformance to some schema definition. tDOM and TclXML both provide good support for at least XPath and XSLT.

Applications

XML by itself is just a partially-standardised syntax for data. It's used as the basis for a variety of different applications, such as:

  • (X)HTML for web pages
  • RDF and OWL for general relational/logical data models
  • DocBook for technical documentation, along with other office document formats (e.g. Microsoft's office XML format, excel xml, OpenDoc, etc)
  • Various configuration file formats (especially in the Java world)
  • SOAP and XML-RPC for remote procedure calls/web-services.

Alternatives

Alternatives to using XML for data files include:


One way of specifying the valid tag structure of a class of documents is to use a Document Type Definition, DTD for short. This way was inherited from SGML. There are alternative ways ... XMLSchema, Relax(NG), ...


Perhaps the single most important introductory point to make to Tcl developers about XML is that it's built-in! Almost--while the core Tcl distribution doesn't know about XML, it does have excellent Unicode abilities. The Kitten starkit includes an XML package while the ActiveTcl installations of Tcl can easily add an XML package via teacup.


tDOM builds-in a pretty-printing serialization option. Those with an interest in a comparable function for TclDOM are welcome to try/use/improve/... dom_pretty_print [L3 ]. "XML pretty-printing" will eventually have more on this topic.


How can you start to generate your own XML documents with Tcl? In answering just that question in a mailing list [reference?], Steve Ball succinctly advised, "When creating XML, I generally use TclDOM. Create a DOM tree in memory, and then use 'dom::DOMImplementation serialize $doc' to generate the XML. The TclDOM package will make sure that the generated XML is well-formed.

Alternatively, XML is just text so there's no reason why you can't just create the string directly. Eg:

        puts <document>$content</document>"

The problem with this is that (a) you have to worry about the XML syntax nitty-gritty and (b) the content variable may contain special characters which you have to deal with.

There are also some generation packages available, like the 'html' package in tcllib (this will be added to TclXML RSN, when my workload permits)."

DKF - If you're going for the cheap-hack method of XML generation mentioned above, you'll want this:

  proc asXML {content {tag document}} {
     set XML_MAP {
        < &lt;
        > &gt;
        & &amp;
        \" &quot;
        ' &apos;
     }
     return <$tag>[string map $XML_MAP $content]</$tag>
  }

Naturally, the XML_MAP variable is factorisable...
MHo: Why not using html::quoteFormValue for this purpose?

For generation of XML (HTML) the pure Tcl way, have a look at the xmlgen module of TclXML on sourceforge: http://sourceforge.net/projects/tclxml/ .

DKF: That's when you're moving away from cheap hacks. And HTML has a lot more entities than XML, though most are optional.


If you want to get particular about entity encoding arbitrary text, this is working for me:

 variable entityMap [list & &amp\; < &lt\; > &gt\; \" &quot\;\
        \u0000

A Cameron Laird article on XSD and XML schema can be found at http://ldn.linuxfoundation.org/column/untaught-xml-schema .


A August 2009 article on how Microsoft has been awarded a software patent for XML files and processing [L4 ].