Version 45 of XML

Updated 2003-05-21 15:14:59

XML = eXtensible Markup Language [L1 ]. Very generally spoken it is a simplified form of SGML, but stricter (more regular) in some aspects:

  • Singleton elements must end with />
  • attribute values must be quoted

Example:

 <father name="Jack" att1="1">
   <child name="Tom" born="1997" />
 </father>

"Programming XML in Tcl" [L2 ] surveys the state-of-the-art as of spring 2001, mainly from a Zveno-biased perspective.

One deficiency of that article is its neglect of Jochen Loewer's tDOM work.


One way of specifying the valid tag structure of a class of documents is to use a Document Type Definition, DTD for short. This way was inherited from SGML. There are alternative ways ... XMLSchema, Relax(NG), ...


Perhaps the single most important introductory point to make to Tcl developers about XML is that it's built-in! Almost--while the core Tcl distribution doesn't know about XML, it does have excellent Unicode abilities, and both the ActiveTcl and Kitten installations of Tcl include XML packages.


tDOM builds-in a pretty-printing serialization option. Those with an interest in a comparable function for TclDOM are welcome to try/use/improve/... dom_pretty_print [L3 ]. "XML pretty-printing" will eventually have more on this topic.


How can you start to generate your own XML documents with Tcl? In answering just that question in a mailing list [reference?], Steve Ball succinctly advised, "When creating XML, I generally use TclDOM. Create a DOM tree in memory, and then use 'dom::DOMImplementation serialize $doc' to generate the XML. The TclDOM package will make sure that the generated XML is well-formed.

Alternatively, XML is just text so there's no reason why you can't just create the string directly. Eg:

        puts <document>$content</document>"

The problem with this is that (a) you have to worry about the XML syntax nitty-gritty and (b) the content variable may contain special characters which you have to deal with.

There are also some generation packages available, like the 'html' package in tcllib (this will be added to TclXML RSN, when my workload permits)."

DKF - If you're going for the cheap-hack method of XML generation mentioned above, you'll want this:

  proc asXML {content {tag document}} {
     set XML_MAP {
        < &lt;
        > &gt;
        & &amp;
        \" &quot;
        ' &apos;
     }
     return <$tag>[string map $XML_MAP $content]</$tag>
  }

Naturally, the XML_MAP variable is factorisable...

For generation of XML (HTML) the pure Tcl way, have a look at the xmlgen module of TclXML on sourceforge: http://sourceforge.net/projects/tclxml/ .


 What: xml2rfc
 Where: http://xml.resource.org/ 
        http://www.ietf.org/rfc/rfc2629.txt 
 Description: A tool that converts XML source into ASCII, HTML, or nroff
        format.  Intended for support of RFC 2629.  On the above web
        page is both a CGI for converting an XML file into the various
        formats, as well as links to the conversion tool itself.  The
        tool itself includes a Tcl/TclXML tool.
 Updated: 11/2001 
 Contact: See web site

It's remarkable that there are two reasonably well-supported XML editors written (mostly) in Tcl: waX Me Lyrical (WAX), which replaces the earlier Swish in the TclXML project, and xe, maintained as part of tDOM.

de: With all respect, xe isn't an XML editor. It's an XML query tool (query language is XPath). - RS: See starDOM for a simple tDOM-based browser that allows editing, reparsing and validating XML source.


XML-RPC -- TclSOAP


RS notes that Internet Explorer makes for a convenient utility to confirm that an XML document is well-formed (although not necessarily valid). Now (since Fall 2002) he only uses starDOM, because of speed and scriptability ;-)

de: IE is useful, to some degree (up to a few MByte XML data size), as an XML Viewer, because it displays the XML document in a tree-like structure.

If you need XML validation, I recommend rxp http://www.ltg.ed.ac.uk/~richard/rxp.html . This avoids any java installation hassle (and the start up time of the java virtual maschine), is open source, runs on every relevant OS, a MS plattform binary is avaliable, if you're in need, it's very conformant and mature and it's the fastest under the more common validating XML parsers. Since rxp is a command line application, it's easily usable from a tcl programm exec.

If you insist in doing XML validation with a tcl extension, there are only two (and maybe a half) options:

Newer tDOM distributions include a validation extension tnc, which is usable both for SAX and DOM processing. It's pretty fast (even faster as rxp).

Xerces-C++ is, among other things, a validating XML parser. Some times ago Steve Ball started to wrap it as tcl extension http://cvs.sourceforge.net/cgi-bin/viewcvs.cgi/tclxml/xercessax/ Lately, Steve Ball wrote at the TclXML list: "I never got the Xerces-C++ wrapper working, but instead I've got a working libxml2 wrapper for TclDOM. At the moment you need to checkout the CVS development tree to get access to it." libxml2 also includes a validating XML parser.

And the half option? Well, it should be doable to utilize one of the various java XML parser with tclblend. I strongly recommend to stick with one of the options above. But if you are a tclblend hero and figure it out, I would be interested in the exact steps.


Joe English in c.l.t: What I usually do to get indented XML is to generate whitespace *inside* the tags, like so:

    <foo x='a' y='b' z='c'
      ><bar
          ><baz>stuff</baz
          ><qux>stuff</qux   ></bar
    ></foo>

This style looks a little weird at first, but it's the most reliable way to "pretty-print" XML without changing the content.


See also A little XML browser using tDOM and BWidgets' Tree, and its refinement starDOM - A little XML parser in pure Tcl


The Perl/Tk folks have written an XML viewer [L4 ].


Overheard in the Tcl chatroom: "Cameron Laird: XML is the moral equivalent of ASCII. 'Wouldn't want to leave home without it; 'scares me that managers think it's a big deal." CL adds, some weeks later: It continues to surprise me how many developers I encounter who tell me they've been instructed to backstitch XML into working applications for no functional reason.


XML tutorials


[ Category Acronym | Category XML | ]