XML

XML, or eXtensible Markup Language, is a data format.

See Also

Natively accessing XML
Interfacing with XML
A discussion about merging the various XML packages.
Pull down menus in XML
Simple XML report writer
Parses a report in XML, displays it in a Tk canvas, and produces postscript from it.
XML pretty-printing
tDOM includes a pretty-printing serialization option. Those with an interest in a comparable function for TclDOM are welcome to try/use/improve dom_pretty_print .
XML Tree Walking
XML tutorials
XML-list
A survey of list-based representations of XML documents.
XML/tDOM encoding issues with the http package
XML_Wrapper
Create Tk forms on the fly from an XML file.
XSD schema validate an XML document

Resources

xml.org

Reading

Programming XML and Web services in TCL, Part 1 : An initial primer , Cameron Laird, 2001-04-01
Surveys the state-of-the-art as of spring 2001, mainly from a Zveno-biased perspective. One deficiency of that article is its neglect of Jochen Loewer's tDOM work.
Untaught XML Schema , Cameron Laird, 2009-06-05
Microsoft patents XML word processing documents , Rupert Goodwins, 2009-08-07
Amusingly enough, Microsoft has also in August 2009 been ordered to stop selling their Word product because of its XML functionality: Injunction on Microsoft Word unlikely to halt sales , Nancy Gohring, 2009-08-11.

Examples

Simple XML report writer
XML Graph to canvas

Description

XML is a simplified form of SGML, but stricter (more regular) in some aspects:

  • Singleton elements must end with />
  • attribute values must be quoted

Example:

<father name="Jack" att1="1">
    <child name="Tom" born="1997" />
</father>

Tcl's Unicode features make it a good language for processing XML.

There is broad confusion about whether to represent attributes of things that are the subject of an XML document as attributes of a tag or as the content of individual entities within an XML entity, and misuse abounds. The intent of the specification is clear even if it is not explicit: An attribute in an XML tag describes the entity that is part of the structure of the document, not the thing in the subject of the document the entity refers to. In other words, data pertaining to the subject of the document comprise the content of tags, and data that describe the document itself comprise the attributes of tags. A more concise way to put it is that entities are structure, and content is content.

The confusion may arise partially because the subject of some part of the document may be the document itself, in which case data about the structure and interpretation of the document may occur as content. In this case, data that could occur as attributes in a tag in one part of the document occurs instead as content in another part of the document. This is legitimate. The inverse case, where content occurs as attributes of a tag, is not.

Parsing

tDOM and TclXML/TclDOM are the two main Tcl extensions for parsing XML, providing both SAX parsing for stream-oriented parsing, and DOM for document-oriented parsing.

See Also:

A little XML parser
Parsing HTML
Parsing XML
Playing Sax
   [Regular Expressions Are Not A Good Idea for Parsing XML, HTML, or e-mail Addresses].
snitDom
tclhttpd XML server
A simple wrapper around Sleepycat's dbxml library version 2 to implement a remote XML database server.
TAX: A Tiny API for XML
Inspired by Stephen Uhler's HTML parser in 10 lines.
tkxmllint
A GUI frontend to xmllint.
XML shallow parsing with regular expressions
xmlp
YAXMLP an XML parser
Re-entrant and designed to not use regexp or string map.
ycl parse xml
A stackless parser based on coroutines. Features a forgiving mode, and also hooks that make it possible to parse streaming data. The resulting parse tree is available as a hierarchy of namespaces along with a set of commands that provide an interface to the hierarchy.

Validation

One way of specifying the valid tag structure of a class of documents is to use a DTD, or. This way was inherited from SGML. There are alternative ways ... XMLSchema, Relax(NG), ...

See Also:

A little XML Schema validator

Generating XML

Formatting ls information in XML
an example of a manual approach to generating XML
Minimalist XML Generation
Howto export Microsoft Outlook contacts to XML using tcom and tDom
Migrating MS Access to other databases using XML
xmlgen / htmlgen
Generates HTML/XML from a Tcl script representing the desired document.

In a mailing list conversation [reference?], Steve Ball succinctly advised, "When creating XML, I generally use TclDOM. Create a DOM tree in memory, and then use dom::DOMImplementation serialize $doc to generate the XML. The TclDOM package will make sure that the generated XML is well-formed.

Alternatively, XML is just text so there's no reason why you can't just create the string directly. Eg:

puts <document>$content</document>

The problem with this is that (a) you have to worry about the XML syntax nitty-gritty, and (b) the content variable may contain special characters which must be escaped.

There are also some generation packages available, like the 'html' package in tcllib (this will be added to TclXML RSN, when my workload permits)."

DKF: If you're going for the cheap-hack method of XML generation mentioned above, you'll want this:

proc asXML {content {tag document}} {
    set XML_MAP {
        < &lt;
        > &gt;
        & &amp;
        \" &quot;
        ' &apos;
    }
    return <$tag>[string map $XML_MAP $content]</$tag>
}

Naturally, the XML_MAP variable is factorisable...

MHo: Why not using html::quoteFormValue for this purpose?

For generation of XML (HTML) the pure Tcl way, have a look at the xmlgen module of TclXML

DKF: That's when you're moving away from cheap hacks. And HTML has a lot more entities than XML, though most are optional.

If you want to get particular about entity encoding arbitrary text, this is working for me:

variable entityMap [list & &amp\; < &lt\; > &gt\; \" &quot\;\
       \u0000

Browsing / Editing

a little XML browser
XML DOM Tk Text Browser Editor
FanXE

Publishing

Anastasia
Designed for the publication of large, highly-complex XML documents and document xcollections.
xml2html
Tcl-based tool for authoring web sites in XML then rendering to HTML.

Related Technologies

There are a whole host of technologies related to XML:

XPath
For selecting nodes within a document.
XQuery
XSL/XSLT
For transforming documents.
DTD and XMLSchema
For specifying document schemas.

tDOM and TclXML both provide good support for at least XPath and XSLT.

XML Formats

XML by itself is just a partially-standardized syntax for data. It's used as the basis for a variety of different applications, such as:

CML
Chemical Markup Language.
excel xml for Excel
(X)HTML
For web pages.
RDF and OWL
For general relational/logical data models
DocBook
For technical documentation. Other similar office document formats include Microsoft's office XML, excel xml, and OpenDoc.
MathML
OpenMath
SOAP and XML-RPC
For remote procedure calls/web-services.
VML
Vector Markup Language for vector graphics.
XLink
A common notation for links in XML to other resources.
Various configuration file formats (especially in the Java world)

Alternatives

Alternatives to using XML for data files include:

Tcl itself
TDL
JSON