Version 13 of TclDOM vs tDOM

Updated 2003-06-16 09:36:06

There are two well-known packages for processing XML documents in Tcl: TclXML [L1 ] (and related packages) and tDOM. One very common question is "Which package should I use for XML processing, and why?". This page summarizes the features and differences between the two packages in order to provide a (hopefully) unbiased view.

I've started this page on the Wiki so that members of both the TclXML and tDOM projects can contribute on an equal basis.

(A table would be best here, but I don't know how to get that in Wiki)

Packages:

  • TclXML is separated into three packages: TclXML for parsing, TclDOM for tree processing, TclXSLT for transformations.
  • tDOM is a single, integrated package.

Parser implementations:

  • TclXML: Pure Tcl (no extension required), C wrappers for expat and (in v3.0) libxml2.
  • tDOM: C wrapper for expat (plus the "simple" but even faster parser based on work by Richard Hipp, typically well suited for 'data-oriented' XML)

DOM implementations:

  • TclDOM: There are three distinct implementations - pure Tcl (no extension required) and 2 C extensions (TclDOM/C and TclDOM/libxml2 (a wrapper for libxml2)).
  • tDOM: C extension.

XSLT implementations:

  • TclXSLT: C wrapper for libxslt - only works with TclDOM/libxml2.
  • tDOM: C implementation.

TEA Compliance:

  • TclXML family: Yes.
  • tDOM: Yes.

Performance:

  • tDOM is reported to have superior runtime performance.
  • Both tDOM and libxml2/libxslt out-perform most other XSLT processors, though MSXML is reputed to also be a well-performing processor.

Memory demand:

  • tDOM trees need 1.5 - 2 times lesser memory overhead than TclDOM/libxml2 trees. TclDOM/Tcl needs much more memory.
  • Most Java DOM implementations need notable more memory, to represent a DOM tree.

Parsing XML

  • TclXML: SAX-style callback API, including interposing on external entity resolution. DTD validation. Posteriori validation (DOM tree validation). Work is underway on TclXML/TclDOM v3.0 which will allow XML Schema and RelaxNG validation, as well as supporting combined DOM building and SAX events during the same parsing step.
  • tDOM: SAX-style callback API, including interposing on external entity resolution. DTD validation. Posteriori validation (DOM tree validation). Supports more then one script per sax event. Allows DOM building and SAX events in one parsing step.

DOM Scripting:

  • TclDOM fairly strictly adheres to the W3C DOM API: IDL Interfaces are mapped to Tcl commands, live node lists, etc. Tree nodes are represented as "tokens" that are passed as arguments to the DOM commands (ie. tree nodes are mutable objects).
  • tDOM is somewhat more "Tclish": Tree nodes are defined as Tcl commands. Additionally, tDOM supports also representing nodes as "tokens". Serializing/parsing of subtrees to/from nested Tcl lists. Very 'Tclish' way to create new subtrees (appendFromScript).

XSLT Scripting:

  • TclXSLT: Allows stylesheets to be compiled, transformations performed and reuse of the compiled stylesheet. Also allows XSLT/XPath extensions to be implemented as Tcl callbacks.
  • tDOM: Yes (including stylesheet "compilation" and reuse).

XPath

  • tDOM supports XPath queries. It works very fast.
  • TclDOM/libxml2 supports XPath queries. There is also a partial implementation of XPath in the pure-Tcl TclXML/TclDOM packages.

Deployment:

  • tDOM can be used as one C library. There are tDOM modules as part of TclKit. But is not provided by ActiveState Tcl distribution.
  • TclXML, TclDOM have a pure Tcl implementation, which can be advantageous for extension-free deployment.
  • The libxml2/libxslt wrapper for TclDOM and TclXSLT are available in the ActiveState distribution.
  • Both tDOM and TclDOM/libxml2, TclXSLT are available in Mac OS X binary distributions.

de: It is a bit confusing, that TclDOM inlcudes not one DOM implementation, but at least three different ones with different characteristics. Not everything, mentioned above about 'TclDOM' is true for every of the included implementations. For example, if you want to use XSLT with TclDOM, you must use the libxml2 wrapper, with the others you could not use XSLT.

SRB That's true - there are a number of separate implementations of TclXML and TclDOM. However, they all present the same scripting API to the application, so developers can prototype using the Tcl implementation and then use the C implementation for speed.

MC 15 June 2003: Steve, above you've written that "TclDOM/libxml2 supports XPath queries", while the pure-Tcl version of "TclXML/TclDOM has a partial implementation of XPath". While both versions "present the same scripting API", obviously the actual implemented features available via the "same scripting API" differ depending on which flavor of TclDOM is in use, no?

de The note about strictly adherence to the W3C DOM API targets mainly the scripted TclDOM.

SRB Not true. All implementations adhere to the same API and data/processing model.

de The notes about speed and memory usage are about the C wrapper, the scripted TclDOM is much slower/memory hungrier. Etc.

SRB True. Tcl implementations are always slower and use more memory than C implementations.


SRB Let's move this commentary to another page, or delete it. Rolf has pointed out that tDOM has partial support for live node lists and I will accept that (at least for the purpose of comparing the two offerings - I don't want to argue about the finer points here).

SRB commented above: last time I looked some aspects of the W3C DOM spec were not defined, such as live node lists

de: Since more than tree years (see [L2 ]) Steve don't misses to repeat this claim at various occasions, including Tcl Cons. One could get the impression, that this is a major claim, and that this outweighs by far any eventual advantages, tDOM may have. So lets take a closer look, what's this fuss is about.

The claim is partly false. tDOM nodes have a method 'childNodesLive' (see [L3 ]), even in the oldest tDOM version which I found on my disk (0.5a2 - around 3 years old, or so). Interesting enough, I replied at that time to Steves Mail mentioned above and showed, how one could get such a 'live nodelist' returning method in tDOM with a few lines of tcl code (see [L4 ]) and Jochen added code along this lines to the distribution. Yes, there isn't a similar 'live' nodelist returning getElementsByTagName method (the method is there, it returns a tcl list of nodes) and in fact the mentioned childNodesLive is more a skeleton then a robust implementation. But although I'm one of the maintainers of tDOM I never ever have heard any problem report about this. Why so? This brings us to the next point.

What about the pratical relevance of this thing? Given, that the programmer would have only a childNodes method at hand, which returns a live nodelist, he would have to write something like

  set liveNodelist [$node childNodesLive]
  for {set x 0} {$x < [$liveNodelist lenght]} {incr x} {
      set childNode [$liveNodelist item $x]
      # do something with childNode
  }

With the current childNodes method, which returns a tcl list of the childs of the node you simply do

  foreach childNode [$node childNodes] {
      # do something with childNode
  }

I only use the second way.

Steve not only continues to repeat this claim since years, he is also the maintainer of TclXML/TclDOM. So one would think, that his stuff of course support this feature. In fact, the scripted TclDOM implementation does. But the flagship in the TclXML/TclDOM package is currently Steves Tcl wrapper around the libxml2 library. And guess what? It doesn't support live nodelists. See the release notes of the actual 2.6 release [L5 ]. I believe, none of the C wrappers, Steve promoted over the years, ever supported live nodelists.

I never understood, why Steve continues to bring up this thing. There are (and always were) tons of much more important points.


MC 15 June 2003: Personally, I'm glad Tcl has both tDOM and TclXML. Having choices--just like we have with OO-extensions--is a Good Thing for Tcl in my opinion. I found Steve's XML tutorials at the Ninth Annual Tcl/Tk Conference to be quite educational, and when I got home and had trouble with the build dependencies of TclXML, I discovered tDOM to be very simple to compile and begin using. YMMV. I appreciate the work of both the TclXML and tDOM developers, and am glad that both projects continue to be actively developed.


TV I guess someone removed my remark (in error) assuming this page included COM objects as subject. Probably the one who did though it shouldn't stay in.

While I'm a it, I guess the XML stuff, as the HTML (HyperText / eXtended Markup Language) raises efficiency issues to begin with because of it being readable composed of readable text primitives and structuring, which are of course always suboptimal in terms of limited use of the available information per byte (only 26 or 37 or so different chars used instead of the 256 or 64k possible ones). Standard 'compress'ing any character based XML file should easily get you 300 percent storage space savings, and considering it's subfield strings/delimiters (just like in html) are not reused with some efficient mechanism (which is good for readability) the whole storage langauge is not efficient by nature.

But readable.

I remember writing a hierarchically oriented graphics language, for a project, where I did want to reuse object definitions, and define transformations on them, on top of being able to in fact make a nested list structure with named elements, which is essentially what xml does.

Tcl suits either approach fine enough, considering it's (computation) efficient use of lists, which apart from byte coding are probably hardly storage efficient by themselves, though, which is fine with me. Before I typed list or program characters on a row enough to fill the whole main memory of even an old PC, most projects are over.