XML

Difference between version 90 and 91 - Previous - Next
'''XML''', or e'''X'''tensible '''M'''arkup '''L'''anguage, is a [data format].



** See Also **

   [Natively accessing XML]:   

   [Interfacing with XML]:   A discussion about merging the various XML packages.

   [Pull down menus in XML]:   

   [Simple XML report writer]:   Parses a report in XML, displays it in a [Tk] [canvas], and produces [postscript] from it.

   [XML pretty-printing]:   tDOM includes a pretty-printing serialization option.  Those with an interest in a comparable function for TclDOM are welcome to try/use/improve [http://phaseit.net/claird/comp.lang.tcl/dom_pretty_print.html%|%dom_pretty_print].

   [XML Tree Walking]:   

   [XML tutorials]:   

   [XML-list]:   A survey of [list]-based representations of XML documents.

   [XML/tDOM encoding issues with the http package]:   

   [XML_Wrapper]:   Create [Tk] forms on the fly from an XML file.

   [XSD schema validate an XML document]:   



** Resources **

   [http://www.xml.org/%|%xml.org]:   



** Reading **

   [http://web.archive.org/web/20110606145728/http://www.ibm.com/developerworks/webservices/library/ws-xtcl/index.html%|%Programming XML and Web services in TCL, Part 1 : An initial primer], [Cameron Laird], 2001-04-01:   Surveys the state-of-the-art as of spring 2001, mainly from a [Zveno]-biased perspective.  One deficiency of that article is its neglect of [Jochen Loewer]'s [tDOM] work.

   [http://web.archive.org/web/20110727035158/http://ldn.linuxfoundation.org/column/untaught-xml-schema%|%Untaught XML Schema], [Cameron Laird], 2009-06-05:   

   [http://web.archive.org/web/20110616090536/http://www.zdnet.com/news/microsoft-patents-xml-word-processing-documents/329645%|%Microsoft patents XML word processing documents], Rupert Goodwins, 2009-08-07:   Amusingly enough, Microsoft has also in August 2009 been ordered to stop selling their Word product '''because''' of its XML functionality: [http://www.computerworld.com/s/article/9136539/Injunction_on_Microsoft_Word_unlikely_to_halt_sales?source=CTWNLE_nlt_dailyam_2009-08-12%|%Injunction on Microsoft Word unlikely to halt sales], Nancy Gohring, 2009-08-11.
   [https://wusspuss.neocities.org/xmpp.html%|%Is xmpp any good? Also, let's write a client in tcl, maybe]:   Includes a scathing critique of XML.


** Examples **

   [Simple XML report writer]:   

   [XML Graph to canvas]:   




** Description **

XML is a simplified form of [SGML], but stricter (more regular) in some
aspects:

   * Singleton elements must end with `/>`
   * attribute values must be quoted


Example:

======none
<father name="Jack" att1="1">
    <child name="Tom" born="1997" />
</father>
======

Tcl's [Unicode] features make it a good language for processing XML.

There is broad confusion about whether to represent attributes of things that
are the subject of an XML document as attributes of a tag or as the content of
individual entities within an XML entity, and misuse abounds.  The intent of
the specification is clear even if it is not explicit:  An attribute in an XML
tag describes the entity that is part of the structure of the document, not
the thing in the subject of the document the entity refers to.  In other
words, data pertaining to the subject of the document comprise the content of
tags, and data that describe the document itself comprise the attributes of
tags.  A more concise way to put it is that entities are structure, and content
is content.

The confusion may arise partially because the subject of some part of the
document may be the document itself, in which case data about the structure and
interpretation of the document may occur as content.  In this case, data that
could occur as attributes in a tag in one part of the document occurs instead
as content in another part of the document.  This is legitimate.  The inverse
case, where content occurs as attributes of a tag, is not.



** [Parsing] **

[tDOM] and [TclXML]/[TclDOM] are the two main Tcl extensions for parsing XML,
providing both [SAX] parsing for stream-oriented parsing, and [DOM] for
document-oriented parsing.

See Also:

   [A little XML parser]:   

   [Parsing HTML]:   

   [Parsing XML]:   

   [Playing Sax]:   

   [Regular Expressions Are Not A Good Idea for Parsing XML, HTML, or e-mail Addresses].

   [snitDom]:   

   [tclhttpd XML server]:   A simple wrapper around Sleepycat's dbxml library version 2 to implement a remote XML database server.

   [TAX: A Tiny API for XML]:   Inspired by [Stephen Uhler's HTML parser in 10 lines].

   [tkxmllint]:   A [GUI] frontend to xmllint.

   [XML shallow parsing with regular expressions]:   

   [xmlp]:   

   [YAXMLP an XML parser]:   Re-entrant and designed to not use [regexp] or [string map].

   `ycl parse xml`:   A stackless parser based on [coroutine%|%coroutines].  Features a forgiving mode, and also hooks that make it possible to parse streaming data.  The resulting parse tree is available as a hierarchy of namespaces along with a set of commands that provide an interface to the hierarchy.



** Validation **

One way of specifying the valid tag structure of a class of documents is to use
a [Document Type Definition%|%DTD], or. This way was inherited from
[SGML].  There are alternative ways ... XMLSchema, Relax(NG), ...

See Also:

   [A little XML Schema validator]:   



** Generating XML **

   [Formatting ls information in XML]:   an example of a manual approach to generating XML

   [Minimalist XML Generation]:   

   [Howto export Microsoft Outlook contacts to XML using tcom and tDom]:   

   [Migrating MS Access to other databases using XML]:   

   [xmlgen / htmlgen]:   Generates [HTML]/XML from a Tcl script representing the desired document.




----

In a mailing list conversation [[reference?], [Steve Ball] succinctly advised,
"When creating XML, I generally use [TclDOM].  Create a [DOM] tree in memory,
and then use `dom::DOMImplementation serialize $doc` to generate the XML.  The
TclDOM package will make sure that the generated XML is well-formed.

Alternatively, XML is just text so there's no reason why you can't just create
the string directly.  Eg:

======
puts <document>$content</document>
======

The problem with this is that (a) you have to worry about the
XML syntax nitty-gritty, and (b) the content variable may contain
special characters which must be escaped.

There are also some generation packages available, like the '[html]'
package in [tcllib] (this will be added to TclXML RSN, when my
workload permits)."

[DKF]: If you're going for the cheap-hack method of XML generation mentioned
above, you'll want this:

======
proc asXML {content {tag document}} {
    set XML_MAP {
        < &lt;
        > &gt;
        & &amp;
        \" &quot;
        ' &apos;
    }
    return <$tag>[string map $XML_MAP $content]</$tag>
}
======

Naturally, the ''XML_MAP'' variable is factorisable...

[MHo]: Why not using '''html::quoteFormValue''' for this purpose?

For generation of XML (HTML) the pure Tcl way, have a look at the [xmlgen /
htmlgen%|%xmlgen] module of [TclXML]

[DKF]: That's when you're moving away from cheap hacks. And HTML has a lot more
entities than XML, though most are optional.

If you want to get particular about entity encoding '''arbitrary text''', this
is working for me:

======
variable entityMap [list & &amp\; < &lt\; > &gt\; \" &quot\;\
       \u0000
======





** Browsing / Editing **

   [a little XML browser]:   

   [XML DOM Tk Text Browser Editor]:   

   [FanXE]:   



** Publishing **

   [Anastasia]:   Designed for the publication of large, highly-complex XML documents and document xcollections.

   [http://www.fh-wedel.de/~si/xml2html.2/xml2html/xml/index.xml%|%xml2html]:   Tcl-based tool for authoring web sites in XML then rendering to HTML.
** Related Technologies **

There are a whole host of technologies related to XML:

   [XPath]:   For selecting nodes within a document.

   [XML Query%|%XQuery]:   
   
   [XSL]/[XSLT]:   For transforming documents.

   [DTD] and XMLSchema:   For specifying document schemas.

[tDOM] and [TclXML] both provide good support for at least XPath and XSLT.



** XML Formats **

XML by itself is just a partially-standardized syntax for data. It's used as
the basis for a variety of different applications, such as:

   [CML]:   Chemical Markup Language.

   [excel xml] for [Excel]:   

   (X)[HTML]:   For web pages.

   [RDF] and [OWL]:   For general relational/logical data models

   [DocBook]:   For technical documentation.  Other similar office document formats include Microsoft's office XML, [excel xml], and OpenDoc.

   [MathML]:   

   [OpenMath]:   

   [SOAP] and [XML-RPC]:   For remote procedure calls/web-services.

   [VML]:   Vector Markup Language for vector graphics.

   [XLink]:   A common notation for links in XML to other resources.

   Various configuration file formats (especially in the [Java] world):   



** Alternatives **

Alternatives to using XML for data files include:

   [Tcl] itself:   

   [TDL]:   

   [JSON]:   



<<categories>> Data Serialization Format | XML