XML

Difference between version 86 and 87 - Previous - Next
'''XML''', or e'''X'''tensible '''M'''arkup '''L'''anguage, is a [data format].



** See Also **

   [Natively accessing XML]:   

   [Interfacing with XML]:   a developers discussion about merging the various XML packages

   [Pull down menus in XML]:   

   [Simple XML report writer]:   parse a report in XML, display it in a [Tk] [canvas], and produce [postscript] from it

   [Regular Expressions Are Not A Good Idea for Parsing XML, HTML, or e-mail Addresses]:   

   [XML pretty-printing]:   tDOM builds-in a pretty-printing serialization option.  Those with an interest in a comparable function for TclDOM are welcome to try/use/improve [http://phaseit.net/claird/comp.lang.tcl/dom_pretty_print.html%|%dom_pretty_print].

   [XML Tree Walking]:   

   [XML tutorials]:   

   [XML-list]:   a survey of [list]-based representations of XML documetns

   [XML/tDOM encoding issues with the http package]:   

   [XML_Wrapper]:   create [Tk] forms on the fly from an XML file

   [XSD schema validate an XML document]:   



** Resources **

   [http://www.xml.org/%|%xml.org]:   



** Reading **

   [http://web.archive.org/web/20110606145728/http://www.ibm.com/developerworks/webservices/library/ws-xtcl/index.html%|%Programming XML and Web services in TCL, Part 1 : An initial primer], [Cameron Laird], 2001-04-01:   surveys the state-of-the-art as of spring 2001, mainly from a [Zveno]-biased perspective.  One deficiency of that article is its neglect of [Jochen Loewer]'s [tDOM] work.

   [http://web.archive.org/web/20110727035158/http://ldn.linuxfoundation.org/column/untaught-xml-schema%|%Untaught XML Schema], [Cameron Laird], 2009-06-05:   

   [http://web.archive.org/web/20110616090536/http://www.zdnet.com/news/microsoft-patents-xml-word-processing-documents/329645%|%Microsoft patents XML word processing documents], Rupert Goodwins, 2009-08-07:   Amusingly enough, Microsoft has also in August 2009 been ordered to stop selling their Word product '''because''' of its XML functionality: [http://www.computerworld.com/s/article/9136539/Injunction_on_Microsoft_Word_unlikely_to_halt_sales?source=CTWNLE_nlt_dailyam_2009-08-12%|%Injunction on Microsoft Word unlikely to halt sales], Nancy Gohring, 2009-08-11.



** Examples **

   [Simple XML report writer]:   

   [XML Graph to canvas]:   




** Description **

XML is a simplified form of [SGML], but stricter (more regular) in some
aspects:

   * Singleton elements must end with `/>`
   * attribute values must be quoted


Example:

======none
<father name="Jack" att1="1">
    <child name="Tom" born="1997" />
</father>
======

Tcl's excellent [Unicode] abilities make it a good language for processing XML.



** [Parsing] **

[tDOM] and [TclXML]/[TclDOM] are the two main Tcl extensions for parsing XML,
providing both [SAX] parsing for stream-oriented parsing, and [DOM] for
document-oriented parsing.

See Also:

   [A little XML parser]:   

   [Parsing XML]:   

   [snitDom]:   

   [tclhttpd XML server]:   a simple wrapper around Sleepycat's dbxml library version 2 to implement a remote XML database server

   [TAX: A Tiny API for XML]:   inspired by [Stephen Uhler's HTML parser in 10 lines]

   [tkxmllint]:   a [GUI] frontent to xmllint

   [XML shallow parsing with regular expressions]:   

   [xmlp]:   

   [YAXMLP an XML parser]:   re-entrant, and designed to not use [regexp] or [string map]

   `ycl parse xml`:   A stackless parser based on [coroutine%|%coroutines].  Features a forgiving mode, and also hooks that make it possible to parse streaming data.  The resulting parse tree is available as a hierarchy of namespaces along with a set of commands that provide an interface to the hierarchy.



** Validation **

One way of specifying the valid tag structure of a class of documents is to use
a Document Type Definition, [DTD] for short. This way was inherited from
[SGML].  There are alternative ways ... XMLSchema, Relax(NG), ...

See Also:

   [A little XML Schema validator]:   


** Generating XML **

See Also:

   [Formatting ls information in XML]:   an example of a manual approach to generating XML

   [Minimalist XML Generation]:   

   [Howto export Microsoft Outlook contacts to XML using tcom and tDom]:   

   [Migrating MS Access to other databases using XML]:   
   [xmlgen / htmlgen]:   Generates [HTML]/XML from a Tcl script representing the desired document.



----

In a mailing list conversation [[reference?], [Steve Ball] succinctly advised,
"When creating XML, I generally use [TclDOM].  Create a [DOM] tree in memory,
and then use 'dom::DOMImplementation serialize $doc' to generate the XML.  The
TclDOM package will make sure that the generated XML is well-formed.

Alternatively, XML is just text so there's no reason why you can't just create
the string directly.  Eg:

======
puts <document>$content</document>
======

The problem with this is that (a) you have to worry about the
XML syntax nitty-gritty and (b) the content variable may contain
special characters which you have to deal with.

There are also some generation packages available, like the '[html]'
package in [tcllib] (this will be added to TclXML RSN, when my
workload permits)."

[DKF]: If you're going for the cheap-hack method of XML generation mentioned
above, you'll want this:

======
proc asXML {content {tag document}} {
    set XML_MAP {
        < &lt;
        > &gt;
        & &amp;
        \" &quot;
        ' &apos;
    }
    return <$tag>[string map $XML_MAP $content]</$tag>
}
======

Naturally, the ''XML_MAP'' variable is factorisable...

[MHo]: Why not using '''html::quoteFormValue''' for this purpose?

For generation of XML (HTML) the pure Tcl way, have a look at the [xmlgen /
htmlgen%|%xmlgen] module of [TclXML]

[DKF]: That's when you're moving away from cheap hacks. And HTML has a lot more
entities than XML, though most are optional.

If you want to get particular about entity encoding '''arbitrary text''', this
is working for me:

======
variable entityMap [list & &amp\; < &lt\; > &gt\; \" &quot\;\
       \u0000
======





** Browsing / Editing **

   [a little XML browser]:   

   [XML DOM Tk Text Browser Editor]:   

   [FanXE]:   



** Publishing **

   [Anastasia]:   Designed for the publication of large, highly-complex XML documents and document xcollections.

   [http://www.fh-wedel.de/~si/xml2html.2/xml2html/xml/index.xml%|%xml2html]:   Tcl-based tool for authoring web sites in XML then rendering to HTML.
** Related Technologies **

There are a whole host of technologies related to XML:

   [XPath]:   for selecting nodes from a document

   [XML Query%|%XQuery]:   
   
   [XSL]/[XSLT]:   for transforming documents

   [DTD] and XMLSchema:   for specifying document schemas

[tDOM] and [TclXML] both provide good support for at least XPath and XSLT.



** XML Formats **

XML by itself is just a partially-standardized syntax for data. It's used as
the basis for a variety of different applications, such as:

   [CML]:   Chemical Markup Language

   [excel xml] for [Excel]:   

   (X)[HTML] for web pages:   

   [RDF] and [OWL] for general relational/logical data models:   

   [DocBook]   for technical documentation:   along with other office document formats (e.g. Microsoft's office XML format, [excel xml], OpenDoc, etc)

   [MathML]:   

   [OpenMath]:   

   [SOAP] and [XML-RPC]:   for remote procedure calls/web-services.

   [VML]:   Vector Markup Language, for vector graphics

   [XLink]:   a common notation for links in XML to other resources

   Various configuration file formats (especially in the [Java] world):   



** Alternatives **

Alternatives to using XML for data files include:

   [Tcl] itself:   

   [TDL]:   

   [JSON]:   



<<categories>> Data Serialization Format | XML