** Description ** XML, which stands for e'''X'''tensible '''M'''arkup '''L'''anguage, is a [data format]. It is a simplified form of [SGML], but stricter (more regular) in some aspects: * Singleton elements must end with /> * attribute values must be quoted Example: ======none ====== "Programming XML in Tcl" [http://www-106.ibm.com/developerworks/webservices/library/ws-xtcl.html] surveys the state-of-the-art as of spring 2001, mainly from a [Zveno]-biased perspective. One deficiency of that article is its neglect of [Jochen Loewer]'s [tDOM] work. ---- ** Parsing XML ** * The two main standard APIs for XML [parser]s are [SAX] and [DOM]. * [tDOM] and [TclXML]/[TclDOM] are the two main Tcl extensions for parsing XML, providing both SAX and DOM implementations. * See also [Parsing XML], [A little XML parser], [XML shallow parsing with regular expressions], [xmlp], and [Regular Expressions Are Not A Good Idea for Parsing XML, HTML, or e-mail Addresses]. ** Related Technologies ** There are a whole host of technologies related to XML, such as [XPath] for selecting nodes from a document, [XSL]/[XSLT] for transforming XML documents, and various tools for validating XML documents for well-formedness and conformance to some schema definition. [tDOM] and [TclXML] both provide good support for at least XPath and XSLT. ** Applications ** XML by itself is just a partially-standardized syntax for data. It's used as the basis for a variety of different applications, such as: * (X)[HTML] for web pages * [RDF] and [OWL] for general relational/logical data models * [DocBook] for technical documentation, along with other office document formats (e.g. Microsoft's office XML format, [excel xml], OpenDoc, etc) * [MathML] * [OpenMath] * Various configuration file formats (especially in the [Java] world) * [SOAP] and [XML-RPC] for remote procedure calls/web-services. ** Alternatives ** Alternatives to using XML for data files include: * [Tcl] itself * [TDL] * [JSON] * ... ** External Links ** * [http://www.xml.org/] ** To Sort ** One way of specifying the valid tag structure of a class of documents is to use a Document Type Definition, [DTD] for short. This way was inherited from SGML. There are alternative ways ... XMLSchema, Relax(NG), ... ---- Perhaps the single most important introductory point to make to Tcl developers about XML is that it's built-in! Almost--while the core Tcl distribution doesn't know about XML, it does have excellent [Unicode] abilities. The [Kitten] [starkit] includes an XML package while the [ActiveTcl] installations of Tcl can easily add an XML package via [teacup]. ---- tDOM builds-in a pretty-printing serialization option. Those with an interest in a comparable function for TclDOM are welcome to try/use/improve/... dom_pretty_print [http://phaseit.net/claird/comp.lang.tcl/dom_pretty_print.html]. "[XML pretty-printing]" will eventually have more on this topic. ---- How can you start to generate your own XML documents with Tcl? In answering just that question in a mailing list [[reference?], [Steve Ball] succinctly advised, "When creating XML, I generally use [TclDOM]. Create a [DOM] tree in memory, and then use 'dom::DOMImplementation serialize $doc' to generate the XML. The TclDOM package will make sure that the generated XML is well-formed. Alternatively, XML is just text so there's no reason why you can't just create the string directly. Eg: ====== puts $content ====== The problem with this is that (a) you have to worry about the XML syntax nitty-gritty and (b) the content variable may contain special characters which you have to deal with. There are also some generation packages available, like the '[html]' package in [tcllib] (this will be added to TclXML RSN, when my workload permits)." [DKF] - If you're going for the cheap-hack method of XML generation mentioned above, you'll want this: ====== proc asXML {content {tag document}} { set XML_MAP { < < > > & & \" " ' ' } return <$tag>[string map $XML_MAP $content] } ====== Naturally, the ''XML_MAP'' variable is factorisable... <
> [MHo]: Why not using '''html::quoteFormValue''' for this purpose? For generation of XML (HTML) the pure Tcl way, have a look at the xmlgen module of TclXML on sourceforge: http://sourceforge.net/projects/tclxml/. [DKF]: That's when you're moving away from cheap hacks. And HTML has a lot more entities than XML, though most are optional. ---- If you want to get particular about entity encoding '''arbitrary text''', this is working for me: variable entityMap [list & &\; < <\; > >\; \" "\;\ \u0000 ---- A [Cameron Laird] article on XSD and XML schema can be found at http://ldn.linuxfoundation.org/column/untaught-xml-schema . ---- An August 2009 article on how Microsoft has been awarded a software patent for XML files and processing [http://news.zdnet.com/2100-9595_22-329645.html?tag=nl.e539]. Amusingly enough, Microsoft has also in August 2009 been ordered to stop selling their Word product '''because''' of its XML functionality [http://www.computerworld.com/s/article/9136539/Injunction_on_Microsoft_Word_unlikely_to_halt_sales?source=CTWNLE_nlt_dailyam_2009-08-12]. <> Data Serialization Format | XML