XML = e'''X'''tensible '''M'''arkup '''L'''anguage [http://www.xml.org/].
Very generally spoken it is a simplified form of [SGML], but stricter (more regular) in some aspects:
* Singleton elements must end with />
* attribute values must be quoted
Example:
"Programming XML in Tcl" [http://www-106.ibm.com/developerworks/webservices/library/ws-xtcl.html]
surveys the state-of-the-art as of spring 2001, mainly from a [Zveno]-biased perspective.
One deficiency of that article is its neglect of [Jochen Loewer]'s [tDOM] work.
----
One way of specifying the valid tag structure of a class of documents is to use a Document Type Definition, [DTD] for short. This way was inherited from SGML. There are alternative ways ... XMLSchema, Relax(NG), ...
----
Perhaps the single most important introductory point to make to Tcl
developers about XML is that it's built-in! Almost--while the core
Tcl distribution doesn't know about XML, it does have excellent
[Unicode] abilities, and both the [ActiveTcl] and [Kitten]
installations of Tcl include XML packages.
----
tDOM builds-in a pretty-printing serialization option. Those with an interest in a comparable function
for TclDOM are welcome to try/use/improve/... dom_pretty_print [http://phaseit.net/claird/comp.lang.tcl/dom_pretty_print.html].
"[XML pretty-printing]" will eventually have more on this topic.
----
How can you start to generate your own XML documents with Tcl? In
answering just that question in a mailing list [[reference?], [Steve Ball] succinctly
advised, "When creating XML, I generally use [TclDOM]. Create a [DOM] tree in memory,
and then use 'dom::DOMImplementation serialize $doc' to generate the
XML. The TclDOM package will make sure that the generated XML is
well-formed.
Alternatively, XML is just text so there's no reason why you can't
just create the string directly. Eg:
puts $content"
The problem with this is that (a) you have to worry about the
XML syntax nitty-gritty and (b) the content variable may contain
special characters which you have to deal with.
There are also some generation packages available, like the '[html]'
package in [tcllib] (this will be added to TclXML RSN, when my
workload permits)."
[DKF] - If you're going for the cheap-hack method of XML generation mentioned above, you'll want this:
proc asXML {content {tag document}} {
set XML_MAP {
< <
> >
& &
\" "
' '
}
return <$tag>[string map $XML_MAP $content]$tag>
}
Naturally, the ''XML_MAP'' variable is factorisable...
For generation of XML (HTML) the pure Tcl way, have a look at the xmlgen module
of TclXML on sourceforge: http://sourceforge.net/projects/tclxml/.
----
If you want to get peticular about entity encoding '''arbitrary text''', this is working for me:
variable entityMap [list & &\; < <\; > >\; \" "\;\
\u0000 \; \u0001 \; \u0002 \; \u0003 \;\
\u0004 \; \u0005 \; \u0006 \; \u0007 \;\
\u0008 \; \u000b \; \u000c \; \u000d
\;\
\u000e \; \u000f \; \u0010 \; \u0011 \;\
\u0012 \; \u0013 \; \u0014 \; \u0015 \;\
\u0016 \; \u0017 \; \u0018 \; \u0019 \;\
\u001A \; \u001B \; \u001C \; \u001D \;\
\u001E \; \u001F \;]
proc entityEncode {text} {
variable entityMap
return [string map $entityMap $text]
}
Notice I drop \t, \n and \r as those are acceptable chars [DG]
----
What: xml2rfc
Where: http://xml.resource.org/
http://www.ietf.org/rfc/rfc2629.txt
Description: A tool that converts XML source into ASCII, HTML, or nroff
format. Intended for support of RFC 2629. On the above web
page is both a CGI for converting an XML file into the various
formats, as well as links to the conversion tool itself. The
tool itself includes a Tcl/TclXML tool.
Updated: 11/2001
Contact: See web site
----
It's remarkable that there are ''two'' reasonably well-supported
XML editors written (mostly) in Tcl: waX Me Lyrical
([WAX]),
which replaces the earlier Swish
[] in the [TclXML] project,
and
xe, maintained as part of [tDOM].
de: With all respect, xe isn't an XML editor. It's an XML query tool (query language is XPath). - [RS]: See [starDOM] for a simple [tDOM]-based browser that allows editing, reparsing and validating XML source.
----
[XML-RPC] -- [TclSOAP]
----
[RS] notes that Internet Explorer makes for a convenient utility to confirm that an XML document is well-formed (although not necessarily valid). Now (since Fall 2002) he only uses [starDOM], because of speed and scriptability ;-)
de: IE is useful, to some degree (up to a few MByte XML
data size), as an XML Viewer, because it displays the XML
document in a tree-like structure.
If you need XML validation, I recommend rxp
http://www.ltg.ed.ac.uk/~richard/rxp.html. This avoids any
java installation hassle (and the start up time of the java
virtual maschine), is open source, runs on every relevant
OS, a MS plattform binary is avaliable, if you're in need,
it's very conformant and mature and it's the fastest under
the more common validating XML parsers. Since rxp is a
command line application, it's easily usable from a tcl
programm [exec].
If you insist in doing XML validation with a tcl extension,
there are only two (and maybe a half) options:
Newer [tDOM] distributions include a validation extension [tnc],
which is usable both for SAX and DOM processing. It's pretty
fast (even faster as rxp).
Xerces-C++ is, among other things, a validating XML parser.
Some times ago Steve Ball started to wrap it as tcl extension
http://cvs.sourceforge.net/cgi-bin/viewcvs.cgi/tclxml/xercessax/
Lately, Steve Ball wrote at the TclXML list: "I never got the Xerces-C++ wrapper working, but instead I've got
a working libxml2 wrapper for TclDOM. At the moment you need to
checkout the CVS development tree to get access to it." libxml2 also includes a validating XML parser.
And the half option? Well, it should be doable to utilize
one of the various java XML parser with tclblend. I strongly
recommend to stick with one of the options above. But if you
are a tclblend hero and figure it out, I would be interested
in the exact steps.
----
[Joe English] in c.l.t: What I usually do to get indented XML is to generate
whitespace *inside* the tags, like so:
stuffstuff
This style looks a little weird at first, but
it's the most reliable way to "pretty-print" XML
without changing the content.
----
See also [A little XML browser] using [tDOM] and [BWidgets]' Tree, and its refinement [starDOM] - [A little XML parser] in pure Tcl
----
The [Perl/Tk] folks have written an XML viewer [http://search.cpan.org/search?mode=module&query=Tk%3a%3aXMLViewer].
----
Overheard in the [Tcl chatroom]: "[Cameron Laird]: XML is the moral equivalent of ASCII. 'Wouldn't want to leave home without it; 'scares me that managers think it's a big deal." CL adds, some weeks later: It continues to surprise me how many developers I encounter who tell me they've been instructed to backstitch XML into ''working'' applications for no functional reason.
----
A cute entity encoder when producing XML from arbitrary text:
interp alias {} xesc {} string map {< < > > & &} ;# RS
----
A cute XML generator (sorry, no attributes, no entities):
proc < {name args} {return <$name>[join $args ""]$name>\n} ;#RS
You can control the tree structure by the nesting of the calls to "<" (here using the auto-indentation of [emacs]:
< root \
[< branch 1 \
[< leaf 1] \
[< leaf 2]] \
[< branch 2 \
[< leaf 3] \
[< leaf 4]]
produces this semi-prettyprint:
11
2
23
4
----
Another variation, again by [RS]:
proc < {element {value ""} {attributes {}}} {
set res <$element
foreach {att attval} $attributes {append res " $att='$attval'"}
if {$value eq ""} {
append res " />"
} else {
append res >[string map {& & < < > >} $value]$element>
}
}
% < try "this is & value" {lang EN}
this is <test> & value
% < try "" {lang EN}
----
[LV] In the news: [http://news.zdnet.com/2100-3513_22-5905949.html?tag=nl.e540] is an article about a company with a couple of patents that they claim are infringed upon by use of XML. They are working out an agreement with a firm that will handle contacting anyone using xml to collect licensing fees... so far, they've contacted 47 companies.
----
[XML tutorials]
----
[[
[Category Acronym] |
[Category XML] |
]]