'''Tool Protocol Language''', or '''TPL'', is not a description of a new language, but an essay on thinking about Tcl in a different way. [CMcC] 2008-10-28 17:11 AEST: The world has many protocols for communicating between processes ('''[IPC]'''). Some are binary, some are [ASCII]. I will call ASCII protocols for IPC ''protocol languages.'' ''[Lars H]: The distinction between "ASCII" and "binary" in this sense isn't necessarily what characters may occur, but rather whether the format requires encoding explicit lengths or positions. For example [PDF] documents consist mostly of ASCII text (unless you compress data), but it is a "binary" format because it is hopeless to edit as text; doing that would throw off the exact file positions listed in the cross-reference table.'' Examples of ''protocol languages'' and ''binary protocols'' * [XML] is used in [XML-RPC] type IPC. * FCGI, [SCGI], [CGI] have aspects of protocol languages, and some of binary protocols. * HTTP is used in [REpresentational State Transfer, REST] * [JSON] is used in [AJAX] * several data representation languages [YAML], ... * [XDR] is a binary protocol used in SUN [RPC] * command-line protocol - used to invoke commands from a process, per the unix '''system()''' command, e.g. * Google's http://code.google.com/p/protobuf/%|%Protocol Buffers%|% are a binary protocol. * The [OpenMath] specification defines both an XML encoding and a binary encoding. Tcl (arguably) has many characteristics desirable in a ''protocol language'': * minimal [syntax] * readily parseable * omnipresence * well known and understood * flexible * expressive * command-args form * long history of [data is code] and [code is data] and [code is data is code] but not (yet) [data is code is data] * long history of use of [little language]s The purpose of this page is to argue this assertion, to explore the truth of the foregoing assertion, to explore counterarguments, to deepen understanding of tcl as a protocol language, to explore prerequisites for its use in this role. It seems to me ([CMcC]) that a cut-down Tcl syntax, without special meaning attributed to `{*}`, `$` and `[[]]` would serve well in this role. Such a restricted syntax should be called TPL or perhaps TDL ([Tcl Data Language]) Many applications and systems provide a [plugin] [API]. Such an API will necessarily have an expression of the form ''command+args->result'', and will therefore be well suited to a TPL representation. Examples of plugin APIs: * [GIMP] * [postfix] * [Apache] * [varnish] * [asterisk] In many cases, wrappers for library APIs have a similar form, ''command+args->result'', and in fact many [critcl] and other such wrappers translate [C] APIs into this form. The advantage and virtue of TPL is that it formalises the useful Tcl syntax subset in a way which might be of interest to people outside the Tcl community. A precedent for TPL may be found in [JSON]. All ''command+arg->result'' forms can be represented as valid Tcl lists, and these are clearly and obviously trivial to interpret in Tcl. Providing a C library which translates a meaningful set of C data types into and out of TPL would enable conversion of any plugin API into wire-ready TPL protocol language. One thing which would be useful in this endeavour is production of a subset of the Tcl syntax [*dekalog] sufficient to completely define TPL as a subset, and perhaps the full Tcl syntax could be expressed in terms of TPL. Dynamics of protocol interaction: There are several styles of protocol interaction: * simple request/confirmation - [RPC] * pipelined request/confirmation - [HTTP] ** in-order confirmation pipelining ** out-of-order confirmation * multiplexed ''streams'' of the above - [FCGI] Consideration must be given to each of these styles. ---- [RS] 2008-10-28: One Tcl-based "data language" is the one that [tDom] produces with the '''asList''' method: ======none % dom parse "42345" domDoc01301840 % [domDoc01301840 documentElement] asList foo {} {{bar {} {{grill {} {{#text 42}}} {grill {} {{#text 345}}} {qux {a b} {}}}}} ====== It may not be the prettiest sight, but can be parsed with Tcl very easily, and allows to represent the same complexity that [XML] has, with nested structures and attributes. * Each element is a triplet of {name attributes children}, where * attributes is {key val key val ...} and * children is {element ...} [Lars H]: Indeed a very useful format, that natively supports [data is code], and for which a pure-Tcl translator from XML is available: [A little XML parser] (whose only failing is that it doesn't translate XML entities to ordinary characters in e.g. attribute values and #text data). I'm presently using it for a project, and have encountered no problems with it (even though it sometimes feels a bit prolix when one has to go `[[list [[list foo {} {}]]]]` in order to make a list with the only child of a node). I suspect this is a slightly higher level format than the TPL suggested here, though; while pretty much anything can be encoded in XML, it might be that some applications of TPL would benefit from not adhering to the strict type–attributes–children structure of the XML-asList format. In XML-specification-speak, I think what I'm getting at is that XML-asList would be an important application of TPL, but not necessarily the only one. In terms of Tcl syntax subsets, there are at least three useful levels: * Full Tcl syntax ([*dekalog]). * Tcl list syntax (as shortly explained on the [lindex] manpage): $, brackets, #, semicolon, and newline have no special powers, but backslash, braces, quote, and whitespace have their normal meaning. * Tcl list syntax, plus command separators and comments. I think this is roughly what CMcC is proposing for TPL, but no direct support exists for it (that wouldn't also support full Tcl syntax). The extended list syntax is superior to list syntax in that it supports comments and is better at catching syntax errors — when forgetting an argument of some command it doesn't grab the name of the next command as that argument — while still keeping the nesting of braces at a tolerable level. A downside for internal processing is that it (presently) cannot preserve the internal representation of data, since everything shimmers to a string when you join separate commands into a script. [CMcC]: I hadn't thought about command separators and comments. I think they might unnecessarily complicate the TPL usage of Tcl syntax, although the use of command separators to represent pipelining is an intriguing possibility (seems to me that comment is completely useless in the TPL context). I think TPL needs backslash, braces, quote and whitespace. So it looks like Tcl list syntax is equivalent to TPL. This is roughly analogous to [JSON], which is a useful parallel to keep in mind, I think. In general, APIs and RPC are functional applications, so are directly representable by Tcl's command syntax + lists (and since command syntax is list syntax, this reduces to being lists.) Clearly anything can be represented as a string (that's a Tcl mantra) and it's useful to interpret a subset of strings as lists, and those lists can be used to represent anything a protocol language can be expected to represent. The virtues of formalising this approach, and giving the Tcl syntax subset a distinct name, are: (a) marketing - we have a new way to think about Tcl, we have a new way to provide utility to the wider world, and to give people a reason to think about Tcl for applications or retrofits, (b) support - we can produce a series of C/[Javascript%|%JS]/etc language functions which will interconvert between the host language's data types and Tcl's. Given such a library for a given language, the process of interfacing applications and systems written in that language to Tcl is significantly simplified, and of course the processing of the API/RPC protocol language in Tcl is *vastly* simplified. This allows Tcl to better fulfill its function as a Tool Control Language, by supporting Tool Protocol Language as a ''protocol language''. [Lars H]: I got the impression that you wanted TPL to be a natural language for config files; in that setting command separators and comments are highly recommended, but I can imagine use-cases also for command substitution (e.g. [binary decode] of some blob might be more convenient than the \x counterpart) and variable substitution there, so it is probably better served by full Tcl syntax. For communication exclusively between two pieces of software, command separators and comments are pretty useless ''and'' a significant complication, so if that is the niche for TPL then the Tcl list syntax is indeed the natural fit. ** Wire protocol ** [Lars H]: TPL, like XML, is ultimately about how one encodes data as a string of Unicode characters, but one might also think about giving recommendations on how to transport a stream of TPL messages over a binary channel (octet-stream). The following is such a "wire protocol" I'm using in a project. It is based on the wire protocol of [comm], but has some tweaks to improve robustness. Basic principles are: * Use utf-8 for character encoding — this is efficient with respect to ASCII characters (of which there are going to be a lot just for the TPL syntax) and it is capable of encoding everything. * Provide a message separator, so that the receiver can start processing a message as soon as it has been completely received. * Make it so that the complete message stream can afterwards be viewed as the list of all the messages, and viewed using standard tools (e.g. more) — this simplifies debugging. Naturally, we also want it to be easy to implement in Tcl. Transmitting is very simple. To encode a message ''msg'', do ====== encoding convertto utf-8 [list $msg] ====== To prepare a ''channel'' for a stream of messages do ====== fconfigure $channel -translation binary puts $channel \xC0\x8D ====== To write an ''encoded_msg'' to such a ''channel'' do ====== puts $channel "$encoded_msg\xC0\x8D" ====== To read the list of all messages on a ''channel'' (assuming they are well-formed) and loop over them, do ====== fconfigure $channel -translation lf -encoding utf-8 foreach msg [read $channel] { # Do stuff } ====== Reading messages as they come in (still assuming well-formedness): ====== fconfigure $channel -translation binary -blocking no fileevent $channel readable [list Receive $channel] proc Receive {channel} { variable buffer append buffer [read $channel] # For now, I also skip checking for EOF. while {[ set pos [string first \xC0\x8D $buffer] ] >= 0} { set chunk [string range $buffer 0 [expr {$pos-1}]] set buffer [string range $buffer [expr {$pos+2}] end] set msg [lindex [encoding convertfrom utf-8 $chunk] 0] # Process $msg } } ====== So how does it work? * The separator `\xC0\x8D` is malformed utf-8 for carriage return (like Tcl internally uses for NUL in stringReps), so it cannot occur in any message. * When Tcl's "[encoding convertfrom] utf-8" encounters `\xC0\x8D`, it still sees it as a carriage return, which is just ordinary whitespace as far as TPL syntax is concerned. If reading with `-encoding utf-8 -translation auto`, the whole `\xC0\x8D\n` sequence will be seen as just a crlf and thus translated to a single \n. * Taking (the string representations of) two lists and putting some whitespace between them produces a string representation of the [concat]enated list, so the whole thing will be a valid list. How it looks: * From within Tcl (and assuming suitable channel configuration), just like a list of messages. * At a unix prompt, each message starts on a new line, begins with a left brace, and ends at the end of a line with a right brace followed by two bytes of binary gunk. This makes message boundaries easy to spot. Things that could be better: * I'd like to specify that there should be a left brace before each message and a right brace after it, but as it turns out one cannot 100% rely on this if all valid string representations of lists are allowed. Consider: ======none % set a "b \"\{\"" b "{" % llength $a 2 % lindex $a 0 b % lindex $a 1 { % list $a b\ \"\{\" ====== This is a valid list of a (multi-element) list, but it isn't brace-delimited. Switching to the canonical list representation would however make it so: ====== % lrange $a 0 1 b \{ % list [lrange $a 0 1] {b \{} ====== A conclusion of this ''could'' be that TPL should tighten the Tcl list syntax somewhat. [PYK] 2014-04-09: [Lars H], you've overthunk it! This "wire protocol" is exactly equivalent to transmitting a normal valid Tcl [script] encoded in UTF-8, with each message being the syntactic equivalent of a [command] in the script. The `\xC0\x8D` strategy is entirely unnecessary. The transmission procedure can simply be: ====== fconfigure $channel -encoding utf-8 puts $channel $msg ====== and the receiving procedure can then use `[info complete]` to separate the items which, syntactically, are commands: ====== fconfigure $channel -encoding utf-8 -blocking no fileevent $channel readable [list Receive $channel] proc Receive channel { variable buffer if {[eof $channel]} { #wrap things up #If the buffer isn't empty, the message is incomplete } gets $channel line if {$line ne {}} { if {$buffer eq {}} { append buffer $line } else { append buffer \n$line } if {[info complete $buffer\n]} { process $buffer[set buffer {}] } } } ====== The moral of the story is that Tcl has another [data format] that doesn't get enough airtime: the [script]. <> Discussion