Tool Protocol Language, or TPL, is not a description of a new language, but an essay on thinking about Tcl in a different way.
CMcC 2008-10-28 17:11 AEST:
The world has many protocols for communicating between processes (IPC). Some are binary, some are ASCII. I will call ASCII protocols for IPC protocol languages.
Lars H: The distinction between "ASCII" and "binary" in this sense isn't necessarily what characters may occur, but rather whether the format requires encoding explicit lengths or positions. For example PDF documents consist mostly of ASCII text (unless you compress data), but it is a "binary" format because it is hopeless to edit as text; doing that would throw off the exact file positions listed in the cross-reference table.
Examples of protocol languages and binary protocols
Tcl (arguably) has many characteristics desirable in a protocol language:
The purpose of this page is to argue this assertion, to explore the truth of the foregoing assertion, to explore counterarguments, to deepen understanding of tcl as a protocol language, to explore prerequisites for its use in this role.
It seems to me (CMcC) that a cut-down Tcl syntax, without special meaning attributed to {*}, $ and [] would serve well in this role. Such a restricted syntax should be called TPL or perhaps TDL (Tcl Data Language)
PYK notes that when CMcC wrote this, TDL must not have existed.
Many applications and systems provide a plugin API. Such an API will necessarily have an expression of the form command+args->result, and will therefore be well suited to a TPL representation.
Examples of plugin APIs:
In many cases, wrappers for library APIs have a similar form, command+args->result, and in fact many critcl and other such wrappers translate C APIs into this form.
The advantage and virtue of TPL is that it formalises the useful Tcl syntax subset in a way which might be of interest to people outside the Tcl community.
A precedent for TPL may be found in JSON. All command+arg->result forms can be represented as valid Tcl lists, and these are clearly and obviously trivial to interpret in Tcl. Providing a C library which translates a meaningful set of C data types into and out of TPL would enable conversion of any plugin API into wire-ready TPL protocol language.
One thing which would be useful in this endeavour is production of a subset of the Tcl syntax *dekalog sufficient to completely define TPL as a subset, and perhaps the full Tcl syntax could be expressed in terms of TPL.
Dynamics of protocol interaction:
There are several styles of protocol interaction:
Consideration must be given to each of these styles.
RS 2008-10-28: One Tcl-based "data language" is the one that tDom produces with the asList method:
% dom parse "<foo><bar><grill>42</grill><grill>345</grill><qux a='b' /></bar></foo>" domDoc01301840 % [domDoc01301840 documentElement] asList foo {} {{bar {} {{grill {} {{#text 42}}} {grill {} {{#text 345}}} {qux {a b} {}}}}}
It may not be the prettiest sight, but can be parsed with Tcl very easily, and allows to represent the same complexity that XML has, with nested structures and attributes.
Lars H: Indeed a very useful format, that natively supports data is code, and for which a pure-Tcl translator from XML is available: A little XML parser (whose only failing is that it doesn't translate XML entities to ordinary characters in e.g. attribute values and #text data). I'm presently using it for a project, and have encountered no problems with it (even though it sometimes feels a bit prolix when one has to go [list [list foo {} {}]]] in order to make a list with the only child of a node). I suspect this is a slightly higher level format than the TPL suggested here, though; while pretty much anything can be encoded in XML, it might be that some applications of TPL would benefit from not adhering to the strict type–attributes–children structure of the XML-asList format. In XML-specification-speak, I think what I'm getting at is that XML-asList would be an important application of TPL, but not necessarily the only one.
In terms of Tcl syntax subsets, there are at least three useful levels:
The extended list syntax is superior to list syntax in that it supports comments and is better at catching syntax errors — when forgetting an argument of some command it doesn't grab the name of the next command as that argument — while still keeping the nesting of braces at a tolerable level. A downside for internal processing is that it (presently) cannot preserve the internal representation of data, since everything shimmers to a string when you join separate commands into a script.
CMcC: I hadn't thought about command separators and comments. I think they might unnecessarily complicate the TPL usage of Tcl syntax, although the use of command separators to represent pipelining is an intriguing possibility (seems to me that comment is completely useless in the TPL context). I think TPL needs backslash, braces, quote and whitespace. So it looks like Tcl list syntax is equivalent to TPL. This is roughly analogous to JSON, which is a useful parallel to keep in mind, I think.
In general, APIs and RPC are functional applications, so are directly representable by Tcl's command syntax + lists (and since command syntax is list syntax, this reduces to being lists.) Clearly anything can be represented as a string (that's a Tcl mantra) and it's useful to interpret a subset of strings as lists, and those lists can be used to represent anything a protocol language can be expected to represent.
The virtues of formalising this approach, and giving the Tcl syntax subset a distinct name, are:
(a) marketing - we have a new way to think about Tcl, we have a new way to provide utility to the wider world, and to give people a reason to think about Tcl for applications or retrofits,
(b) support - we can produce a series of C/JS/etc language functions which will interconvert between the host language's data types and Tcl's. Given such a library for a given language, the process of interfacing applications and systems written in that language to Tcl is significantly simplified, and of course the processing of the API/RPC protocol language in Tcl is *vastly* simplified. This allows Tcl to better fulfill its function as a Tool Control Language, by supporting Tool Protocol Language as a protocol language.
Lars H: I got the impression that you wanted TPL to be a natural language for config files; in that setting command separators and comments are highly recommended, but I can imagine use-cases also for command substitution (e.g. binary decode of some blob might be more convenient than the \x counterpart) and variable substitution there, so it is probably better served by full Tcl syntax. For communication exclusively between two pieces of software, command separators and comments are pretty useless and a significant complication, so if that is the niche for TPL then the Tcl list syntax is indeed the natural fit.
Lars H: TPL, like XML, is ultimately about how one encodes data as a string of Unicode characters, but one might also think about giving recommendations on how to transport a stream of TPL messages over a binary channel (octet-stream). The following is such a "wire protocol" I'm using in a project. It is based on the wire protocol of comm, but has some tweaks to improve robustness. Basic principles are:
Naturally, we also want it to be easy to implement in Tcl.
Transmitting is very simple. To encode a message msg, do
encoding convertto utf-8 [list $msg]
To prepare a channel for a stream of messages do
fconfigure $channel -translation binary puts $channel \xC0\x8D
To write an encoded_msg to such a channel do
puts $channel "$encoded_msg\xC0\x8D"
To read the list of all messages on a channel (assuming they are well-formed) and loop over them, do
fconfigure $channel -translation lf -encoding utf-8 foreach msg [read $channel] { # Do stuff }
Reading messages as they come in (still assuming well-formedness):
fconfigure $channel -translation binary -blocking no fileevent $channel readable [list Receive $channel] proc Receive {channel} { variable buffer append buffer [read $channel] # For now, I also skip checking for EOF. while {[ set pos [string first \xC0\x8D $buffer] ] >= 0} { set chunk [string range $buffer 0 [expr {$pos-1}]] set buffer [string range $buffer [expr {$pos+2}] end] set msg [lindex [encoding convertfrom utf-8 $chunk] 0] # Process $msg } }
So how does it work?
How it looks:
Things that could be better:
% set a "b \"\{\"" b "{" % llength $a 2 % lindex $a 0 b % lindex $a 1 { % list $a b\ \"\{\"
This is a valid list of a (multi-element) list, but it isn't brace-delimited. Switching to the canonical list representation would however make it so:
% lrange $a 0 1 b \{ % list [lrange $a 0 1] {b \{}
A conclusion of this could be that TPL should tighten the Tcl list syntax somewhat.
PYK 2014-04-09: Lars H, you've overthunk it! This "wire protocol" is exactly equivalent to transmitting a normal valid Tcl script encoded in UTF-8, with each message being the syntactic equivalent of a command in the script. The \xC0\x8D strategy is entirely unnecessary. The transmission procedure can simply be:
fconfigure $channel -encoding utf-8 puts $channel [list $msg]
and the receiving procedure can then use info complete to separate the items which, syntactically, are commands. -binary is also unnecessary as utf-8 handles that naturally:
proc receive chan { variable buffer set count [gets $chan line] if {$count >= 0} { append buffer $line\n } else { if {[eof $chan]} { #wrap things up #If the buffer isn't empty, the message is incomplete } } # Buffer is guaranteed to end in a newline if {[info complete $buffer]} { set buffer [string trim $buffer] if {$buffer ne {}} { process $buffer[set buffer {}] } } } proc process args { puts stderr [list processing $args] }
The moral of the story is that Tcl has another data format that doesn't get enough airtime: the script.