Purpose of this page: To collate our knowledge about the facilities provided by Tcl to work with binary data, for example to talk to other applications using a binary protocol for exchanging information and commands.
Definition of binary data in Tcl
"Everything is a string." But what is a string and what's 'in' a string?
The Tcl language views strings as sequences of Unicode characters. That is each character can have any Unicode (actually UTF-16) value from 0 to 0xFFFF, excluding character values that are unassigned of reserved by Unicode.
Binary data in Tcl is just strings consisting of pseudo-characters (bytes) with code points anywhere in the more limited range 0 to 255. Also there are no excluded characters with binary data.
On the C level the Tcl library uses UTF-8 encoded strings for compatibility with the pre-8.0 interfaces. In addition we have the byte array objects (Tcl_NewByteArrayObj) to deal with binary data diretly.
The main facility for working with binary data is the binary command with its subcommands to dissect (scan) and join (format) binary data into/from standard tcl values (strings, integers, lists, et cetera).
For dealing with binary data on C level see also http://www.tcl.tk/man/tcl8.4/TclLib/ByteArrObj.htm
To exchange the binary information with other applications all of the facilities of the I/O system are at our fingertips and ready to be used. But note:
On news:comp.lang.tcl , Mac Cody and Jeff David write:
Mac Cody wrote:
> Here is a simple example that > first writes binary data to a file and then reads back the > binary data: > > set outBinData [binary format s2Sa6B8 {100 -2} 100 foobar 01000001] > puts "Format done: $outBinData" > set fp [open binfile w]
Important safety tip. When dealing with binary files you should always do:
fconfigure $fp -translation binary
I got bit hard on this one once when my \x0a and \x0d bytes got translated.
> puts -nonewline $fp $outBinData > close $fp > set fp [open binfile r] fconfigure $fp -translation binary > set inBinData [read $fp] > close $fp > binary scan $inBinData s2Sa6B8 val1 val2 val3 val4 > puts "Scan done: $val1 $val2 $val3 $val4" > Jeff David
A post to comp.lang.tcl asks how best to embed binary data into a Tcl script. kennykb has this summary of the answer:
In particular, you should avoid typing binary data directly into strings. While Tcl is able to handle binary data, there are places where you can run into problems. In particular, if you happen to have a Tcl script containing the literal character for a control-Z, you will find, as of Tcl 8.4, that you get a syntax error from Tcl. This is because beginning with 8.4, \u001a is an end-of-file character in scripts. See source (in particular the reference page) for more details.
Please note that the issue with control-Z is just a special case of a more general bit of advice for writing portable Tcl scripts. Whenever Tcl_EvalFile() (or the source command) reads in and evaluates the contents of a file, the reading in is done according to the system encoding. System encodings may be different on different systems. If your file of Tcl code is going to move from system to system, you should be sure that all characters in it are valid in all system encodings. This essentially means you should limit yourself to 7-bit ASCII. You can represent characters outside 7-bit ASCII using the \u quoting supported by the Tcl parser.
Hmmmm.... after a bit more reflection, it dawns on me that control-Z is part of 7-bit ASCII, so it's not a special case after all. Never mind.
Another tip (it's also mentioned on the string page, but I think it's worth repeating):
[string bytelength] should not be used with binary data. That command measures how long the UTF-8 representation of a string is in bytes. For binary data you don't want conversion to UTF-8, so you don't want [string bytelength] either. Use [string length] instead. It's confusing but probably logical.
See Binary representation of numbers and Dump a file in hex and ASCII for examples of usage.
What would be a way that one could read and write C structures in Tcl in a manner that would make the intents appear obvious to the reader?
What
Can anyone put the above page's recommendations together to form a best practices example?
[what categories should be added to this page?]