'''binary scan''' ''string formatString ?varName varName ...?'' The '''binary scan''' command parses fields from a binary string, returning the number of conversions performed. ''String'' gives the input to be parsed and ''formatString'' indicates how to parse it. Each ''varName'' gives the name of a variable; when a field is scanned from ''string'' the result is assigned to the corresponding variable. As with '''[binary format]''', the ''formatString'' consists of a sequence of zero or more field specifiers separated by zero or more spaces. Each field specifier is a single type character followed by an optional numeric ''count''. Most field specifiers consume one argument to obtain the variable into which the scanned values should be placed. The type character specifies how the binary data is to be interpreted. The ''count'' typically indicates how many items of the specified type are taken from the data. If present, the count is a non-negative decimal integer or '''*''', which normally indicates that all of the remaining items in the data are to be used. If there are not enough bytes left after the current cursor position to satisfy the current field specifier, then the corresponding variable is left untouched and '''binary scan''' returns immediately with the number of variables that were set. If there are not enough arguments for all of the fields in the format string that consume arguments, then an error is generated. A similar example as with '''[binary format]''' should explain the relation between field specifiers and arguments in case of the binary scan subcommand: binary scan $bytes s3s first second This command (provided the binary string in the variable bytes is long enough) assigns a list of three integers to the variable ''first'' and assigns a single value to the variable ''second''. If ''bytes'' contains fewer than 8 bytes (i.e. four 2-byte integers), no assignment to ''second'' will be made, and if ''bytes'' contains fewer than 6 bytes (i.e. three 2-byte integers), no assignment to ''first'' will be made. Hence: puts [binary scan abcdefg s3s first second] puts $first puts $second will print (assuming neither variable is set previously): 1 25185 25699 26213 can't read "second": no such variable It is '''important''' to note that the '''c''', '''s''', and '''S''' (and '''i''' and '''I''' on 64bit systems) will be scanned into long data size values. In doing this, values that have their high bit set (0x80 for chars, 0x8000 for shorts, 0x80000000 for ints), will be sign extended. Thus the following will occur: set signShort [binary format s1 0x8000] binary scan $signShort s1 val; # val == 0xFFFF8000 If you want to produce an unsigned value, then you can mask the return value to the desired size. For example, to produce an unsigned short value: set val [expr {$val & 0xFFFF}]; # val == 0x8000 Each type-count pair moves an imaginary cursor through the binary data, reading bytes from the current position. The cursor is initially at position 0 at the beginning of the data. DMG, 2-Dec-03: It is also important to note that the scanning of float types is limited to the "endian" of the scanner. [IEEE binary float to string conversion] provides one way of converting them. Another way is to do a binary scan of the characters, binary format them in the proper order, and binary scan the now native order. ---- DMG, 2-Dec-03: Question: Does anyone know of a way/hack to scan in null terminated strings? I was somewhat surprised to see they were not part of the formatString set as they naturally fall into how Tcl works (well, how it used to). For example, I'm trying to read a file that has a 30-byte space allocated to hold 2 null-terminated strings. 19-Oct-2004: Try set null_term_string [lindex [split $string \000 ] 0] ---- [sbron], 27-Sep-2005: I more frequently need unsigned results from [[binary scan]] than the default signed values. I created my own proc that adds a few new field specifiers that return unsigned values: '''C''' - unsigned byte, '''u''' - unsigned little-endian short, '''U''' - unsigned big-endian short, '''l''' - 32-bit unsigned little endian integer, and '''L''' - 32-bit unsigned big-endian integer. proc binscan {str fmtstr args} { # Create a format string using the built-in signed versions set format [string map {C c u s U S l i L I} $fmtstr] # Split the formatstring into the separate terms set i 0; set vars ""; set fmtlist "" foreach n [regexp -all -inline {[a-wA-W][0-9* ]*} $fmtstr] { lappend fmtlist $n lappend vars term([incr i]) } # Execute the signed binary scan eval [linsert $vars 0 binary scan $str $format] #binary scan $str $format {expand}$vars # Define the mask values to apply to the special format specifiers array set mask {C 0xff u 0xffff U 0xffff l 0xffffffff L 0xffffffff} # Apply the mask and assign the results to the specified variables set i 0 foreach n $fmtlist v $args { set type [string index $n 0] # Link to the variable in the calling stack frame upvar 1 $v var if {[info exists mask($type)]} { set list "" foreach t $term([incr i]) { lappend list [expr {$t & $mask($type)}] } set var $list } else { set var $term([incr i]) } } } ---- [DAG] - 30-Jan-2006 - Don't you think there is something wrong in bit handling? If I try to scan a binary content for several binary data, I can get only first part of bytes, but not last. Let's take manual page example: binary scan \x07\x87\x05 b5b* var1 var2 will return 2 with 11100 stored in var1 and 1110000110100000 stored in var2 and binary scan \x70\x87\x05 B5B* var1 var2 will return 2 with 01110 stored in var1 and 1000011100000101 stored in var2. Now, in both cases, I get 5 bits in first var, and 16 in the second. Since the input is 24 bits long, I am missing 3, and there is no way to get them: binary scan \x70\x87\x05 B5B3B* var1 var2 var3 will return 3 with 01110 stored in var1, 100 stored in var2 and 00000101 stored in var3. This means that 5 bits were taken from byte 1 and the rest skipped, 3 from by byte 2 and the rest skipped, and all bits were taken from byte 3. There is no way, therefore, to have 2 results from binary scan \x87 B4B4 nibble1 nibble2 which would require first half of byte for one variable and the second one for the other, thus having the two part separated. I'd like to have this feature, a lot of algorithm would require it, like data compression or encryption. I use often Tcl to explore binary data structure from libraries in C/C++, like Palm databases, or the like. [Lars H]: Agreed. I too got bitten by this a couple of months ago, took me hours to figure out. My current impression is that [binary scan] is generally too simple to be directly useful -- generally one also needs to post-process the data returned. (First read entire byte, then split it up.) Of course, there is this corny method of backing up and then reading the byte again, from the other end: binary scan \x38 B4Xb4 nibble1 nibble2 This will always return one of them backwards, however. ---- See also: * [binary] * [binary format] * [format] * [scan] * [IEEE binary float to string conversion] ---- [Tcl syntax help] - [Category Command] - [Category String Processing] - [Category Binary Data]