Error processing request

Parameters

CONTENT_LENGTH0
REQUEST_METHODGET
REQUEST_URI/revision/binary+scan?V=50
QUERY_STRINGV=50
CONTENT_TYPE
DOCUMENT_URI/revision/binary+scan
DOCUMENT_ROOT/var/www/nikit/nikit/nginx/../docroot
SCGI1
SERVER_PROTOCOLHTTP/1.1
HTTPSon
REMOTE_ADDR172.71.255.3
REMOTE_PORT11382
SERVER_PORT4443
SERVER_NAMEwiki.tcl-lang.org
HTTP_HOSTwiki.tcl-lang.org
HTTP_CONNECTIONKeep-Alive
HTTP_ACCEPT_ENCODINGgzip, br
HTTP_X_FORWARDED_FOR3.137.218.215
HTTP_CF_RAY879c694b79aa233f-ORD
HTTP_X_FORWARDED_PROTOhttps
HTTP_CF_VISITOR{"scheme":"https"}
HTTP_ACCEPT*/*
HTTP_USER_AGENTMozilla/5.0 AppleWebKit/537.36 (KHTML, like Gecko; compatible; ClaudeBot/1.0; [email protected])
HTTP_CF_CONNECTING_IP3.137.218.215
HTTP_CDN_LOOPcloudflare
HTTP_CF_IPCOUNTRYUS

Body


Error

Unknow state transition: LINE -> END

-code

1

-level

0

-errorstack

INNER {returnImm {Unknow state transition: LINE -> END} {}} CALL {my render_wikit {binary scan} {** Summary **

'''[[binary scan]''' parses fields from a binary string, returning the number
of conversions performed.


** See Also **

   [binary]:   
   [binary format]:   
   [format]:   
   [scan]:   
   [IEEE binary float to string conversion]:   
   [NaN]:   
   [Tcl syntax]:   


** Synopsis **

    :   '''binary scan''' ''string formatString'' ?''varName varName ...''?


** Description **

''String'' gives the input to be parsed and ''formatString'' indicates how to
parse it.  Each ''varName'' gives the name of a variable; when a field is
scanned from ''string'' the result is assigned to the corresponding variable.


As with `[[[binary format]]`, ''formatString'' consists of a sequence of zero
or more field specifiers separated by zero or more spaces. Each field specifier
is a single type character followed by an optional numeric ''count''. Most
field specifiers consume one argument to obtain the variable into which the
scanned values should be placed. The type character specifies how the binary
data is to be interpreted.  The ''count'' typically indicates how many items of
the specified type are taken from the data. 

If present, the count is a non-negative decimal integer or '''`*`''', which
normally indicates that all of the remaining items in the data are to be used.
If there are not enough bytes left after the current cursor position to satisfy
the current field specifier, then the corresponding variable is left untouched
and `[[binary scan]` returns immediately, and the value is the number of
variables that were set.  If there are not enough arguments for all of the
fields in the format string that consume arguments, then an error is generated.

A similar example as with `[[[binary format]]` should explain the relation
between field specifiers and arguments in case of the binary scan subcommand:

======
binary scan $bytes s3s first second
======

This command, provided the binary string in the variable bytes is long enough,
assigns a list of three integers to `$first`, and assigns a single value to the
`$second`.  If `$bytes` contains fewer than 8 bytes (i.e. four 2-byte
integers), no assignment to `$second` will be made, and if `$bytes` contains
fewer than 6 bytes (i.e. three 2-byte integers), no assignment to `$first` will
be made.  Hence:

======
puts [binary scan abcdefg s3s first second]
puts $first
puts $second
======

will, assuming neither variable is set previously, print:

======
1
25185 25699 26213
can't read "second": no such variable
======

It is '''important''' to note that the types, `c`, `s`, `S`, `i`, and `I` 
'''i''' and '''I''' on 64bit systems) will be scanned into long data size
values. In doing this, values that have their high bit set (0x80 for chars,
0x8000 for shorts, 0x80000000 for ints), will be sign extended.  Thus the
following will occur:

======
% set signShort [binary format s1 0x8000]
% binary scan $signShort s1 val
1
% set val
-32768
% format %x $val
FFFF8000
======

If you want to produce an unsigned value, then you can mask the return value to
the desired size.  For example, to produce an unsigned short value:

======
% set val [expr {$val & 0xFFFF}]
32768
% format %x $val
8000
======

Since Tcl 8.5, one may use the '''u''' modifier to get unsigned
interpretations:

======
% set signShort [binary format s1 0x8000]
% binary scan $signShort su1 val
1
% set val
32768
% format %x $val
8000
======

Each type-count pair moves an imaginary cursor through the binary data, reading
bytes from the current position.  The cursor is initially at position 0 at the
beginning of the data.

----

DMG 2003-12-02: It is also important to note that the scanning of float types
is limited to the "endian" of the scanner.  [IEEE binary float to string
conversion] provides one way of converting them.  Another way is to `[[binary
scan]` the characters, `[[[binary format]]` them in the proper order, and
`[[binary scan]` the now-native order.

[DKF]: Tcl 8.5 includes additional format types (due to [TIP] #129) that allow
float types of specified endian-ness to be handled.


** Example: Padding Bytes **

[Arjen Markus]: Suppose you have a sequence of bytes that are read from a file.
Two of these, say at position 6 and 7 (counting from 0), make up an integer
number.  If the original data are in big-endian order, then

======
set two_bytes [string range $str 6 7]
binary scan "\0\0$two_bytes" I intvalue
======

will turn these two bytes into an integer (consisting of 4 bytes, hence the two
leading nulls),

If they are in little-endian order, use:

======
set two_bytes [string range $str 6 7]
binary scan "${two_bytes}\0\0" i intvalue
======



** Misc **


DMG, 2003-12-02: Question: Does anyone know of a way/hack to scan in null
terminated strings?  I was somewhat surprised to see they were not part of the
formatString set as they naturally fall into how Tcl works (well, how it used
to).  For example, I'm trying to read a file that has a 30-byte space allocated
to hold 2 null-terminated strings.
 
2004-10-19: Try 

======
set null_term_string [lindex [split $string \000 ] 0]
======

----

[sbron] 2005-09-27: I more frequently need unsigned results from `[[binary
scan]` than the default signed values. I created my own proc that adds a few
new field specifiers that return unsigned values: '''C''' - unsigned byte,
'''u''' - unsigned little-endian short, '''U''' - unsigned big-endian short,
'''l''' - 32-bit unsigned little endian integer, and '''L''' - 32-bit unsigned
big-endian integer.

======
proc binscan {str fmtstr args} {
     # Create a format string using the built-in signed versions
     set format [string map {C c u s U S l i L I} $fmtstr]
     # Split the formatstring into the separate terms
     set i 0; set vars ""; set fmtlist ""
     foreach n [regexp -all -inline {[a-wA-W][0-9* ]*} $fmtstr] {
         lappend fmtlist $n
         lappend vars term([incr i])
     }

     # Execute the signed binary scan
     eval [linsert $vars 0 binary scan $str $format]
     #binary scan $str $format {expand}$vars

     # Define the mask values to apply to the special format specifiers
     array set mask {C 0xff u 0xffff U 0xffff l 0xffffffff L 0xffffffff}
     # Apply the mask and assign the results to the specified variables
     set i 0
     foreach n $fmtlist v $args {
         set type [string index $n 0]
         # Link to the variable in the calling stack frame
         upvar 1 $v var
         if {[info exists mask($type)]} {
             set list ""
             foreach t $term([incr i]) {
                 lappend list [expr {$t & $mask($type)}]
             }
             set var $list
         } else {
             set var $term([incr i])
         }
     }
}
======

[Lars H] 2007-02-20: [TIP] [http://tip.tcl.tk/275%|%#275] adds something to
that end in the standard binary scan command of Tcl 8.5.

[kostix] 2007-06-19: noather wrapper for `[[binary scan]` that mimics [TIP]
#275 for handling of unsigned integers (so note that other enhancements
introduced in 8.5 aren't emulated):

======
proc bscan {data f args} {
    set c 0xFF
    set s 0xFFFF
    set S 0xFFFF
    set i 0xFFFFFFFF
    set I 0xFFFFFFFF

    set outf ""
    set upos {}
    set pos -1
    set last ?

    foreach a [split $f ""] {
        if {[string equal $a u] && [string first $last csSiI] >= 0} {
            lappend upos $pos [set $last]
        } else {
            append outf $a

            if {[string first $a aAbBhHcsSiIwWfdxX@] >= 0} {
                set last $a
                incr pos
            }
        }
    }

    set count [uplevel 1 [list binary scan $data $outf] $args]

    foreach {pos mask} $upos {
        upvar 1 [lindex $args $pos] v
        set v [expr {$v & $mask}]
    }

    set count
}
======

Works about 10 times slower than `[[binary scan]` itself.

May be used to provide 8.5 compatibility like this:

======
if {[package vsatisfies $tcl_version 8.5]} {
    interp alias {} bscan {} binary scan
} else {
    # define the above proc here
}
======

Note also that this proc isn't 100% compatible with the real `[[binary scan]`,
since it doesn't do any formal syntax checking of the format string.

----

[DAG] 2006-01-30: Don't you think there is something wrong in bit handling?  If
I try to scan a binary content for several binary data, I can get only first
part of bytes, but not last.

Let's take manual page example:

======
% binary scan \x07\x87\x05 b5b* var1 var2
2
% puts $var1
11100
% puts $var2
1110000110100000
======

[LV]: Here's an attempt to visually explain

======None
00000111 10000111 00000101
======

If so, then let's see how b5b* ends up with the 2 values:

======none
00000111 10000111 00000101
   00000 01111111 00000000
   12345 90123456 12345678
 becomes
    var1              var2
   11100 11100001 10100000
======

However, check this out:

======none
% binary scan \x07\x87\x05 b* var3
111000001110000110100000

1110 0000 1110 0001 1010 0000
7    0    7    8    5    0
======

So what appears to be happening is that Tcl doesn't display leading 0's in the
binary string.

[[later...]] Well, I '''thought''' that was the case... however, check this
out:

======none
binary scan \xff\xff\xff b5b* var7 var8
2
% puts $var7
11111
% puts $var8
1111111111111111
% string length $var8
16
======

Shouldn't there be 8*3=24 bits displayed in this case - no leading zeros are
present. So why am I not seeing them in the scan?

[[Still later...]] Okay, so Miguel just updated tcl-bugs 1663473, regarding the
''missing'' bits. His comment is that, according to the docs, the b5 indicates
that the 5 bits are to be read, and that the remaining 3 bits are disguarded -
the docs say '''any extra bits in the last byte are ignored'''. Note this extra
example that miguel provides!

======
% binary scan \xff\xff\xff b8b* var3 var4
2
% string length $var3
8
% string length $var4
16
======

[LV]: I note here that all 24 bits are present here ... now watch this.

======
% binary scan \xff\xff\xff b9b* var3 var4
2
% string length $var3
9
% string length $var4
8
======

[LV]: See that? by saying b9, one eats one bit of the second byte - the
remainder of the bits are ignored, and `b*` then eats the third byte.

It doesn't appear that binary scan provides the ability to scan parts of a byte
into two or more variables.

----

(''[DAG], continued from before LV's insertion'') will return `2` with `11100`
stored in `$var1` and `1110000110100000` stored in `$var2` and

======
binary scan \x70\x87\x05 B5B* var1 var2
======

will return `2` with `01110` stored in `$var1` and `1000011100000101` stored in
`$var2`.

Now, in both cases, I get 5 bits in first `$var1`, and 16 in `$var2`.  Since
the input is 24 bits long, I am missing 3, and there is no way to get them:

======
binary scan \x70\x87\x05 B5B3B* var1 var2 var3
======

will return `3` with `01110` stored in `$var1`, `100` stored in `$var2` and
`00000101` stored in `$var3`.  This means that 5 bits were taken from byte 1
and the rest skipped, 3 from by byte 2 and the rest skipped, and all bits were
taken from byte 3.

There is no way, therefore, to have 2 results from

======
binary scan \x87 B4B4 nibble1 nibble2
======

which would require first half of byte for one variable and the second one for
the other, thus having the two part separated.

I'd like to have this feature, a lot of algorithm would require it, like data
compression or encryption.  I use often Tcl to explore binary data structure
from libraries in C/C++, like Palm databases, or the like.

----

[LV]: Note that I mentioned this desire in the bug report, but the reply by
[DKF] was: "The problem is that reading at the bit level across multiple
variables would make the code to do bulk uses of `[[binary scan]] much harder
since you'd have to allow for someone reading one bit and then 100kB of 32-bit
words, all offset by one bit, even though this is a vanishingly rare case."

Feel free to visit
[https://sourceforge.net/tracker/?func=detail&atid=110894&aid=1663473&group_id=10894]
to add your own responses.


----

'''[[continuing discussion]]'''

[Lars H]: Agreed. I too got bitten by this a couple of months ago, took me
hours to figure out.  My current impression is that `[[binary scan]` is
generally too simple to be directly useful -- generally one also needs to
post-process the data returned. (First read entire byte, then split it up.)

Of course, there is this corny method of backing up and then reading the byte
again, from the other end:

======
binary scan \x38 B4Xb4 nibble1 nibble2
======

This will always return one of them backwards, however.

----

[LV]: I strongly urge anyone finding such bugs/misfeatures/surprising behaviors
in tcl/tk/other extensions to report them as bugs, either in the behavior or in
the documentation. Things can only get better if people report things.

I've just reported the apparent discrepancy between the man page b5b* example
and the behavior as of Tcl 8.5a6, though right now, the response isn't all that
encouraging...

https://sourceforge.net/tracker/?func=detail&atid=110894&aid=1663473&group_id=10894

----

[NickH]: We frequently need to make several calls to binary scan to fully parse
input, usualy because of discriminators or counters.  Whilst it is obviously
possible in principle to keep track of where we are in the scan, in practice it
is difficult and/or ugly and the command already knows.

I would like to see the formats for both binary scan and binary format extended
to allow the extraction of the current byte position.  If this were done by
adding meaning to `@` with no count (currently an error) I can't see how anyone
would object (an alternative would be a new format char possibly = or ?)

typical use would then be something like:

======
set offset 0

binary scan @${offset}<lots of stuff>@ <lots of vars> offset
if {????} {
    binary scan @${offset}<lots of stuff>@ <lots of vars> offset
} else {
    binary scan @${offset}<lots of other stuff>@ <lots of vars> offset
}

puts "bytes read = $offset"
======


For `[[[binary format]]` the use would be contrary to other parameters but it
can sometimes be useful for back-patching lengths instead of using `[[[string
length]]` and `[[[expr]`

----

[AMG]: It bothers me that `[[[binary scan]]` insists on writing to variables
rather than simply returning a list.  This prevents it from being directly used
in a functional context.  Well, I just discovered a handy workaround.  The key
insight is that [[brackets] perform ''script'' substitution, not merely
''command'' substitution.  This means it's legal to put more than one command
in a pair of brackets; since it's a script, the commands are delimited by
semicolons or newlines.  The substituted value is the result of the ''last''
command.  Here, see for yourself:

======
set bytes Hello!
puts [binary scan $bytes B* tempvar; set tempvar]
# 010010000110010101101100011011000110111100100001
======

More than one conversion?  Instead of [[[set]], use [[[list]] if a list is
desired.  Or use [[[format]] to concatenate the values and format them further.
Single-argument [[[lindex]] may also be useful; it simply returns its argument.

======
set bytes Tcl
puts [binary scan $bytes B*X*H* a b; list $a $b]
# 010101000110001101101100 54636c
puts [binary scan $bytes B*X*H* a b; format "%s = %8s" $a $b]
# 010101000110001101101100 =   54636c
puts [binary scan $bytes B*X*H* a b; lindex "$a = $b"]
# 010101000110001101101100 = 54636c
puts [binary scan $bytes B*X*H* a b; lindex [string map {0 o 1 i} $a]$b]
# oioioioooiioooiioiioiioo54636c
======

Obviously, [[[puts]] can be replaced by other commands.  I'm only using it for
demonstration purposes.

----

[Chad]: Why is it that after

======
binary scan [binary format f 393.84] f n
======

the value of n is 393.839996338?

[AM] 2011-01-14: This is a classic problem: the decimal number 393.84 can not
be represented exactly in the binary format that most contemporal computers
use. The value you see is the closest binary value.  Compare this to the
difficulty of representing 1/3 in decimal numbers.

[LV]: Are there any languages which provide an add-on module or mode of
operation where '''real''' variables are represented in a programmatic form
designed not to lose percision?  I was thinking that perhaps [APL] or [LISP]
was one such language.

Perhaps someone has - or could - create a package for Tcl that would permit one
to do math in that fashion for people who don't care about the lost computer
cycles but do care about the precision.

[CliC]: [http://racket-lang.org%|%Racket%|%], a Scheme dialect, has "rational"
types, so that e.g., 1/3 may be used in calculations with no loss of precision.
I'm not a heavy user of it, though. I just downloaded it to check out the
authors' "modern answer to [SICP]" programming tutorial.


<<categories>> Command | Binary Data | Parsing | String Processing} regexp2} CALL {my render {binary scan} {** Summary **

'''[[binary scan]''' parses fields from a binary string, returning the number
of conversions performed.


** See Also **

   [binary]:   
   [binary format]:   
   [format]:   
   [scan]:   
   [IEEE binary float to string conversion]:   
   [NaN]:   
   [Tcl syntax]:   


** Synopsis **

    :   '''binary scan''' ''string formatString'' ?''varName varName ...''?


** Description **

''String'' gives the input to be parsed and ''formatString'' indicates how to
parse it.  Each ''varName'' gives the name of a variable; when a field is
scanned from ''string'' the result is assigned to the corresponding variable.


As with `[[[binary format]]`, ''formatString'' consists of a sequence of zero
or more field specifiers separated by zero or more spaces. Each field specifier
is a single type character followed by an optional numeric ''count''. Most
field specifiers consume one argument to obtain the variable into which the
scanned values should be placed. The type character specifies how the binary
data is to be interpreted.  The ''count'' typically indicates how many items of
the specified type are taken from the data. 

If present, the count is a non-negative decimal integer or '''`*`''', which
normally indicates that all of the remaining items in the data are to be used.
If there are not enough bytes left after the current cursor position to satisfy
the current field specifier, then the corresponding variable is left untouched
and `[[binary scan]` returns immediately, and the value is the number of
variables that were set.  If there are not enough arguments for all of the
fields in the format string that consume arguments, then an error is generated.

A similar example as with `[[[binary format]]` should explain the relation
between field specifiers and arguments in case of the binary scan subcommand:

======
binary scan $bytes s3s first second
======

This command, provided the binary string in the variable bytes is long enough,
assigns a list of three integers to `$first`, and assigns a single value to the
`$second`.  If `$bytes` contains fewer than 8 bytes (i.e. four 2-byte
integers), no assignment to `$second` will be made, and if `$bytes` contains
fewer than 6 bytes (i.e. three 2-byte integers), no assignment to `$first` will
be made.  Hence:

======
puts [binary scan abcdefg s3s first second]
puts $first
puts $second
======

will, assuming neither variable is set previously, print:

======
1
25185 25699 26213
can't read "second": no such variable
======

It is '''important''' to note that the types, `c`, `s`, `S`, `i`, and `I` 
'''i''' and '''I''' on 64bit systems) will be scanned into long data size
values. In doing this, values that have their high bit set (0x80 for chars,
0x8000 for shorts, 0x80000000 for ints), will be sign extended.  Thus the
following will occur:

======
% set signShort [binary format s1 0x8000]
% binary scan $signShort s1 val
1
% set val
-32768
% format %x $val
FFFF8000
======

If you want to produce an unsigned value, then you can mask the return value to
the desired size.  For example, to produce an unsigned short value:

======
% set val [expr {$val & 0xFFFF}]
32768
% format %x $val
8000
======

Since Tcl 8.5, one may use the '''u''' modifier to get unsigned
interpretations:

======
% set signShort [binary format s1 0x8000]
% binary scan $signShort su1 val
1
% set val
32768
% format %x $val
8000
======

Each type-count pair moves an imaginary cursor through the binary data, reading
bytes from the current position.  The cursor is initially at position 0 at the
beginning of the data.

----

DMG 2003-12-02: It is also important to note that the scanning of float types
is limited to the "endian" of the scanner.  [IEEE binary float to string
conversion] provides one way of converting them.  Another way is to `[[binary
scan]` the characters, `[[[binary format]]` them in the proper order, and
`[[binary scan]` the now-native order.

[DKF]: Tcl 8.5 includes additional format types (due to [TIP] #129) that allow
float types of specified endian-ness to be handled.


** Example: Padding Bytes **

[Arjen Markus]: Suppose you have a sequence of bytes that are read from a file.
Two of these, say at position 6 and 7 (counting from 0), make up an integer
number.  If the original data are in big-endian order, then

======
set two_bytes [string range $str 6 7]
binary scan "\0\0$two_bytes" I intvalue
======

will turn these two bytes into an integer (consisting of 4 bytes, hence the two
leading nulls),

If they are in little-endian order, use:

======
set two_bytes [string range $str 6 7]
binary scan "${two_bytes}\0\0" i intvalue
======



** Misc **


DMG, 2003-12-02: Question: Does anyone know of a way/hack to scan in null
terminated strings?  I was somewhat surprised to see they were not part of the
formatString set as they naturally fall into how Tcl works (well, how it used
to).  For example, I'm trying to read a file that has a 30-byte space allocated
to hold 2 null-terminated strings.
 
2004-10-19: Try 

======
set null_term_string [lindex [split $string \000 ] 0]
======

----

[sbron] 2005-09-27: I more frequently need unsigned results from `[[binary
scan]` than the default signed values. I created my own proc that adds a few
new field specifiers that return unsigned values: '''C''' - unsigned byte,
'''u''' - unsigned little-endian short, '''U''' - unsigned big-endian short,
'''l''' - 32-bit unsigned little endian integer, and '''L''' - 32-bit unsigned
big-endian integer.

======
proc binscan {str fmtstr args} {
     # Create a format string using the built-in signed versions
     set format [string map {C c u s U S l i L I} $fmtstr]
     # Split the formatstring into the separate terms
     set i 0; set vars ""; set fmtlist ""
     foreach n [regexp -all -inline {[a-wA-W][0-9* ]*} $fmtstr] {
         lappend fmtlist $n
         lappend vars term([incr i])
     }

     # Execute the signed binary scan
     eval [linsert $vars 0 binary scan $str $format]
     #binary scan $str $format {expand}$vars

     # Define the mask values to apply to the special format specifiers
     array set mask {C 0xff u 0xffff U 0xffff l 0xffffffff L 0xffffffff}
     # Apply the mask and assign the results to the specified variables
     set i 0
     foreach n $fmtlist v $args {
         set type [string index $n 0]
         # Link to the variable in the calling stack frame
         upvar 1 $v var
         if {[info exists mask($type)]} {
             set list ""
             foreach t $term([incr i]) {
                 lappend list [expr {$t & $mask($type)}]
             }
             set var $list
         } else {
             set var $term([incr i])
         }
     }
}
======

[Lars H] 2007-02-20: [TIP] [http://tip.tcl.tk/275%|%#275] adds something to
that end in the standard binary scan command of Tcl 8.5.

[kostix] 2007-06-19: noather wrapper for `[[binary scan]` that mimics [TIP]
#275 for handling of unsigned integers (so note that other enhancements
introduced in 8.5 aren't emulated):

======
proc bscan {data f args} {
    set c 0xFF
    set s 0xFFFF
    set S 0xFFFF
    set i 0xFFFFFFFF
    set I 0xFFFFFFFF

    set outf ""
    set upos {}
    set pos -1
    set last ?

    foreach a [split $f ""] {
        if {[string equal $a u] && [string first $last csSiI] >= 0} {
            lappend upos $pos [set $last]
        } else {
            append outf $a

            if {[string first $a aAbBhHcsSiIwWfdxX@] >= 0} {
                set last $a
                incr pos
            }
        }
    }

    set count [uplevel 1 [list binary scan $data $outf] $args]

    foreach {pos mask} $upos {
        upvar 1 [lindex $args $pos] v
        set v [expr {$v & $mask}]
    }

    set count
}
======

Works about 10 times slower than `[[binary scan]` itself.

May be used to provide 8.5 compatibility like this:

======
if {[package vsatisfies $tcl_version 8.5]} {
    interp alias {} bscan {} binary scan
} else {
    # define the above proc here
}
======

Note also that this proc isn't 100% compatible with the real `[[binary scan]`,
since it doesn't do any formal syntax checking of the format string.

----

[DAG] 2006-01-30: Don't you think there is something wrong in bit handling?  If
I try to scan a binary content for several binary data, I can get only first
part of bytes, but not last.

Let's take manual page example:

======
% binary scan \x07\x87\x05 b5b* var1 var2
2
% puts $var1
11100
% puts $var2
1110000110100000
======

[LV]: Here's an attempt to visually explain

======None
00000111 10000111 00000101
======

If so, then let's see how b5b* ends up with the 2 values:

======none
00000111 10000111 00000101
   00000 01111111 00000000
   12345 90123456 12345678
 becomes
    var1              var2
   11100 11100001 10100000
======

However, check this out:

======none
% binary scan \x07\x87\x05 b* var3
111000001110000110100000

1110 0000 1110 0001 1010 0000
7    0    7    8    5    0
======

So what appears to be happening is that Tcl doesn't display leading 0's in the
binary string.

[[later...]] Well, I '''thought''' that was the case... however, check this
out:

======none
binary scan \xff\xff\xff b5b* var7 var8
2
% puts $var7
11111
% puts $var8
1111111111111111
% string length $var8
16
======

Shouldn't there be 8*3=24 bits displayed in this case - no leading zeros are
present. So why am I not seeing them in the scan?

[[Still later...]] Okay, so Miguel just updated tcl-bugs 1663473, regarding the
''missing'' bits. His comment is that, according to the docs, the b5 indicates
that the 5 bits are to be read, and that the remaining 3 bits are disguarded -
the docs say '''any extra bits in the last byte are ignored'''. Note this extra
example that miguel provides!

======
% binary scan \xff\xff\xff b8b* var3 var4
2
% string length $var3
8
% string length $var4
16
======

[LV]: I note here that all 24 bits are present here ... now watch this.

======
% binary scan \xff\xff\xff b9b* var3 var4
2
% string length $var3
9
% string length $var4
8
======

[LV]: See that? by saying b9, one eats one bit of the second byte - the
remainder of the bits are ignored, and `b*` then eats the third byte.

It doesn't appear that binary scan provides the ability to scan parts of a byte
into two or more variables.

----

(''[DAG], continued from before LV's insertion'') will return `2` with `11100`
stored in `$var1` and `1110000110100000` stored in `$var2` and

======
binary scan \x70\x87\x05 B5B* var1 var2
======

will return `2` with `01110` stored in `$var1` and `1000011100000101` stored in
`$var2`.

Now, in both cases, I get 5 bits in first `$var1`, and 16 in `$var2`.  Since
the input is 24 bits long, I am missing 3, and there is no way to get them:

======
binary scan \x70\x87\x05 B5B3B* var1 var2 var3
======

will return `3` with `01110` stored in `$var1`, `100` stored in `$var2` and
`00000101` stored in `$var3`.  This means that 5 bits were taken from byte 1
and the rest skipped, 3 from by byte 2 and the rest skipped, and all bits were
taken from byte 3.

There is no way, therefore, to have 2 results from

======
binary scan \x87 B4B4 nibble1 nibble2
======

which would require first half of byte for one variable and the second one for
the other, thus having the two part separated.

I'd like to have this feature, a lot of algorithm would require it, like data
compression or encryption.  I use often Tcl to explore binary data structure
from libraries in C/C++, like Palm databases, or the like.

----

[LV]: Note that I mentioned this desire in the bug report, but the reply by
[DKF] was: "The problem is that reading at the bit level across multiple
variables would make the code to do bulk uses of `[[binary scan]] much harder
since you'd have to allow for someone reading one bit and then 100kB of 32-bit
words, all offset by one bit, even though this is a vanishingly rare case."

Feel free to visit
[https://sourceforge.net/tracker/?func=detail&atid=110894&aid=1663473&group_id=10894]
to add your own responses.


----

'''[[continuing discussion]]'''

[Lars H]: Agreed. I too got bitten by this a couple of months ago, took me
hours to figure out.  My current impression is that `[[binary scan]` is
generally too simple to be directly useful -- generally one also needs to
post-process the data returned. (First read entire byte, then split it up.)

Of course, there is this corny method of backing up and then reading the byte
again, from the other end:

======
binary scan \x38 B4Xb4 nibble1 nibble2
======

This will always return one of them backwards, however.

----

[LV]: I strongly urge anyone finding such bugs/misfeatures/surprising behaviors
in tcl/tk/other extensions to report them as bugs, either in the behavior or in
the documentation. Things can only get better if people report things.

I've just reported the apparent discrepancy between the man page b5b* example
and the behavior as of Tcl 8.5a6, though right now, the response isn't all that
encouraging...

https://sourceforge.net/tracker/?func=detail&atid=110894&aid=1663473&group_id=10894

----

[NickH]: We frequently need to make several calls to binary scan to fully parse
input, usualy because of discriminators or counters.  Whilst it is obviously
possible in principle to keep track of where we are in the scan, in practice it
is difficult and/or ugly and the command already knows.

I would like to see the formats for both binary scan and binary format extended
to allow the extraction of the current byte position.  If this were done by
adding meaning to `@` with no count (currently an error) I can't see how anyone
would object (an alternative would be a new format char possibly = or ?)

typical use would then be something like:

======
set offset 0

binary scan @${offset}<lots of stuff>@ <lots of vars> offset
if {????} {
    binary scan @${offset}<lots of stuff>@ <lots of vars> offset
} else {
    binary scan @${offset}<lots of other stuff>@ <lots of vars> offset
}

puts "bytes read = $offset"
======


For `[[[binary format]]` the use would be contrary to other parameters but it
can sometimes be useful for back-patching lengths instead of using `[[[string
length]]` and `[[[expr]`

----

[AMG]: It bothers me that `[[[binary scan]]` insists on writing to variables
rather than simply returning a list.  This prevents it from being directly used
in a functional context.  Well, I just discovered a handy workaround.  The key
insight is that [[brackets] perform ''script'' substitution, not merely
''command'' substitution.  This means it's legal to put more than one command
in a pair of brackets; since it's a script, the commands are delimited by
semicolons or newlines.  The substituted value is the result of the ''last''
command.  Here, see for yourself:

======
set bytes Hello!
puts [binary scan $bytes B* tempvar; set tempvar]
# 010010000110010101101100011011000110111100100001
======

More than one conversion?  Instead of [[[set]], use [[[list]] if a list is
desired.  Or use [[[format]] to concatenate the values and format them further.
Single-argument [[[lindex]] may also be useful; it simply returns its argument.

======
set bytes Tcl
puts [binary scan $bytes B*X*H* a b; list $a $b]
# 010101000110001101101100 54636c
puts [binary scan $bytes B*X*H* a b; format "%s = %8s" $a $b]
# 010101000110001101101100 =   54636c
puts [binary scan $bytes B*X*H* a b; lindex "$a = $b"]
# 010101000110001101101100 = 54636c
puts [binary scan $bytes B*X*H* a b; lindex [string map {0 o 1 i} $a]$b]
# oioioioooiioooiioiioiioo54636c
======

Obviously, [[[puts]] can be replaced by other commands.  I'm only using it for
demonstration purposes.

----

[Chad]: Why is it that after

======
binary scan [binary format f 393.84] f n
======

the value of n is 393.839996338?

[AM] 2011-01-14: This is a classic problem: the decimal number 393.84 can not
be represented exactly in the binary format that most contemporal computers
use. The value you see is the closest binary value.  Compare this to the
difficulty of representing 1/3 in decimal numbers.

[LV]: Are there any languages which provide an add-on module or mode of
operation where '''real''' variables are represented in a programmatic form
designed not to lose percision?  I was thinking that perhaps [APL] or [LISP]
was one such language.

Perhaps someone has - or could - create a package for Tcl that would permit one
to do math in that fashion for people who don't care about the lost computer
cycles but do care about the precision.

[CliC]: [http://racket-lang.org%|%Racket%|%], a Scheme dialect, has "rational"
types, so that e.g., 1/3 may be used in calculations with no loss of precision.
I'm not a heavy user of it, though. I just downloaded it to check out the
authors' "modern answer to [SICP]" programming tutorial.


<<categories>> Command | Binary Data | Parsing | String Processing}} CALL {my revision {binary scan}} CALL {::oo::Obj703971 process revision/binary+scan} CALL {::oo::Obj703969 process}

-errorcode

NONE

-errorinfo

Unknow state transition: LINE -> END
    while executing
"error $msg"
    (class "::Wiki" method "render_wikit" line 6)
    invoked from within
"my render_$default_markup $N $C $mkup_rendering_engine"
    (class "::Wiki" method "render" line 8)
    invoked from within
"my render $name $C"
    (class "::Wiki" method "revision" line 31)
    invoked from within
"my revision $page"
    (class "::Wiki" method "process" line 56)
    invoked from within
"$server process [string trim $uri /]"

-errorline

4