## Version 47 of Binary representation of numbers

Updated 2011-01-15 00:55:43 by CliC

Problem: how to convert numbers between their decimal and binary representation.

I.e., we are looking for a pair of procs {int2bits, bits2int} such that

• int2bits 6 -> {1 1 0}
• bits2int {1 1 0} -> 6

``` package require Tcl 8.5
proc bits2int {bits} {
return [expr 0b[join \$bits {}]]
}```

TS Caveat, negative integers are not always carefully handled (while {\$i>0}...construct). As Lars H pointed out.

Richard Suchenwirth suggested in c.l.t.

``` proc int2bits {i} {
#returns a bitslist, e.g. int2bits 10 => {1 0 1 0}
set res ""
while {\$i>0} {
set res [expr {\$i%2}]\$res
set i [expr {\$i/2}]
}
if {\$res==""} {set res 0}
split \$res ""
}
proc bits2int {bits} {
#returns integer equivalent of a bitlist
set res 0
foreach i \$bits {
set res [expr {\$res*2+\$i}]
}
set res
}```

Joseph Collard added optional field width:

``` proc int2bits {i {digits {} } } {
#returns a bitslist, e.g. int2bits 10 => {1 0 1 0}
# digits determines the length of the returned list (left truncated or added left 0 )
# use of digits allows concatenation of bits sub-fields

set res ""
while {\$i>0} {
set res [expr {\$i%2}]\$res
set i [expr {\$i/2}]
}
if {\$res==""} {set res 0}

if {\$digits != {} } {
append d [string repeat 0 \$digits ] \$res
set res [string range \$d [string length \$res ] end ]
}
split \$res ""
} ```

MS: Another approach is using the binary command:

``` proc int2bits {i} {
#returns a bitslist, e.g. int2bits 10 => {1 0 1 0}
binary scan [binary format I1 \$i] B* x
split [string trimleft \$x 0] {}
}
proc bits2int {bits} {
#returns integer equivalent of a bitlist
set bits [format %032s [join \$bits {}]]
binary scan [binary format B* \$bits] I1 x
set x
}```

Cameron Laird rightly noted the platform dependence of the "%032s" part. To correct that, do instead

```     # Miguel wrote this.
proc int2bits {i} {
#returns a bitslist, e.g. int2bits 10 => {1 0 1 0}
binary scan [binary format I1 \$i] B* x
split [string trimleft \$x 0] {}
}

set tmp [llength [int2bits -1]]
# What is this!?  Despite newcomers' instinct that there is some
# special @-related syntax for regular expressions, the subtlety
# of this next computation is pointed elsewhere.  Think of it this
# way:  bits2int is a function of *two* variables, operating in quite
# different spaces.  Along with the "bits" argument, bits2int depends
# on the arithmetic domain of operation, or the "width" (32-bit, 64-bit,
# ...) of native computations.
# It would be easy enough to write bits2int as a function whose
# implementation calculates arithmetic-width each time.  That's
# wasteful of cycles, in general, and arguably bad style.  What Miguel
# wanted when he wrote this definition was a bits2int with a particular
# proc body which is computed once on first scanning the definition.
#
# So the proc body itself is a variable rather then the stereotypical
#     "proc ... {
#             body
#      }"
# There are several ways to compute such a string.  He chose to write
# a template (the {# returns ...; set bits ...} part), and use regsub to
# substitute in the tiny sequence which is variable at this point.  Other
# approaches rely on [list], ""-grouping, [format], [subst], and so on.
#
# To repeat, though--there's nothing magic about '@' and regsub.
# It's just a mechanism for controlling the time of substitution or
# or evaluation--and any number of strings other than {@} would have
# worked as far as regsub's syntax is concerned.
#
# What I *really* should do is augment the [proc] Wiki page to explain
# mechanisms and situations for variable proc bodies.
regsub @ {
#returns integer equivalent of a bitlist
set bits [format %[email protected] [join \$bits {}]]
binary scan [binary format B* \$bits] I1 x
set x
} \$tmp tmp
proc bits2int {bits} \$tmp

unset tmp```

Note: I'm leaving this for now, as an illustration. But it is *not* correct, the 32-bit dependence is still there, implicit in the "I" format specification. A correct approach needs to wait until there are 64 bit format specs for binary

'Nother variation: use format to convert to octal (or hexadecimal, for that matter), and "string map ..." to transform octal digits to bit patterns.

``` proc int2bits x {
string map {
0 {0 0 0} 1 {0 0 1} 2 {0 1 0} 3 {0 1 1} 4 {1 0 0} 5 {1 0 1} 6 {1 1 0} 7 {1 1 1}
} [split [format %o \$x] ""]
} ;#RS - note however that the bit sequence is multiples of 3 long:
% int2bits 9
0 0 1 0 0 1
% int2bits 255
0 1 1 1 1 1 1 1 1```

Lars H: A simple variation on that, without leading zeros and about 30% faster, since it avoids the split:

``` proc int2bits x {
string map {
+0 0 +1 1 +2 {1 0} +3 {1 1} +4 {1 0 0} +5 {1 0 1} +6 {1 1 0} +7 {1 1 1}
0 { 0 0 0} 1 { 0 0 1} 2 { 0 1 0} 3 { 0 1 1} 4 { 1 0 0} 5 { 1 0 1} 6 { 1 1 0} 7 { 1 1 1}
} [format +%o \$x]
}
```

Helmut Giese in c.l.t:

``` proc val2Bin val {
set binRep [binary format c \$val]
binary scan \$binRep B* binStr
return \$binStr
}

# some tests
puts [val2Bin 10]
puts [val2Bin 129]
puts [val2Bin 0x7F]```

Arjen Markus The following fragment converts a hexadecimal representation to a floating-point number:

```   # Convert between float and hex
#
set hex "f3080000"
set bin [binary format h8 \$hex]
binary scan \$bin f float
puts "\$float"```

On a big-endian machine the result is "1.0" (on a little-endian machine it is bizarre: 4.6006..e-041, but that is simply because the bytes are reversed.)

For extracting "unsigned" floating-point values from very long (>32 bits) integers in hex, MS has this recommendation on c.l.t.:

``` proc conv {largeHex} {
set res 0.0
foreach hexDigit [split \$largeHex {}] {
set new 0x\$hexDigit
set res [expr {16.0*\$res + \$new}]
}
return \$res
}```

Arjen Markus Suppose you have a sequence of bytes that are read from a file. Two of these, say at position 6 and 7 (counting from 0), make up an integer number. Then:

```   set two_bytes [string range \$str 6 7]
binary scan "\0\0\$two_bytes" I intvalue```

will turn these two bytes into an integer (consisting of 4 bytes, hence the two leading nulls), if the original data are in big-endian order.

If they are in little-endian order, use:

```   set two_bytes [string range \$str 6 7]
binary scan "\${two_bytes}\0\0" i intvalue```

(Note: trailing nulls and a different format)

Eric Amundsen - Here's a solution that seems to work fine for 64-bit 2's complement integer math.

```    proc bin2int {binString} {
set result 0
for {set j 0} {\$j < [string length \$binString]} {incr j} {
set bit [string range \$binString \$j \$j]
set result [expr \$result << 1]
set result [expr \$result | \$bit]
}
return \$result
}

set binaryString 11001

puts [bin2int \$binaryString]```

KBK contributed this version in the Tcl chatroom on 2002-11-18:

``` proc fromBinary { digitString } {
set r 0
foreach d [split \$digitString {}] {
incr r \$r
incr r \$d
}
return \$r
}```

TV (24-4-'03) The above is along the lines which one could use in a reusable low level implementation. The complement in similar manner could be:

``` set a 16; set r {};
while {\$a != 0} {set h [expr \$a/2]; if {[expr \$h+\$h] != \$a} {set r "1\$r"} {set r "0\$r"} ; set a \$h }
puts \$r```

The result should be split to get a list from the string, and the domain is limited by the ceiling for expr based addition and division.

Addition and division by two can be fast for low level implementation, and the method can probably easily be used with an infinite math package (I've seen one but didn't try it out yet).

Lars H, 5 Feb 2005: The following two converters came out of a "challenge" at Googol magnitude. Since they are simply doing things to strings, they are not limited by expr precision!

``` proc dec2bin {num} {
while {[regexp {[0-9]} \$num]} {
set num\
[string map {o0 0 o1 1 o2 2 o3 3 o4 4 i0 5 i1 6 i2 7 i3 8 i4 9 0 ""}\
[string map {0 0o 1 0i 2 1o 3 1i 4 2o 5 2i 6 3o 7 3i 8 4o 9 4i} \$num]]
}
string map {i 1 o 0} \$num
}

proc bin2dec {num} {
set num h[string map {1 i 0 o} \$num]
while {[regexp {[io]} \$num]} {
set num\
[string map {0o 0 0i 1 1o 2 1i 3 2o 4 2i 5 3o 6 3i 7 4o 8 4i 9 ho h hi h1}\
[string map {0 o0 1 o1 2 o2 3 o3 4 o4 5 i0 6 i1 7 i2 8 i3 9 i4} \$num]]
}
string range \$num 1 end
}

proc int2bits {n} {split [dec2bin \$n] ""}
proc bits2int {n} {bin2dec [join \$n ""]}```

The idea of dec2bin is to alternate between a normal decimal notation and an a bi-quinary [L1 ] or "abacistic" notation where there are separate positions for fives (i = a five, o = no five). It is easy to generate the abacistic notation for half of a number in decimal notation -- just half every digit in place, e.g. 7 -> 3i -- and also easy to convert from abacistic to decimal notation, e.g. i3 -> 8. (Note that this doesn't combine is and os into the same decimal position as they were split off from.) is and os that don't have a digit to their right to recombine with constitute the bits that have been shifted out from the decimal number. bin2dec does the same thing backwards, with an extra h in the leftmost position to discard unnecessary zeroes.

``` % dec2bin 987654
11110001001000000110
% bin2dec 11110001001000000111
987655
%  bin2dec 1[string repeat 0 64]
18446744073709551616```

ET: This last example shows the problem with binary digits: They're impossible to read after just a few digits. While I never liked the language ADA, it did have an idea that I wish had caught on, the optional use of an underscore character in large numerical constants, to make the numbers readable. (And trival for a compiler or interpreter to scan/parse).

So, 1_234_567 is a number that is as readable as 1,234,567 and is much better than 1234567. I have a handy little converter, (which I stole from somewhere on this wiki and modifed):

``` proc {commas} {var {num 3} {char ","}} {
set len   [string length \$var]
set first [expr \$len - \$num]
set x     ""
while { \$len > 0} {
# grab left num chars
set lef [string range \$var \$first end]
if {[string length \$x] > 0} {
set x   "\${lef}\$char\${x}"
} else {
set x   \${lef}
}
# grab everything except left num chars
set var [string range  \$var 0 [expr \$first -1]]
set len   [string length \$var]
set first [expr \$len - \$num]
}
return \$x
}```

Here are some examples of its use:

``` dec2bin 987654
11110001001000000110

% commas [dec2bin 987654] 4 _
1111_0001_0010_0000_0110

% commas [dec2bin 987654] 4 { }
1111 0001 0010 0000 0110

% commas [dec2bin 987654] 1 { }
1 1 1 1 0 0 0 1 0 0 1 0 0 0 0 0 0 1 1 0

%commas 123456789     ;# naturally, it defaults for use with large decimal integers
123,456,789

% commas 123456789 3 _  ;# and here's how I wish numbers could be entered, in tcl and in C etc.
123_456_789

% puts "0x[commas [format %08X 123456789] 4 _]"    ;# and for hex numbers as well
0x075B_CD15   ```

Larry Very interesting page! Is there aywhere on the page code that simply converts numbers into binary?

For example we have CW000-aa000 and we'd like the program to first search the CW string then convert into binary numbers at position 3-4-5 and 9-10-11.

Chad: Why is it that after

` binary scan [binary format f 393.84] f n`

the value of n is 393.839996338?

AM (14 january 2011) This is a classic problem: the decimal number 393.84 can not be represented exactly in the binary format that most contemporal computers use. The value you see is the closest binary value. Compare this to the difficulty of representing 1/3 in decimal numbers.

LV Are there any languages which provide an add-on module or mode of operation where real variables are represented in a programmatic form designed not to lose percision? I was thinking that perhaps APL or LISP was one such language.

Perhaps someone has - or could - create a package for Tcl that would permit one to do math in that fashion for people who don't care about the lost computer cycles but do care about the precision.

CliC Racket, a Scheme dialect [L2 ] has "rational" types, so that e.g., 1/3 may be used in calculations with no loss of precision. I'm not a heavy user of it, though; I just downloaded it to check out the authors' "modern answer to SICP" programming tutorial.