IEEE float dissection

IEEE754 float and double formats are curious beasts. The procedure below was designed to help understand the binary encoding of these numbers, dissecting the bits to unravel and present their meaning. It uses binary format to get the binary encoding of a float and then interprets the bits according to the spec. Tcl8.6's knowledge of binary numbers (format %b and 0b11010010 notation) is exploited to make the code a little easier to read.

Binary128 ("quadruple precision") is not supported because binary doesn't know about it yet: if you're really curious, this should be easily remedied with a small Critcl or tcc4tcl proc.

Example Output

You can see what values look like in different precision:

% set pi [expr atan(1)*4]
% seefloat pi
Value: 3.141592653589793  (0 10000000 10010010000111111011011)
Sign: +
Exponent: 1
Mantissa: 4788187
Significand: 1 + (4788187 / (2.**23))
Expression:  (1 + (4788187 / (2.**23))) * 2.**1
Re-calculated value:  3.1415927410125732

% seefloat pi double
Value: 3.141592653589793  (0 10000000000 1001001000011111101101010100010001000010110100011000)
Sign: +
Exponent: 1
Mantissa: 2570638124657944
Significand: 1 + (2570638124657944 / (2.**52))
Expression:  (1 + (2570638124657944 / (2.**52))) * 2.**1
Re-calculated value:  3.141592653589793

It also handles denormal numbers:

% seefloat 1.+2.-3. ;# a denormal number very close to 0
Value: 5.551115123125783e-17  (0 01001001 00000000000000000000000)
Sign: +
Exponent: -54
Mantissa: 0
Significand: 1 + (0 / (2.**23))
Expression:  (1 + (0 / (2.**23))) * 2.**-54
Re-calculated value:  5.551115123125783e-17

.. but NaN and Inf are not handled very well (TODO -- see below).

Code

proc tobinary {s} { ;# literal bytes to binary sequence
    binary scan $s B* d
    return $d
}
        
proc truncfloat {n} {
    binary scan [binary format R $n] R f
    return $f
}
proc fitsinbinary32 {x} {
    expr {$x == [truncfloat $x]}
}

proc seefloat {x {type "binary32"}} {
    if {$x ne "NaN"} {
        set x [expr $x]
    }
    #  FMT:    [binary scan/format] code
    #  EBITS:  bits in exponent
    #  MBITS:  bits in mantissa
    set swbody {
        binary32 - float {
            set FMT     R
            set EBITS   8
            set MBITS   23
        }
        binary64 - double {
            set FMT     Q
            set EBITS   11
            set MBITS   52
        }
    }
    set ERRFMT "Unknown type '%s', must be one of [join [dict keys $swbody] ,\ ]."
    lappend swbody default {
        return -code error [format $ERRFMT $type]
    }
    switch $type $swbody
    # Exponent bias
    set EBIAS [expr {2**($EBITS-1)-1}]
    # Maximum exponent (means value is Inf)
    set EMAX  0b[string repeat 1 $EBITS]
    # Top bit of mantissa (quiet vs signalling NaN)
    set MTOP  [expr {2**($MBITS-1)}]

    set bits [tobinary [binary format $FMT $x]]

    set fbits [concat [
                    string range $bits 0 0
                ] [
                    string range $bits 1 $EBITS
                ] [
                    string range $bits $EBITS+1 end
                ]]

    set sgn   [string range $bits 0 0]
    set exp 0b[string range $bits 1 $EBITS]
    set man 0b[string range $bits $EBITS+1 end]

    set sgn [expr {$sgn ? "-" : "+"}]

    set denormal false
    set isnan    false
    set isinf    false

    set sig [format "%d / (2.**%d)" $man $MBITS]

    if {$exp == $EMAX} {
        if {$man == 0} {
            puts "${sgn}Inf"
        } else {
            if {$man & $MTOP} {
                puts "quiet NaN"
            } else {
                puts "signalling NaN"
            }
        }
        return
    } elseif {$exp == 0} {
        set denormal true
        set exp [expr {1 - $EBIAS}]
    } else {
        set exp [expr {$exp - $EBIAS}]
        set sig "1 + ($sig)"
    }

    set expr "($sig) * 2.**$exp"

    puts "Value: $x  ($fbits)"
    puts "Sign: $sgn"
    puts [format "Exponent: %d" $exp]
    puts [format "Mantissa: %d" $man]
    if {$denormal} {
        puts "Denormal number"
    }
    puts "Significand: $sig"
    puts "Expression:  $expr"
    puts "Re-calculated value:  [expr $expr]"
}

Discussion and Further Work

This experiment has uncovered an obscure bug in binary format with binary32 targets:

% seefloat Inf
Value: Inf  (0 11111110 11111111111111111111111)
Sign: +
Exponent: 127
Mantissa: 8388607
Significand: 1 + (8388607 / (2.**23))
Expression:  (1 + (8388607 / (2.**23))) * 2.**127
Re-calculated value:  3.4028234663852886e+38

See http://core.tcl.tk/tcl/tktview/85ce4bf92 for more detail and an attempted fix.

It would be interesting to also examine bounds on representation, and how this relates to IEEE binary float to string conversion.

See Also

Contributors

aspect kicked this page off in a fit of idle curiosity.


arjen - 2015-02-01 11:17:14

You might also want to have a look at the Tcllib module "math::machineparameters" - it is the Tcl equivalent to the LAPACK routine DLAMCH.