Arjen Markus (10 july 2003) A simple alternative to *a priori* analysis of numerical algorithms and interval arithmetic is that you keep track of numerical errors while doing the calculations. The method is described in detail by Nicholas Higham.

The script below is a basic implementation of the technique (just two operations) to illustrate it.

# numerr.tcl -- # Running error analysis # # Bibliography: # Nicholas J. Higham: # Accuracy and stability of Numerical Algorithms # SIAM, 1996, ISBN 0-89871-355-2 # # The idea: # Floating-point operations are not exact, but they # can be modelled as: # fl( x op y ) = (1+D) (x op y) # where |D| <= u, u is the unit roundoff or machine precision # # The procedures here return: # - The result fl( x op y ) # - The accumulated error estimate # in the form of a two-elements list # namespace eval ::math::numerr { namespace export + - * / \ variable roundoff 1.0e-16 } # + -- # Add two numbers and return the result with error estimate # # Arguments: # operand1 First operand (number with error estimate or regular number) # operand2 Second operand (ditto) # Result: # Sum and error estimate # proc ::math::numerr::+ {operand1 operand2} { variable roundoff set add1 [lindex $operand1 0] set err1 [lindex $operand1 1] set add2 [lindex $operand2 0] set err2 [lindex $operand2 1] if { $err1 == "" } { set err1 0.0 } if { $err2 == "" } { set err2 0.0 } return [list [expr {$add1+$add2}] \ [expr {abs($add1+$add2)*$roundoff+$err1+$err2}]] } # * -- # Multiply two numbers and return the result with error estimate # # Arguments: # operand1 First operand (number with error estimate or regular number) # operand2 Second operand (ditto) # Result: # Porduct and error estimate # proc ::math::numerr::* {operand1 operand2} { variable roundoff set mult1 [lindex $operand1 0] set err1 [lindex $operand1 1] set mult2 [lindex $operand2 0] set err2 [lindex $operand2 1] if { $err1 == "" } { set err1 0.0 } if { $err2 == "" } { set err2 0.0 } return [list [expr {$mult1*$mult2}] \ [expr {abs($mult1*$mult2)*$roundoff + \ abs($mult2)*$err1 + abs($mult1)*$err2}]] } # Test code # namespace import ::math::numerr::* set sum 0.0 set val 100.0 for { set i 0 } { $i < 10 } { incr i } { set sum [+ $sum $val] puts "$sum -- [expr {[lindex $sum 1]/[lindex $sum 0]}]" set val [* $val 1.1] }

When run, it produces this output:

100.0 1e-14 -- 1e-16 210.0 4.2e-14 -- 2e-16 331.0 9.93e-14 -- 3e-16 464.1 1.8564e-13 -- 4e-16 610.51 3.05255e-13 -- 5e-16 771.561 4.629366e-13 -- 6e-16 948.7171 6.6410197e-13 -- 7e-16 1143.58881 9.14871048e-13 -- 8e-16 1357.947691 1.2221529219e-12 -- 9e-16 1593.7424601 1.5937424601e-12 -- 1e-15

As you can see, the relative error increases with each iteration (the regularity is a bit surprising, but it seems to be correct :)

TV I'm not sure it's the intention of the page, but it may be good to add, there are various types of floating point numbers, of which probaly the IEEE standardized format is the most commonly refered too, though I don't know statistically. The idea of the {mantisa exponent} tuple is essential, and normally, both are formatted as two's complement binary numbers, though I think offsetted numbers also exist. I guess most common CPU / FPU's use fixed accuracy mantissa and exponent, though that too could in principle be made adaptive, which leaves you with two operations: the actual computation, possibly with some higher accuracy intermedeate result, and the readjusting to the outcoming mantissa and exponent. In principle you could have that {19000 1} + 00001 1} equals {19001 1} without any rounding, while for a different first number, you could get rounding.\

am Be assured that I know "all" about that :) The whole idea of this type of analysis is to keep track of rounding errors. The book by Higham contains more information on that than you may ever want to know!

TV It would seem to be an interesting book you describe. I know how the cpu works by and large which performs the operations, and in this case I think that is a good starting point.

The algorithm above is not irrelevant, even quite essential since it says * something * though quite honestly not so much about the effect of error propagation in the important (e.g. in fft) add/multiply combination, but it seems little insightfull for those who want to know what makes for inaccuracies and what doesn't.

The floating point unit mostly would have a mantissa and an exponent, it fits every result in the bit resolution of the mantissa, simply by making sure the first one of the result, starting from the left of the binary number, fits in the most significant position, and counting the exponent (binary) up or down to reflect the multiplication factor.

Part of what happens above would be that you'd start with a 'nice' (binary speaking) mantissa: 100 --> 1100100 bin, and a simple exponent: -7 (or some other reasonable zero-point bookkeeping, I don't know the common e.g. ieee specs by heart).

Then you compute, in the tuple binary domain, and things start to happen. Addition is overseeable, but still:

1100100...0 11111...1001 1100100...0 11111...1001 ---------------------------+ 11001000...0 11111...1001

Which to begin with requires an extra bit of precision in the mantissa, before it is brought back to the normal precision:

1100100...0 11111...1010

increasing the exponent to make the number rigth, and losing one bit of accuracy, which however isn't introducing any inaccuracy.

A some point, you will get a number which does not 'fit' in the mantissa, so that when the result of the addition is stored, one bit is lost, even though the inaccuracy is of course therefore only pow(2,-#mantissa_bits), relatively speaking. And there is nothing much to be done about it, except using more bits, at the expense of computation power. Maybe use machine code and a cpu which gives you that bit and bookkeep it in the overall computation.

Which brings us to the subject of the *algorithmic* accuracy, that is the sensitivity of the applied algorithm to the obviously ever present approximations of the 1/6 = 0.66666..7 kind, but than binary speaking. For that, one may want bi-jection as arithmatical operations, symmetry considerations, and noise analysis, which maybe could be seen in the above.

With multiplications, it is commonly known processor design criterion that obviously you get twice the number of relevant bits after a multiplication, so that after 10 multiplies, you nearly always run ot of bits, practically, and will need to round, and therefore lose accuracy. That, too, can be taken up to the algorithm level, and described well, and at least be dealt with in many cases, for instance where one can take 'mild' multiplications with rounded numbers for constants in intermedeate results, or at the right places fitting higher accuracies aka bit widths for the mantissa.

The exponent, of course, should never over or underflow, because then you'd lose accuracy at a mjor rate, but luckily this is easily veryfiable: just prevent intermedeate exponents which are smaller than the minimum coming from the number of bits, then the inaccuracies come from the mantissa, and should be smaller than 1 lsb or pow(2,-#bits + exponent)

Unless you want to discuss operators with a certain inaccuracy interval, which I wouldn't know a good example for, except maybe analog computations. You can of course take the whole inaccuracy of the numbers you use as the interval, but you wouldn't be accurate, and achieve not so much.

Often, practically working with bit strings is more efficient: every bit you have to assign to the exponent, you loose in the mantissa, and after all word fetches and stores, and datapaths, and alu/fpu accuracies are of some fixed word width, for instance 32 or 64 bits, so by using integer based computations, you will make most efficient use of that word width, if you can

For any symmetrical around zero, equal (over the whole number range) accuracy sensitivity problem, it is senseless to use exponents, and one would better scale the whole equation to fit in the word width nicely.

The analysis when you *do* round off, which is normal of course, could take the normal 'rounding' and 'ceiling' considerations into acocunt, they can tell exactly how much inaccuracy you may lose where in the computations.

I agree when one looks at an algorithmic level, many things can be said about inaccuracies..