## Computers and real numbers

Arjen Markus 2004-07-14: I think it is appropriate to write a little tutorial on the subject of "real numbers". I do not mean this as a grand in-depth tutorial, just a few of the more basic facts.

Let us first start with some terminology: computers normally deal with floating-point numbers rather than the real numbers known from mathematics. And that is to a large degree the cause of much of the trouble!

Note:

Only a few things that I say here are specific to Tcl. Computers in general have a hard time dealing with real numbers.

## What is a floating-point number?

Floating-point numbers are approximations of the real numbers:

• They have a limited range, typically from 1.0e-37 to 1.0e+37 for "single-precision" and from 1.0e-200 to 1.0e+200 for "double-precision". Tcl uses "double-precision".
• They can approximate real numbers with a finite number of decimals only. For "double-precision", it is typically 12 to 13 decimals. Enough in many practical applications, but you can get into serious problems.
• Whereas we are used to decimals (1.1 and 0.002 for instance), most current-day computers use a binary system, so they can not represent 1.1 exactly. Strange? Yes, perhaps, but not if you compare this to 1/3 in our decimal system:
`1/3 = 0.333333... a never-ending sequence of threes!`

## What does it mean?

What does this mean in practice? Well, have a look yourself:

```set x 0.1
set y 0.3
set z [expr {\$x+\$y}]

puts "\$x + \$y = \$z"
puts "Difference: [expr {\$z-\$x-\$y}]"```

On my PC, the difference is approximately 5.55e-17. This is due to the fact that both 0.1 and 0.3 are represented by floating-point numbers that are not exactly 0.1 and 0.3 and their sum is not exactly 0.4.

Does this matter? It depends.

### Real variables controlling a loop

What you should not do is program like this:

```for {set x 0.1} {\$x < 1.1} {set x [expr {\$x+0.1}]} {
puts \$x
}```

On my PC x runs from 0.1 to 1.1, not 1.0! For the sake of completeness it can be noted that the problem does not occur in general, e.g. from 1.0 till 10.0 there is no problem. The casual reader should be aware of the fact that "real" problems only arises when fractions exist (decimal fractions are not necessarily represented exactly in binary) or when the number is very high (truncating digits). In short: all values that can also be represented as integers are safe.

Here is a more elaborate loop that shows the pitfall ...

```set tcl_precision 17
foreach start {0.0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1.0} {
set stop  [expr {\$start+1.0}]
set count 0
for {set x \$start} {\$x < \$stop} {set x [expr {\$x+0.1}]} {
incr count
}
puts "End: \$x -- \$count"
}```

Here is the output I get:

``` End: 1.0999999999999999 -- 11
End: 1.2 -- 11
End: 1.2 -- 10
End: 1.3 -- 10
End: 1.4000000000000001 -- 10
End: 1.5000000000000002 -- 10
End: 1.6000000000000003 -- 10
End: 1.7000000000000004 -- 10
End: 1.8000000000000007 -- 10
End: 1.9000000000000008 -- 10
End: 2.0000000000000009 -- 10```

Yes, tcl_precision is a trick: this is one of the few magic variables in Tcl. It controls the number of decimals that are used to convert numbers to readable strings.

The variation in the number of iterations (the value of "count" at the end) shows why you should not use "real" variables to control loops.

## The consequences of finite precision

Another remarkable thing about computer arithmetic is this:

x + y + z may not be equal to (x + y) + z or z + y + x

Just look at this code:

```set x  1.0e30
set y  1.0
set z -1.0e30

puts "x+y+z = [expr {\$x+\$y+\$z}]"
puts "x+z+y = [expr {\$x+\$z+\$y}]"```

The result:

``` x+y+z = 0.0
x+z+y = 1.0```

Because of the finite precision, 1.0e30+1.0 becomes 1.0e30.

## Rounding

By now it must be clear: nothing concerning "real numbers" is easy with computers. That is even true for such a seemingly simple matter as rounding.

There are at least four methods to round a number:

• Round 0.1 ... 0.4 downwards to 0, 0.5 ... 0.9 upwards to 1
• Make sure the last digit becomes even
• Round to zero: 0.9 ==> 0
• Round to infinity: 0.1 ==> 1

The first two are the commonest in manual computations, but quite often a computer will simply truncate the value:

`set e [expr {1/3}]`

does not become 0.33333333..., but simply 0: the operands are integer, so the result will become integer. To force the result to maintain the decimals:

`set e [expr {double(1)/3}]`

for instance.

Rounding is extremely important with financial computations: get it wrong and, alas, you lose money!

An example where rounding to even is preferrable is this: Round off the number 0.444445 to less and less decimals.

The first method would give: 0.44445, 0.4445, 0.445, 0.45, 0.5.

Rounding to even gives: 0.44444, 0.4444, 0.444, 0.44, 0.4, so the difference is smallest.

Lars H 2004-11-09: I don't think that is the usual reason for the "round to even" rule. Consider instead the situation that several rounded numbers are to be added together (happens for example in matrix multiplication, where the products of matrix elements have to be rounded to fit within available precision). With "round to even", the distribution of the rounding error is (under reasonable assumptions about the randomness of input) symmetric around 0. With "round upwards" it would not be, and as a result the error would have a positive expectation value. For the sum of many rounded numbers, this means "round upwards" on average leads to a sum that is too large, whereas "round to even" on average leads to a sum with the right value; "round to even" thus avoids a source of systematic error.

escargo 2004-07-15: Don't forget the issue of how to round negative numbers; this can be important when calculations are performed when making comparisons. Rounding toward zero and rounding toward negative infinity are two common ways of rounding.

## Comparisons

A last slippery spot: comparing two numbers. It is said that you should not directly compare for equality:

`if { \$x == 0.1 } { puts "Do not do this!" }`

Instead use a small enough margin:

`if { abs(\$x-0.1) < 1.0e-10 } { puts "This is more reliable" }`

This is not only true of equality but also inequality and even lower than and greater - see the do-loops above!

One convenient way out of this conundrum is the math::fuzzy module of Tcllib. It takes care of determining a suitable margin and returns a consistent result.

FPX: If you don't want to depend on the math package in Tcllib, you can define a comparison function as

```proc fequal {x y} {
set epsilon 1.0e-6
if {[expr abs(\$y) < \$epsilon]} {
return [expr abs(\$x) < \$epsilon]
}
return [expr (abs(\$x - \$y) / double(\$y)) < \$epsilon]
}```

## There is more!

Further information - probably more than you want to know is given by David Goldberg in What Every Computer Scientist Should Know about Floating Point [1 ]

Another paper that may be more readable (i.e. less mathematics) is: The Perils of Floating Point by Bruce Bush [2 ]

The page, A real problem, has much more examples and some technical discussions on the subject.

From time to time you may get into serious trouble with straightforward computations. See the page Evaluating polynomial functions if you feel up to a preview. Luckily, in many cases smart people have already found a solution. Consult your local numerical analysis guru to find out how to tackle the problem.

TV: Well, apart from that rather strange logic, basic rules aren't too hard: don't simulate higher order approximations when what you approximate probably doesn't contain them, make sure every intermedeate result in your computations is reasonably within the accuracy of your floating point number, if you can prevent rounding by multiplying/dividing with multiples of 2, possibly use maxima or some other means to leave fractions in place or even use its infinite precision computations, and if you dare simply step for step (brace level by brace level) analyze your computation, print intermediate results, and see when things are weird. Some problems of course are hard, but often reasonable thinking with pocket calculator in hand can prevent a lot of errors.

RJM: Thanks Arjen for this contribution. After reading this page I wonder why BCD arithmetic has not got to be a common option in computing apps. Every microprocessor provides a BCD mode flag or equivalent operations and as I have been working with C-compilers in the past, some of them provided an option "use BCD arithmetic". Of course, BCD is slower than binary calculations, but here the same applies to the increasing popularity of scripting languages: today's computers are sooo fast.... Perhaps, BCD is not implemented in Intel's floating point processor (I don't know) - that would make a dramatic speed difference (footnote: at the same time I added an extra note close to the for loop example).

AM Computers used to have BCD arithmetic, but the circuitry is more involved and the analysis of binary arithmetic is much easier. So, when computer manufacturers started to adopt standards and make bigger and more powerful computers, the emphasis was put on binary arithmetic.

There is a tendency, it seems, though, to go back to BCD support in hardware!

TV (jul 15 '04) To get away from the sort of magical aura of real numbers, let's simply stay with what they are made of on most computer systems: a binary two complements mantissa and exponent both with a fixed number of bits with the assumption the exponent makes all bits of the mantissa relevant as long as that fts in its range. Nothing much more to it than that, essentially, that answers most any of the questions.

AM 2004-08-10: I have been thinking about a small library to deal with Fixed-point arithmetic ...

For rigorous background and generalizations from a mathematician, see "Computing over the Reals: Where Turing Meets Newton" [3 ].

AM 2004-10-01: What a splendid little article! Just what I needed. (I hope the book it refers to will be as clear as the article).

While reading it, I came up with a small thought experiment. Suppose:

• you have two "oracles" that will produce the digits of two numbers, rational or irrational, does not matter. They will produce them one at a time, so that you get the opportunity to add the two numbers to as much precision as you want.
• is it then possible to create an oracle that will produce the sum of these two numbers in the same way, that is produce the digits one at a time?

The answer astonished me: NO. Here is why:

• Suppose the first number is 0.999.....9 (the number of 9's is fixed but not known in advance) and the second one is 0.000....02 (again the number of zeros is fixed but not known in advance).
• Depending on the number of 9's and the number of 0's the first (!) digit of the sum will be 0 or 1.

Now this is with such a simple operation as addition. Just think what happens with multiplication or division.

TV: Isn't that, apart from the unclear reason for writing these things, sort of like asking for rules like in normal limits in mathematics, and rows ?

Sarnold: I have tried to understand such things while writing BigFloat for Tcl (formerly MPA), a package dedicaced to arbitrary precision floating-point numbers. I find 'real numbers' just too complex for me. Even some physicists think usually that there is no need for anything else than 'double's.

AM: Your package does seem to handle real numbers quite nicely, though :)

Another paper of interest: [5 ] gives a nice introduction to interval arithmetic, and shows how to use it to rigourously solve Ordinary Differential Equations. See also [4 ] (author's homepage) for some more on the subject.

LV: Other packages that attempt to handle larger floating numbers include Big Floating-Point Numbers and Tcl based Packed Decimal Arithmetic.