the other side of the screen: Mai 2011

I deal with a lot of computer generated data. Molecular dynamics simulations provide me with terabytes of data in form of numbers which are stored as floating point numbers. These floating point numbers represent real numbers with a certain machine dependent maximum error, the machine epsilon. Here is an example in python:

>>> a=0.1
>>> a
0.10000000000000001

So, the computer stores the real number "0.1" as the floating point number "0.10000000000000001". This is indeed an acceptable error. Let's have a look at another example:

>>> a=[6.04, 6.08, 5.99, 6.01, 6.04, 6.02]
>>> sum(a)/len(a)
6.0299999999999985

The average of the array "a" is computed as "6.0299999999999985". Here, one doesn't know if it is precisely that value or if it is the average of floating point numbers representing the values given in "a". In this case we have:

>>> a
[6.04, 6.0800000000000001, 5.9900000000000002, 6.0099999999999998, 6.04, 6.0199999999999996]

The machine epsilon is dependent on the computer and is stored to a variable. In C this variable is stored in the standard library header file "float.h", in python one can retrieve it by:

>>> import sys
>>> sys.float_info.epsilon
2.2204460492503131e-16

Most programs I use do not respect this machine epsilon, this error. For instance, they just bluntly return:

average +- standard deviation
6.0299999999999985 +- 0.028284271247461926

This gives information which is simply not true, that is the average and the standard deviation. It's simply the nearest possible floating point number representation in the computer for the actual real number results. If I calculate the average by hand I get the real number "6.03".

floating point number, real number
6.0299999999999985, 6.03

Besides that wouldn't fit into a table reporting this result. So let's round that value to the least significant digit. Fortunately, I have a standard deviation which indicates the least significant digit. (This is just the way I learned it at school.)

Round up (yes, round up to make sure) the standard deviation or the known error to two significant digits:

0.028284271247461926 ~ 0.029

Round the average to the lowest decimal digit of the rounded standard deviation:

6.0299999999999985 ~ 6.030

Voila:

6.030 +- 0.028

To automate that I've found some time ago an algorithm to round a number "num" to an arbitrary number of significant digits "nsig":

from math import *
def roundto(num, nsig):
  if num == 0:
    return 0
  lg = int( log( abs(num) ) / log(10.0) )
  r = 10 ** ( lg - (nsig-1) )
  return int( num/r + 0.5 ) * r

>>> roundto(32.25235,4)
32.25
>>> roundto(32.25235,5)
32.252000000000002

Unfortunately, as one can see, the rounded number will be represented again by a floating point number! Therefore an adequate print statement would be required as well.

In python, I recently found a function doing almost the same, "round". The difference is that you specify the decimal to round to as opposed to the number of significant digits in "roundto".

>>> round(32.25235,2)
32.25
>>> round(32.25235,-1)
30.0

I reckon people committed to implementing arithmetic and other mathematical operations on computers have given a lot of attention to prevent these machine epsilon errors from build up. At least the simulation algorithms are designed keeping this in mind.

the other side of the screen

Donnerstag, 5. Mai 2011

Precision and Rounding