Stopping your program at the first floating point error

January 12, 2012 in blog, blog-technical

If you know that somewhere in your program, there lurks a catastrophic numerical bug that puts NaNs or Infs into your results and you want to know where it first happens, the search can be a little frustrating. However, as before, the IEEE standard can help you; these illegal events (divide by zero, underflow or overflow, or invalid operations which cause NaNs) can be made to trigger exceptions, which will stop your code right at the point where it happens; then if you run your code through a debugger, you can find the very line where it happens.

We’ll discuss using the gnu compilers here; other compiler suites have similar options.

Let’s take a look at the following Fortran code:

program nantest
    real :: a, b, c
    a = 1.
    b = 2.
    c = a/b
    print *, c,a,b
    a = 0.
    b = 0.
    c = a/b
    print *, c,a,b
    a = 2.
    b = 1.
    c = a/b
    print *,c,a,b
end program nantest

If we compile this code with -ffpe-trap=invalid (I usually add ,zero,overflow , and even underflow if I think that’s causing me a problem in intermediate results), then the debugger can tell us the line where it all goes wrong:

$ gfortran -o nantest nantest.f90 -ffpe-trap=invalid,zero,overflow -g -static
$ gdb nantest
(gdb) run
Starting program: /scratch/ljdursi/Testing/fortran/nantest 
  0.50000000       1.0000000       2.0000000    
Program received signal SIGFPE, Arithmetic exception.
0x0000000000400384 in nantest () at nantest.f90:13
13          c = a/b
Current language:  auto; currently fortran

With the intel fortran compiler (ifort), using the option -fpe0 will do the same thing.

It’s a little tricker with C code; we have to actually insert a call to “feenableexcept()”, which enables floating point exceptions, and is defined in fenv.h;

#include <stdio.h>
#include <fenv.h>
int main(int argc, char **argv) {
    float a, b, c;
    feenableexcept(FE_DIVBYZERO | FE_INVALID | FE_OVERFLOW);
    a = 1.;
    b = 2.;
    c = a/b;
    printf("%f %f %f\n", a, b, c);
    a = 0.;
    b = 0.;
    c = a/b;
    printf("%f %f %f\n", a, b, c);
    a = 2.;
    b = 1.;
    c = a/b;
    printf("%f %f %f\n", a, b, c);
    return 0;

but the effect is the same:

$ gcc -o nantest nantest.c -lm -g
$ gdb ./nantest
(gdb) run
Starting program: /scratch/s/scinet/ljdursi/Testing/exception/nantest 
1.000000 2.000000 0.500000
Program received signal SIGFPE, Arithmetic exception.
0x00000000004005d0 in main (argc=1, argv=0x7fffffffe4b8) at nantest.c:17
17	    c = a/b;

either way, you have a much better handle on where the errors are occuring.

New Free “Intro to HPC” eBook from TACC

January 3, 2012 in blog, blog-technical

To start off the new year, Victor Eijkhout from the Texas Advanced Computing Centre has released a free ebook (you can also buy a printed copy from Lulu) covering the basics of computer architecture for those scientists who want to better understand how to make their code perform well on modern machines; the basic ideas of parallel programming; some numerical fundamentals like ODEs, basic PDEs, and linear algebra; and some application areas like molecular dynamics, graph computations, and monte carlo simulations.   It’s a wide-ranging resource, and well worth taking a look at.

Another good work for scientists wanting to understand the realities of high performance computing is How To Write Fast Numerical Code: A Small Introduction by Chellappa, Franchetti, and Puschel at CMU.  This smaller, more focussed tutorial comes out of a course one of the authors taught in 2005 and again in 2008, and the course materials themselves are an interesting resource.

Testing Roundoff

November 23, 2011 in blog, blog-technical

A talk has been circulating (HT: Hacker News) from a conference celebrating 50 years of scientific computing at Stanford where the author, William Kahan, discusses an old and sadly disused trick for testing the numerical stability of the implementation of an algorithm that should work with any C99 or Fortran2003 compiler without changing the underlying code.    It’s definitely a tool that’s worth having in your toolbox, so we’re going to talk about it here.

We’ll consider a simple numerical problem; imagine a projectile launched from height h=0 with velocity v0=5000 m s-1, and subject to the Earth’s gravitational accelleration, g = 9.81 m s-2. We’re going to ask when the (first) time is that the projectile hits a height h.

This is going to be an application of our friend the quadratic equation:
[latex] r = \frac{-b \pm \sqrt{b^2 – 4 a c}}{2 a}[/latex].
Now, because of the repeated subtraction, a naive implementation of this equation is known to undergo catastrophic cancellation near b2=4ac, or for where the discriminant is much less than b — in our case, near the ends and the peak of the projectile’s trajectory.   We’re going to demonstrate that below.

Now, before we show that such sensitivity can happen, we should ask — why would we care? If we test our code and know it gives “good enough” answers under the conditions that matter to us, does it really matter what could happen in other circumstances? The answer, of course, is yes. There are a lot of things we could want to do — increase the agressiveness of compiler optimizations when compiling our code, for instance — which will have the effect of numerically perturbing our computation; and we need to know if those small perturbations will have small, or large, effects on our answers.

It turns out that IEEE 754, the standard for floating point numbers, can give us some help with this. (Everyone who does numerical work should know at least a little bit about the floating point standard, or at least the issues involved with floating point numbers. What every computer scientist should know about floating point, particularly the first few sections, is an essential guide). The floating point standard – which almost all widely-used computing hardware should support – allows you to set certain properties of the mathematics “on the fly”. One particularly useful feature is the ability to set how the last digit of all floating point operations are rounded – to nearest (the default), to zero (eg, always truncate), to positive infinity (eg, always round up) or to negative infinity (always round down). In the C99 standard, this is implemented in the “fenv.h” header and the math library; in Fortran2003, this is part of the intrinsic IEEE_ARITHMETIC module, where you can call IEEE_SET_ROUNDING_MODE.

By changing the rounding, you are perturbing every floating point operation in your calculation. If this perturbation results in significant changes in your result, then your calculation is very fragile, and you may have to look into re-writing the calculation, using another algorithm, or resorting to using higher precision for that calculation (which will push the perturbations to less significant decimal places). If not, then you have some evidence that your calculation is robust to perturbations, at least in the last bit.

Below we have an example of how you’d do this in C. We have a simple routine which uses the obvious implementation of the quadratic equation to calculate the time when the projectile is at one meter, and we perform this calculation with all available rounding modes:


#include <stdio.h>
#include <math.h>
#include <fenv.h>
const int NOSOLN=-1;
const int SOLN  = 0;
int time(const float vo, const float g, const float ho, float *time) {
    float disc  = (vo*vo - 2.*g*ho);
    if (disc &lt; 0) return NOSOLN;
    disc = sqrt(disc);
    float root1 = (vo + disc)/g;
    float root2 = (vo - disc)/g;
    if ((root2 &gt;= 0.) &amp;&amp; root2 &lt; root1)
        *time = root2;
        *time = root1;
    return SOLN;
int main(int argc, char **argv) {
    const float g =9.81;
    const float vo=5000.;
    const int   ho=1.;
    int nroundings=4;
    char *names[]  ={"To nearest", "To +inf", "To -inf", "To zero"};
    for (int r=0; r&lt;nroundings; r++) {
        int status = fesetround(roundings[r]);
        if (status) {
            fprintf(stderr,"Could not set rounding to '%s'.\n", names[r]);
        } else {
            float soln;
            time(vo, g, ho, &amp;soln);
            printf("%s: %f\n", names[r], soln);
    return 0;

We compile the code with gcc (any C99 compiler should work):

$ gcc -O0 -Wall -std=c99 quadratic.c -o quadratic -lm

Note that we need to explicitly link in the math library, and to turn off optimization (so that the compiler doesn’t replace the repeated calls to time() with a single call). Running this, we find:

$ ./quadratic
To nearest: 0.000199
To +inf: 0.000149
To -inf: 0.000249
To zero: 0.000249

Changing the rounding modes changes the result by 50%! This shows that our current implementation – which is not giving obviously wrong answers – is extremely fragile in the presence of numerical noise, and we should exercise extreme caution with compiler flags, etc. (We’ll leave it as an exercise for the reader to come up with a better approach!)