SciNet’s publication about Niagara deployment

August 3, 2019 in blog, blog-general, blog-technical, for_press, for_researchers, for_users, frontpage, news, Road_to_Niagara

Have you ever wondered how a supercomputer is designed and brought to life?
Read SciNet’s latest paper on the deployment of Canada’s fastest supercomputer: Niagara.

Niagara is currently the fastest supercomputer accessible to academics in Canada.
In this paper we describe the transition process from our previous systems, the TCS and GPC, the procurement and deployment processes, as well as the unique features that make Niagara a one-of-a-kind machine in Canada.

Please cite this paper when using Niagara to run your computations, simulations or analysis:
“Deploying a Top-100 Supercomputer for Large Parallel Workloads: the Niagara Supercomputer”, Ponce et al, “Proceedings of PEARC’19: Practice and Experience in Advanced Research Computing on Rise of the Machines (Learning)”, 34 (2019).

Learn more about SciNet’s research and publications by visiting the following link.

Road to Niagara 3: Hardware setup

March 5, 2018 in blog-technical, for_press, for_researchers, for_users, news, Road_to_Niagara, Uncategorized

This is the fourth of a series of posts on the transition to SciNet’s new supercomputer called “Niagara”, which will replace the General Purpose Cluster (GPC) and Tightly Coupled Cluster (TCS). The transition to Niagara will take place in the fall of 2017, and the system is planned to be available to users in early 2018.

The University of Toronto has awarded the contract for Niagara to Lenovo, and some of the details of the hardware specifications of the Niagara system have been released:

The system will have the following hardware components:

  • 1,500 nodes.
  • Each node will have 40 Intel Skylake cores (making a total of 60,000 cores) at 2.4 GHz.
  • Each node will have 200 GB (188 GiB)of DDR4 memory.
  • The interconnect between the nodes will be Mellanox EDR Infiniband in a Dragonfly+ topology.
  • A ~9PB usable shared parallel filesystem (GPFS) will be mounted on all nodes.
  • A 256TB Excelero burst buffer (NVMe fabric, up to 160 GB/s) will be available for fast I/O.
  • Peak theoretical speed: 4.61 PetaFLOPS

Niagara is estimated to be installed and operational towards in March 2018, and ready for users not too long after.

Even before official ready-date, there will a period in which select users can try out and port their codes to Niagara.

After the friendly-user period, all current users of the GPC (and former users of the TCS) will get access to Niagara.

The large core count, ample memory per core, and fast interconnect support Niagara’s intended purpose to enable large parallel compute jobs of 512 cores or more.

The software setup will also be tailored to large parallel computations. Nonetheless, there will still be a fair amount of backfill opportunity for smaller jobs.

The setup of Niagara is intended to be similar in spirit to the GPC, but different in form: scheduling per node, a home, scratch and possibly project directory defined in environment variables, a module system, and access to our team of analyst to help you get your codes running, and running well.

Gravitation waves detected, again!

June 15, 2016 in blog, blog-general, blog-technical, for_press, for_researchers, for_users, in_the_news, news, success_story, Testimonials

We congratulate the LIGO and Virgo collaborations to the second-ever observation of gravitational waves from colliding black holes.

SciNet is proud to have contributed to the computation of the waveform templates that were used in this latest discovery of LIGO. LIGO measured about 55 gravitational wave cycles for this new binary black hole system. This large number of cycles made detailed computations of the expected wave-shapes more important than for the first detected black hole merger that was announced in February.

Canada is a leader of numerical calculations of colliding black holes, research led by Professor Harald Pfeiffer, Canada Research Chair for Numerical Relativity and Gravitational Wave Astrophysics at the University of Toronto. Pfeiffer states: I am very grateful for the sustained support of the SciNet team during the last 7 years; their support and the access to computing time on SciNet’s supercomputers have been crucial for my research program and its profound contributions to the LIGO discovery.


Above: The in-spiral and collision of two black holes similar to GW151226. The top portion of the frame shows the horizons of the two holes, in this case, at the moment close to the merger of the black holes. The middle portion of the frame shows the gravitational waveform projected onto the LIGO Livingston detector. The bottom part shows the frequency of the gravitational waves, gradually increasing from about 35Hz to above 700Hz. For this system, LIGO could observe many more gravitational wave cycles than for the first discoved system (named GW150914).

Visualization done by University of Toronto Undergraduate student Aliya Babul & Prof. Harald Pfeiffer, within the SXS Collaboration/

Computing and Planet-Finding

October 17, 2012 in blog, blog-general, blog-technical

An Earth-sized planet has been found around the star closest to the Sun, Alpha Centauri – and while the astronomers used a telescope, it was only with big computing that they could first “see” the planet.

Read the rest of this entry →

Stopping your program at the first floating point error

January 12, 2012 in blog, blog-technical

If you know that somewhere in your program, there lurks a catastrophic numerical bug that puts NaNs or Infs into your results and you want to know where it first happens, the search can be a little frustrating. However, as before, the IEEE standard can help you; these illegal events (divide by zero, underflow or overflow, or invalid operations which cause NaNs) can be made to trigger exceptions, which will stop your code right at the point where it happens; then if you run your code through a debugger, you can find the very line where it happens.

We’ll discuss using the gnu compilers here; other compiler suites have similar options.

Let’s take a look at the following Fortran code:

program nantest
    real :: a, b, c
    a = 1.
    b = 2.
    c = a/b
    print *, c,a,b
    a = 0.
    b = 0.
    c = a/b
    print *, c,a,b
    a = 2.
    b = 1.
    c = a/b
    print *,c,a,b
end program nantest

If we compile this code with -ffpe-trap=invalid (I usually add ,zero,overflow , and even underflow if I think that’s causing me a problem in intermediate results), then the debugger can tell us the line where it all goes wrong:

$ gfortran -o nantest nantest.f90 -ffpe-trap=invalid,zero,overflow -g -static
$ gdb nantest
(gdb) run
Starting program: /scratch/ljdursi/Testing/fortran/nantest 
  0.50000000       1.0000000       2.0000000    
Program received signal SIGFPE, Arithmetic exception.
0x0000000000400384 in nantest () at nantest.f90:13
13          c = a/b
Current language:  auto; currently fortran

With the intel fortran compiler (ifort), using the option -fpe0 will do the same thing.

It’s a little tricker with C code; we have to actually insert a call to “feenableexcept()”, which enables floating point exceptions, and is defined in fenv.h;

#include <stdio.h>
#include <fenv.h>
int main(int argc, char **argv) {
    float a, b, c;
    feenableexcept(FE_DIVBYZERO | FE_INVALID | FE_OVERFLOW);
    a = 1.;
    b = 2.;
    c = a/b;
    printf("%f %f %f\n", a, b, c);
    a = 0.;
    b = 0.;
    c = a/b;
    printf("%f %f %f\n", a, b, c);
    a = 2.;
    b = 1.;
    c = a/b;
    printf("%f %f %f\n", a, b, c);
    return 0;

but the effect is the same:

$ gcc -o nantest nantest.c -lm -g
$ gdb ./nantest
(gdb) run
Starting program: /scratch/s/scinet/ljdursi/Testing/exception/nantest 
1.000000 2.000000 0.500000
Program received signal SIGFPE, Arithmetic exception.
0x00000000004005d0 in main (argc=1, argv=0x7fffffffe4b8) at nantest.c:17
17	    c = a/b;

either way, you have a much better handle on where the errors are occuring.

New Free “Intro to HPC” eBook from TACC

January 3, 2012 in blog, blog-technical

To start off the new year, Victor Eijkhout from the Texas Advanced Computing Centre has released a free ebook (you can also buy a printed copy from Lulu) covering the basics of computer architecture for those scientists who want to better understand how to make their code perform well on modern machines; the basic ideas of parallel programming; some numerical fundamentals like ODEs, basic PDEs, and linear algebra; and some application areas like molecular dynamics, graph computations, and monte carlo simulations.   It’s a wide-ranging resource, and well worth taking a look at.

Another good work for scientists wanting to understand the realities of high performance computing is How To Write Fast Numerical Code: A Small Introduction by Chellappa, Franchetti, and Puschel at CMU.  This smaller, more focussed tutorial comes out of a course one of the authors taught in 2005 and again in 2008, and the course materials themselves are an interesting resource.

Testing Roundoff

November 23, 2011 in blog, blog-technical

A talk has been circulating (HT: Hacker News) from a conference celebrating 50 years of scientific computing at Stanford where the author, William Kahan, discusses an old and sadly disused trick for testing the numerical stability of the implementation of an algorithm that should work with any C99 or Fortran2003 compiler without changing the underlying code.    It’s definitely a tool that’s worth having in your toolbox, so we’re going to talk about it here.

We’ll consider a simple numerical problem; imagine a projectile launched from height h=0 with velocity v0=5000 m s-1, and subject to the Earth’s gravitational accelleration, g = 9.81 m s-2. We’re going to ask when the (first) time is that the projectile hits a height h.

This is going to be an application of our friend the quadratic equation:
[latex] r = \frac{-b \pm \sqrt{b^2 – 4 a c}}{2 a}[/latex].
Now, because of the repeated subtraction, a naive implementation of this equation is known to undergo catastrophic cancellation near b2=4ac, or for where the discriminant is much less than b — in our case, near the ends and the peak of the projectile’s trajectory.   We’re going to demonstrate that below.

Now, before we show that such sensitivity can happen, we should ask — why would we care? If we test our code and know it gives “good enough” answers under the conditions that matter to us, does it really matter what could happen in other circumstances? The answer, of course, is yes. There are a lot of things we could want to do — increase the agressiveness of compiler optimizations when compiling our code, for instance — which will have the effect of numerically perturbing our computation; and we need to know if those small perturbations will have small, or large, effects on our answers.

It turns out that IEEE 754, the standard for floating point numbers, can give us some help with this. (Everyone who does numerical work should know at least a little bit about the floating point standard, or at least the issues involved with floating point numbers. What every computer scientist should know about floating point, particularly the first few sections, is an essential guide). The floating point standard – which almost all widely-used computing hardware should support – allows you to set certain properties of the mathematics “on the fly”. One particularly useful feature is the ability to set how the last digit of all floating point operations are rounded – to nearest (the default), to zero (eg, always truncate), to positive infinity (eg, always round up) or to negative infinity (always round down). In the C99 standard, this is implemented in the “fenv.h” header and the math library; in Fortran2003, this is part of the intrinsic IEEE_ARITHMETIC module, where you can call IEEE_SET_ROUNDING_MODE.

By changing the rounding, you are perturbing every floating point operation in your calculation. If this perturbation results in significant changes in your result, then your calculation is very fragile, and you may have to look into re-writing the calculation, using another algorithm, or resorting to using higher precision for that calculation (which will push the perturbations to less significant decimal places). If not, then you have some evidence that your calculation is robust to perturbations, at least in the last bit.

Below we have an example of how you’d do this in C. We have a simple routine which uses the obvious implementation of the quadratic equation to calculate the time when the projectile is at one meter, and we perform this calculation with all available rounding modes:


#include <stdio.h>
#include <math.h>
#include <fenv.h>
const int NOSOLN=-1;
const int SOLN  = 0;
int time(const float vo, const float g, const float ho, float *time) {
    float disc  = (vo*vo - 2.*g*ho);
    if (disc &lt; 0) return NOSOLN;
    disc = sqrt(disc);
    float root1 = (vo + disc)/g;
    float root2 = (vo - disc)/g;
    if ((root2 &gt;= 0.) &amp;&amp; root2 &lt; root1)
        *time = root2;
        *time = root1;
    return SOLN;
int main(int argc, char **argv) {
    const float g =9.81;
    const float vo=5000.;
    const int   ho=1.;
    int nroundings=4;
    char *names[]  ={"To nearest", "To +inf", "To -inf", "To zero"};
    for (int r=0; r&lt;nroundings; r++) {
        int status = fesetround(roundings[r]);
        if (status) {
            fprintf(stderr,"Could not set rounding to '%s'.\n", names[r]);
        } else {
            float soln;
            time(vo, g, ho, &amp;soln);
            printf("%s: %f\n", names[r], soln);
    return 0;

We compile the code with gcc (any C99 compiler should work):

$ gcc -O0 -Wall -std=c99 quadratic.c -o quadratic -lm

Note that we need to explicitly link in the math library, and to turn off optimization (so that the compiler doesn’t replace the repeated calls to time() with a single call). Running this, we find:

$ ./quadratic
To nearest: 0.000199
To +inf: 0.000149
To -inf: 0.000249
To zero: 0.000249

Changing the rounding modes changes the result by 50%! This shows that our current implementation – which is not giving obviously wrong answers – is extremely fragile in the presence of numerical noise, and we should exercise extreme caution with compiler flags, etc. (We’ll leave it as an exercise for the reader to come up with a better approach!)