Road to Niagara 3: Hardware setup

March 5, 2018 in blog-technical, for_press, for_researchers, for_users, news, Road_to_Niagara, Uncategorized

This is the fourth of a series of posts on the transition to SciNet’s new supercomputer called “Niagara”, which will replace the General Purpose Cluster (GPC) and Tightly Coupled Cluster (TCS). The transition to Niagara will take place in the fall of 2017, and the system is planned to be available to users in early 2018.

The University of Toronto has awarded the contract for Niagara to Lenovo, and some of the details of the hardware specifications of the Niagara system have been released:

The system will have the following hardware components:

  • 1,500 nodes.
  • Each node will have 40 Intel Skylake cores (making a total of 60,000 cores) at 2.4 GHz.
  • Each node will have 200 GB (188 GiB)of DDR4 memory.
  • The interconnect between the nodes will be Mellanox EDR Infiniband in a Dragonfly+ topology.
  • A ~9PB usable shared parallel filesystem (GPFS) will be mounted on all nodes.
  • A 256TB Excelero burst buffer (NVMe fabric, up to 160 GB/s) will be available for fast I/O.
  • Peak theoretical speed: 4.61 PetaFLOPS

Niagara is estimated to be installed and operational towards in March 2018, and ready for users not too long after.

Even before official ready-date, there will a period in which select users can try out and port their codes to Niagara.

After the friendly-user period, all current users of the GPC (and former users of the TCS) will get access to Niagara.

The large core count, ample memory per core, and fast interconnect support Niagara’s intended purpose to enable large parallel compute jobs of 512 cores or more.

The software setup will also be tailored to large parallel computations. Nonetheless, there will still be a fair amount of backfill opportunity for smaller jobs.

The setup of Niagara is intended to be similar in spirit to the GPC, but different in form: scheduling per node, a home, scratch and possibly project directory defined in environment variables, a module system, and access to our team of analyst to help you get your codes running, and running well.