Mist GPU Cluster

February 12, 2020 in Systems

The Mist system is a cluster of 54 IBM servers each with 4 NVIDIA V100 “Volta” GPUs with 32 GB memory each, and with NVLINKs in between. Each node of the cluster has 256GB RAM. It has InfiniBand EDR interconnection providing GPU-Direct RMDA capability.

This system is a combination of the GPU extension to the Niagara cluster and the refresh of the GPU cluster of the Southern Ontario Smart Computing Innovation Platform (SOSCIP). The Niagara GPU portion is available to Compute Canada users, while the SOSCIP portion will be used by allocated SOSCIP projects. By combining the resources, users from either group are able to take advantage of any unused computing resources of the other group.

The user experience on Mist is similar to that on Niagara, in that it uses the same scheduler and software module framework. At the same time, it is a very different machine from Niagara, so virtually all details (software, scheduler parameters, compilers, performance) are different.

The system is currently its beta testing phase, but will become accessible to Niagara users on Wednesday February 12, 2020.

Specifics of the cluster:

  • Mist consists of 54 nodes.
  • Each node has 32 IBM Power9 CPU cores (total core count: 1728, all four-way threaded) and 4 NVIDIA V100 Volta GPUs (total CUDA cores > million).
  • There is 256 GB of RAM per node.
  • EDR Infiniband one-to-one network with GPU-Direct RMDA capability.
  • Shares file systems with the Niagara cluster (parallel filesystem: IBM Spectrum Scale, formerly known as GPFS).
  • No local disks.
  • Theoretical peak performance (“Rpeak”) of 1.6 PF (double precision), 3.2 PF (single precision).
  • Technical documentation can be found at SciNet’s documentation wiki.