Mist GPU Cluster

February 12, 2020 in Systems

The Mist system is a cluster of 54 IBM servers each with 4 NVIDIA V100 “Volta” GPUs with 32 GB memory each, and with NVLINKs in between. Each node of the cluster has 256GB RAM. It has InfiniBand EDR interconnection providing GPU-Direct RMDA capability.

This system is a combination of the GPU extension to the Niagara cluster and the refresh of the GPU cluster of the Southern Ontario Smart Computing Innovation Platform (SOSCIP). The Niagara GPU portion is available to Compute Canada users, while the SOSCIP portion will be used by allocated SOSCIP projects. By combining the resources, users from either group are able to take advantage of any unused computing resources of the other group.

The user experience on Mist is similar to that on Niagara, in that it uses the same scheduler and software module framework. At the same time, it is a very different machine from Niagara, so virtually all details (software, scheduler parameters, compilers, performance) are different.

The system is currently its beta testing phase, but will become accessible to Niagara users on Wednesday February 12, 2020.

Specifics of the cluster:

  • Mist consists of 54 nodes.
  • Each node has 32 IBM Power9 CPU cores (total core count: 1728, all four-way threaded) and 4 NVIDIA V100 Volta GPUs (total CUDA cores > million).
  • There is 256 GB of RAM per node.
  • EDR Infiniband one-to-one network with GPU-Direct RMDA capability.
  • Shares file systems with the Niagara cluster (parallel filesystem: IBM Spectrum Scale, formerly known as GPFS).
  • No local disks.
  • Theoretical peak performance (“Rpeak”) of 1.6 PF (double precision), 3.2 PF (single precision).
  • Technical documentation can be found at SciNet’s documentation wiki.

Niagara Supercomputer

March 5, 2018 in Systems

Niagara is a homogeneous cluster of initially 61,920 cores but expanded (in 2020) to 80,640 cores, owned by the University of Toronto and operated by SciNet. This system is intended to enable large parallel jobs of 1024 cores and more. It is the most powerful supercomputer in Canada available for academic research. Compute allocations are handled through Compute Canada’s annual resource allocation competition. Niagara was designed to optimize throughput of a range of scientific codes running at scale, energy efficiency, and network and storage performance and capacity.

Niagara was officially launched on March 5, 2018. The expansion became available in March 2020.

Specifics of the cluster:

  • Niagara consists of 2,016 nodes.
  • Each node has 40 Intel Skylake cores at 2.4GHz or Cascaselake cores at 2.5GHz, for a total of 80,640 cores.
  • There is 202 GB (188 GiB) of RAM per node.
  • EDR Infiniband network in a so-called ‘Dragonfly+’ topology.
  • 6PB of scratch, 3PB of project space (parallel filesystem: IBM Spectrum Scale, formerly known as GPFS).
  • 256 TB burst buffer (Excelero + IBM Spectrum Scale).
  • No local disks.
  • Theoretical peak performance (“Rpeak”) of 6 PF.
  • Estimated peak performance (“Rmax”) of 4 PF.
  • About 900 kW power consumption.

High Performance Storage System (HPSS)

November 8, 2016 in for_researchers, for_users, Systems, Uncategorized

The High Performance Storage System (HPSS) is a tape-backed hierarchical storage system that provides a significant portion of the allocated storage space at SciNet. It is a repository for archiving data that is not being actively used. Data can be returned to the active filesystem on the compute clusters when it is needed.

SciNet’s HPSS currently stores nearly 9 PB of data.

For more information, see the technical documentation on the SciNet wiki


September 3, 2016 in Systems, Uncategorized

The SOSCIP GPU Cluster (SGC)is a Southern Ontario Smart Computing Innovation Platform (SOSCIP) resource located at the University of Toronto’s SciNet HPC facility. The SOSCIP multi-university/industry consortium is funded by the Ontario Government and the Federal Economic Development Agency for Southern Ontario.

The SOSCIP GPU Cluster consists of of 14 IBM Power 822LC “Minsky” Servers each with 2x10core 3.25GHz Power8 CPUs and 512GB Ram. Similar to Power 7, the Power 8 utilizes Simultaneous MultiThreading (SMT), but extends the design to 8 threads per core allowing the 20 physical cores to support up to 160 threads. Each node has 4x NVIDIA Tesla P100 GPUs each with 16GB of RAM with CUDA Capability 6.0 (Pascal) connected using NVlink.

Allocations of this system are done through Southern Ontario Smart Computing Innovation Platform (SOSCIP)

A quickstart guide to using this GPU cluster can be found on SciNet’s technical documentation wiki.

Teach Cluster

August 19, 2016 in Systems

Teach is a cluster of 672 cores at SciNet that has been assembled from older re-purposed compute hardware. Access to this small, homogeneous cluster is provided primarily for local teaching purposes. It is configured similarly to the production Niagara system.

The cluster consists of 42 repurposed nodes each with 16 cores (two 8-core Intel Xeon 2.0GHz Sandybridge CPUs), with 64GB of RAM per node.

The nodes are interconnected with 2.6:1 blocking QDR Infiniband for MPI communications and disk I/O to the SciNet Niagara filesystems.

The Teach cluster came online in the Fall on 2018 and is currently, in the winter term of 2019, used in three courses at the University of Toronto.

UofT faculty that is interested in using this cluster in their teaching can contact us at support@scinet.utoronto.ca.