Systems | SciNet | Advanced Research Computing at the University of Toronto

Rouge AMD GPU Cluster

May 17, 2021 in news, Systems

The Rouge cluster was donated to the University of Toronto by AMD as part of their COVID-19 HPC Fund support program. The cluster consists of 20 x86_64 nodes each with a single AMD EPYC 7642 48-Core CPU running at 2.3GHz with 512GB of RAM and 8 Radeon Instinct MI50 GPUs per node.

The nodes are interconnected with 2xHDR100 Infiniband for internode communications and disk I/O to the SciNet Niagara filesystems. In total this cluster contains 960 CPU cores and 160 GPUs.

The user experience on Rouge is similar to that on Niagara and Mist, in that it uses the same scheduler and software module framework.

The cluster was named after the Rouge River that runs through the eastern part of Toronto and surrounding cities.

The system is currently its beta testing phase. Existing Niagara and Mist users affiliated with the University of Toronto can request early access by writing to support@scinet.utoronto.ca

In tandem with this SciNet hosted system, AMD, in collaboration with Penguin Computing, has also given access to a cloud system of the same architecture.

Specifics of the cluster:

Rouge consists of 20 nodes.
Each node has a 48-core AMD EPYC7642 CPU, 2-way hyperthreaded, and 8 AMD Radeon Instinct MI50 GPUs.
There is 512 GB of RAM per node.
HDR Infiniband one-to-one network between nodes.
Shares file systems with the Niagara cluster (parallel filesystem: IBM Spectrum Scale, formerly known as GPFS).
No local disks.
Theoretical peak performance (“Rpeak”) of 1.6 PF (double precision), 3.2 PF (single precision).
Technical documentation can be found at SciNet’s documentation wiki.

Comments Off on Rouge AMD GPU Cluster

Mist GPU Cluster

February 12, 2020 in Systems

The Mist system is a cluster of 54 IBM servers each with 4 NVIDIA V100 “Volta” GPUs with 32 GB memory each, and with NVLINKs in between. Each node of the cluster has 256GB RAM. It has InfiniBand EDR interconnection providing GPU-Direct RMDA capability.

This system is a combination of the GPU extension to the Niagara cluster and the refresh of the GPU cluster of the Southern Ontario Smart Computing Innovation Platform (SOSCIP). The Niagara GPU portion is available to Compute Canada users, while the SOSCIP portion will be used by allocated SOSCIP projects. By combining the resources, users from either group are able to take advantage of any unused computing resources of the other group.

The user experience on Mist is similar to that on Niagara, in that it uses the same scheduler and software module framework. At the same time, it is a very different machine from Niagara, so virtually all details (software, scheduler parameters, compilers, performance) are different.

The system became generally accessible to Niagara users on Wednesday February 12, 2020.

Specifics of the cluster:

Mist consists of 54 nodes.
Each node has 32 IBM Power9 CPU cores (total core count: 1728, all four-way threaded) and 4 NVIDIA V100 Volta GPUs (total CUDA cores > million).
There is 256 GB of RAM per node.
EDR Infiniband one-to-one network with GPU-Direct RMDA capability.
Shares file systems with the Niagara cluster (parallel filesystem: IBM Spectrum Scale, formerly known as GPFS).
No local disks.
Theoretical peak performance (“Rpeak”) of 1.6 PF (double precision), 3.2 PF (single precision).
Technical documentation can be found at SciNet’s documentation wiki.

Comments Off on Mist GPU Cluster

Niagara Supercomputer

March 5, 2018 in Systems

Niagara is a homogeneous cluster of initially 61,920 cores but expanded (in 2020) to 80,640 cores, owned by the University of Toronto and operated by SciNet. This system is intended to enable large parallel jobs of 1024 cores and more. It is the one of the most powerful supercomputers in Canada available for academic research. Compute allocations are handled through Compute Canada’s annual resource allocation competition. Niagara was designed to optimize throughput of a range of scientific codes running at scale, energy efficiency, and network and storage performance and capacity.

Niagara was officially launched on March 5, 2018. The expansion became available in March 2020.

Specifics of the cluster:

Niagara consists of 2,016 nodes.
Each node has 40 Intel Skylake cores at 2.4GHz or Cascaselake cores at 2.5GHz, for a total of 80,640 cores.
There is 202 GB (188 GiB) of RAM per node.
EDR Infiniband network in a so-called ‘Dragonfly+’ topology.
6PB of scratch, 3PB of project space (parallel filesystem: IBM Spectrum Scale, formerly known as GPFS).
256 TB burst buffer (Excelero + IBM Spectrum Scale).
No local disks.
Theoretical peak performance (“Rpeak”) of 6 PF.
Estimated peak performance (“Rmax”) of 4 PF.
About 900 kW power consumption.

Comments Off on Niagara Supercomputer

High Performance Storage System (HPSS)

November 8, 2016 in for_researchers, for_users, Systems, Uncategorized

The High Performance Storage System (HPSS) is a tape-backed hierarchical storage system that provides a significant portion of the allocated storage space at SciNet. It is a repository for archiving data that is not being actively used. Data can be returned to the active filesystem on the compute clusters when it is needed.

SciNet’s HPSS currently has nearly 90 PB of capacity.

For more information, see the technical documentation on the SciNet wiki

Comments Off on High Performance Storage System (HPSS)

Teach Cluster

August 19, 2016 in Systems

Teach is a cluster of 672 cores at SciNet that has been assembled from older re-purposed compute hardware. Access to this small, homogeneous cluster is provided primarily for local teaching purposes. It is configured similarly to the production Niagara system.

The cluster consists of 42 repurposed nodes each with 16 cores (two 8-core Intel Xeon 2.0GHz Sandybridge CPUs), with 64GB of RAM per node.

The nodes are interconnected with 2.6:1 blocking QDR Infiniband for MPI communications and disk I/O to the SciNet Niagara filesystems.

The Teach cluster came online in the Fall on 2018 and is currently, in the winter term of 2019, used in three courses at the University of Toronto.

UofT faculty that is interested in using this cluster in their teaching can contact us at support@scinet.utoronto.ca.

Comments Off on Teach Cluster

SciNet | Advanced Research Computing at the University of Toronto

You are browsing the Blog for Systems.

Rouge AMD GPU Cluster

Mist GPU Cluster

Niagara Supercomputer

High Performance Storage System (HPSS)

Teach Cluster

University of Toronto

Recent SciNet News