Niagara at Scale – Oct 2021

September 12, 2021 in for_press, for_researchers, for_users, news

SciNet held a successful event “Niagara at Scale” in the Spring of 2021, during which Niagara was reserved for 48 hours for large parallel computations on the order of the size of the cluster. Such “heroic computations” are Niagara’s mandate, but they are hard to run within the regular batch scheduler.

Because of further demand for such large computations, another “Niagara at Scale” event will be held for two days in October 2021. Exact dates will be announced in the near future.

Purpose of “Niagara at Scale”

These events enable pre-approved projects that require all or nearly all of the capacity of the Niagara supercomputer at once. Such heroic computations are Niagara’s mandate, as it is the “Large Parallel cluster within the national systems of the Compute Canada Federation, and the fastest machine of its kind in Canada according to the TOP500 List. But computations of this size — think massively parallel codes running on tens of thousands of cores — are hard or impossible to run within the regular batch scheduler.

How to apply

Announcements of this event went out to Niagara users on August 16, 2021, with an application deadline of September 7. The selection of participants is currently being made.

Future events

There will be more Niagara at Scales events in the future. If you are a Niagara user that has massively parallel jobs or workflows that could take advantage of this opportunity, keep an eye out for future announcements.

SciNet Virtual Summer Training Program 2021

June 2, 2021 in for_educators, for_researchers, for_users, frontpage, news

For the second summer in a row, in lieu of its annual Ontario Summer School, SciNet will be offering weekly virtual summer training on Advanced Research Computing from June through to early September. Topics will include parallel programming, Linux shell, cybersecurity, large scale batch processing, and performance Python and R.

The program will start on June 14th, 2021 and currently consists of the following 8 courses (more may be added later), each with 3 online vents of 90 minutes on successive days within one week.

  • Enable your Research with Cybersecurity!
  • Advanced Linux Command Line
  • Introduction to Supercomputing
  • Parallel Programming at Scale on Supercomputers with MPI
  • Python and High Performance Computing
  • Parallel Programming on Multicore Computers with OpenMP
  • R and High Performance Computing
  • Debugging and Performance

See the program site on the SciNet education website for further details.

Rouge AMD GPU Cluster

May 17, 2021 in news, Systems


The Rouge cluster was donated to the University of Toronto by AMD as part of their COVID-19 HPC Fund support program. The cluster consists of 20 x86_64 nodes each with a single AMD EPYC 7642 48-Core CPU running at 2.3GHz with 512GB of RAM and 8 Radeon Instinct MI50 GPUs per node.

The nodes are interconnected with 2xHDR100 Infiniband for internode communications and disk I/O to the SciNet Niagara filesystems. In total this cluster contains 960 CPU cores and 160 GPUs.

The user experience on Rouge is similar to that on Niagara and Mist, in that it uses the same scheduler and software module framework.

The cluster was named after the Rouge River that runs through the eastern part of Toronto and surrounding cities.

The system is currently its beta testing phase. Existing Niagara and Mist users affiliated with the University of Toronto can request early access by writing to support@scinet.utoronto.ca

In tandem with this SciNet hosted system, AMD, in collaboration with Penguin Computing, has also given access to a cloud system of the same architecture.

Specifics of the cluster:

  • Rouge consists of 20 nodes.
  • Each node has a 48-core AMD EPYC7642 CPU, 2-way hyperthreaded, and 8 AMD Radeon Instinct MI50 GPUs.
  • There is 512 GB of RAM per node.
  • HDR Infiniband one-to-one network between nodes.
  • Shares file systems with the Niagara cluster (parallel filesystem: IBM Spectrum Scale, formerly known as GPFS).
  • No local disks.
  • Theoretical peak performance (“Rpeak”) of 1.6 PF (double precision), 3.2 PF (single precision).
  • Technical documentation can be found at SciNet’s documentation wiki.

Industry Post-doc Position in Dynamical Downscaling

March 16, 2021 in HPC Jobs, HPC Jobs Ontario, news

Professor W. R. Peltier at the University of Toronto Department of Physics in collaboration with Aquanty invites applications for a postdoctoral research associate to investigate climate change impacts in northern Canada. The research work will include dynamical downscaling of climate projections, with an emphasis on land-surface – climate interactions in Arctic regions. The successful candidate will use the Weather Research and Forecasting (WRF) model to downscale CMIP5 and CMIP6 projections, and to assess temporal and spatial changes in snow cover and permafrost distribution. This project is part of a larger initiative to investigate the impact of climate change on natural resources across Canada, and includes partners in academia, government, and industry.

The minimum requirements for this position are:

  • A doctorate in atmospheric science, meteorology, hydrology, physics or a similar quantitative field
  • Significant experience with the Python programming language, its numerical/scientific stack (e.g. numpy, xarray etc.) and version control (e.g. git)
  • Experience with Linux/Unix environments, shell scripting (e.g. bash) and high-performance/parallel computing
  • Demonstrated ability to publish novel research

The ideal candidate would also possess the following skills and experiences:

  • Research experience with WRF or a similar limited-area atmospheric model
  • Familiarity with land-surface models like Noah-MP or CLM, and the ability to make changes or updates to these model components
  • Interest in climate change impacts and application of research results
  • Commitment to maintainable and reusable software

It is also expected that the successful candidate will contribute to the formulation of research objectives
and the design of numerical experiments, as well as towards the writing and publication of their own
research and that of project partners.

The position will be supervised by Prof. Peltier at the University of Toronto, and there will be direct
technical interaction with Aquanty researchers. Due to the applied nature of the research project,
engagement with both the research community and with natural resources stake-holder groups is
expected.

The appointment will be for a 3-year period, and it is expected that the successful candidate will be
legally able to work in Canada, and will (pending the evolution of the COVID pandemic) eventually
(re-)locate to the Greater Toronto Area in order to maintain a presence at the University of Toronto.
Interested candidates should contact Dr. Andre R. Erler at aerler@aquanty.com (using the subject line
“Post-doc Application”) and include an academic CV and cover letter.

Niagara at Scale Pilot

March 5, 2021 in blog-general, for_press, for_researchers, for_users, frontpage, news

SciNet will be reserving the Niagara cluster for two days in March for the first-ever “Niagara at Scale”, from March 30th, 2021, at 12 noon EST, to April 1st, 2021, at 12 noon EST.

Purpose of the “Niagara at Scale” event

This event will enable pre-approved projects that require all or nearly all of the capacity of the Niagara supercomputer at once. Such heroic computations are Niagara’s mandate, as it is the “Large Parallel cluster within the national systems of the Compute Canada Federation, and the fastest machine of its kind in Canada according to the TOP500 List. But computations of this size — think massively parallel codes running on tens of thousands of cores — are hard or impossible to run within the regular batch scheduler.

How to apply

We already have some groups interested in participating, but we would like to extend our invitation to the whole Canadian high-performance computing community before committing to a particular date. Users that have massively parallel jobs or workflows that could take advantage of this opportunity, are encourage to contact us at support@scinet.utoronto.ca by Friday, March 12, 2021 (note: this is an extension of the original deadline of March 5).

In the email, please briefly describe your intended computation, as well as the size and duration of the jobs you would like to run at scale.  Successful proposals will need to show evidence that their codes can run efficiently on at least 20,000 cores on Niagara and include strong and/or weak scaling data and plots.

In addition, your codes must be able to checkpoint and restart, especially since jobs will be restricted to shorter wall time.

Information session on March 10, 2021

We will hold an online information session regarding this program on March 10, 2021 at our SciNet User Group Meeting at noon EST. Attend to learn what kind of computations this program is aimed at. We will also provide guidance on how to get your computation to such a large scale if it needs it but your code does not yet scale to that size. For more information and sign-up for the event, go to https://scinet.courses/569

Future “Niagara at Scale” Events

The current event is a pilot project. If this initiative proves successful, we are planning to hold several of these events per year.

Job Opportunity: Network and Security Administrator at SciNet

February 3, 2021 in HPC Jobs, HPC Jobs Ontario

Work At SciNetSciNet at the University of Toronto is looking for a Network and Security Administrator who will be responsible for maintaining, upgrading, and securing SciNet’s networking infrastructure

For more details see the Posting on the job site of the University of Toronto.

This posting closes on February 18th, 2021

SciNet Virtual Summer Training Program

June 12, 2020 in blog, for_researchers, for_users

In lieu of its annual in person Ontario Summer School, this year, SciNet, in collaboration with CAMH, will be offering weekly virtual summer training on High Performance Computing from June through to August. The program consists of the following 11 courses, each with 3 online vents of 90 minutes on successive days within one week: a lecture, a hands-on session, and a wrap-up session.

The first of these courses, the “Introduction to Supercomputing” has just finished, with over 70 participants. Still to come in the next weeks are

  • Intro to Linux Shell
  • Parallel Programming on Multicore Computers with OpenMP
  • Parallel Programming at Scale on Supercomputers with MPI
  • Neuroimaging Analysis at Scale
  • Python for MRI Analysis
  • Python and High Performance Computing
  • Brain Network Modeling
  • R and High Performance Computing
  • Whole-Genome Association Analysis with PLINK
  • Debugging and Performance

See the program site on the SciNet education website for further details.

Mist GPU Cluster

February 12, 2020 in Systems

The Mist system is a cluster of 54 IBM servers each with 4 NVIDIA V100 “Volta” GPUs with 32 GB memory each, and with NVLINKs in between. Each node of the cluster has 256GB RAM. It has InfiniBand EDR interconnection providing GPU-Direct RMDA capability.

This system is a combination of the GPU extension to the Niagara cluster and the refresh of the GPU cluster of the Southern Ontario Smart Computing Innovation Platform (SOSCIP). The Niagara GPU portion is available to Compute Canada users, while the SOSCIP portion will be used by allocated SOSCIP projects. By combining the resources, users from either group are able to take advantage of any unused computing resources of the other group.

The user experience on Mist is similar to that on Niagara, in that it uses the same scheduler and software module framework. At the same time, it is a very different machine from Niagara, so virtually all details (software, scheduler parameters, compilers, performance) are different.

The system became generally accessible to Niagara users on Wednesday February 12, 2020.

Specifics of the cluster:

  • Mist consists of 54 nodes.
  • Each node has 32 IBM Power9 CPU cores (total core count: 1728, all four-way threaded) and 4 NVIDIA V100 Volta GPUs (total CUDA cores > million).
  • There is 256 GB of RAM per node.
  • EDR Infiniband one-to-one network with GPU-Direct RMDA capability.
  • Shares file systems with the Niagara cluster (parallel filesystem: IBM Spectrum Scale, formerly known as GPFS).
  • No local disks.
  • Theoretical peak performance (“Rpeak”) of 1.6 PF (double precision), 3.2 PF (single precision).
  • Technical documentation can be found at SciNet’s documentation wiki.

2020 International Summer School on HPC Challenges in Computational Sciences, University of Toronto, Canada, July 7-12

November 29, 2019 in for_press, for_researchers, for_users, frontpage, news


Update April 17, 2020: This event has been postponed to 2021.

Applications open November 29, 2019, and are due January 27, 2020

Who can apply: Graduate students and postdoctoral scholars from institutions in Canada, Europe, Japan and the United States, especially if you use advanced computing in your research. Students from underrepresented groups in computing are highly encouraged to apply (e.g., women, racial/ethnic minorities, persons with disabilities, etc.).

Who are the teachers: Leading computational scientists and HPC technologists from the U.S., Japan, Europe and Canada will teach classes and provide mentoring to attendees.

What will you learn: Topics include:

  • HPC challenges by discipline
  • HPC programming proficiencies
  • Performance analysis & profiling
  • Scientific visualization
  • Big Data Analytics
  • Mentoring
  • Networking
  • Machine Learning
  • Canadian, EU, Japanese and U.S. HPC-infrastructures

Preferred qualifications, but not required:

  • Familiarity with HPC, not necessarily an HPC expert, but rather a scholar who could benefit from including advanced computing tools and methods into their existing computational work
  • A graduate student with a strong research plan or a postdoctoral fellow in the early stages of their research careers
  • Utilize parallel programming at least on monthly basis, more frequently preferred
  • A science or engineering background, however, applicants from other disciplines are welcome, provided your research activities include computational work.

Cost: School fees, meals, and housing are covered for all accepted applicants, also intercontinental flight costs.

Further information and application: https://ss20.ihpcss.org

Questions? Reach out to the contact for your region listed on the back of this page to have questions answered about eligibility, the application process, or the summer school itself.

This summer school is organized by:

            
            

Contacts

Reach out to the contact for your region listed to get questions answered about eligibility, the application process, or the summer school itself.

CANADA
SciNet HPC Consortium: www.scinethpc.ca

Ramses van Zon
SciNet, Univ. of Toronto, Canada
Email: rzon@scinet.utoronto.ca

EUROPE
PRACE: www.prace-ri.eu

Hermann Lederer Simon Wong
Max Planck Computing and Data Facility, Germany ICHEC, Ireland
Email: lederer@mpcdf.mpg.de Email: simon.wong@ichec.ie

JAPAN
RIKEN: www.r-ccs.riken.jp/en

Toshiyuki Imamura
CCS, RIKEN
Email: Imamura.toshiyuki@riken.jp

UNITED STATES
XSEDE: www.xsede.org.

Jay Alameda
NCSA, University of Illinois at Urbana-Champaign, United States
Email: alameda@illinois.edu

Job Opportunity: Scientific Applications Analyst (SciNet/SOSCIP)

November 21, 2019 in HPC Jobs, HPC Jobs Ontario

SOSCIP and SciNet are looking for a scientific applications analyst. Under general direction of the CTO, this individual will provide senior IT services and training in GPU programming, data science applications, and scientific computing workflows for the SciNet and SOSCIP advanced computing consortia which serve researchers at the University of Toronto and partner institutions, including industry researchers, faculty, postdoctoral fellows and graduate students in all disciplines and fields (e.g. science and engineering, medicine, finance, languages, etc.). The incumbent is involved in GPU-accelerated computing for data analytics and machine learning on large data sets (100TB and up). S/he works with researchers and research teams to plan, develop, install and optimize the SOSCIP GPU-Accelerated cluster for various research programs and provides technical consultation to researchers on their system needs for research operations. S/he also takes part in delivering and developing training and education on GPU applications.

For more details see the Posting on the job site of the University of Toronto.

This posting closes on December 14th, 2019.