Training in Research Computing and Data Science

The full power of high performance computing systems can best be exploited by people with specialized knowledge. At the same time, computational skills and knowledge are becoming a necessity for all fields of study. Therefore, the education and training in computational literacy, research computing and data science, is absolutely critical. SciNet has developed an education and training program (and an education site) for the wider scientific community aimed at helping students and users obtain the skills and knowledge required to get the most out of advanced research computing resources. It is one of our most important activities, and it has shown tremendous growth throughout SciNet’s existence, in particular in the area of data science.

SciNet’s training program

SciNet’s training program consists of a large offering of cross-disciplinary, hands-on and skill based workshops and lecture series (for examples, see the courses table below). It all started in 2009 with sessions introducing advanced research computing resources and yearly intensive parallel programming workshops, but as our user base has grown to encompass fields relatively new to advanced research computing, such as medical science, biology, forestry, and economics, the program was expanded to include topics in data science such as introductory scientific computing in Python, R, machine learning, and work-flow design, while still including advanced research computing and high performance computing.

Did you know…

  • Over 1600 people have registered for at last one of SciNet’s training or university courses (2013-2018)
  • Over 200 SciNet Certificates have been awarded (2013-2018).
  • The attendance of SciNet’s training program in 2018 was nearly tenfold that of 2012.
  • The largest contribution of this growth is from training in data science.
  • Three graduate courses have been developed on the basis of SciNet’s training program, and are taught by SciNet HPC analysts.
  • Almost 400 students have taken these courses for graduate credit (2015-2019).

The growth of SciNet’s training and education program is illustrated by the chart below which counts the total number of attendance (number of attendees times duration in hours) of all education and training events given by SciNet. This graph also highlights the growth in popularity of our data science courses (including machine learning).

The skills that SciNet aims to transfer are rare and sought-after, and complement and enhance the skills students learn in regular curricula. Users and students can get a certificate in Scientific Computing, Data Science, or High Performance Computing once they have completed enough SciNet credit-hours. As a document that proves the holder has highly competitive skills, the certificates are in high demand.

The demand for this kind of training has led to the development of graduate courses, organized in collaboration with other UofT departments.

Courses Table

The table below show some of the workshops, mini-courses and term-long courses that SciNet offers.

For an up-to-date list of current offerings, check out SciNet’s courses website, which also contains past course materials and recordings.

Workshops: Short lecture series:
Linux Shell
Linux Scripting Intro to Programming
Advanced Linux Shell Research Computing with Python
Relation Database Basics Neural Network Programming
Scientific Data Visualization Advanced Neural Networks
Workflow Optimization Machine Learning with Python
Parallel Debugging Advanced Parallel Scientific Computing
Parallel I/O
Coarray Fortran
GPU Programming (CUDA) Term-long courses:
Shared Memory Programming (OpenMP)
Distributed Memory Programming (MPI) Scientific Computing for Physical Scientists
Scalable Data Analysis Workshop Quantitative Applications for Data Analysis
Machine Learning Workshop Intro to Computational Biostatistics with R
Data Analysis with R
Parallel R
High Performance Computing with Python
Storage and I/O in Large Scale Scientific Projects
Intro to Apache Spark
Parallel Profiling and Performance Tools
Research Data Management

The diversity of academic backgrounds of the students taking our courses can be seem in the following chart, broken down by faculty within the University of Toronto.

faculty_studenthours_distribution_scinet_teaching

Our Partners in Computational Training

There are other units within the university that offer workshops on computational skills (see e.g. Map and Data Library’s Data & Digital Tools workshops). And for those interested in becoming trainers, the Technical Skills Outreach Project is organizing train-the-trainer events to boost the Carpentries at UofT, which focus on introductory workshops in computational skills.

Together with our partner consortia in Ontario, SHARCNET and CAC, SciNet is involved in the annual Ontario Summer Schools in High Performance and Advanced Research Computing. These schools provide attendees with opportunities to learn and share knowledge and experience in high performance and technical computing. Before the Covid-19 pandemic, each of the three consortia organized one week of in-person summer school. In the last two instances, the number of unique attendees to the Toronto-based summer school was over 150. During the pandemic, these summer schools have been replaced with a variety of virtual events.

SciNet is also an organizer and sponsor of the International High Performance Computing Summer School (IHPCSS). This event is a graduate-level summer institute organized as a collaboration between SciNet, XSEDE, PRACE and RCCS/RIKEN. In 2015, we were the local organizers of the IHPCSS, when it was held at the University of Toronto. The IHPCSS is an expenses-paid program which is open to graduate students from Canada, the US, Europe and Japan. The demand from Canadian students is consistently about ten times larger than the number of available spots, further evidence for the demand for training in research computing. The event was held virtually in 2021, but is planned to be in person in Athens, Greece, in 2022 ; see https://ss22.ihpcs.org for details.