Training in Research Computing and Data Science

The full power of high performance computing systems can best be exploited by people with specialized knowledge. The education and training of such people is absolutely critical, especially since the methodology in many disciplines has evolved to include a large computational component. SciNet has developed an education and training program (and an education site) for the wider scientific community aimed at helping students and users obtain the skills and knowledge required to get the most out of advanced research computing resources. It is one of our most important activities, and it has shown tremendous growth throughout SciNet’s existence, in particular in the area of data science.

SciNet’s training program

SciNet’s training program consists of a large offering of cross-disciplinary, hands-on and skill based workshops and lecture series (see courses table below). It all started in 2009 with sessions introducing advanced research computing resources and yearly intensive parallel programming workshops, but as our user base has grown to encompass fields relatively new to advanced research computing, such as medical science, biology, forestry, and economics, the program was expanded to include topics in data science such as introductory scientific computing in Python, R, machine learning, and work-flow design, while still including advanced research computing and high performance computing.

Did you know…

  • Over 1600 people have registered for at last one of SciNet’s training or university courses (2013-2018)
  • Over 200 SciNet Certificates have been awarded (2013-2018).
  • The attendance of SciNet’s training program in 2018 was nearly tenfold that of 2012.
  • The largest contribution of this growth is from training in data science.
  • Three graduate courses have been developed on the basis of SciNet’s training program, and are taught by SciNet HPC analysts.
  • Almost 400 students have taken these courses for graduate credit (2015-2019).

The growth of SciNet’s training and education program is illustrated by the chart below which counts the total number of attendance (number of attendees times duration in hours) of all education and training events given by SciNet. This graph also highlights the growth in popularity of our data science courses (including machine learning).

The skills that SciNet aims to transfer are rare and sought-after, and complement and enhance the skills students learn in regular curricula. Users and students can get a certificate in Scientific Computing, Data Science, or High Performance Computing once they have completed enough SciNet credit-hours. As a document that proves the holder has highly competitive skills, the certificates are in high demand.

The demand for this kind of training has led to the development of graduate courses, organized in collaboration with other UofT departments.

Courses Table

The table below show some of the workshops, mini-courses and term-long courses that SciNet offers:

Workshops: Short lecture series:
Linux Shell
Linux Scripting Intro to Programming
Advanced Linux Shell Research Computing with Python
Relation Database Basics Neural Network Programming
Scientific Data Visualization Advanced Neural Networks
Workflow Optimization Machine Learning with Python
Parallel Debugging Advanced Parallel Scientific Computing
Parallel I/O
Coarray Fortran
GPU Programming (CUDA) Term-long courses:
Shared Memory Programming (OpenMP)
Distributed Memory Programming (MPI) Scientific Computing for Physical Scientists
Scalable Data Analysis Workshop Quantitative Applications for Data Analysis
Machine Learning Workshop Intro to Computational Biostatistics with R
Data Analysis with R
Parallel R
High Performance Computing with Python
Storage and I/O in Large Scale Scientific Projects
Intro to Apache Spark
Parallel Profiling and Performance Tools
Research Data Management

SciNet’s education site contains up-to-date information on courses, as well as course materials and recordings.

The diversity of academic backgrounds of the students taking our courses can be seem in the following chart, broken down by faculty within the University of Toronto.

faculty_studenthours_distribution_scinet_teaching

Collaborations in research computing training

Together with our partner consortia, SHARCNET and CAC, SciNet is involved in the annual Ontario Summer Schools in High Performance Computing. These schools provide attendees with opportunities to learn and share knowledge and experience in high performance and technical computing. Each of the three consortia organizes one week of summer school. In the past two years, the number of unique attendees to the Toronto-based summer school was over 150.

SciNet is also an organizer and sponsor of the International High Performance Computing Summer School (IHPCSS). This ‘school’ is a graduate-level summer institute organized as a collaboration between SciNet, XSEDE, PRACE and RCCS/RIKEN. In 2015, we were the local organizers of the IHPCSSS, when it was held at the University of Toronto. The IHPCSS is an expenses-paid program which is open to graduate students from Canada, the US, Europe and Japan. The demand from Canadian students is consistently about ten times larger than the number of available spots, further evidence for the demand for training in research computing.