SciNet News, April 2012

SYSTEM NEWS
  • SciNet is pleased to announce that it will host and run a new Blue Gene/Q system for the newly-announced Southern Ontario Smart Computing and Innovation Partnership (SOSCIP). The system will consist of at least 40,000 cores with a peak theoretical speed exceeding 500 TFlops and should be operational in the fall of this year. This will be the first Blue Gene system in Canada. For more information see the University of Toronto release and elsewhere on this site.Renovations needed to expand the SciNet machine room will be ongoing for the next 3-4 months and will necessitate some shutdowns for
    electrical and plumbing work. Great care is being taken to minimize the interruptions to SciNet researchers but at this stage we anticipate two shutdowns of 1-2 days each.The first shutdown associated with the machine room expansion will take place next week on Wed 18 April. We were already planning a shutdown in order to make changes to GPC networking and we will now
    combine those into a single, but longer shutdown. Details will be sent in a separate email but we expect systems will go down on the morning of 18 April and come back in the evening of 19 April.
  • GPC: In the last two weeks, there was a problem which caused quite a lot of multinode jobs to crash and caused file system issues. It
    took a lot of hard work to get the gpc in a stable state because the issue was caused by the unique interplay of the particular
    configurations of the gpc (involving hardware and software issues).  A lot of mainly smaller jobs still ran successfully, so rather than
    shutting down the gpc for an undetermined time, solutions were rolled out by rebooting nodes after jobs ended. This improved things for some users, while in a few other cases, we were able to provide a temporary work-around.Currently, we have found what seems to be a stable configuration for the gpc, and expect users to be able to run jobs as usual. However, if you encounter any further trouble, please contact us.
  • GPC: Due to the network changes we are making to the GigE nodes, if you run multinode ethernet MPI jobs, you still need to explicitly request the ethernet interface in your mpirun command:For Openmpi: mpirun –mca btl self,sm,tcp
    For IntelMPI: mpirun -env I_MPI_FABRICS shm:tcpThere is no need to do this if you run on IB, or if you run single node mpi jobs on the ethernet (GigE) nodes. Please check the wiki
    page on ‘GPC MPI Versions’ for more details. We expect these changes to be finished by the end of the month.
EVENTS COMING UP

Unless stated otherwise, all events take place at the SciNet Headquarters, Rm 235 of 256 McCaul Street, Toronto. All events below are free but we ask that you sign up on the courses website: https://support.scinet.utoronto.ca/courses.

  • Apr 16/24: DEADLINES COMPUTE CANADA SPECIAL RESOURCE ALLOCATION CALLEvery fall, the Compute Canada Resource Allocation Committee issues
    an annual Call for Proposals for allocations of its distributed
    computing and data storage systems. This spring, two specialized
    Calls for Proposals requests are being issued outside of the regular
    cycle:

    1. Large Shared Memory System: Deadline: Monday, April 16, 2012 at 3:00 pm Eastern
    2. Humanities and Social Sciences Researchers: Deadline: Tuesday, April 24, 2012 at 3:00 pm Eastern

    See https://computecanada.org for details.

  • Wed Apr 11, noon: SCINET USER GROUP (SNUG) MEETINGThe SciNet Users Group (SNUG) meetings are every month on the second Wednesday, and involve pizza, user discussion, feedback, and one or two short talks on topics or technologies of interest to the SciNet community.This time, we will have:
    • TechTalk by Scott Northrup (SciNet) on “Infiniband on the GPC”
    • User discussion
    • Pizza!

    Sign up at https://support.scinet.utoronto.ca/courses/?q=node/49

  • Thu Apr 12: SCHEDULED SHUTDOWN OF THE TCSThe TCS will be shutdown Thursday, 12 April in order to do some maintenance on the InfiniBand network. The system will go down at 11 am and should be available again by 5 pm.
  • Wed Apr 18-19: SCHEDULED SHUTDOWN OF ALL SCINET SYSTEMSThis is the first shutdown needed to expand the capacity of the SciNet datacentre for the coming Blue Gene/Q system. We expect systems will go down on the morning of 18 April and come back in the evening of 19 April.
  • Mon Apr 23: INTRODUCTION TO SCIENTIFIC C++This is a one-day course that will introduce you to the various features of C++ with a focus on those that are useful for scientific software development. We will take the C-to-C++ route, so familiarity with C, in particular with pointers, is a prerequisite.We will cover:
    • a basic refresher of C;
    • the nice features of C++ (“a better C”);
    • object oriented programming (classes, inheritance, …);
    • very basic generic programming with templates;
    • a discussion of some useful libraries out there.

    Sign up at our courses website.

  • Wed May 9, 10:30 am – 12:00 am: INTRO TO SCINETLearn what SciNet resources are available, how to recompile your code and how to use the batch system, in approximately 90 minutes.Intended for new users, but experienced users may still pick up some valuable pointers.

    Sign up at our courses website.

    Note that attendants to the intro may be interested in the immediately following event:

  • Wed May 9, 12:00 am: MAY SNUG MEETINGWe are still looking for users willing to giving a short talk (20-30 minutes) at the May SNUG about interesting work that they did on SciNet clusters and how they did it! If you are up for it, email support@scinet.utoronto.ca.More info on future SNUGs and sign-up at https://support.scinet.utoronto.ca/courses/?q=node/51
  • May 14-18: SCICOMPSciNet will host the annual meeting of ScicomP, the IBM HPC Systems Scientific Computing User Group. This meeting (which is part of the meeting of SPXXL, the user group of large IBM installations) is open to users and deals mostly with applications and science rather than just with technical aspects of the computers.This event will not be held at the SciNet Headquarters. For more information on this event, its schedule, location and registration, go to http://spscicomp.org/scicomp2012 .
ADDED TO THE WIKI IN MARCH

All new wiki content below is listed and linked on the main page:

http://wiki.scinethpc.ca/wiki/index.php/SciNet_User_Support_Library#What.27s_New_On_The_Wiki)

  • Slides of the fourth lecture of High Performance Scientific Computing
  • Slides of the TechTalk on the Intel Math Kernel Library
  • Software installed on the Power 7 Linux cluster
  • A FAQ entry on how to deal with ib memory problems
WHAT ELSE HAPPENED AT SCINET IN MARCH?
  • Mar 12: SNUG meeting was held, with a TechTalk by Ramses van Zon on “The Intel Math Kernel Library”
  • Mar 26: “Introduction to the Linux Shell” was given.
  • Mar 28: “Intro to SciNet” was given.