By Henry Neeman
The OU Supercomputing Center for Education and Research (
OSCER)
This is the second module in a trilogy, which begins with
"HPC on a Single Thread", and concludes with
"Techniques and Technologies". These three modules comprised much of the core material for the 2010 Blue Waters
Undergraduate Petascale Institute. It is intended that these materials may be readily adapted
and adopted by undergraduate faculty to serve as the core content for an undergraduate course
on scientific parallel computing. These materials were, in turn, adapted from the
"Supercomputing in Plain English" materials originally developed at OSCER for campus and regional education, outreach and
training.
Links to the module resources follow the content description below.
* Shared Memory Multithreading
This submodule is an introduction to using multiple, independent flows of execution. Topics
include: parallelism basics (definition, threads vs. processes, Amdahl's Law, speedup,
scalability, granularity, parallel overhead); recap of the jigsaw puzzle analogy from the
Overview submodule; the fork/join model; OpenMP (compiler directives, hello world, parallel
do/for, chunks, private vs. shared data, static vs. dynamic vs. guided scheduling,
synchronization, barriers, critical sections, race conditions explained via the analogy of
"The Pen Games," reductions, how to parallelize a serial code).
* Distributed Multiprocessing
By the time the participants reach this submodule, they have a fairly good grasp of how to
think about parallelism, but no experience with distributed parallelism or multiprocessing.
Topics include: an analogy for understanding distributed parallelism (desert islands), which
covers distributed operation, communication, message passing, independence, privacy, latency
vs. bandwidth; recap of parallelism issues; parallel strategies (client-server, task
parallelism, data parallelism, pipelining); MPI (structure of MPI calls, MPI program
structure, Single Program/Multiple Data strategy, hello world, MPI runs, compiling for MPI,
rank, determinism vs. indeterminism, MPI data types, tags, communicators, broadcasting,
reductions, non-blocking vs. blocking communication, communication hiding).
* Applications and Types of Parallelism
This submodule focuses on various kinds of parallelism, motivated by example application
types. Topics include: Monte Carlo simulation to illustrate client-server (the concept of
embarrassingly parallel or loosely coupled computing, Monte Carlo methods in layman.s terms,
high energy physics as a motivating example, parallelization of Monte Carlo); N-body methods
to illustrate task parallelism (N-body problems, 1-, 2- and 3-body problems, big-O notation
for non-computer scientists, spatial vs temporal complexity, force calculations, parallelizing
force calculations, data parallelism vs task parallelism, reductions, collective
communications); transport problems to illustrate data parallelism (Riemann sums, mesh
discretizations, finite difference method, Navier-Stokes equation, ghost boundary zones, data
decomposition, Cartesian geometries, use of send/receive buffers).
* Multicore Madness
The purpose of this submodule is to frighten the participants, because multicore (and, soon,
many-core) are highly disruptive technologies that will require substantial redesign of many
existing software applications and will make more difficult the design of new software. Topics
include: implications of Moore's Law; recap of the storage hierarchy, including a practical
example of the disparity between CPU speed and RAM bandwidth; recap of tiling;
multicore/many-core basics (definition, RAM challenges, interconnect challenges); weather
forecasting example (Cartesian mesh, finite difference, ghost boundaries); software strategies
for weather forecasting (tiling won.t work because of inadequate calculations per byte,
strategies for improving cache reuse, multiple subdomains per process, expanded ghost stencil
to improve both cache reuse and communication hiding, higher order numerical schemes to
increase the number of calculations per mesh zone per timestep, parallelization in Z to
improve the size of each subdomain, cache size limitations).
Presentation #5: Shared Memory Multithreading : Presentation in MS PowerPoint (.ppt) format.
Exercise #5: OpenMP : Exercise in MS Word (.doc) format.
Presentation #6: Distributed Memory Parallelism : Presentation in MS PowerPoint (.ppt) format.
Exercise #6: MPI Point to Point : Exercise in MS Word (.doc) format.
Presentation #7: Applications and Types of Parallelism : Presentation in MS PowerPoint (.ppt) format.
Exercise #7: MPI Collective Communications : Exercise in MS Word (.doc) format.
Presentation #8: Multicore Madness : Presentation in MS PowerPoint (.ppt) format.
Exercise #8: Hybrid MPI+OpenMP : Exercise in MS Word (.doc) format.
Source Code: OpenMP : Zip archive containing source codes for Exercise 05: OpenMP.
Source Code: MPI Point to Point : Zip archive containing source codes for Exercise 06: MPI Point to Point.
N Body :
Parallelization: Conway's Game of Life : Simulates the evolution of simple and complex forms of lives based on simple rules.
Parallelization: Area Under a Curve : Calculus integration program that find the area under a curve. Perfect to teach the basics of OpenMP and MPI.
Introduction to OpenMP : Guided lesson to teach undergraduate and graduate students how to use OpenMP.