By Henry Neeman
The OU Supercomputing Center for Education and Research (
OSCER)
This module is the first in a trilogy. The sequel is
"Multithreading and Multiprocessing", and
"Techniques and Technologies" concludes the series. These three modules comprised much of the core material for the 2010
Blue Waters Undergraduate Petascale Institute. It is intended that these materials may be
readily adapted and adopted by undergraduate faculty to serve as the core content for an
undergraduate course on scientific parallel computing. These materials were, in turn, adapted
from the
"Supercomputing in Plain English" materials originally developed at OSCER for campus and regional education, outreach and
training.
Links to the module resources follow the content description below.
What the heck is supercomputing? Who uses supercomputing, how, and why? How does
supercomputing work? What does the explosive growth of computing power mean for students,
faculty and professionals? This module consists of several submodules, exploring the following
topics:
* Overview: What the Heck is Supercomputing?
This submodule provides a broad overview of High Performance Computing (HPC), and is split
into several sections: what is supercomputing?; the fundamental issues of supercomputing
(i.e., storage hierarchy and parallelism); a quick hardware primer; introduction to storage
hierarchy issues; introduction to parallelism via an analogy (multiple people putting together
a jigsaw puzzle); an introduction to Moore's Law; the motivation for using HPC.
* The Tyranny of the Storage Hierarchy
Fundamentally, just two issues govern performance: the storage hierarchy, and parallelism.
This submodule focuses on the implications of a fundamental reality: fast implies expensive
implies small, and slow implies cheap implies large. Topics include: what is the storage
hierarchy?; registers; cache; RAM; the relationship between cache and RAM; cache hits and
misses; cache lines; cache mapping strategies (direct, fully associative, set associative);
cache conflicts; write-through vs. write-back; locality; tiling; hard disk; I/O strategies;
virtual memory. A key point in the broader context of HEC is that parallel performance can be
difficult to predict or understand without a thorough grounding in the storage hierarchy.
* Instruction Level Parallelism
This submodule is an in-depth introduction to parallelism, and is a relatively gentle way to
ease into understanding how parallel computing behaves in practice. Topics include: recap of
parallelism; what is Instruction Level Parallelism (ILP)?; kinds of ILP; instructions and
cycles; scalar (non-ILP) operation; independence, dependence and order; superscalar; loops;
pipelining; loop performance; pipeline inhibitors; superpipelining; vectors.
* Stupid Compiler Tricks
Although this submodule is ostensibly about high performance compilers, the first section is
actually an introduction to dependencies, to reinforce thinking about parallelism and its
implications. Topics include: dependency analysis (control vs. data dependencies, branch
dependencies, loop carried dependencies, call dependencies, I/O dependencies, reductions, data
dependencies, output dependencies, loop carried dependencies); tricks compilers play (scalar
optimizations such as copy propagation, constant folding, dead code removal, strength
reduction, common subexpression elimination, variable renaming; loop optimizations such as
hoisting loop invariant code, unswitching, iteration peeling, index set splitting, loop
interchange, unrolling, fusion, fission; inlining); tricks to play with compilers (command
line options for optimization; profiling).
Presentation #1: What is Supercomputing? : Overview presentation in MS PowerPoint (.ppt) format.
Exercise #1: Introduction to Batch : Exercise in MS Word (.doc) format.
Presentation #2: Tyranny of the Storage Hierarchy : Presentation in MS PowerPoint (.ppt) format.
Exercise #2: Tiling : Exercise in MS Word (.doc) format.
Presentation #3: Instruction Level Parallelism : Presentation in MS PowerPoint (.ppt) format.
Exercise #3: Arithmetic Operations : Exercise in MS Word (.doc) format.
Presentation #4: Stupid Compiler Tricks : Presentation in MS PowerPoint (.ppt) format.
Exercise #4: Loop Carried Dependencies : Exercise in MS Word (.doc) format.
Source Code: Introduction to Unix : Zip archive containing source codes for Exercise 01: Introduction to Unix.
Source Code: Tiling : Zip archive containing source codes for Exercise 02: Tiling
Source Code: Arithmetic Operations : Zip archive containing source codes for Exercise 03: Arithmetic Operations.
Source Code: Loop Carried Dependencies : Zip archive containing source codes for Exercise 04: Loop Carried Dependencies.
Intro to Bash :
Loop Carried Dependencies :
Arithmetic Operations :
Tiling :
Parallelization: Conway's Game of Life : Simulates the evolution of simple and complex forms of lives based on simple rules.
Parallelization: Area Under a Curve : Calculus integration program that find the area under a curve. Perfect to teach the basics of OpenMP and MPI.
Introduction to OpenMP : Guided lesson to teach undergraduate and graduate students how to use OpenMP.