a national resource for computational science education

HOME UPEP Shodor Blue Waters

HPC on a Single Thread

By Henry Neeman
The OU Supercomputing Center for Education and Research ( OSCER)

This module is the first in a trilogy. The sequel is "Multithreading and Multiprocessing", and "Techniques and Technologies" concludes the series. These three modules comprised much of the core material for the 2010 Blue Waters Undergraduate Petascale Institute. It is intended that these materials may be readily adapted and adopted by undergraduate faculty to serve as the core content for an undergraduate course on scientific parallel computing. These materials were, in turn, adapted from the "Supercomputing in Plain English" materials originally developed at OSCER for campus and regional education, outreach and training.

Links to the module resources follow the content description below.

What the heck is supercomputing? Who uses supercomputing, how, and why? How does supercomputing work? What does the explosive growth of computing power mean for students, faculty and professionals? This module consists of several submodules, exploring the following topics:

* Overview: What the Heck is Supercomputing?

This submodule provides a broad overview of High Performance Computing (HPC), and is split into several sections: what is supercomputing?; the fundamental issues of supercomputing (i.e., storage hierarchy and parallelism); a quick hardware primer; introduction to storage hierarchy issues; introduction to parallelism via an analogy (multiple people putting together a jigsaw puzzle); an introduction to Moore's Law; the motivation for using HPC.

* The Tyranny of the Storage Hierarchy

Fundamentally, just two issues govern performance: the storage hierarchy, and parallelism. This submodule focuses on the implications of a fundamental reality: fast implies expensive implies small, and slow implies cheap implies large. Topics include: what is the storage hierarchy?; registers; cache; RAM; the relationship between cache and RAM; cache hits and misses; cache lines; cache mapping strategies (direct, fully associative, set associative); cache conflicts; write-through vs. write-back; locality; tiling; hard disk; I/O strategies; virtual memory. A key point in the broader context of HEC is that parallel performance can be difficult to predict or understand without a thorough grounding in the storage hierarchy.

* Instruction Level Parallelism

This submodule is an in-depth introduction to parallelism, and is a relatively gentle way to ease into understanding how parallel computing behaves in practice. Topics include: recap of parallelism; what is Instruction Level Parallelism (ILP)?; kinds of ILP; instructions and cycles; scalar (non-ILP) operation; independence, dependence and order; superscalar; loops; pipelining; loop performance; pipeline inhibitors; superpipelining; vectors.

* Stupid Compiler Tricks

Although this submodule is ostensibly about high performance compilers, the first section is actually an introduction to dependencies, to reinforce thinking about parallelism and its implications. Topics include: dependency analysis (control vs. data dependencies, branch dependencies, loop carried dependencies, call dependencies, I/O dependencies, reductions, data dependencies, output dependencies, loop carried dependencies); tricks compilers play (scalar optimizations such as copy propagation, constant folding, dead code removal, strength reduction, common subexpression elimination, variable renaming; loop optimizations such as hoisting loop invariant code, unswitching, iteration peeling, index set splitting, loop interchange, unrolling, fusion, fission; inlining); tricks to play with compilers (command line options for optimization; profiling).


Presentation #1: What is Supercomputing? : Overview presentation in MS PowerPoint (.ppt) format.

Exercise #1: Introduction to Batch : Exercise in MS Word (.doc) format.

Presentation #2: Tyranny of the Storage Hierarchy : Presentation in MS PowerPoint (.ppt) format.

Exercise #2: Tiling : Exercise in MS Word (.doc) format.

Presentation #3: Instruction Level Parallelism : Presentation in MS PowerPoint (.ppt) format.

Exercise #3: Arithmetic Operations : Exercise in MS Word (.doc) format.

Presentation #4: Stupid Compiler Tricks : Presentation in MS PowerPoint (.ppt) format.

Exercise #4: Loop Carried Dependencies : Exercise in MS Word (.doc) format.

Source Code: Introduction to Unix : Zip archive containing source codes for Exercise 01: Introduction to Unix.

Source Code: Tiling : Zip archive containing source codes for Exercise 02: Tiling

Source Code: Arithmetic Operations : Zip archive containing source codes for Exercise 03: Arithmetic Operations.

Source Code: Loop Carried Dependencies : Zip archive containing source codes for Exercise 04: Loop Carried Dependencies.

Intro to Bash :

Loop Carried Dependencies :

Arithmetic Operations :

Tiling :

Related Resources:

Parallelization: Conway's Game of Life : Simulates the evolution of simple and complex forms of lives based on simple rules.

  • Serial Version
  • Shared Memory Version (OpenMP)
  • Distributed Memory Version (MPI)

Parallelization: Area Under a Curve : Calculus integration program that find the area under a curve. Perfect to teach the basics of OpenMP and MPI.

  • Serial Version
  • Shared Memory Version (OpenMP)
  • Distributed Memory Version (MPI)

Introduction to OpenMP : Guided lesson to teach undergraduate and graduate students how to use OpenMP.