The collection of curricular materials designed to enable the teaching and use of parallel and high-performance computing in the undergraduate science or engineering classroom. The development and testing of these materials was funded by the Blue Waters Undergraduate Petascale Education Program (BW-UPEP).

Social Networks Module : Social Networks Module developed for the Undergraduate Petascale Education Program.

Biofilms Module : Materials for teaching the construction of a cellular automata model of microbial biofilms using Mathematica, as well as a parallel version using C with MPI.

- Overview
- The Tyranny the Storage Hierarchy
- Instruction Level Parallelism
- Compiler Tricks

Multithreading and Multiprocessing :

- Shared Memory
- Multithreading (OpenMP)
- Distributed Multiprocessing (MPI)
- Applications and Types of Parallelism
- Multicore Madness

- High Throughput Computing
- GPGPU: Number Crunching in Your Graphics Card
- Grab Bag (Scientific Libraries, I/O Libraries, Visualization)

GalaxSeeHPC : Blue Waters Undergraduate Petascale Education module introducing the n-body problem, algorithms and approaches used to simulate n-body systems, a serial implementation, and issues and approaches to parallel implementation.

Introduction to OpenMP : Guided lesson to teach undergraduate and graduate students how to use OpenMP.

Intro to Scientific Computing : Jennifer Houchins' introduction to Scientific Computing Lab.

- Strategy
- Approximations
- Error Analysis
- Computer Arithmetic

Petakit : BW UPEP Module from the Earlham College Cluster Computing Group, lead by Charlie Peck.

Stocahstic Optimization : Module covering stochastic optimization.

Parallel BLAST Module : BW-UPEP module introducing the BLAST similarity search tool, it's algorithm, and performance considerations. Serial and MPI versions of BLAST are benchmarked in this exercise.

Exercises for Multiple Modules : This directory contains student exercises and code examples that reinforce topics across multiple modules.

Scaling N-Body Simulations : Blue Waters Undergraduate Petascale Education Module exploring the computational issues involved with scaling the the size of a simulated n-body system.

Living Links: Applications of Matrix Operations to Population Studies : This module demonstrates the application of matrix operations to the modeling of populations.

Age Structured Models : Materials for teaching the application, implementation, and analysis of age-structured models in the study of populations.

Area Under a Curve : Parallelization: Area Under a Curve (BW-UPEP Module)

Game of Life : Parallelization: Conway's Game of Life (BW-UPEP Module)

Parallelization: Infectious Disease : Epidemiology is the study of infectious disease. Infectious diseases are said to be "contagious" among people if they are transmittable from one person to another. Epidemiologists can use models to assist them in predicting the behavior of infectious diseases. This module will develop a simple agent-based infectious disease model, develop a parallel algorithm based on the model, provide a coded implementation for the algorithm, and explore the scaling of the coded implementation on high performance cluster resources.

Parallelization: Sieve of Eratosthenes : This module presents the sieve of Eratosthenes, a method for finding the prime numbers below a certain integer. One can model the sieve for small integers by hand. For bigger integers, it becomes necessary to use a coded implementation. This code can be either serial (sequential) or parallel. Students will explore the various forms of parallelism (shared memory, distributed memory, and hybrid) as well as the scaling of the algorithm on multiple cores in its various forms, observing the relationship between run time of the program and number of cores devoted to the program. An assessment rubric, two exercises, and two student project ideas allow the student to consolidate her/his understanding of the material presented in the module.

Dynamic Programming with CUDA, Part I : This module provides a quick review of dynamic programming, but the student is assumed to have seen it before. The parallel programming environment is NVIDIA's CUDA environment for graphics cards (GPGPU - general purpose graphics processing units). The CUDA environment simultaneously operates with a fast shared memory and a much slower global memory, and thus has aspects of shared-memory parallel computing and distributed computing. Specifics for programming in CUDA are included where appropriate, but the reader is also referred to the NVIDIA CUDA C Programming Guide, and the CUDA API Reference Manual.

A Beginner's Guide to High-Performance Computing : To present some of the general ideas behind and basics principles of high-performance com- puting (HPC) as performed on a supercomputer. These concepts should remain valid even as the technical specification of the latest machines continually change. Although this material is aimed at HPC supercomputers, if history be a guide, present HPC hardware and software become desktop machines in less than a decade.

Dynamic Programming with CUDA, Pt II : This module is largely stand-alone. It is "Part II" only in the sense that it does not contain the overview of dynamic programming seen in Part I, and does not recapitulate the introduction to CUDA. We will continue to refer the reader to various NVIDIA references where appropriate, particularly the NVIDIA CUDA C Programming Guide, and the CUDA API Reference Manual, and where we introduce new CUDA-specific ideas, will linger a bit longer by way of introduction. The algorithms described here are completely independent of Part I, so that a reader who already has some familiarity with CUDA and dynamic programming may begin with this module with little difficulty.

Modeling an Able Invader : This module will introduce the basics of cellular automaton simulation with an application to studying the effect of fencing artificial watering points on adult cane toad invasion in Australia.

How Many People Does it Take To... : This module teaches an introduction to the Party Problem, a problem in the field of Ramsey Theory, a subfield of mathematics and performance differences of a naive solution to the Party Problem between a sequential program, an OpenMP program, and a CUDA program.

Matrix Multiplication with CUDA : This module teaches matrix multiplication in the context of enumerating paths in a graph and the basics of programming in CUDA. It emphasizes the power of using shared memory when programming on GPGPU architectures.

Parallel Numerical Simulation of Boltzmann Transport : This module teaches the basic principles of semi-classical transport simulation based on the time-dependent Boltzmann transport equation (BTE) formalism with performance considerations for parallel implementations of multi-dimensional transport simulation and the numerical methods for efficient and accurate solution of the BTE for both electronic and thermal transport using the simple finite difference discretization and the stable upwind method

Scaling in nature and in the machine : The purpose of this module is to:

- To understand some fundamental physical processes governing sand movement in rivers
- To implement a model for simulating these processes using C
- To learn how to visualize these simulations using Paraview
- To see how massive simulations can be achieved using supercomputers

Parallel Spectral Methods : This module teaches the principals of Fourier spectral methods, their utility in solving partial differential equation and how to implement them in code. Performance considerations for several Fourier spectral implementations are discussed and methods for effective scaling on parallel computers are explained.

Parallelization: Binary Tree Traversal : This module teaches the use of binary trees to sort through large data sets, different traversal methods for binary trees, including parallel methods, and how to scale a binary tree traversal on multiple compute cores.

Suffix Trees: How to do Google search in bioinformatics? : This module will:

- Introduce the suffix tree data structure and its many applications in string matching and bioinformatics
- Describe how suffix trees are built on a serial computer
- Discuss the challenges associated with building the tree in parallel
- Explain one application in bioinformatics (pattern matching) that uses suffix tree
- Develop a method to implement pattern matching on a distributed memory parallel computer
- Describe how to analyze parallel performance and identify improvements

Probable Cause: Modeling with Markov Chains : Markov Chains have numerous applications in biology from ecology to bioinformatics. This module will explore some of these applications along with the need for high performance computing in solving some of the problems.

Scientific Visualization with CUDA : This module give a basic introduction to the CUDA architecture and programming model, OpenGL for 3D graphics, and the interoperability between the two for interactive, high performance scientific visualization.

Introduction to GPU programming using CUDA : This module provides an introduction to the GPU architecture and the CUDA development environment, instructions on interfacing with the GPU hardware, and an emphasis on debugging C and CUDA codes with cuda-gdb.

Learning Automated Performance Analysis using PetaKit and the Bootable Cluster CD : This module teaches a method for evaluating the scalability of parallel programs. It includes a software toolchain, PetaKit, for automating the collection of performance data both for single runs and for parameter sweeps illustrating both strong and weak scaling.

Standalone BLAST Link : Download link for standalone BLAST, produces errors in Standard XML.

Parallelization: Conway's Game of Life : Simulates the evolution of simple and complex forms of lives based on simple rules.

- Serial Version
- Shared Memory Version (OpenMP)
- Distributed Memory Version (MPI)

Parallelization: Area Under a Curve : Calculus integration program that find the area under a curve. Perfect to teach the basics of OpenMP and MPI.

- Serial Version
- Shared Memory Version (OpenMP)
- Distributed Memory Version (MPI)