First, get a job:qsub -I -l nodes=2:ppn=32:xe -l walltime=03:00:00
Open another terminal window. In that window, copy this code into your scratch directory:
cp -r ~bplist/2015/hybrid ~/scratch
We know that OpenMP is an easy-to-implement API for shared memory processors (SMPs) and MPI is a distributed memory system that uses message passing to communicate across processes. Using them together in a hybrid reduces communication cost within a shared-memory processor and scales an application across multiple processes.
Here are aprun options which are relevant when running MPI with OpenMP:
-n
: total number of MPI tasks for the job-d
: depth, the number of OpenMP threads per MPI task (set omp_set_num_threads(x)
to same value)-N
: MPI tasks per compute node; this is an optional flag-S
: the number of MPI tasks to allocate per NUMA node. There are 4 NUMA regions/node. Threads running on the same region can improve performance.The function call omp_set_num_threads(x)
is the safest way to set the number of OpenMP threads when doing hybrid.
For instance,
aprun -n 4 -d 16 ./file.o
, or
aprun -n 4 -S 1 ./file.o
For a more thorough reference on aprun
, see Blue Waters' aprun page
As an example, we have a model of a spread of a rumor.
cd ~/scratch/hybrid
vi rumor-hybrid.c
make
As an exercise, add MPI to this OpenMP program that calculates pi (if you need to peek at one solution, see pi-key.c):
cd ~/scratch/hybrid
vi pi-openmp.c
cc -o pi-openmp.o pi-openmp.c
Another exercise is to add OpenMP to this MPI Sieve of Eratosthenes program (if you need to peek at one solution, see sieve-key.c):
cd ~/scratch/hybrid
vi sieve-mpi.c
cc -o sieve-mpi.o sieve-mpi.c
aprun -n 4 -d 16 ./sieve-mpi.o -n i
, where i
is the upper bound on the prime searchhellorank.c
is a simple program to understand the different aprun
options for OpenMP + MPI hybrid. It is from Blue Waters (a.k.a., Galen...). cc -o hellorank.o hellorank.c
to compile, aprun -n4 -S1 -d8 ./hellorank.o
is one option for running.