OpenMP Introduction

Thread v. Process

A process is created by the OS to execute a program and given resources (e.g., memory, registers); different processes do not inherently share their memory with another. A thread is a subset of a process, and it shares the resources of its parent process. Multiple threads of a process will have access to the same memory (but will have local variables).

Getting to know OpenMP

OpenMP is an API built for shared-memory parallelism. This is usually realized by multi-threading. The OpenMP API is comprised of three distinct components: compiler directives, runtime library routines, and environment variables.

Code to get

Copy code into your workspace:
cp -r ~instr006/BW_Institute/openmp-intro ~

Let's look at some code. To open the first example, use vim:
vim BW_Institute/openmp-intro/examples/hello-parallel.c
When it's time to compile, this will suffice:
cc -o a.out hello-parallel.c
When it's time to run:
./a.out

Code examples style (in this page)

example-shell-command
Example line of code (in the file, not the shell)
Comments are also interweaved
  1. parallel compiler directive

    USAGE:
    #pragma omp parallel [options] {
        Code inside here runs in parallel
        Among options available: declaring private/shared vars
    }
    EXAMPLE: #pragma omp parallel private(var1, var2) shared(var3) {

    The parallel pragma starts a parallel block. It creates a team of N threads (where N is determined at runtime), all of which execute the next statement or block (a block of code requires a {…} enclosure). After the statement, the threads join back into one.

  2. OMP_NUM_THREADS environment variable

    USAGE: OMP_NUM_THREADS=number
    EXAMPLE: export OMP_NUM_THREADS=16 Sets the value in the bash shell

    This environment variable tells the library how many threads that can be used in running the program. If dynamic adjustment of the number of threads is enabled, this number is the maximum number of threads that can be used, else, it is the exact number of threads that will be used.

    The default value is the number of online processors on the machine; on Blue Waters, it seems to be 1, no matter your allocation.

  3. omp_get_thread_num function

    EXAMPLE: uid = omp_get_thread_num()

    This function asks the thread that is executing it to identify itself by returning it's unique number. [answers "Who am I?"]

    What would this return if called outside of a parallel block?

  4. omp_get_num_threads function

    EXAMPLE: threadCount = omp_get_num_threads()

    This function returns the number of threads in the team currently executing the parallel block from which it is called. [answers "How many of us?"]

  5. More code to play with

    To open the second example, use vim:
    vim BW_Institute/openmp-intro/examples/hello-parallel-for.c
    When it's time to compile, this will suffice:
    cc -o b.out hello-parallel-for.c
    When it's time to run:
    ./b.out

  6. for compiler directive

    USAGE:
    #pragma omp for [clause]{
    	for loop
    }

    EXAMPLE:
    #pragma omp parallel  
    {
    	#pragma omp for 
    		for (i=0; i < N; i++){

    This directive has the iterations of the upcoming loop be executed in parallel by the team of threads. The iterations (or chunks of them) can be assigned in a 'round-robin' style before any iteration is processed, or they can be assigned first-come, first-serve.

    Does the code in one iteration of the loop need to be independent of all other iterations? Why/not?

    This assumes a parallel region has already been initiated, otherwise it executes in serial on a single processor. The loop iteration variable is private in scope throughout the loop execution.

    "for" is the keyword in C/C++, but in FORTRAN, this would be "do".


Exercise

Go to code in your workspace:
cd ~/openmp-intro

Team up with someone you haven't worked with before. Modify this to be parallel. You decide where would be reasonable to do so. Compile it, run it, show an instructor your work, explaining your thought process.

These are commands to compile it. It compiles and runs now, but we want to parallelize it.
cc -c -fPIC fire.c -o sharedFire.o
cc -I. -L. sharedFire.o main.c -o sharedFire

This runs the code, in which x is the probability of neighbors catching fire, 0..1, and t is the number of trees on each side of the square forest, 2..80:

./sharedFire -p x -n t