CSERD


  • userhome
  • catalog
  • resources
  • help

Parallel Speedup Tutorial


Shodor > CSERD > Resources > Tutorials > Parallel Speedup Tutorial

  


Speedup (and efficiency)

"You're either workin', or talkin'."

Definition of Speedup

The speedup of a parallel code is how much faster it runs in parallel. If the time it takes to run a code on 1 processors is T1 and the time it takes to run the same code on N processors is TN, then the speedup is given by

S = T1 / TN.

This can depend on many things, but primarily depends on the ratio of the amount of time your code spends communicating to the amount of time it spends computing.

Definition of Efficiency

Efficiency is a measure of how much of your available processing power is being used. The simplest way to think of it is as the speedup per processor. This is equivalent to defining efficiency as the time to run N models on N processors to the time to run 1 model on 1 processor.

E = S/N = T1 / (N TN)

This gives a more accurate measure of the true efficiency of a parallel program than CPU usage, as it takes into account redundant calculations as well as idle time.

Factors that affect speedup

The primary issue with speedup is the communication to computation ratio. To get a higher speedup, you can

  • Communicate less
  • Compute more
  • Make connections faster
  • Communicate faster

The amount of time the computer requires to make a connection to another computer is referred to as its latency, and the rate at which data can be transferred is the bandwidth. Both can have an impact on the speedup of a parallel code.

Collective communication can also help speed up your code. As an example, imagine you are trying to tell a number of people about a party. One method would be to tell each person individually, another would be to tell people to "spread the word". Collective communication refers to improving communication speed by having any node with the information being sent participate in sending the information to other nodes. Not all protocols allow for collective communication, and even protocols which do may not require a vendor to implement collective communication. An example is the broadcast routine in MPI. Many vendor specific versions of MPI allow for broadcast routines which use a "tree" method of communications. The more common implementation found on most clusters, LAM-MPI and MPICH, simply have the sending machine contact each receiving machine in turn.


©1994-2025 Shodor