Shodor

a national resource for computational science education

HOME BWPEP Shodor Blue Waters

BLAST-ing in Parallel: Enabling an Essential Computational Tool to Keep Pace with the Explosive Growth in Biological Sequence Data

By Jeffrey D. Krause at Shodor, Durham, North Carolina
and Michael Ly at the University of Illinois, Champaign-Urbana, Illinois

With Contributions from Aaron Weeden and Jennifer Houchins
Shodor, Durham, North Carolina

This module explores the inner workings of the BLAST similarity search tool, considering the algorithm and the impact of various search conditions and settings on performance. Various approaches to parallelizing the computation and their performance impacts are considered. Benchmarking of the mpiBLAST parallel code is carried out at different scales.

Three different versions of BLAST are run in this module:

  • Server-based BLAST at the National Center for Biotechnology Information (NCBI).
  • Stand-alone BLAST will be downloaded, built and bench-marked under various conditions.
  • Finally, mpiBLAST will likewise built and bench-marked under various conditions and scales.

This module:

  • Introduces the problem of biological sequence alignment
  • Develops a basic brute-force algorithm for optimal alignment with implementations in various interpreted and compiled programming languages
  • Considers algorithmic complexity and scaling
  • Develops the ideas underlying the BLAST approach for similarity searches of large sequence databases
  • Describes various parallelization approaches to speeding the BLAST computations and evaluates the performance gains realized under various conditions
  • Outlines a student activity to download, build and run MPI-BLAST with various query and database sequences to evaluate the scaling performance of this code

The module documents can be downloaded below.

Resources:

Parallel_BLAST.doc : MS Word document describing: 1) the biology of sequence similarity, 2) the basic BLAST algorithm, 3) an activity to characterize the performance of NCBI's server-based BLAST, 4) an activity to build and benchmark of NCBI's stand-alone BLAST, and 5) an activity to build, benchmark and scale mpiBLAST.

Parallel BLAST (PDF) : The module document in PDF format.