Project Title | BioInformatics Software Engineering with Mothur |
Summary | Mothur is a popular bioinformatics package for analyzing 16S microbial rRNA gene sequences. Working with faculty in computer science and biology the intern will design, implement, and test modifications to the shared memory and distributed memory portions of Mothur's analysis tools. The intern will work with C/C++, OpenMP, MPI, and the Linux/GCC toolchain on Earlham's clusters and Blue Waters. |
Job Description | Earlham College has been using Mothur for analyzing a variety of 16s samples. While doing so we have identified a number of technical issues related to the shared memory and distributed memory algorithms/implementations in Mothur's analysis tools. Some of these issues are exacerbated by large data sets of the sort used in broad metagenomic studies. The runtime performance data collected to-date indicate that improvements are possible given changes to the underlying approaches for shared memory, distributed memory, and possibly hybrid approaches. The principle outcome of this project will be a more robust and performant codebase with respect to parallelism. Another outcome of the project will be additions to Mothur's wiki detailing the technical and operational aspects of the parallelism associated with each of Mothur's tools. Development, testing and benchmarking will be done on Earlham's clusters and Blue Waters using the Linux/GCC toolchain with C/C++, OpenMP, and MPI. One goal of the structured benchmarking is to develop a guide for scientists that given the characteristics of an input data set estimates the runtime and resource requirements for a given workflow. |
Conditions/Qualifications | Familiarity with 16s microbial rRNA gene sequencing and analysis. Familiarity with C/C++, OpenMP, MPI, and the Linux/GCC toolchain. |
Start Date | 07/01/2014 |
End Date | 04/30/2015 |
Location | Charlie Peck, Cluster Computing Group, Earlham College, Richmond, Indiana |
Interns | Kristin Muterspaw
|