Project Title | Topological Analysis for Plant Phenomics Data |
Summary | The undergraduate intern for this project will contribute to the modeling and building of new algebraic topological models and methods for analyzing large-scale plant phenomics data. The scientific objective is to computationally discover novel genomic markers for specific phenotypic performance measures in maize and wheat. The research component will entail algorithm and code development in a heterogeneous parallel environment (MPI, OpenMP, GPU/CUDA), and participating in interdisciplinary collaboration with biologists and mathematician, and engaging in scholarly activities. The educational outcome will entail curriculum development to support a simplified topological data analysis on plant phenomics data. |
Job Description | This is an interdisciplinary project at the intersection of parallel computing, computational mathematics and computational biology. The goal of the project is to develop a new parallel tool for topological data analytics â an emerging technique in computational mathematics â and apply it to plant phenomics data (genotypes and phenotypes). Preliminary results have demonstrated the high potential for TDA methods for other problems in bioinformatics. This project will enable the application of TDA to a new problem in bioinformatics. Given the massive volumes of phenomics data, parallelization is essential. The project will involve: problem modeling from an application context, parallel algorithm development (we have a serial prototype already developed in the lab) and mapping to different architectures (distributed memory cluster with multicore nodes and a GPU card within each node), and testing and evaluation of results. The student will program in C/C++ and MPI, OpenMP and also some parts in CUDA. The student will collaborate with a plant biologist and a mathematician. We plan to test our code on real world data sets acquired from maize/corn genomes, which contain 13M SNPs for over 6000 crop individuals (at the time of this writing). We will also develop an undergraduate curriculum module introducing our parallel implementation of TDA for plant biology. The expected outcomes of the project include: i) a new parallel TDA tool; ii) research papers publishing findings with the undergraduate student as one of the authors (lead or co-author); and iii) an undergraduate curriculum module that introduces TDA for beginners, along with an visual/interactive tool to analyze plant data sets (test cases will be included); The curriculum module will be submitted for review in the Journal of Computational Science Education by the end of the internship project. More importantly, the project will provide an opportunity for the undergraduate student to participate and contribute to an active interdisciplinary research project and in the process learn about the important role of high performance and scalable computing in modern day scientific discovery. |
Conditions/Qualifications | Programming languages: C/C++ with an inclination to learn basic MPI, OpenMP and CUDA programming. Familiarity with Unix based systems. |
Start Date | 05/16/2015 |
End Date | 05/31/2016 |
Location | School of Electrical Engineering and Computer Science, Washington State University, Pullman, WA 99164 |
Interns | Ritche Long
|