Project Title | Deep Learning Enabled Structural Knowledge Discovery |
Summary | Protein structure studies pose fundamental questions that supercomputing-enabled modern machine learning methods can help to answer. Our Blue Waters Intern Team devised a novel dimension mapping method to leverage popular convolutional neural network models and Blue Waters facility with preliminary success . We propose to expand the scope and scale of our research by applying it to areas such as protein model evaluation and docking. We also propose to investigate visualization methods to help interpret results of our models so that we can better understand their biological meaning. |
Job Description | Task 0: Prepare the intern for an effective computational research career. 1A: HPC environment including job submission and performance optimization; 1B: Parallel programming skills and scientific knowledge discovery workflow, including Git, Jupyter Notebook; 3C: Basics of machine learning and deep learning including Tensorflow, Caffe, etc. Task 1: Improve upon our current architecture to develop an effective structural based knowledge discovery neural network model for molecular deep learning. 1A: study mathematical foundation on the Spacing-filling curve dimension reduction in the context of deep learning architectures. 1B: assess performance implications of training parameters of choice. Task 2: Screen KRas-ligand binding with candidates from small molecules libraries, and produce a shortlist for scrutiny. 2A: identify a set of local biological, chemical and quantum features that can be used as extra channels to improve the performance of our models. 2B: develop visualization tools to help understand and interpret training results. The interns will be involved in each and every step of this project, including data collection, coding, debugging, system administration and data analysis. The interns will also be involved in the dissemination process through writing and submitting papers to technical audiences. Last year our student interns were able to present at the PEARC 17 and SC 17. We have found the conference experience extremely valuable for students and hope we can continue that. |
Use of Blue Waters | We have evaluated options for performing machine learning on supercomputers, and concluded that a pragmatical manner is to construct a container with NVIDIA support on local workstation, then run the docker image on supercomputers. An ideal situation will be running NVIDIA docker service on the supercomputer, alas that is not supported by any major computer center due to security concerns. Two prominent alternatives are Shifter and Singularity. Singularity involves package and image conversion, which is time consuming and non-trivial. Shifter is easier with different command set but allows same format images. The Shifter service at Blue Waters was just upgraded to a new edition in December 2017, and will fit for the purpose of this project perfectly. |
Conditions/Qualifications | This intern should demonstrate essential capabilities to conduct computational research in a Linux environment. Required skills: Linux system administration, C++ and Python programming, parallel programming. In addition, the student should have excellent communication skills because we will work with biologists to better understand domain issues. Also, it's desirable to know data analysis and visualization tools such as scikit-learn and HTML5. We encourage students with background in biology, chemistry, physics and mathematics to join the team. |
Start Date | 05/31/2018 |
End Date | 05/31/2019 |
Location | Department of Computer Science Hood College of Frederick, Maryland. |
Interns | Xiange Wang
|