Computing Challenges for Modeling and Simulating Macromolecular Assemblies

Presenter: Edward C. Uberbacher
Other Authors: Philip LoCascio, Sergey Passovets, Andrey Gorin, Pratul Agarwal
ORNL

Both NIH, through its structural genomics initiative, and DOE, in its Genomics:GTL Program, have recognized that a major next step in understanding and utilizing complex biological systems is a capability to rapidly model and simulate the dynamics of large assemblies of macromolecules. Molecular machines are the basis for life’s chemistry. They malfunction in human disease, can be used to produce energy in microbes and from biomass, and provide the possibility of innovative new chemistries at the interface of nanoscience and biology. Computational tools to characterize, model and simulate these machines will revolutionize biomedical research and biotechnology in the U.S. The ORNL Computational Biology Program has is focusing on the key steps necessary to build and simulate large molecular machines: (1) Computing Accurate Building Blocks: Building accurate models of molecular machines relies on a capability to build accurate component protein or nucleic acid structures. Major initiatives such as the NIH Structural Genomics Program are investing huge sums with the assumption that this problem can be solved computationally in a timely way. While approximate structures can be obtained using existing infrastructure with techniques such as homology modeling, creating accurate computational models of protein and nucleic acids is generally beyond current capabilities. We are developing conformational search methods, constrained by multiple homology examples, which can greatly improve the accuracy of derived protein models. This involves conformational searches in the neighborhood of the approximate structure that are larger than current problems and which will rely on large shared memory spaces and massively parallel and vectorized algorithms. (2) Putting the pieces together: Given relatively accurate starting models, the components must be docked properly to create models for the molecular assembly. Flexible docking searches, which allow molecules to flex as they come together, are generally to expensive to be used. We are focused on improved algorithms, the use of other knowledge to reduce the complexity of the docking search, and large shared memory conformational searches. (3) Seeing how they work: Once models are built, simulating complex molecular machines at biologically meaningful time scales requires thoughtful problem design and current and next generation capability computers. For example, rational engineering to improve cellulases, which are key to biomass energy initiatives, will require a very detailed understanding of molecular mechanisms of catalysis and processivity. Molecular dynamics simulations of cellulases are estimated to utilize 10’s of millions of node hours for each of the individual steps in the catalytic process. Codes which scale sufficiently are being developed.