Addressing Unknown Constants and Metabolic Network Behaviors through Petascale Computing: Understanding H2 Production in Green Algae

Presenter: C.S. Chang, NREL
Authors: Christopher S. Chang, Peter Graf and Michael Seibert
National Renewable Energy Laboratory

The Genomics revolution has resulted in a massive and growing quantity of whole-genome DNA sequences, which encode the metabolic catalysts necessary for life. However, gene annotations can rarely be complete, and measurement of the kinetic constants associated with the encoded enzymes can not possibly keep pace, necessitating the use of careful modeling to explore plausible network behaviors. Key challenges are (1) the quantitative formulation of kinetic laws governing each transformation in a fixed model network, (2) characterizing the stable solution (if any) of the associated ordinary differential equations, (3) fitting the latter to metabolomics data as it becomes available, and (4) optimizing a model output against the possible space of kinetic parameters, with respect to properties such as robustness of network response, or maximum consumption/production. This SciDAC-2 project addresses this large-scale uncertainty in the genome-scale metabolic network of the water-splitting, H2-producing green alga Chlamydomonas reinhardtii. Each metabolic transformation is formulated as an irreversible steady-state process, such that the vast literature on known enzyme mechanisms may be incorporated directly. To start, glycolysis, the tricarboxylic acid cycle, and basic fermentation pathways have been encoded in Systems Biology Markup Language (SBML) with careful annotation and consistency with the KEGG database, yielding a model with 4 compartments, 85 species, 35 reactions, and 89 kinetic constants.

We have developed a system that takes as input an SBML model, and automatically produces C code that when executed optimizes the model’s kinetic parameters according artificial test criteria. The generation of this optimizer from the model consists of several steps. First, the model is parsed and converted to a system of ordinary differential equations (ODEs), including Jacobian and sensitivity matrices. These are then translated to C functions and embedded in code utilizing the ODE solver package CVODES, resulting in a library that can efficiently simulate the model, including calculating derivatives with respect to parameters. This library is in turn incorporated in code that calculates the objective functions implied by criteria (2) – (4) above, as well as their derivatives. Finally these routines are built into code using the optimization package TAO to optimize the model with respect to the kinetic parameters. We illustrate the system and present numerical results. Further development, including overlaying of a parallel multistart algorithm, will allow optimization of thousands of parameters on high-performance systems ranging from distributed grids to unified petascale architectures.