Identify the minimal set of biochemical reactions to add to a draft metabolic model to enable it to produce biomass in a specified media.

Draft metabolic models usually have missing reactions due to incomplete or incorrect functional genome annotations. As a result, these models are unable to generate biomass on media where the organism typically is capable of growing. Gapfilling algorithms can be used to overcome this problem. These algorithms tentatively bridge gaps in metabolic pathways by identifying the minimal number of biochemical reactions to add to the draft metabolic model to enable it to produce biomass in a specified media. Gapfilling is an optimization procedure that can produce multiple solutions.

Starting with a draft metabolic model, generated by the Build Metabolic Model app or imported , we can apply the Gapfill Metabolic Model app to identify and fill all the gaps in the metabolic pathways of our models that might prevent the production of biomass for the organism or community. This is achieved by two alternative ways: (i) relaxing reversibility constraints on the model s reactions; (ii) adding new reactions to the existing model. In this gapfilling process, the model is augmented to include all the more than 13,000 biochemical reactions contained in the ModelSEED [1] database (available for download from GitHub). The database of reactions consists of KEGG, MetaCyc, EcoCyc, Plant BioCyc, Plant Metabolic Networks, and Gramene.

During the gapfilling process, all reactions determined to be thermodynamically reversible [2-4] are adjusted to be reversible in the gapfilled metabolic model. Finally, flux balance analysis (FBA) [5] is performed to generate a flux profile that prioritizes the production of biomass while minimizing the flux through all reactions and reaction directions that were added in the gapfilling process. This method is consistent with previously published algorithms for gapfilling reaction networks [6, 7]. All reactions and reaction directions generated by these algorithms that were not included in the draft model and have a nonzero flux are then added to the gapfilled model. This gapfilling solution subsequently permits growth of the metabolic model in the specified media condition. To see the reactions and reaction directions added by the gapfilling process, click the Reactions tab in the output metabolic model, and sort the table by clicking the Gapfilling column title.

The detailed 2-Step gapfilling algorithm is described below:

The objective function (2.1 and 2.4) minimizes the number of reactions, which are not present in the model but should be added for biomass to be produced in those conditions. Since, in this case, there is a false negative prediction, at least one reaction will need to be added.
In the formulation, all reactions are treated as reversible, with every reversible reaction being decomposed into two reactions in each direction, one in the forward direction and the other in the backward direction. This allows for the independent addition of each direction in the algorithm. As a result of this, reactions represented in the formulation are the forward and backward components of the reactions in the database. In the objective function, ** r_{gapfilling}** represents the total number of reactions in the database; in objective function (2.1),

**is the flux through reaction**

*v*_{i}**; in objective function (2.4)**

*i***is a binary variable equal to zero if the flux through reaction**

*Z*_{i}**is zero and one otherwise; and,**

*i***is a constant value stating the energy cost associated of adding reaction to the model. If reaction**

*λ*_{gapfill,i}**is already present in the model,**

*i***is zero. Otherwise,**

*λ*_{gapfill,i}**is calculated using equation (2.8):**

*λ*_{gapfill,i}

Each of the *P* variables in equation (2.8) is binary, representing a penalty applied when adding different types of reactions to the model: they are equal to one if the penalty applies to the type of the particular reaction and equal to zero otherwise.

*P*is related to reactions not in KEGG._{KEGG,i}*P*to the addition of reactions involving metabolites with unknown structure._{structure,i}*P*to reactions for which cannot be calculated._{known-ΔG,i}*P*to reactions operating in an unfavorable direction._{unfavorable,i}

Equation (2.2 and 2.5) implements the mass balance constraints related to the steady-state assumption of FBA. Here, ** N_{reactionDB}** is the stoichiometric matrix, and

**flux vector through reaction database.**

*v*Equation (2.6) enforces the bounds on reaction fluxes (** v_{i}**), and the values of the reaction use variables (

**). This equation ensures that each reaction flux,**

*Z*_{i}**, is zero unless**

*v*_{i}**is one. The**

*Z*_{i}**term in equation (2.6) is the core to the simulation using FBA. If**

*v*_{max,i}**corresponds to a reaction associated with a knocked-out gene,**

*v*_{max,i}**is set to zero. If**

*v*_{max,i}**corresponds to the uptake of a nutrient not in the medium,**

*v*_{max,i}**is also set to zero.**

*v*_{max,i}Equation (2.3 and 2.7) constrains the biomass flux, ** v_{bio}**, to a nonzero value, to ensure growth.
The result of the gapfilling optimization includes a list of irreversible reactions from the model that should be made reversible, and a set of reactions not in the model that should be added to fix a false negative prediction.

**Team members who developed & deployed algorithm in KBase:**
Chris Henry, Janaka Edirisinghe, Sam Seaver, Neal Conrad. For questions, e-mail help@kbase.us

Related Publications

- [1] Henry, C.S., et al., High-throughput generation, optimization and analysis of genome-scale metabolic models. Nat Biotechnol, 2010. 28(9): p. 977-82. , http://www.nature.com/nbt/journal/v28/n9/full/nbt.1672.html
- [2] Henry, C.S., et al., Genome-scale thermodynamic analysis of Escherichia coli metabolism. Biophys J, 2006. 90(4): p. 1453-61. , http://www.cell.com/biophysj/abstract/S0006-3495(06)72335-9
- [3] Jankowski, M.D., et al., Group contribution method for thermodynamic analysis of complex metabolic networks. Biophys J, 2008. 95(3): p. 1487-99. , http://www.cell.com/biophysj/abstract/S0006-3495(08)70215-7
- [4] Henry, C.S., et al., iBsu1103: a new genome-scale metabolic model of Bacillus subtilis based on SEED annotations. Genome Biol, 2009. 10(6): p. R69. , https://genomebiology.biomedcentral.com/articles/10.1186/gb-2009-10-6-r69
- [5] Orth, J.D., I. Thiele, and B.O. Palsson, What is flux balance analysis? Nat Biotechnol, 2010. 28(3): p. 245-8. , http://www.nature.com/nbt/journal/v28/n3/abs/nbt.1614.html
- [6] Latendresse, M., Efficiently gap-filling reaction networks. BMC Bioinformatics, 2014. 15: p. 225. , http://bmcbioinformatics.biomedcentral.com/articles/10.1186/1471-2105-15-225
- [7] Dreyfuss, J.M., et al., Reconstruction and validation of a genome-scale metabolic model for the filamentous fungus Neurospora crassa using FARM. PLoS Comput Biol, 2013. 9(7): p. e1003126. , http://journals.plos.org/ploscompbiol/article?id=10.1371/journal.pcbi.1003126

App Specification:

https://github.com/cshenry/fba_tools/tree/584206644abfeb5f3184783aaa27b3a0993ca583/ui/narrative/methods/gapfill_metabolic_model**Module Commit: ** 584206644abfeb5f3184783aaa27b3a0993ca583