Construct draft metabolic models based on annotated genomes.
The Build Metabolic Model App was implemented in KBase to enable users to build genome-scale metabolic models (GEMs) based on the ModelSEED Pipeline for genomes they have imported or generated with other tools in the system. This overview of the ModelSEED pipeline [1] details the steps for automated reconstruction of GEMs using the Build Metabolic Model App in KBase.
To use the Build Metabolic Model App with a genome uploaded into KBase, the genome must first be annotated or re-annotated using the RAST functional ontology ([Annotate Microbial Assembly ](https://narrative.kbase.us/#catalog/apps/RAST_SDK/annotate_contigset/release)or [Annotate Microbial Genome](https://narrative.kbase.us/#catalog/apps/RAST_SDK/reannotate_microbial_genome/release)) before users can build a draft metabolic model for an organism. This is necessary because the SEED functional annotations generated by RAST [2] are linked directly to the biochemical reactions in the [ModelSEED biochemistry database](https://github.com/ModelSEED/ModelSEEDDatabase/blob/master/Biochemistry/), which is used by KBase for metabolic modeling. It is important to note that the RAST annotation services are **not developed to annotate Eukaryotic organisms**. We have implemented different modeling strategies for modeling Plants and Fungi. For plants, users may instead use the [Annotate Plant Enzymes with OrthoFinder App](https://narrative.kbase.us/#catalog/apps/kb_orthofinder/annotate_plant_transcripts/release). For fungi, please refer to the stand alone [Build Fungal Model App](https://narrative.kbase.us/#catalog/apps/kb_fungalmodeling/built_fungal_model/release) for details.
Once a genome has been annotated using the RAST functional ontology, it can be fed into the pipeline for preliminary reconstruction, wherein the RAST annotations are used to generate draft metabolic models. Draft metabolic models are comprised of a reaction network complete with gene-protein-reaction (GPR) associations, predicted Gibbs free energy of reaction values, and the biomass reaction. The biomass reaction includes non-universal cofactors, lipids, and cell wall components. The biomass reaction is organism-specific, based on a biomass reaction template, which uses the SEED subsystems and RAST functional annotations to assign non-universal (e.g., cofactors, cell wall components) biomass components that represent unique biological functions exhibited by a large set of organisms or specific to a small set of organisms.
In order for an organism-specific biomass component to be added to the biomass reaction, its genome must contain the proper subsystems and annotations specified in the template. The GPR associations represent the mapping between the biochemical reactions and the standardized functional roles assigned to genes during the RAST annotation. This mapping allows the pipeline to differentiate between cases where protein products from multiple genes form a complex to catalyze a reaction, and cases where protein products from multiple genes can independently catalyze the same reaction. The draft model includes all reactions associated with one or more enzymes encoded in the genome that are identified in the annotations. Additionally, spontaneous reactions are added during this step. All templates used for model reconstruction can be found in [Github](https://github.com/ModelSEED/ModelSEEDTemplates).
With the 2.0.0 release the ATP production was improved in the model reconstruction procedure by constructing core models, testing for proper ATP production from this core, then ensuring that ATP production does not incorrectly explode when expanding the core model to a genome-scale model. We similarly improved our gapfilling approach to ensure that gapfilling does not cause a model to start over-producing ATP. Model reconstruction using the classic pipeline without the new ATP production method is still available and can be used by turning ON the advanced parameter classic mode (OFF by default).
Gapfilling is the process by which the App identifies the minimal set of biochemical reactions to add to a draft metabolic model to enable it to produce biomass in a specified media. This step is optional, but it is recommended and runs by default. A radio box in the advanced options of the Build Metabolic Model App can be unchecked to allow model reconstruction without gapfilling. Gapfilling can be done later if desired. To gapfill the draft metabolic model or to perform additional gapfilling analysis please see the [Gapfill Metabolic Model App](https://narrative.kbase.us/#appcatalog/app/fba_tools/gapfill_metabolic_model/release).
The quality of a draft metabolic model depends on the completeness of the annotated genome used for the preliminary reconstruction. Due to the fact that most genomes are not completely annotated, draft metabolic models usually contain gaps preventing the production of some biomass components. In this step, an optimization algorithm identifies the minimal set of reactions that must be added to each model to fill these gaps [3, 4]. The gapfilling algorithm is described in detail in the [Gapfill Metabolic Model App page](https://narrative.kbase.us/#appcatalog/app/fba_tools/gapfill_metabolic_model/release). Reactions to be used by gapfilling are selected from the [ModelSEED biochemistry database](https://github.com/ModelSEED/ModelSEEDDatabase/tree/master/Biochemistry). This curated database contains mass and charge balanced reactions, standardized to aqueous conditions at neutral pH. The ModelSEED reaction database integrates biochemistry contained in KEGG, MetaCyc, EcoCyc, Plant BioCyc, Plant Metabolic Networks, and Gramene. This step is conducted to ensure that every model is capable of simulating cell growth.
Once model reconstruction is complete, Flux Balance Analysis (FBA) can be applied to assess the capacity of reactions to carry flux and reaction essentiality.
For additional information about metabolic modeling, visit the Metabolic Modeling in KBase FAQ.
Team members who developed & deployed algorithm in KBase: Chris Henry, Janaka Edirisinghe, Sam Seaver, Jos P. Faria and Neal Conrad. For questions, please [contact us](https://www.kbase.us/support/).
Related Publications
- [1] Henry CS, DeJongh M, Best AA, Frybarger PM, Linsay B, Stevens RL. High-throughput generation, optimization and analysis of genome-scale metabolic models. Nat Biotechnol. 2010;28: 977 982. doi:10.1038/nbt.1672 , https://www.ncbi.nlm.nih.gov/pubmed/20802497
- [2] Overbeek R, Olson R, Pusch GD, Olsen GJ, Davis JJ, Disz T, et al. The SEED and the Rapid Annotation of microbial genomes using Subsystems Technology (RAST). Nucleic Acids Res. 2014;42: D206 D214. doi:10.1093/nar/gkt1226 , https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3965101/
- [3] Latendresse M. Efficiently gap-filling reaction networks. BMC Bioinformatics. 2014;15: 225. doi:10.1186/1471-2105-15-225 , https://bmcbioinformatics.biomedcentral.com/articles/10.1186/1471-2105-15-225
- [4] Dreyfuss JM, Zucker JD, Hood HM, Ocasio LR, Sachs MS, Galagan JE. Reconstruction and Validation of a Genome-Scale Metabolic Model for the Filamentous Fungus Neurospora crassa Using FARM. PLOS Computational Biology. 2013;9: e1003126. doi:10.1371/journal.pcbi.1003126 , https://journals.plos.org/ploscompbiol/article?id=10.1371/journal.pcbi.1003126
- [5] Mahadevan R, Schilling CH. The effects of alternate optimal solutions in constraint-based genome-scale metabolic models. Metab Eng. 2003;5: 264 276. , https://www.ncbi.nlm.nih.gov/pubmed/14642354
App Specification:
https://github.com/cshenry/fba_tools/tree/b083384ac00d4f9d7cb796a664ee3ffd017cf248/ui/narrative/methods/build_multiple_metabolic_modelsModule Commit: b083384ac00d4f9d7cb796a664ee3ffd017cf248