Propagate Model to New Genome

Translate the metabolic model of one organism to another, using a mapping of similar proteins between their genomes.

Accurate metabolic models are based on well-curated genome annotations which are labor intensive and thus highly valuable. This App permits users to construct a metabolic model of a new organism based on either a previously published or user-curated model in another organism, rather than constructing and curating it ab initio.

The quality of the propagated model will depend on how closely related the target and source genomes are because gene-to-gene function connections between the two are inferred from sequence homology. Therefore the App has two required inputs: (i) a source model to be propagated and (ii) a proteome comparison map between the source and the target organism.

(i) The input source model can be either one previously created externally and then imported into KBase or one generated within the system with the Build Metabolic Model App. (See here for instructions on import a genome into KBase).

If creating a new model in KBase using this App, we recommend first performing a literature search for available published model(s) for organisms closely related to the target organism. To aid in this process the Insert Genome Into Species Tree App can be used to find closely related organisms to look for when searching the literature.

Since the purpose of this App is to take advantage of the additional information provided by a curated model that our automated generated models may be missing, we do not recommend using a model generated via the Build Metabolic Model App as a source, unless curation is introduced. Curation can be performed in KBase using the Edit Metabolic Model App.

(ii) Generating the necessary protein comparison requires the use of the Compare Two Proteomes App to identify genes encoding potentially orthologous proteins in the two organisms. Best results are obtained when the source and target genomes are closely related, as higher-confidence functional inferences may be drawn from the higher homology of protein pairs between the two organisms. Additionally, the transferred model may suggest missing annotations in the target genome, thus improving the quality of its annotation.

The method applies the following rules in order to decide whether to propagate a reaction or not:

If the reaction associates with one or more genes in the original (source) model, but none of those genes has an ortholog in the new genome, the reaction is deleted.
If the reaction associates with no genes in the original model, then the action taken depends on whether or not the "Keep reactions with no genes" option is checked (the reaction is retained if checked, otherwise it is removed). By checking the box, the user is making the assumption that both species have the same gaps in the enzymatic annotation of their genomes.
If the reaction has one or more genes in the original model and at least one has a homolog in the new genome, then the reaction is retained, and all genes with homologs are translated and retained.

App Output
There are eight separate tabs for browsing the data in the model:

Overview: this tab shows a summary of key information about the model, including the associated genome, number of reactions, and number of compounds.
Reactions: this tab shows detailed reaction information including reaction ID, name, biochemical equation, the associated gene IDs, and whether or not the reaction was added by the gapfilling stage.
Compounds: this tab shows information about compounds in the model, including chemical formula and charge.
Genes: this tab shows gene IDs and associated reaction IDs.
Compartments: this tab shows the subcellular localization of the compounds and enzymes. Typically, there are three types of compartments in microbes: Cytosol (c), Periplasm (p), and Extracellular (e). Reactions and compounds belonging to each compartment are identified using compartment notation (e.g., rxn00001[c0], cpd00001[c0]). The integer associated with the compartment (e.g., the 0 in c0) represents the index number of the model. For a single-species model, this number will always be zero, but if individual models are merged into a community model, each sub-model will then be assigned a distinct index.
Biomass: This tab shows the biomass composition of the model. Typically, biomass is represented in the model as an equation where biomass compounds and ATP would make 1 gram of biomass. After clicking on the Biomass tab, the coefficients of each biomass component are listed in the Coefficient column. Negative coefficients represent the compounds on the left side of the biomass equation, and positive coefficients represent the compounds on the right side of the equation.
Gapfilling: This tab shows the reactions that were added to fill metabolic gaps resulting from missing or inconsistent annotations. During the gapfilling process, an optimization algorithm adds a minimal number of reactions and compounds to make the biochemical network generate biomass. Currently, this tab does not show anything because gapfilling indication was moved to the Reactions tab.
Pathways: this tab shows the KEGG maps that represent the metabolic network of the model. Click on the name of a map (e.g., TCA cycle) to see the presence or absence of the reactions (blue).

For additional information about metabolic modeling, visit the Metabolic Modeling in KBase FAQ.

Team members who developed & deployed algorithm in KBase: Chris Henry, Janaka Edirisinghe, Sam Seaver, and Neal Conrad. For questions, please contact us.

Related Publications

Arkin AP, Cottingham RW, Henry CS, Harris NL, Stevens RL, Maslov S, et al. KBase: The United States Department of Energy Systems Biology Knowledgebase. Nature Biotechnology. 2018;36: 566. doi: 10.1038/nbt.4163 , https://www.nature.com/articles/nbt.4163

App Specification:

https://github.com/cshenry/fba_tools/tree/b083384ac00d4f9d7cb796a664ee3ffd017cf248/ui/narrative/methods/propagate_model_to_new_genome

Module Commit: b083384ac00d4f9d7cb796a664ee3ffd017cf248