Modeling Central Metabolism and Energy Biosynthesis across Microbial Life ¶

Reference: Edirisinghe JN, Weisenhorn P, Conrad N, Xia F, Overbeek R, Stevens RL, Henry CS. Modeling central metabolism and energy biosynthesis across microbial life. BMC Genomics. 2016;17. doi:10.1186/s12864-016-2887-8 ¶

Authors and affiliations¶

Janaka N. Edirisinghe^1,2, Pamela Weisenhorn¹,Neal Conrad¹, Fangfang Xia¹, Rick L. Stevens^1,2, Christopher S. Henry^1,2,*¶

* Corresponding author: CSH (chenry@mcs.anl.gov)

Mathematics and Computer Science Division, Argonne National Laboratory, Argonne, IL 60439, USA
Computation Institute, University of Chicago, Chicago, IL 60637, USA

Narrative Overview¶

Automatically generated bacterial metabolic models, and even some curated models, lack accuracy in predicting energy yields due to poor representation of key pathways in energy biosynthesis and the electron transport chain (ETC). Further compounding the problem, complex interlinking pathways in genome-scale metabolic models, and the need for extensive gapfilling to support complex biomass reactions, often results in predicting unrealistic yields or unrealistic physiological flux profiles

To overcome this challenge, we developed methods and tools (used in this Narrative) to build high quality core metabolic models (CMM) representing accurate energy biosynthesis based on a well studied, phylogenetically diverse set of model organisms. We compare these models to explore the variability of core pathways across all microbial life, and by analyzing the ability of our core models to synthesize ATP and essential biomass precursors, we evaluate the extent to which the core metabolic pathways and functional ETCs are known for all microbes. 6,600 (80%) of our models were found to have some type of aerobic ETC, whereas 5,100 (62%) have an anaerobic ETC, and 1,279 (15%) do not have any ETC. Using our manually curated ETC and energy biosynthesis pathways with no gapfilling at all, we predict accurate ATP yields for nearly 5586 (70%) of the models under aerobic and anaerobic growth conditions.

Table 1. Number of core models able to grow (predict ATP yields) on various aerobic and anaerobic media without any gapfilling reactions added

Glucose-aerobic	Glucose-anaerobic	Glucose-anaerobic-Nitrate	Glucose-anaerobic-DMSO	Glucose-anaerobic-TMAO	Glycerol-anaerobic-Nitrate	Succinate-aerobic
5291(64%)	4440 (54%)	4951(61%)	4533 (56%)	4476 (55%)	2681 (33%)	1736 (21%)

Over 65% (5253) of core models required two or fewer gapfilling reactions to produce all of the required biomass precursors.

Table 2. Number of gapfilled reactions that are required in Core Metabolic Models in order to produce all biomass precursors


Number of Models	Number of Gapfilled reactions or range	Percentage (%)
3415	0	41
1270	1	15
568	2	6
1336	3 to 5	16
1448	6 to 10	14
317	11 to 20	4
16	21 to 33	0.001

This study revealed gaps in our knowledge of the central pathways that result in 2,495 (30%) CMMs being unable to produce ATP under any of the tested conditions. We then established a methodology for the systematic identification and correction of inconsistent annotations using core metabolic models coupled with phylogenetic analysis.

We predict accurate energy yields based on our improved annotations in energy biosynthesis pathways and the implementation of diverse ETC reactions across the microbial tree of life. We highlighted missing annotations that were essential to energy biosynthesis in our models. We examine the diversity of these pathways across all microbial life and enable the scientific community to explore the analyses generated from this large-scale analysis of over 8000 microbial genomes.

In this Narrative we demonstrate construction of core models from the core template, gapfill analysis and flux balance analysis against media formulations.

Narrative Contents¶

[Core Model construction pipeline](#Core-Model-construction-pipeline)
[Gene annotations of _Escherichia coli_ K12](#RAST-annotations-of-Escherichia-coli-K12)
[Build a draft Core Metabolic Model and gapfill in minimal media aerobically](#Build-a-draft-Core-Metabolic-Model-and-gapfill-in-minimal-media-aerobically)
[Core Model of _Escherichia coli_ K12](#Core-Model-of-Escherichia-coli-K12)
[Run Flux Balance Analysis on Glucose-aerobic Minimal Media under aerobic conditions](#Run-Flux-Balance-Analysis-on-Glucose-aerobic-Minimal-Media-under-aerobic-conditions)
[Run Flux Balance Analysis on Glucose-anaerobic Minimal Media, simulating the the growth of the organism under anaerobic conditions](#Run-Flux-Balance-Analysis-on-Glucose-anaerobic-Minimal-Media,-simulating-the-the-growth-of-the-organism-under-anaerobic-conditions)
[Run Flux Balance Analysis on Glucose-anaerobic-Nitrate minimal media under anaerobic conditions](#Run-Flux-Balance-Analysis-on-Glucose-anaerobic-Nitrate-minimal-media-under-anaerobic-conditions)
[ATP yield predictions of core models under aerobic and anerobic conditions](#ATP-yield-predictions-of-core-models-under-aerobic-and-anerobic-conditions)
[Run Flux Balance Analysis on glucose minimal media aerobically using biomass as the objective function](#Run-Flux-Balance-Analysis-on-glucose-minimal-media-aerobically-using-biomass-as-the-objective-function)
[Distribution of gapfill reactions in core models](#Distribution-of-gapfill-reactions-in-core-models)
[References](#References)

Core Model construction pipeline¶

The pipeline starts by assigning gene annotations to the assembled microbial genomes, using the RAST annotation pipeline. Next, the CMMs are constructed based on a manually curated CMT that consists of biochemical reactions derived from a phylogenetically diverse set of model organisms including Escherichia coli, Bacillus. subtilis, Pseudomonas aeroginosa, Clostridium acetobutylicum, and Paracococcus denitrificans. In the final step, FBA is performed, optimizing the biomass or ATP hydrolysis as the objective function.

Figure 1. Core model construction pipeline

_{Return to Narrative contents}

Gene annotations of Escherichia coli K12¶

We start by assigning gene annotations to the assembled E. coli genome, using the RAST annotation pipeline.

_{Return to Narrative contents}

Above, the Escherichia coli K12 genome is shown in a genome viewer. This viewer provides a concise, text-based overview of the genome as well as its contigs and genes.

In the Contigs and Genes tabs, each entry is clickable, opening either a browser for the contig or another tab with expanded information about the gene.

You can sort these entries by clicking on a column header to sort by that field (e.g., Length). Clicking the same column header again will reverse the sort order.

This Escherichia coli genome is faily complete and has a single contig: click on the contig to see neighboring genes and potential operons in this species.

To further explore this genome, click the genome name at the top of the viewer. This will open a Landing Page for the genome in a new tab in your browser. The Landing Page provides more details about the organism, its genome, and annotations.

Build a draft Core Metabolic Model and gapfill in minimal media aerobically¶

Metabolic models generally require an objective function (OF) that is optimized during flux balance analysis to predict flux profiles. However, in our Core Metabolic Models, we explored two OFs: a biomass biosynthesis objective function and an ATP hydrolysis objective function. While CMMs do not include the amino acids, nucleotides, lipids, and cofactors that are typically included in the biomass biosynthesis objective function of genome-scale models, they do include the central carbon precursor metabolites for these compounds. Thus the biomass biosynthesis OF for our CMMs was constructed based on the biomass precursor stoichiometry derived by Varma and Parlsson and used in one of the earliest models of E. coli. When analyzing CMMs using the biomass biosynthesis OF, we found that occasionally gapfilling was required to enable synthesis of all essential biomass precursors . To permit a focused study of energy biosynthesis in our models without gapfilling, we developed a second OF for our CMMs consisting only of the ATP hydrolysis reaction: ATP + H2O -> ADP + Pi + H+. Using this OF, we computed ATP production yields in all models without any gapfilling; hence, these computations were based solely on reactions derived from existing RAST annotations

First, we will use the Build Metabolic Model app to build an initial draft metabolic core model based on the gene annotations in the Escherichia coli K12 genome. We chose Core metabolism as the model template listed under the field Template for reconstruction. This app has two steps; when the first step (Build Metabolic Model) finishes, the second step (Gapfill Metabolic Model) starts automatically.

The gapfill step lets you specify a media condition (i.e., the metabolites available in the environment in which you want to analyze your organism’s growth). If you leave the Media field blank, "complete" media will be used by default. Complete media is a special type of media that does not include an exact list of compounds. Instead, complete media consists of all metabolites for which a transporter is available in the KBase biochemistry database. (Transporters are reactions that move metabolites across cell membranes.) In the case of core models we use a minimal media for our simulations (e.g., Glucose minimimal media or Glycerol minimal media).

In addition to the media formulations available in KBase, you can upload your own custom media. In this example, Escherichia coli K12 was tested for growth in a minimal media condition called Glucose-aerobic.

We are making a preliminary assertion that a model cannot make all required biomass components from the sources in the minimal media, however, core models are desigined based on highly curated template that many of the core models inculding the one based on Escherichia coli K12 does not require any gapfilling reactions added to the model (see Table 2) in order to proudce its biomass when using Glucose minimal media as the sole carbon source.

Below, you will see the input cells for running the Build Metabolic Model app on our annotated E. coli genome.

_{Return to Narrative contents}

Core Model of Escherichia coli K12¶

An initial draft model of _Model of Escherichia coli K12 is produced based on RAST annotations. The model was not gapfilled, as the gapfilling option was not selected.

_{Return to Narrative contents}

Above is the core model for Escherichia coli K12.

There are seven tabs for browsing the data in the model: Overview, Reactions, Compounds, Genes, Compartments, Biomass, Gapfilling and Pathways. The contents of these tabs are as follows:

Overview — Summary of key information about the model, including the associated genome, number of reactions, and number of compounds.
Reactions - Reaction information in detail, including reaction ID, enzyme name, the biochemical equation, and the associated gene IDs.
Compounds - Information about compounds in the model, including the chemical formula and charge.
Genes - Gene IDs and associated reaction IDs.
Compartments - The subcellular localization of the compounds and enzymes. Typically, there are three types of compartments in microbes: Cytosol (c0), Periplasm (p0) and Extracellular (e0). Reactions and compounds belonging to each compartment are identified using compartment notation, e.g., rxn00001[c0], cpd00001[c0].
Biomass — The biomass composition of the model. Typically, biomass is represented in the model as an equation where biomass compounds and ATP would make one gram of biomass. The coefficients of each biomass component are listed in the Coefficient column. Negative coefficients represent the compounds at the left side of the biomass equation and the positive coefficients represent the compounds at the right side of the equation.
Gapfilling - The reactions that were added to fill metabolic gaps in the model. These metabolic gaps occur as a result of missing or inconsistent annotations. During the gapfilling process, an optimization algorithm adds a minimal number of reactions and compounds to make the biochemical network generate its biomass. In this case the model was not gapfilled.
Pathways - KEGG maps [5,6,7] that represent the metabolic network of the model. You can click on the name of a map (e.g., TCA cycle) to see the presence or absence of the reactions (colored in blue).

Run Flux Balance Analysis on Glucose-aerobic Minimal Media under aerobic conditions¶

We have built a core metabolic model; now we can use the Run Flux Balance Analysis app to perform FBA to calculate the flow of metabolites through our model. FBA results can be used to predict the growth rate of an organism under certain conditions or the production rates for particular metabolites of interest. In this case we have used ATP hydrolysis (ATP+H2O -> ADP +Pi + H+) as the objective function.

To perform FBA, you must specify a media condition that you want to investigate using your metabolic model. In this example, we select the Glucose-aerobic minimal media, implying the organism grows on Glucose minimal media under aerobic conditions.

_{Return to Narrative contents}

Flux balance analyis resutls (below) are organized into a table with six tabs: Overview, Reaction fluxes, Exchange fluxes, Genes, Biomass, and Pathways. You can see the objective value as 26.5 mmol of ATP/mmol of Glucose. We get this ATP yield as the organism undergoes oxidative phosphorylation by utilizing aerobic electron transport chains. Under the aerobic condition, glucose is fully oxidized into CO2, H2O and energy.

Overview — Among the summary information in this tab is the objective value (growth of the model), which is important because it represents the maximum achievable flux through the biomass reaction of the metabolic model. An objective value of 0 or something very close to 0 means that the model did not grow on the specified media. This tab also lists other information, including the genome, media formulation, number of reactions, and number of compounds associated with the FBA.
Reaction fluxes — Numerical flux values, minimum and maximum flux bounds, biochemical equations, and associated genes for each reaction in the model. This information represents the fluxes through all internal reactions that allow for growth and byproduct creation. These fluxes can be further broken down into biological pathways of interest (see Pathways tab). A user may ask, for example, “How much fatty acid is being produced?” or “What are the high flux reactions or pathways?”
Exchange fluxes — These fluxes describe the rates at which nutrients are taken in and byproducts are secreted. Positive exchange flux values represent the uptake of compounds, and negative exchange flux values represent the excretion of compounds.
Genes — This tab displays the gene knockout information, if any. Because this example uses the wildtype strain of Eshcerichia coli K12, no gene knockout information is available to display.
Biomass — We use the (bio2) ATP hydrolysis as the objective function in this case.
Pathways — This tab displays KEGG maps that represent the metabolic network of the model. Click on the name of a map (e.g., TCA cycle) to see the presence or absence of reactions (blue) and fluxes (positive fluxes are shades of red; negative fluxes are shades of green).

For more information on the Run Flux Balance Analysis app, see:

Narrative tutorial on metabolic modeling

App details page, which includes an explanation of all parameters

Metabolic modeling FAQ

Run Flux Balance Analysis on Glucose-anaerobic Minimal Media, simulating the the growth of the organism under anaerobic conditions¶

We now run FBA on Ecoli_Glucose using Glucose minimal media under anaerobic conditions (without the presense of oxygen). We select the Glucose-anaerobic media formulation.

_{Return to Narrative contents}

Notice the objective value is now 2.75 mmol of ATP/mmol of glucose when simulated under the anaerobic condition. Compared to the objective value 26.5 mmol of ATP/mmol of glucose under the aerobic condition, it is significantly less. This is because there is no oxygen present in the media. As a result, oxidative phosphorylation is not active, electron transport chains are not utilized to produce energy. Under this condition, the organism produces energy solely from the fermentation process.

Run Flux Balance Analysis on Glucose-anaerobic-Nitrate minimal media under anaerobic conditions¶

Facultative anerobic organisms like Escherichia coli can grow both in aerobic and anaerobic conditons, as shown before. They are able to reduce a number of anaerobic electron acceptors such as nitrate (NO3), dimethyl solfuxide (DMSO) and trimethyl amineoxide (TMAO) during anerobic respiration. If anaerobic electron acceptors are not present in the medium, these organisms are still able to grow solely using the fermentation process (as shown above). Now we run FBA on our model 'Ecoli_Glucose' anerobically with nitrate (NO₃) present as an anaerobic electron acceptor.

_{Return to Narrative contents}

ATP yield predictions of core models under aerobic and anaerobic conditions¶

_{Return to Narrative contents}

Figure 2. Predictions of ATP yields using FBA on selected core models

Run Flux Balance Analysis on glucose minimal media aerobically using biomass as the objective function¶

As explained earlier, Core Models have two objective functions, ATP hydrolysis and biomass biosynthesis. About 41% of core models (3415), including the core model of Escherichia coli, do not need any gapfilling reactions addedd in order to produce the essential biomass precursors in the OF. However, some core models (see Table 1) require gapfilling reactions to be added in order to satisfy the biomass objective function. Now we run FBA, selecting biomass (bio2) as the objective function against glucose minimal media without the gapfilling option. _{Return to Narrative contents}

Distribution of gapfill reactions in core models¶

_{Return to Narrative contents}

We can see the Escherichia coli core model was able to grow (0.12 objective value) without any gapfilling reactions added to the model. However, as some models do require gapfilling in order to produce biomass precursors, we have run an analysis identifying the distribution of number of gapfilling reactions needed by each model and organized them by phylogeny.

Figure 3. Number of gapfilled reactions that are required in CMMs in order to produce all biomass precursors, with CMMS organized by phylogenetic group. The blue bars represent the gene-associated reactions and the red bars represent the gapfilled reactions for all CMMs used in this study. The height of the bars represents the number of reactions. CMMs are grouped according to taxonomy.

References¶

Schuetz R, Kuepfer L, Sauer U: Systematic evaluation of objective functions for predicting intracellular fluxes in Escherichia coli. Molecular systems biology 2007, 3:119.
Varma A, Palsson BO: Metabolic capabilities of Escherichia-coli. 2. Optimal-growth patterns. Journal of Theoretical Biology 1993, 165(4):503-522.
Varma A, Palsson BO: Metabolic capabilities of Escherichia-coli.1. Synthesis of biosynthetic precursors and cofactors. Journal of Theoretical Biology 1993, 165(4):477-502.
Aziz RK, Bartels D, Best AA, DeJongh M, Disz T, Edwards RA, Formsma K, Gerdes S, Glass EM, Kubal M et al: The RAST Server: rapid annotations using subsystems technology. BMC genomics 2008, 9:75.
Kanehisa, M., Sato, Y., Furumichi, M., Morishima, K., and Tanabe, M.; New approach for understanding genome variations in KEGG. Nucleic Acids Res. 47, D590-D595 (2019).
Kanehisa, Furumichi, M., Tanabe, M., Sato, Y., and Morishima, K.; KEGG: new perspectives on genomes, pathways, diseases and drugs. Nucleic Acids Res. 45, D353-D361 (2017).
Kanehisa, M. and Goto, S.; KEGG: Kyoto Encyclopedia of Genes and Genomes. Nucleic Acids Res. 28, 27-30 (2000).

_{Return to Narrative contents}

Modeling Central Metabolism and Energy Biosynthesis across Microbial Life ¶

Reference: Edirisinghe JN, Weisenhorn P, Conrad N, Xia F, Overbeek R, Stevens RL, Henry CS. Modeling central metabolism and energy biosynthesis across microbial life. BMC Genomics. 2016;17. doi:10.1186/s12864-016-2887-8¶

Authors and affiliations¶

Janaka N. Edirisinghe1,2, Pamela Weisenhorn1,Neal Conrad1, Fangfang Xia1, Rick L. Stevens1,2, Christopher S. Henry1,2,*¶

Narrative Overview¶

Narrative Contents¶

Core Model construction pipeline¶

Gene annotations of Escherichia coli K12¶

Build a draft Core Metabolic Model and gapfill in minimal media aerobically¶

Core Model of Escherichia coli K12¶

Run Flux Balance Analysis on Glucose-aerobic Minimal Media under aerobic conditions¶

Run Flux Balance Analysis on Glucose-anaerobic Minimal Media, simulating the the growth of the organism under anaerobic conditions¶

Run Flux Balance Analysis on Glucose-anaerobic-Nitrate minimal media under anaerobic conditions¶

ATP yield predictions of core models under aerobic and anaerobic conditions¶

Run Flux Balance Analysis on glucose minimal media aerobically using biomass as the objective function¶

Distribution of gapfill reactions in core models¶

References¶

Apps

Reference: Edirisinghe JN, Weisenhorn P, Conrad N, Xia F, Overbeek R, Stevens RL, Henry CS. Modeling central metabolism and energy biosynthesis across microbial life. BMC Genomics. 2016;17. doi:10.1186/s12864-016-2887-8 ¶

Janaka N. Edirisinghe^1,2, Pamela Weisenhorn¹,Neal Conrad¹, Fangfang Xia¹, Rick L. Stevens^1,2, Christopher S. Henry^1,2,*¶