Dissecting Complexity: The Hidden Impact of Application Parameters on Bioinformatics Research¶

Mikaela Cashman* [1], Myra B. Cohen [2], Alexis L. Marsh [2], Robert W. Cottingham [1]

[1] Biosciences Division, Oak Ridge National Laboratory, Oak Ridge, Tennessee, USA
[2] Department of Computer Science, Iowa State University, Ames, Iowa, USA
* Address correspondence

Narrative Overview¶

This narrative is part of supplementary data for the above manuscript and covers the motivating example (Tables 1 and 2, and Figures 2 and 3). Please refer to additional narratives for other experimentation (to be linked upon publication).

This study uses the Escherichia coli (EC) str. K-12 substr. MG1655 genome (reannotated) to build a metabolic model representation, and performs Flux Balance Analyis (FBA) on a gapfilled version of that model with different values for max uptake of oxgyen in order to determine this parameter's effect on the objective value (biomass).

To rerun, start by "making a copy" of this narrative, then go to the cell of the app you wish to rerun, click "Reset" and accept the warning, then "Run". Note that if the app is marked as an older version, that references the version used in the manuscript, it is possible with an updated version results may differ. Thus to reproduce results from the paper choose to not update. To learn more about KBase narratives please visit the KBase quick start guide here, guide on running apps here, and guide on common workflows here.

Manuscript Abstract¶

*Biology is a quest; an ongoing inquiry about the nature of life. How do the different forms of life interact? What makes up an ecosystem? How does a tiny bacterium work? To answer these questions biologists turn increasingly to sophisticated computational tools. Many of these tools are highly configurable, allowing customization in support of a wide range of uses. For example, algorithms can be tuned for precision, efficiency, type of inquiry, or for specific categories of organisms or their component subsystems. Ideally, configurability provides useful flexibility. However, the complex landscape of configurability may be fraught with pitfalls. This paper examines that landscape in bioinformatics tools. We propose a methodology, SOMATA, to facilitate systematic exploration of the vast choice of application parameters, and apply it to three different tools on a range of scientific inquires. We further argue that the tools themselves are complex ecosystems. If biologists explore these, ask questions, and experiment just as they do with their biological counterparts, they will benefit by both finding improved solutions to their problems as well as increasing repeatability and transparency. We end with a call to the community for an increase in shared responsibility and communication between tool developers and the biologists that use them in the context of complex system decomposition.*

Narrative Introduction¶

A systems view in the biological context is a useful approach to study complex and potentially interacting systems. In the associated manuscript, we take a systems view of software tools themselves. Instead of using software as a single tool, we demonstrate how users can approach software with the same mindset as the organisms they study; as a complex system that can be optimized based on one's environment and goals. We demonstrate this idea with an example showing how application parameters in KBase tools can change how a researcher perceives and interacts with bioinformatics tools while conducting a common experimental scenario. In this scenario we are trying to understand how different chemical compounds in a growth media change the metabolic pathways utilized in Escherichia coli.

Experimental Design¶

Before running extensive laboratory experiments we might want to exploit a common computational method simulating an organism's growth using a genome-scale metabolic network [a] . We can run a Flux Balance Analysis (FBA) which calculates the flow of metabolites through the metabolic network in a specific growth media to optimize for growth (measured by flux through the biomass reaction) using a linear programming algorithm. The FBA tool used here provides input options such as the FBA model and the media, as well as algorithmic options such as the reaction to maximize, custom flux bounds, expression thresholds, and maximum uptakes of different nutrient sources. One output of this tool is the objective value (OV) which represents the maximum flow through the biomass reaction as a measurement of growth. For more information about this tool please refer to [b] .

For the purpose of this example we will explore the effect of the max oxygen uptake parameter which is described in the documentation as the "maximum number of moles of oxygen permitted for uptake (default uptake rates varies from 0 to 100 for all nutrients)". This parameter has no fixed default value since the maximum is constrained by the media input. We will change this to a value of 0 to see what the behavior is.

Through this narrative, we will recreate the data used in Tables 1 and 2, and Figures 2 and 3. Figure 3 can be seen below where we model the configuration changes as state changes to the system. In the initial state (before running FBA) we have a static model with no defined fluxs through its reactions. After running different configurations through, we arrive at new states with different outputs (measured by growth/OV and changes to metabolic reactions). Hence, the configuration (and thus the parameters) have a direct impact on the individual reaction fluxes and the pathways through the network which leads to different behavior (e.g. growth). We will revisit this Figure in the discussion.

Narrative Overview¶

This narrative executes the following workflow:

Step 1: Model creation begins with the EC genome selected from KBase’s public NCBI RefSeq genome database.
Step 2: The genome was reannotated with RAST in order to have the most completely annotated genome possible.
Step 3: Using the "build metabolic model" application, the draft metabolic model was created.
Step 4: Gapfil draft metabolic model ("gapfill metabolic model").
Step 5: Perform FBA on the model with different max oxgyen uptake rates ("Run Flux Balance Analysis").
Results and Discussion

Step 1. Select Genome¶

We start by selecting a genome. In this case, the authors used a publicly available genome from KBase’s public NCBI RefSeq genome database. However, steps 2-5 will still proceed as shown in this narrative if you elect to upload your own genome. The Escherichia coli (EC) str. K-12 substr. MG1655 genome can be viewed below.

_{return to overview}

Step 2. Reannotate with RAST¶

Next we perform reannotation with RAST. Genome-scale metabolic models are constructed from the genome annotations. Automated annotation tools allow for a higher percentage of an organism's genes to have annotations thereby resulting in a more complete metabolic network.

_{return to overview}

Step 3. Build the draft metabolic model from the genome¶

Next we bulid the metabolic model in order to convert the annotated genome into a genome-scale metabolic model.

_{return to overview}

Step 4. Gapfill Draft Model¶

Next we gapfill the model. Before this step, most genome-scale metabolic model's metabolic network is not complete enough to be simutated to grow when using flux balance analysis (produce biomass) and a given media. This method adds the minimal number of reactions to the network such that the model can grow in that media.

_{return to overview}

Step 5. Run FBAs¶

Next, we perform FBA on the model with the the configurations of interest. Here we analyze different max oxgyen uptake rates to determine this parameter's effect on the objective value. In this example, we include max oxygen uptake set to the default, as well as manually set to 0, 10, 40, and 100. Run configurations and output can be found below. Output files can also be found in the data tab on the left.

_{return to overview}

Default Max Oxygen¶

Max Oxygen = 0¶

Max Oxygen = 10¶

Max Oxygen = 40¶

Max Oxygen = 100¶

Results and Discussion¶

If we do not manually change any of the parameters (i.e. run the default configuration) E. coli produced an OV of 0.164692 mmol/g CDW hr. When we changed the max oxygen parameter to a value of 0, E. coli no longer grows (OV of 0 is returned). Next we increased this value in increments of 10 to observe how the OV changes. Results for values from 0 to 100 can be seen in the Table below (Table 1 in the manuscript). We see the OV increases linearly until it reaches the maximum (also the default growth) of 0.164692 when max oxygen is set to 40. If we set this value past 40, there is no added effect.

Parameter	OV
Default	0.164692
Max O 0	0
Max O 10	0.053666
Max O 20	0.107332
Max O 30	0.160998
Max O 40	0.164692
Max O 50	0.164692
Max O 60	0.164692
Max O 70	0.164692
Max O 80	0.164692
Max O 90	0.164692
Max O 100	0.164692

Impacts on Reaction Flux¶

In addition to the changes observed in the OV, there are also changes in the fluxes of reactions in the metabolic reaction network. These fluxes can be positive meaning the net flow of metabolites occurs from left to right in the reaction equation (equation 1 below), negative meaning the net flow occurs right to left (equation 2 below), or zero indicating the flux is equal on both sides resulting in a net of zero.

$${1.}\space{Glucose}\rightarrow{Glucose-6-Phosphate}$$$$ {2.}\space{Glucose}\leftarrow{Glucose-6-Phosphate}$$

To illustrate changes in fluxes caused by adjusting max oxgyen uptake rate, we compared the fluxes when max oxgyen is set to 10 versus the the default setting. In this case, 431 of the 1,591 reactions present in the network (27.09%) result in different fluxes the mmajority of which being and increase or decrease in flux with no directional change. However 12 reactions had a significant change as seen in the table below (Table 2 in the manuscript). Five reactions change from a negative flux to a flux of zero, three from zero to negative, two zero to positive, and two changed from positive to zero. These 12 reactions represent significant changes that occur to metabolism of E. coli when oxygen levels are varied by means of the tool parameter.

No.	KBase ID	ID KEGG ID / EC Number	Effect on Flux	Associated Pathways
1.	rxn00515.c0	R00722 2.7.4.6	neg to zero	rn00230 Purine metabolism rn01100 Metabolic pathways rn01232 Nucleotide metabolism
2.	rxn00616.c0	R00848 1.1.99.5	neg to zero	rn00564 Glycerophospholipid metabolism rn01110 Biosynthesis of secondary metabolites
3.	rxn00715.c0	R00970 2.7.1.48	neg to zero	rn00240 Pyrimidine metabolism
4.	rxn00935.c0	R01257 1.1.99.16	neg to zero	rn00620 Pyruvate metabolism
5.	rxn04792.c0	R06983 1.1.1.284	neg to zero	rn00680 Methane metabolism rn01100 Metabolic pathways rn01120 Microbial metabolism in diverse environments rn01200 Carbon metabolism
6.	rxn00611.c0	R00842 1.1.1.8, 1.1.1.94, 1.1.1.261	zero to neg	rn00564 Glycerophospholipid metabolism rn01110 Biosynthesis of secondary metabolites
7.	rxn01673.c0	R02326 2.7.4.6	zero to neg	rn00240 Pyrimidine metabolism rn01100 Metabolic pathways rn01232 Nucleotide metabolism
8.	rxn09188.c0	R10507 1.5.99.8	zero to neg	rn00330 Arginine and proline metabolism rn00332 Carbapenem biosynthesis rn01100 Metabolic pathways rn01110 Biosynthesis of secondary metabolites
9.	rxn00931.c0	R01251 1.5.1.2	zero to pos	rn00330 Arginine and proline metabolism rn01100 Metabolic pathways rn01110 Biosynthesis of secondary metabolites rn01230 Biosynthesis of amino acids
10.	rxn08657.c0	1.1.3.15	zero to pos	ec00630 Glyoxylate and dicarboxylate metabolism ec01100 Metabolic pathways ec01110 Biosynthesis of secondary metabolites ec01120 Microbial metabolism in diverse environments
11.	rxn04938.c0	R07140 11.1.1.284	pos to zero	rn00480 Glutathione metabolism rn00680 Methane metabolism rn01100 Metabolic pathways rn01120 Microbial metabolism in diverse environments
12.	rxn08656.c0	1.1.3.15	pos to zero	ec00630 Glyoxylate and dicarboxylate metabolism ec01100 Metabolic pathways ec01110 Biosynthesis of secondary metabolites ec01120 Microbial metabolism in diverse environments

Changes to the Pyruvate Metabolism Pathways¶

We can observe these changes to the reactions and corresponding pathways directly in the KEGG pathway maps in the figure below (Figure 2 in the manuscript) which depicts Pyruvate Metabolism. The reaction through EC 1.1.5.4 changes from a negative net flux (left) to a net flux of zero (right). We also see reactions change in the magnitude of their flux. For example ECs 2.3.1.54, 2.7.1.40, and 2.7.2.1 change from a higher net negative flux (left) to a lower net flux (right). EC 2.3.1.8 changes from a higher net positive flux (left) to a lower net flux (right).

Green, increased exchange flux; red, decreased exchange flux; gray, no net change in exchange flux due to metabolic rerouting; white, no change predicted.

State Diagram Representation¶

Coming back to our original state diagram representation, we can now see how each state change results in biological differences in the resulting models. In the initial state (before running FBA) we have a static model with no defined fluxs through its reactions. After running the default configuration through, we arrive at a state with a growth of 0.164692 with fluxes through its 1,591 reactions. But if we take a different transition, for example through max oxygen of 10 then we arrive a different state with a growth of 0.0536659 where the flux through 431 reactions differ leading to changes in the pathways. The same occurs if we set max oxygen to 30 with a growth of 0.160998 resulting again in different reactions and paths. Hence, the configuration (and thus the parameters) have a direct impact on the individual reaction fluxes and the pathways through the network which leads to different behavior (e.g. growth).

Summary¶

By approaching this software tool with a systems view, we have demonstrated that not only can the ultimate objective change based on the parameters used (e.g. objective value), but the internal biological state (the direction and magnitude of the internal metabolic reactions) can change in significant ways as well. In the associated manuscript we demonstrate via case studies how using a systems mindset allows the scientist to explore and understand dependencies between application parameters and the final scientific result. This improved understanding can build confidence and lead to better science.

_{return to overview}

References¶

[a] Orth JD, Thiele I, Palsson BØ. What is flux balance analysis? Nature Biotechnology. 2010;28(3):1546–1696.

[b] FAQ: Metabolic Modeling; 2022. Available from: https: //docs.kbase.us/workflows/metabolic-models/faq-metabolic-modeling

To reference this narrative's work please cite:¶

Mikaela Cashman, Myra B. Cohen, Alexis L. Marsh, Robert W. Cottingham, Dissecting Complexity: The Hidden Impact of Application Parameters on Bioinformatics Research. bioRxiv 2022.12.20.521257; doi: https://doi.org/10.1101/2022.12.20.521257

Created Object Name	Type	Description
v11_Ecoli_MG1655_RASTannotated.GFModel	FBAModel	FBAModel-15 v11_Ecoli_MG1655_RASTannotated.GFModel
v11_Ecoli_MG1655_RASTannotated.GFModel.gf.1	FBA	FBA-13 v11_Ecoli_MG1655_RASTannotated.GFModel.gf.1

Dissecting Complexity: The Hidden Impact of Application Parameters on Bioinformatics Research¶

Narrative Overview¶

Manuscript Abstract¶

Narrative Introduction¶

Experimental Design¶

Narrative Overview¶

Step 1. Select Genome¶

Step 2. Reannotate with RAST¶

Step 3. Build the draft metabolic model from the genome¶

Step 4. Gapfill Draft Model¶

Step 5. Run FBAs¶

Default Max Oxygen¶

Max Oxygen = 0¶

Max Oxygen = 10¶

Max Oxygen = 40¶

Max Oxygen = 100¶

Results and Discussion¶

Impacts on Reaction Flux¶

Changes to the Pyruvate Metabolism Pathways¶

State Diagram Representation¶

Summary¶

References¶

To reference this narrative's work please cite:¶

Apps