Generated November 10, 2022
figure1

Applying a mixture of KBase and PDB to unravel and mystery in plant metabolism

In the previous presentation, described in this narrative, we demonstrated how KBase tools can be applied to discover an entirely new metabolic pathway, and to propose gene candidates for steps in that pathways. We then show how our new KBase-PDB tools allow users to: (1) upload and view AlphaFold structures of all candidate genes; (2) rapidly query PDB for experimental structures corresponding to candidate genes and the entire genome of interest; and (3) query PDB for experimental structures that involve cocrystalization with a compound of interest in our pathway. Later in this session, we'll see how tools in PDB complement this workflow by enabling users to align their alphafold structures against experimental structures of interest in PDB itself.

Now we're going to change gears and explore how these tools can also be applied to study plant genomes as well, again demonstrating excellent synergy with the tools and data available in PDB itself. In this story, we're going to proceed through the following steps:

  1. Introduce Our Metabolic Mystery
  2. Build a Functioning Metabolic Model of Arabidopsis
  3. Predict and Visualize Flux for Arabidopsis Model
  4. Apply PDB annotation app to scan arabidopsis genome for relevant structures in PDB
  5. Identify proteins of interest to resolve our metabolic mystery
  6. Import structures from PDB and view in KBase
  7. Align candidate gene structures with experimental homologs to understand function (later session)
  8. Experiment validation
  9. New theories for function of uncharacterized COG3236 in Arabidopsis and E. coli

1. Introduce Our Metabolic Mystery

Now let's go back to 2012 with the publication of this paper rigorously reviewing the state of annotation for B vitamin pathways in plants: Plant B Vitamin Pathways and their Compartmentation: a Guide for the Perplexed

This article highlights a particular metabolic mystery in riboflavin biosynthesis in plants at the time:

figure1
See the pink highlighted step in riboflavin biosynthesis, PyrR (EC:1.1.1.193), which was unknown in plants at the time. Below we have the annotated genome of Arabidopsis pre-annotated using the Annotate Plant Enzymes with OrthoFinder app in KBase. This app can take a while to run, so we are choosing not to run it here. This genome was edited to reflect our lack of knowledge of the 1.1.1.193 step in riboflavin biosynthesis. I'll demonstrate how to explore the genome data below.

2. Build a Functioning Metabolic Model of Arabidopsis

Since the above genome version has already been annotated, we can proceed directly to building a metabolic model using the PlantSEED app in KBase. This app constructs a functioning metabolic model of Arabidopsis with full organelles and biomass biosynthesis reactions. The output of this app demonstrates how our metabolic mystery is exposed in KBase. If you search for the function in question, 1.1.1.193, you'll find the function is in the model, but it lacks any associated genes. This indicates that the reaction is gapfilled and the genes for the reaction are unknown.

Reconstruct the metabolic network of a plant based on an annotated genome.
This app completed without errors in 3m 46s.
Objects
Created Object Name Type Description
Athaliana_2012_FBAModel FBAModel FBAModel: Athaliana_2012_FBAModel
Links
Output from Reconstruct Plant Metabolism
The viewer for the output created by this App is available at the original Narrative here: https://narrative.kbase.us/narrative/130357

3. Predict and Visualize Flux for Arabidopsis Model

Now that we have a functioning model, let's run flux balance analysis to see the fluxes through the reactions in this model. I'll review the fluxes below and also show how they can be visualized using our escher app.

Predict metabolite fluxes in a metabolic model of an organism grown on a given media using flux balance analysis (FBA).
This app completed without errors in 1m 4s.
Objects
Created Object Name Type Description
ArabidopsisAutotrophicFBA FBA FBA-13 ArabidopsisAutotrophicFBA
Report
Summary
A flux balance analysis (FBA) was performed on the metabolic model 130357/16/1 growing in 130357/6/1 media.
Output from Run Flux Balance Analysis
The viewer for the output created by this App is available at the original Narrative here: https://narrative.kbase.us/narrative/130357
Display Metabolic Pathways
This app completed without errors in 40s.
Summary
message_in_app /kb/module/work/tmp/cdf02337-7dc9-40df-86bd-c0ca36d559fc
Links

4. Apply PDB annotation app to scan arabidopsis genome for relevant structures in PDB

This app attempts to find structures available in PDB that are homologous to any genes in our Arabidopsis genome.

Queries PDB API with genome proteins and annotates proteins with associated PDB metadata
This app completed without errors in 6h 2m 52s.
Objects
Created Object Name Type Description
Athaliana_TAIR10_2012.pdb Genome Saving PDB annotation for Athaliana_TAIR10_2012
Links

5. Identify proteins of interest to resolve our metabolic mystery

Searching for our EC of interest, 1.1.1.193, we find two candidate genes for the missing function: AT3G47390, AT4G20960

Both hit proteins of the same function in microbes, but with wildly different EC numbers. But which one is right? Let's query PDB specifically for these two genes to get a broader set of hits.

6. Query PDB for more hits for our candidate proteins

The PDB metadata import app provides a broad scan for hits in PDB for our genome of interest, but by necessity, it can return only limited hits for each individual gene queried. The targeted query app will return more hits for each gene. Now that we have specific gene candidates from our broad query, we can query these genes specifically with our query app.

Given a json format query constraint, query RCSB databases for a list of protein structures
This app completed without errors in 30s.
Summary
Query has resulted in 24 structures in RCSB DB.
Links

7. Experiment validation

Both of these genes were subsequently tested by completementing in a E coli PyrR KO with each gene. The individual genes failed to show functionality, but when both genes were combined, we see function. This backs up the story also indicated from our structure studies. Now we can add the new annotations officially to our arabidopsis genome.

v1 - KBaseGenomes.Genome-11.0
The viewer for the data in this Cell is available at the original Narrative here: https://narrative.kbase.us/narrative/130357

Released Apps

  1. Reconstruct Plant Metabolism
    • [1] Seaver SMD, Lerma-Ortiz C, Conrad N, Mikaili A, Sreedasyam A, Hanson AD, et al. PlantSEED enables automated annotation and reconstruction of plant primary metabolism with improved compartmentalization and comparative consistency. Plant J. 2018;95: 1102 1113. doi:10.1111/tpj.14003
    • [2] Seaver SMD, Gerdes S, Frelin O, Lerma-Ortiz C, Bradbury LMT, Zallot R, et al. High-throughput comparison, functional annotation, and metabolic modeling of plant genomes using the PlantSEED resource. Proc Natl Acad Sci USA. 2014;111: 9645 9650. doi:10.1073/pnas.1401329111
    • [3] GitHub source:
    • [4] Emms DM, Kelly S. OrthoFinder: solving fundamental biases in whole genome comparisons dramatically improves orthogroup inference accuracy. Genome Biol. 2015;16. doi:10.1186/s13059-015-0721-2
    • [5] Henry CS, DeJongh M, Best AA, Frybarger PM, Linsay B, Stevens RL. High-throughput generation, optimization and analysis of genome-scale metabolic models. Nat Biotechnol. 2010;28: 977 982. doi:10.1038/nbt.1672
    • [6] Overbeek R, Olson R, Pusch GD, Olsen GJ, Davis JJ, Disz T, et al. The SEED and the Rapid Annotation of microbial genomes using Subsystems Technology (RAST). Nucleic Acids Res. 2014;42: D206 D214. doi:10.1093/nar/gkt1226
    • [7] Latendresse M. Efficiently gap-filling reaction networks. BMC Bioinformatics. 2014;15: 225. doi:10.1186/1471-2105-15-225
    • [8] Dreyfuss JM, Zucker JD, Hood HM, Ocasio LR, Sachs MS, Galagan JE. Reconstruction and Validation of a Genome-Scale Metabolic Model for the Filamentous Fungus Neurospora crassa Using FARM. PLOS Computational Biology. 2013;9: e1003126. doi:10.1371/journal.pcbi.1003126
    • [9] Mahadevan R, Schilling CH. The effects of alternate optimal solutions in constraint-based genome-scale metabolic models. Metab Eng. 2003;5: 264 276.

Apps in Beta

  1. Escher Pathway Viewer
    • [1] King, Z. A., Dr ger, A., Ebrahim, A., Sonnenschein, N., Lewis, N. E., & Palsson, B. . (2015). Escher: A Web Application for Building, Sharing, and Embedding Data-Rich Visualizations of Biological Pathways. PLOS Computational Biology, 11(8), e1004321.
    • [2] Rowe, E., Palsson, B. ., & King, Z. A. (2018). Escher-FBA: a web application for interactive flux balance analysis. BMC Systems Biology, 12(1), 84.
  2. PDB - Import PDB Metadata into KBase Genome
    no citations
  3. Query RCSB databases for protein structures
    no citations
  4. Run Flux Balance Analysis
    • Henry CS, DeJongh M, Best AA, Frybarger PM, Linsay B, Stevens RL. High-throughput generation, optimization and analysis of genome-scale metabolic models. Nat Biotechnol. 2010;28: 977 982. doi:10.1038/nbt.1672
    • Orth JD, Thiele I, Palsson B . What is flux balance analysis? Nature Biotechnology. 2010;28: 245 248. doi:10.1038/nbt.1614