Applying a mixture of KBase and PDB to unravel and mystery in plant metabolism¶

In the previous presentation, described in this narrative, we demonstrated how KBase tools can be applied to discover an entirely new metabolic pathway, and to propose gene candidates for steps in that pathways. We then show how our new KBase-PDB tools allow users to: (1) upload and view AlphaFold structures of all candidate genes; (2) rapidly query PDB for experimental structures corresponding to candidate genes and the entire genome of interest; and (3) query PDB for experimental structures that involve cocrystalization with a compound of interest in our pathway. Later in this session, we'll see how tools in PDB complement this workflow by enabling users to align their alphafold structures against experimental structures of interest in PDB itself.

Now we're going to change gears and explore how these tools can also be applied to study plant genomes as well, again demonstrating excellent synergy with the tools and data available in PDB itself. In this story, we're going to proceed through the following steps:

Introduce Our Metabolic Mystery
Build a Functioning Metabolic Model of Arabidopsis
Predict and Visualize Flux for Arabidopsis Model
Apply PDB annotation app to scan arabidopsis genome for relevant structures in PDB
Identify proteins of interest to resolve our metabolic mystery
Import structures from PDB and view in KBase
Align candidate gene structures with experimental homologs to understand function (later session)
Experiment validation
New theories for function of uncharacterized COG3236 in Arabidopsis and E. coli

1. Introduce Our Metabolic Mystery¶

Now let's go back to 2012 with the publication of this paper rigorously reviewing the state of annotation for B vitamin pathways in plants: Plant B Vitamin Pathways and their Compartmentation: a Guide for the Perplexed

This article highlights a particular metabolic mystery in riboflavin biosynthesis in plants at the time:

See the pink highlighted step in riboflavin biosynthesis, PyrR (EC:1.1.1.193), which was unknown in plants at the time. Below we have the annotated genome of Arabidopsis pre-annotated using the Annotate Plant Enzymes with OrthoFinder app in KBase. This app can take a while to run, so we are choosing not to run it here. This genome was edited to reflect our lack of knowledge of the 1.1.1.193 step in riboflavin biosynthesis. I'll demonstrate how to explore the genome data below.

2. Build a Functioning Metabolic Model of Arabidopsis¶

Since the above genome version has already been annotated, we can proceed directly to building a metabolic model using the PlantSEED app in KBase. This app constructs a functioning metabolic model of Arabidopsis with full organelles and biomass biosynthesis reactions. The output of this app demonstrates how our metabolic mystery is exposed in KBase. If you search for the function in question, 1.1.1.193, you'll find the function is in the model, but it lacks any associated genes. This indicates that the reaction is gapfilled and the genes for the reaction are unknown.

3. Predict and Visualize Flux for Arabidopsis Model¶

Now that we have a functioning model, let's run flux balance analysis to see the fluxes through the reactions in this model. I'll review the fluxes below and also show how they can be visualized using our escher app.

4. Apply PDB annotation app to scan arabidopsis genome for relevant structures in PDB¶

This app attempts to find structures available in PDB that are homologous to any genes in our Arabidopsis genome.

5. Identify proteins of interest to resolve our metabolic mystery¶

Searching for our EC of interest, 1.1.1.193, we find two candidate genes for the missing function: AT3G47390, AT4G20960

Both hit proteins of the same function in microbes, but with wildly different EC numbers. But which one is right? Let's query PDB specifically for these two genes to get a broader set of hits.

6. Query PDB for more hits for our candidate proteins¶

The PDB metadata import app provides a broad scan for hits in PDB for our genome of interest, but by necessity, it can return only limited hits for each individual gene queried. The targeted query app will return more hits for each gene. Now that we have specific gene candidates from our broad query, we can query these genes specifically with our query app.

7. Experiment validation¶

Both of these genes were subsequently tested by completementing in a E coli PyrR KO with each gene. The individual genes failed to show functionality, but when both genes were combined, we see function. This backs up the story also indicated from our structure studies. Now we can add the new annotations officially to our arabidopsis genome.