In the previous presentation, described in this narrative, we demonstrated how KBase tools can be applied to discover an entirely new metabolic pathway, and to propose gene candidates for steps in that pathways. We then show how our new KBase-PDB tools allow users to: (1) upload and view AlphaFold structures of all candidate genes; (2) rapidly query PDB for experimental structures corresponding to candidate genes and the entire genome of interest; and (3) query PDB for experimental structures that involve cocrystalization with a compound of interest in our pathway. Later in this session, we'll see how tools in PDB complement this workflow by enabling users to align their alphafold structures against experimental structures of interest in PDB itself.
Now we're going to change gears and explore how these tools can also be applied to study plant genomes as well, again demonstrating excellent synergy with the tools and data available in PDB itself. In this story, we're going to proceed through the following steps:
Now let's go back to 2012 with the publication of this paper rigorously reviewing the state of annotation for B vitamin pathways in plants: Plant B Vitamin Pathways and their Compartmentation: a Guide for the Perplexed
This article highlights a particular metabolic mystery in riboflavin biosynthesis in plants at the time:
Since the above genome version has already been annotated, we can proceed directly to building a metabolic model using the PlantSEED app in KBase. This app constructs a functioning metabolic model of Arabidopsis with full organelles and biomass biosynthesis reactions. The output of this app demonstrates how our metabolic mystery is exposed in KBase. If you search for the function in question, 1.1.1.193, you'll find the function is in the model, but it lacks any associated genes. This indicates that the reaction is gapfilled and the genes for the reaction are unknown.
Now that we have a functioning model, let's run flux balance analysis to see the fluxes through the reactions in this model. I'll review the fluxes below and also show how they can be visualized using our escher app.
This app attempts to find structures available in PDB that are homologous to any genes in our Arabidopsis genome.
Searching for our EC of interest, 1.1.1.193, we find two candidate genes for the missing function: AT3G47390, AT4G20960
Both hit proteins of the same function in microbes, but with wildly different EC numbers. But which one is right? Let's query PDB specifically for these two genes to get a broader set of hits.
The PDB metadata import app provides a broad scan for hits in PDB for our genome of interest, but by necessity, it can return only limited hits for each individual gene queried. The targeted query app will return more hits for each gene. Now that we have specific gene candidates from our broad query, we can query these genes specifically with our query app.
Both of these genes were subsequently tested by completementing in a E coli PyrR KO with each gene. The individual genes failed to show functionality, but when both genes were combined, we see function. This backs up the story also indicated from our structure studies. Now we can add the new annotations officially to our arabidopsis genome.