Annotate Plant Coding Sequences with Metabolic Functions App

Not yet updated for Release 3.0

The instructions in this document are for Release 2.0. The December 2016 release looks a bit different, though the overall operation is similar. This document will be updated soon.

Description of tutorial

This tutorial will guide you through the steps needed to run the Annotate Plant Coding Sequences with Metabolic Functions app in the KBase Narrative Interface.

In this tutorial, we will:

  • Import plant coding sequences or find a genome to annotate using the Narrative Interface Data Browser.
  • Add the genome to our Narrative.
  • Find and insert the Annotate Plant Coding Sequences with Metabolic Functions app into our Narrative.
  • Examine the annotated genome.

Description of the app

The Annotate Plant Coding Sequences with Metabolic Functions app performs functional annotation of plant cDNA sequences. The input FASTA file should contain cDNA sequences obtained from assembling coding sequences from next-generation sequencing of a transcriptome. (Support for protein data types will be available soon.) This app assigns metabolic functions to the sequences. These functions are derived from the PlantSEED Subsystems ontology (see the “Supplementary information” section at the end of the tutorial for sources of biochemistry integrated into PlantSEED). The resulting annotated genome can be viewed and subsequently used as input to other KBase apps such as Build Metabolic ModelThe genome can also be downloaded in several formats.

For more information, please see the app details page.

Description of the input

This app starts with using transcript sequences (cDNA sequences) either in FASTA format or stored as a Genome object in KBase.

In KBase, there are a number of ways to obtain genome data to analyze:

  1. Upload your own coding sequences data as a FASTA file from your local machine.
  2. Use example data available in KBase from the slideout Data Browser.
  3. Use the Public tab of the Data Browser to select “Plant Genomes” and search for a genome of interest.
  4. Use a plant genome that you have already used in another Narrative or that another user has shared with you.

In this tutorial, we will use example data available from KBase’s reference data collection.

Description of the output

The output of this app is a new “Genome” object in which sequences have been newly annotated with metabolic functions.

Point and click instructions for using this app

Note: This tutorial assumes that you have already created a new Narrative. For instructions on how to accomplish this and other tasks such as finding or uploading data to your Narrative, please refer to the Narrative Interface User Guide.

Step 1. Add data that you want to analyze

Here, we will demonstrate the Annotate Plant Coding Sequences with Metabolic Functions app by adding a genome from example data in KBase.

First, click the Add Data (or the “+”) button in the Data Panel on the left of your screen. (If you don’t see this button, make sure you have the Analyze tab selected.) The Data Browser will slide out, with tabs that show several data sources.

Under the Example tab, scroll to find the heading called “Example Genomes.” Add the Sbicolor.JGI-v2.1 genome to your Narrative by hovering over it and clicking the Add button that appears to its left.


Once the genome is added, exit the Data Browser by clicking either the Close button at the bottom right of the browser window or the arrow at the top of the Data Panel. (Note that you also can close the Data Browser by clicking anywhere in the main Narrative panel in the center.)

Notice that your Data Panel now shows the genome that you added.

You can find out more about this data object by mousing over the record in the Data Panel and clicking on the “…” that appears. An expanded view of the data will open:


The icons in this view let you see a data summary, download the object, see its provenance, and more. (Please see the Explore Data section of the Narrative Interface User Guide for more information.)

For now, we will examine this genome by dragging it from the Data Panel and dropping it onto the main Narrative panel. You will see something like this:


Step 2. Add and run the app

Now that you have your input data, you can add the Annotate Plant Coding Sequences with Metabolic Functions app to your Narrative. Take a closer look at the Apps Panel directly below your data.

You can search for apps using the search box at the top of the panel, or just scroll until you find the one you want. Locate the Annotate Plant Coding Sequences with Metabolic Functions app in the list. Click on its name or icon to add it to the main Narrative panel.


To annotate our example sorghum genome, we must fill out the fields in the app cell. The parameters for the app are described in-depth in the app details page found here.

For the first input field (Plant Coding Sequences), select Sbicolor.JGI-v2.1 from the dropdown menu. Next, specify the name of the output genome object. Here, we will use “Sbicolor.JGI-v2.1_annotate.” This genome object will include the annotation data and be in a format compatible with other KBase apps.

Notice that as you fill in the required parameter fields, the red arrows next to those fields change to green checkmarks. Once all required fields have a green checkmark, the app is ready to run.


Click the Run button at the bottom of the cell to launch the analysis. Depending on the queue size, this job should take approximately 5-10 minutes. You can check your job status by clicking on the Jobs tab near the top left of the Narrative.


Step 3. Look at the output

Once the job is complete, the annotated Genome object will appear in the Data Panel under the Analyze tab. An output cell will also appear under the original app in the main Narrative panel. This output cell has three tabs labeled Overview, Contigs, and Genes that allow you to browse the genomic data.

If, for example, you would like to see how many genes were annotated with metabolic functions, search for “EC” (for Enzyme Commission), which provides 5284 entries filtered from 39,441 total entries. This indicates that ~13.4% new annotations based on metabolic functions were added to this genome.


You can also see how many genes are annotated with a specific enzyme. For example, searching for phenylalanine will reveal nine genes that are annotated with Phenylalanine ammonia-lyase (EC4.3.1.24).


Similarly, if you would like to search for enzyme 5-O-(4-coumaroyl)-D-quinate/shikimate 3′-hydroxylase that participates in phenylpropanoid biosynthesis, you will find four genes that are annotated with this enzyme in the annotated sorghum genome.


Step 4. Download the results

To download your annotated genome, locate it in your Data Panel. Open the expanded view of the Genome object by clicking on the “. . .” or white space surrounding the object name (but not on the name itself). Use the Export/Download data icon to download the data in a format of your choice. For genomes, the options are GenBank or JSON formats.


Step 4. Save and share your Narrative

Save your Narrative by clicking on the ‘save’ icon in the top right corner of the screen. Once it is saved, click on the ‘share’ button above to let others view your analysis.

Biological example / use case

Due to an increasing number of sequenced plant genomes, it is highly important to have new apps that can be applied to analyze, annotate, and model these genomes. To make the gene annotations consistent, PlantSEED supports subsystems-based annotation that will be used for efficient metabolic model reconstruction of plant genomes. The Annotate Plant Coding Sequences with Metabolic Functions app has been designed to map modeling-compatible annotations onto a genome that has been taken from Gramene, Phytozome, or your own coding sequences.

By using the point and click instructions above, you have already successfully created an input object that can be used in the Build Metabolic Model app.  Alternatively, you can use the Build Plant Metabolic Model app that includes the Annotate Plant Coding Sequences with Metabolic Functions app as the first step.

For this biological use case, we have used publicly available sorghum transcriptome data that is annotated with metabolic functions using PlantSEED models and further used it to generate metabolic models specific to different tissues [1]. This data was obtained using next-generation sequencing (NGS) technology to examine the differential expression of sorghum genes in response to polyethylene glycol (PEG) and exogenous abscisic acid (ABA) in roots and shoots [1].

Narrative for this biological use case

We have created a public Narrative demonstrating this use case. You can access it at SorghumTranscriptomeModelsBasedOnOsmoticStress.

Note that you must be signed in to your KBase account to access the Narrative. It will open in view-only mode, allowing you to see the results but not run the analyses yourself. If you want to run the analyses, copy the Narrative to your account using instructions in the Access and Copy Narratives section of the Narrative Interface User Guide. You also can create your own Narrative using sorghum transcriptome data in response to ABA.

Further analysis

The resulting annotated genome can be used as an input to other KBase apps such as Build Metabolic Model. If you wish to proceed to modeling a plant genome or transcriptome, we recommend reading the tutorial for the Build Plant Metabolic Model app.


  1. Dugas DV, Monaco MK, Olson A, Klein RR, Kumari S, Ware D, Klein PE: Functional annotation of the transcriptome of Sorghum bicolor in response to osmotic stress and abscisic acid. BMC Genomics 2011, 12(1):514.

Supplementary information

See a table listing the biochemical sources integrated into PlantSEED here.