Annotate Microbial Contigs App

Not yet updated for Release 3.0

The instructions in this document are for Release 2.0. The December 2016 release looks a bit different, though the overall operation is similar. This document will be updated soon.

Description of tutorial

This tutorial will guide you through the steps needed to run the Annotate Microbial Contigs app in the KBase Narrative Interface.

In this tutorial, we will:

  • Use the Data Browser to select a  contig set from KBase’s reference data collection and add it to our Narrative.
  • Find and insert the Annotate Microbial Contigs app.
  • Use this app to annotate our  genome.
  • Examine the resulting annotation and describe how to download the output.
  • Demonstrate another example in which we import a genome from NCBI and find prophage elements within it using the Annotate Microbial Contigs app.
  • Describe how the output of this app can be used in further KBase analyses.

Description of the app

This app uses components from the RAST (Rapid Annotations using Subsystems Technology) toolkit to annotate an assembled bacterial or archaeal genome. The required input is a contig set in FASTA format. For more information, see the details page for this app.

Description of the input

The Annotate Microbial Contigs app reads a contig set as input. Contigs can be loaded into your Narrative several ways:

  1. Upload a set of contigs in FASTA format
  2. Upload a GenBank file — Note that KBase’s uploader will parse the GenBank file and create two outputs in your Narrative: a contig set and a genome object containing the original feature calls and annotations. (For the purpose of this tutorial, we will ignore the genome object that is created.)
  3. Import contigs directly from NCBI via FTP — This app converts the FASTA file of contigs (the “contig set”) into a Genome typed object, which is called a “genome” in your Data Panel.
  4. Generate a contig set using the Assemble Contigs from Reads app

For instructions on uploading contigs using the first three ways, see the Contigs section of the Data Upload and Download Guide.

In the point and click portion of this tutorial, we will use example data available in KBase. For the biological use case, we will import contigs from GenBank.

Description of the output

The output of this app is a “Genome.” In KBase, a “Genome” or “Genome typed object” is a special data type that contains the feature calls and annotation data for a genome.

Point and click instructions for using this app

Note: This tutorial assumes that you have already created a new Narrative. For instructions on how to accomplish this and other tasks such as finding or uploading data to your Narrative, please refer to the Narrative Interface User Guide.

Step 1. Add data that you want to analyze

Before running the app, you need to copy or upload the needed input data. We will demonstrate the Annotate Microbial Contigs app by adding and annotating a set of example contigs available in KBase’s reference data collection.

To add the example data, click the Add Data (or the red “+”) button in the Data Panel on the left of your screen. (If you don’t see this button, make sure you have the Analyze tab selected.) The Data Browser will slide out, with tabs that show several data sources. Choose the Example tab and find the heading called Example Contig Sets.

Locate the example contig set called “Rhodobacter_CACIA_14H1_contigs” and add it to your Narrative by mousing over it and clicking the Add button that appears at its left.

AnnotateMicrobialContigs13

Exit the Data Browser by clicking the Close button at the bottom right of the browser window or the arrow at the top of the Data Panel. (Note that you also can close the Data Browser by clicking anywhere in the main Narrative panel in the center.)

Notice that your Data Panel at left now shows the contig set that you just added to your Narrative.

AnnotateMicrobialContigs17

You can find out more about this data object by hovering over it and clicking  the “…” that appears. An expanded view of the data will open.

AnnotateMicrobialContigs10

The icons in this view let you see a data summary, download the object, see its provenance, and more. (Please see the Explore Data section of the Narrative Interface User Guide for more information.)

AssembleAnnotate08

Be sure to save your Narrative frequently, using the Save button at the top right of the screen.

Step 2. Add and run the app

Now that you have your input data, you can add the Annotate Microbial Contigs app to your Narrative. Take a closer look at the Apps Panel directly below your data.

You can search for apps using the search box at the top of the panel, or just scroll until you find the one you want. Locate the Annotate Microbial Contigs app in the list. Click on its name or icon to add it to the main Narrative panel.

To annotate our example Rhodobacter genome, we must fill out the fields in the app before running it. The app details page provides an in-depth description of all the parameters for this app. For the first field (Contig Set), select the Rhodobacter_CACIA_14H1_contigs object that you copied to your Narrative.

Next, specify the Scientific Name of the organism you are working on. This field has no restrictions, but we strongly suggest that you enter a meaningful genus, species, and strain name when possible because the name can affect which programs are run. We will use “Rhodobacter str. CACIA_14H1” in this example.

AnnotateMicrobialContigs03

Since Rhodobacter is a bacterium, we will leave the Domain field set to B (Bacteria).  Note that this app should be used to annotate only bacterial and archaeal genomes.  More apps to support annotation of eukaryotic and viral genomes are coming soon.

Next, you need to specify a numeric value representing the genetic code. The default is 11, which is the appropriate value for all but a handful of bacterial genomes (e.g., for Mycoplasmas, use code 4). The genetic code is a required field because the gene calling algorithms need to be able to find the appropriate stop codons. If you have any questions as to what value to use, we suggest visiting the NCBI taxonomy browser.

Finally, we need to choose a name for the output genome object. This will be created by the app and will include feature calls and annotation data. This output object will be in the appropriate format to be used by other apps (see the “Further Analysis” section of this tutorial). We will name our output object “Rhodobacter_CACIA_14H1_genome.”

App Quick Tip

Understanding the default behavior and advanced options

The Annotate Microbial Contigs app runs a default annotation pipeline that works well in most cases. However, for more specialized tasks, you can customize the annotation pipeline to suit your needs. Clicking on the Advanced options link at the bottom of the app cell will display a series of checkboxes that let you enable or disable certain options.  For more information on each option, please refer to the app details page.

AnnotateMicrobialContigs04

Notice that as you fill in the required parameter fields, the red arrows next to those fields change to green checkmarks. Once all required fields have a green checkmark, the app is ready to run.

Click the Run button at the bottom of the cell to launch the annotation job. A blue box will appear around the cell, and a message at the bottom will indicate the job was submitted. Depending on the queue size (how many other calculations have been requested by users recently), this job should take about 5 minutes.

AnnotateMicrobialContigs08

You can check your job status by selecting the Jobs tab near the top left of the Narrative Interface. The display is updated every few seconds, so you can easily see where your annotation request is at in the queue and when it is running or complete.

Step 3. Look at the output

Once your annotation job has finished, you can peruse the output or download the annotated genome to your computer in several common formats. We suggest that you begin by exploring the genome from within the Narrative Interface. You can browse your annotated genome three ways:

First, after the app has run, an output cell appears under the app cell in the main Narrative panel. The output cell features three tabs for reviewing the data: Overview, Contigs, and Genes. You can sort table entries under these tabs by clicking on a column header to sort by that field (e.g., Length). Clicking the same column header again will reverse the sort order. You can even sort by more than one column simultaneously by clicking one column header and then Shift-clicking on others.

To see more details about an entry under the Contigs and Genes tabs, click on the entry to open an expanded view of it in another tab:

AnnotateMicrobialContigs06

The second way to examine your newly annotated genome is to look at it in your Data Panel where it will appear once the analysis job finishes.

AnnotateMicrobialContigs21

From the Data Panel, you can click on the name of the annotated genome object to open a viewer for the genome. This viewer, which contains the same content as your output cell, can also be opened by dragging the genome from your Data Panel and dropping into the main Narrative:

AnnotateMicrobialContigs20

Within the Contigs and Genes sections of the genome viewer, you can sort the table entries by clicking on a column header to sort by that field (e.g., Length). Clicking the same column header again will reverse the sort order. You can even sort by more than one column at a time by clicking one column header and then Shift-clicking other column headers.

Third, if you mouse over the genome object in your Data Panel and click on the “…” or the white space surrounding its name, information about the object and the app that created it will be displayed. Clicking on the “Explore Data” binocular icon in this expanded view will open a Data Landing page, allowing you to peruse the data or launch further analyses. (This functionality is still in development.)

AnnotateMicrobialContigs15

App Quick Tip

Comparing your genome to a KEGG or NCBI genome using genome comparison tools

A common request from users is to see locus IDs or other gene IDs in the genome output in KBase. This is a capability we are working on, but in the meantime, we suggest the following approaches to make locus tags or other gene IDs available within your Narrative.

1.) Import a Genbank file into your Narrative with the IDs you wish to use. Gene IDs are kept intact during the import process.

2.) Either use your imported genome directly, or if that is not possible, use the “Compare Two Proteomes” app to map the genes in your genome to the genes in the imported genome, creating a mapping between your gene IDs and the gene IDs you wish to see.

AssembleAnnotate08

Be sure to save your Narrative frequently, using the Save button at the top right of the screen.

Step 4. Download the results

Remember that you can download an object from your Data Panel by opening the expanded view of the object and clicking on the Export/Download data icon. For a genome, this will reveal options for exporting the object in either GenBank or JSON format. (Be aware that the download functionality is still in development and may not yet work as expected.)

AnnotateMicrobialContigs19

Biological example / use case

Now that you are familiar with the steps in the Annotate Microbial Contigs app, we will show you how to import a dataset from NCBI and find prophage elements within the genome using this app.

First, create a new Narrative by selecting the Narratives tab at the top of the left sidebar and then clicking the “+ New Narrative” button. (Once your Narrative opens, you can name it immediately by clicking on “Untitled” at the top of the screen or wait until you save it.)

Next, click the Add Data button to open the Data Browser slideout. Select the Import tab and choose Genome from the dropdown menu. (Here, a “genome” actually refers to a GenBank file. ) Click Next.

AnnotateMicrobialContigs14

In a new window, notice there are two tabs at the top: Upload GenBank File and Import From FTP.  We will import a GenBank file from FTP, so select the second tab. Enter the following link into the FTP File field:

ftp://ftp.ncbi.nlm.nih.gov/genomes/Bacteria/Escherichia_coli_O157_H7_EDL933_uid57831/NC_002655.gbk

This is the GenBank file for the E. coli O157:H7 EDL933 chromosome, so in the Genome Object ID field, enter “O157H7_chrm_gbk.” Now click Import.

AnnotateMicrobialContigs07

Note:  If you have trouble using the FTP importer, you can also click on the FTP link above, download the GenBank file to your computer, and then upload it using the Upload GenBank File tab.

Once complete, the import job will generate two data objects in your Narrative: a Contig Set object (this is the contigs) and a Genome object containing the original NCBI feature data. Close the Data Browser and note that the objects now appear in your Data Panel at left.

AnnotateMicrobialContigs05

In this example, we will annotate the contigs from scratch. If you wish to retain the original gene calls and IDs in a future analysis, use the Annotate Microbial Genome app instead of this one.

Now that you have your data, find Annotate Microbial Contigs in the list of apps and click on its name to add it to your Narrative.

In the Contig Set field, select O157H7_chrm_gbk.contigset, which you just imported. For Scientific Name, enter “Escherichia coli O157:H7 EDL933.” The defaults for domain and genetic code are fine for this analysis; we don’t need to change those. Finally, choose a name for the output genome object. Here, we will use “EDL933_with_PhiSpy.”

Before running the app, click the show advanced options link at the bottom of the app cell to expose the advanced options.  Check the box for Find prophage elements with phispy. PhiSpy is program that runs several different heuristic searches to find prophage elements within a genome.

AnnotateMicrobialContigs09

Now you are ready to launch the analysis. Click Run at the bottom of the app cell.

Because we have chosen to search for prophage elements with PhiSpy, this annotation job will take longer than the previous example we demonstrated. Depending on the queue size, the job should complete in 10-15 minutes.

Once the output appears in your Narrative, you can click on the Genes tab to see each feature that was annotated in the genome. The Type column in the table displays each feature by type: RNA, CDS, prophage, etc.

AnnotateMicrobialContigs16

Search for the prophage elements that PhiSpy found by typing  “prophage” into the Search gene box. Then double-click on the Type column heading to display the features in reverse alphabetical order. You should be able to see 16 prophage elements if you also look on the second page of the output display.

AnnotateMicrobialContigs11

In the future, you will be able to download the prophage elements in FASTA format.

Further analysis

The genome object generated in this tutorial is the starting point for many apps in the KBase Narrative Interface. For instance, if you are interested in using your newly annotated genome for metabolic modeling, we suggest reading the tutorial for the Reconstruct a Genome-scale Metabolic Model app. If you would like to use your genome to begin an evolutionary study, check out tutorials on Insert Genome into Species Tree or Compare Genomes from Pangenome.

Other app tutorials

Find more app tutorials here!

Contact us

If you have questions, you can contact us.