This tutorial will guide you through the steps needed to run the Annotate Microbial Genome app in the KBase Narrative Interface to reannotate a bacterial or archaeal genome.
In this tutorial, you will:
This app uses components from RASTtk (Rapid Annotations using Subsystems Technology toolkit) to annotate a prokaryotic genome, update genome annotations, or perform computations on a set of genomes so that they are consistent. In KBase, all Genome-typed objects include feature calls and annotation data. This app allows users to reannotate annotated genomes to make the annotations consistent with other KBase genomes and to prepare the imported genome for further processing by other KBase apps.
A Genome typed object can be generated by uploading a GenBank file, importing a GenBank file from NCBI via FTP, retrieving a Genome typed object from KBase, or using the output of the Annotate Microbial Contigs app.
Unlike the Annotate Microbial Contigs app, this app does not require metadata fields because the metadata are assumed to be included in the uploaded GenBank file or original Genome typed object in KBase. The output of this app is a Genome typed object with the new annotations.
For more information, please see the app details page.
This app takes a “Genome” as input. In KBase, a “Genome” or “Genome typed object” is an object type that contains a genome’s feature calls and annotation data. A genome can be loaded into your Narrative several ways:
This tutorial will take you through the steps for running the Annotate Microbial Genome app using example data from KBase’s reference data collection.
Once you are ready to upload your own data, see the Genome section of the Data Upload and Download Guide
The output of this app is a new Genome object with updated annotations. Please note that your results might be slightly different than what is shown at present in this tutorial because of improvements in the annotation pipeline and its underlying databases.
The current default parameters for this app reflect our main use case, which is for users who wish to map KBase annotations onto a genome for which the genes and other genomic features have already been called. Therefore, the scripts that are turned on by default are those that have no impact on the calling of features.
Once you have added the app cell to your Narrative, you can use the Advanced options to turn on any of the scripts that have been turned off by default. The app will find the contigs either in your Narrative or in KBase and use them to call the selected genomic features de novo. If you are re-calling features from a GenBank file that you have uploaded, the app will use the contig set that was created upon upload (so don’t delete that object). Since the default behavior of the app is to inherit the old feature calls, if you decide to re-call the CDS or RNA features, we recommend that you also turn on the Resolve overlapping features option. This will prevent the generation of duplicate feature calls for CDS and RNA features in your new genome object.
For a more detailed description of the behavior of each script in the Advanced options, please refer to the details page for this app. (Note: This page can also be accessed in the Narrative Interface by hovering over the app name in the Apps Panel, clicking the “…” to show expanded information, and then clicking “more.”)
Note: This tutorial assumes that you have already created a new Narrative. For instructions on how to accomplish this and other tasks such as finding or uploading data to your Narrative, please refer to the Narrative Interface User Guide.
Step 1. Add data that you want to analyze
The first step in running this app is to copy or upload the needed input data to your Narrative. We will demonstrate the Annotate Microbial Genome app by copying a genome from KBase’s public reference data.
First, click the Add Data (or “+”) button, which can be found under the Analyze tab in the left panel of your screen. The Data Browser panel will slide out with tabs that show several data sources. Choose the Public tab to display a list of publicly available data objects. Genomes are displayed by default, but the data types dropdown menu allows you to search for other types of data as well.
In this example, we will search for and use the “Escherichia coli str. K-12 substr. MG1655” genome (but feel free to select another genome or E. coli substrain of your choice). Note that you can enter the entire name as a search string but also just search for “K-12” to quickly locate the genome in the list:
Hover your cursor over the data object called “Escherichia coli str. K-12 substr. MG1655” and click the Add button that appears to its left to add it to your Narrative.
Exit the Data Browser by clicking the Close button at the bottom right or the arrow at the top of the Data Panel. (Note that you also can close the Data Browser by clicking anywhere in the main Narrative panel in the center.)
Notice that your Data Panel now shows the genome that you have added:
You can find out more about this data object by mousing over the record in the Data Panel and clicking the “…” that appears. An expanded view of the data will open:
The icons in this view let you see a data summary, download the object, see its provenance, and more. (Please see the Explore Data section of the Narrative Interface User Guide for more information.)
For now, we’ll examine this genome by dragging it from the Data Panel and dropping it into the main Narrative panel to open a genome viewer:
Within the Contigs and Genes sections of the genome viewer, you can sort the table entries by clicking on a column header to sort by that field (e.g., Length). Clicking the same column header again will reverse the sort order. You can even sort by more than one column at a time by clicking one column header and then Shift-clicking other column headers.
Step 2. Add and run the app
Now that you have your input data, you can add the Annotate Microbial Genome app to your Narrative. Take a closer look at the Apps Panel directly below your data.
You can search for apps using the search box at the top of the panel, or just scroll until you find the one you want. Locate the Annotate Microbial Genome app in the list. Click on its name or icon to add it to the main Narrative panel.
To reannotate the example E. coli genome in your Narrative, you must first fill out the fields in the app cell. For details on the parameters for this app, see its details page. Some of the key parameters are discussed below.
For the first field, Genome, select “Escherichia_coli_str._K-12_substr._MG1655” from the dropdown menu. Next, provide an output name for the newly re-annotated genome. Here, we will use “Escherichia_coli_str._K-12_substr._MG1655_v2.” Note that if you use the same name for the input and output, it will create a new version of the Genome object with the same name.
When all fields have been filled out, a green checkmark will appear beside each one indicating that the app is ready to run.
Clicking on Run will launch the job to reannotate the genome. A blue box will appear around the cell, and a message at the bottom will indicate the job was submitted.
Depending on the queue size (how many other calculations have been requested by users recently), this job should take approximately 1 to 3 minutes. You can check your job status by selecting the Jobs tab near the top left of the Narrative Interface.
If the job finishes successfully, a new Genome object will be added to your Data Panel and an output cell will appear under the app cell in your Narrative.
Step 3. Look at the output
Once your annotation request has completed, you can peruse the output or download the annotated genome to your computer in several common formats. We suggest that you begin by exploring the genome from within the Narrative Interface. If your Data Panel isn’t visible, click on the Analyze tab.
You can browse your annotated genome in three ways:
First, find the Annotate Microbial Genome output cell in the main Narrative panel. This table contains three tabs for reviewing the genomic data: Overview (shown below), Contigs, and Genes. Contigs lists the large assembled pieces of the genome (sometimes just one contig for the whole genome), along with the number of genes in the contigs. The Genes tab lets you browse all the annotated genes.
To see more details about an entry under the Contigs and Genes tabs, click on the entry to open an expanded view of it in another tab:
You can sort table entries under these tabs by clicking on a column header to sort by that field (e.g., Length). Clicking the same column header again will reverse the sort order. You can even sort more than one column simultaneously by clicking one column header and then Shift-clicking on others.
The second way to examine your newly annotated genome is to look at it in your Data Panel where it will appear once the analysis job finishes.
From the Data Panel, you can click on the name of the annotated genome object to open a viewer for the genome. This viewer, which contains the same content as your output cell, can also be opened using the drag-and-drop technique noted earlier in this tutorial:
Third, if you mouse over the genome object in your Data Panel and click on the “…” or the white space surrounding its name, information about the object and the app that created it will be displayed. Clicking on the “Explore Data” binocular icon in this expanded view will open a Data Landing page, allowing you to peruse the data or launch further analyses. (This functionality is still in development.)
Now that we have a genome with standard KBase annotations, we can use it as input for more analyses. Click on the new genome in the Data Panel to see the apps that can be used on it. Two might be of particular interest: the Insert Genomes into Species Tree app, which allows you to see closely related organisms (though obvious in this case), and the Reconstruct Genome-scale Metabolic Model app to infer a metabolic model from the genome and find missing reactions necessary for growth by using gapfilling algorithms. Please see the “Further Analysis” section at the end of this tutorial for more information.
Step 4. Save your Narrative
After you are satisfied with any analysis, save your results by clicking on the save icon in the top right corner of the screen.
Step 5. Download the results
Remember that you can download an object from your Data Panel by opening the expanded view of the object and clicking on the Export/Download data icon. For a genome, this will reveal options for exporting the object in either GenBank or JSON format. (Be aware that the download functionality is still in development and may not yet work as expected.)
The automated metabolic model reconstructions that can be performed in KBase require a standard nomenclature to map between proteins and their associated metabolic reactions. The infrastructure is built upon the annotations per se rather than the sequences themselves to help differentiate between paralogs that may be erroneously identified through using protein similarity-based searching techniques. This approach usually causes confusion and considerable inconvenience to modelers, but we have found that it results in more accurate initial models. Thus, the Annotate Microbial Genome app was designed to map modeling-compatible annotations onto a genome that you have already curated.
By using the point-and-click instructions above, you have successfully created an input object that can be used in the reconstruction of a metabolic model. The steps for creating your own modeling-compatible genome are identical, except that you would most likely be uploading a GenBank file that you have created or importing one from NCBI. (Please note: the uploader is still in early development and may not yet work correctly.)
By clicking on the following link you can download the E. coli K-12 GenBank file from NCBI to your computer:
You can also use the link to add the genome via the Data Browser’s Import tab. The steps below describe both ways.
To import the GenBank file, click on the Add Data or “+” button and then select the Import tab. Now choose Genome from the dropdown menu and click Next.
The next window will have two tabs at the top: Upload GenBank File and Import from FTP. If uploading a file from your computer, simply select the file and choose a name for the Genome object that will be created.
If importing from FTP, copy the FTP link (ftp://ftp.ncbi.nlm.nih.gov/genomes/Bacteria/Escherichia_coli_K_12_substr__MG1655_uid57779/NC_000913.gbk) into the Name of FTP file field and choose a name for the genome. Finally, provide a name for the Contig-Set and click the Import button.
Once your genome is imported, you can repeat the steps from the point-and-click instructions above to annotate it.
If you are interested in building a model with your annotated microbial genome, we recommend reading the tutorial on the Reconstruct Genome-scale Metabolic Model app for step-by-step instructions. You may also find that you want a set of uniform genome annotations for evolutionary analyses. If so, check out the app tutorials for Compare Genomes from Pangenome and Insert Genomes into Species Tree.