Compare Two Proteomes App

Not yet updated for Release 3.0

The instructions in this document are for Release 2.0. The December 2016 release looks a bit different, though the overall operation is similar. This document will be updated soon.

Description of tutorial

This tutorial will take you through the steps for creating a Proteome Comparison  data object in KBase’s Narrative Interface using the Compare Two Proteomes  app and browsing the results.

In this tutorial, we will:

  • Select two microbial genomes to compare using the Narrative Interface Data Browser.
  • Add the genomes to our Narrative.
  • Find and insert the Compare Two Proteomes app into our Narrative.
  • Use this app to generate a proteome comparison file that provides information about the relationship between the proteins encoded by the two selected genomes and their synteny.
  • Examine the resulting synteny plot.
  • Describe how the results can be used in other analyses.

Description of the app

This app performs an all-vs-all proteome comparison for a pair of microbial species. The algorithm determines sets of similar proteins based on the Best Unambiguous Sets (BUS) algorithm. As a pre-processing step, BUS eliminates all edges that are less than 80% of the maximum-weight edge, both in amino acid identity and in sequence length. Based on the resulting unambiguous matches, the algorithm builds blocks of conserved gene order (synteny) when neighboring genes in one species have one-to-one matches to neighboring genes in the other species; these blocks of conserved synteny are used to resolve additional ambiguities by preferentially keeping matches within synteny blocks. Finally, the algorithm searches for subsets of genes that are locally optimal, such that all best matches of genes within the group are contained within the group and no genes outside the group have matches within the group. The output of this app is visualized as a dot plot matrix showing pairs of similar proteins determined from the BUS algorithm.

Description of the input

The input of this app is two KBase Genome objects for related microbial species. In KBase, a “Genome” or a “Genome typed object” is a special file type that contains feature calls and annotation data for a genome. There are at least four ways to add genomes to your Narrative:

  1. Use KBase’s uploader to import a GenBank file. Note that the uploader will parse the file and create two output objects in your Narrative: a Contig Set and a Genome containing the original feature calls and annotations.
  2. Import a GenBank file from NCBI via FTP.
  3. Add genomes directly into your Narrative from KBase’s reference data collection.
  4. Work from a Genome object that was created by the Annotate Microbial Contigs app.

This tutorial will demonstrate the third option, adding KBase reference data. For instructions on importing genomes from GenBank, see the Genome section of the Data Upload and Download Guide.

Description of output

The output for this app is a Proteome Comparison object, which is a special file type containing information about the relationship between the proteins and their synteny.

Point and click instructions for using this app

Note: This tutorial assumes that you have already created a new Narrative. For instructions on how to accomplish this and other tasks such as finding or uploading data to your Narrative, please refer to the Narrative Interface User Guide.

Step 1. Add data that you want to analyze

The first step in running this app is to copy or upload the needed input data. In the sidebar running along the left of the Narrative Interface, find the Data Panel near the top. (Note that both the Data Panel and the Apps Panel are accessible under the Analyze tab.)

The Data Panel is where you will eventually see all the data objects that you add to your Narrative. To select a genome from KBase’s reference data collection, click on the Add Data button in the panel. This will open the Data Browser, a slideout window with various importing options.

Choose the Public tab to display a list of publicly available data objects. Genomes are displayed by default, but the data types dropdown menu allows you to search for other types of data as well.

In this example, we will search for and use the “Escherichia coli str. K-12 substr. MG1655” genome with one contig (but feel free to select another genome or E. coli substrain of your choice). Copy this genome into your Narrative by clicking on the Add button that appears when you mouse over it.


Next, search for “Escherichia coli O157:H7” and add this genome as well. Exit the Data Browser by clicking the Close button at the bottom right or the arrow at the top of the Data Panel. (Note that you also can close the Data Browser by clicking anywhere in the main Narrative panel in the center.)

Notice that your Data Panel now contains the two datasets that you added. You’ll also see that the Add Data button has become a red circle with a plus sign.


You can find out more about these genomes by mousing over the records in the Data Panel and clicking the “…” that appears. An expanded view of the data will open with icons allowing you to see a data summary, download the object, explore its provenance, and more. (Please see the Explore Data section of the Narrative Interface User Guide for more information.)

Try this later…
Once you complete this tutorial and are ready to analyze your own genomes, you may want to use genomes that you have annotated with the Annotate Microbial Contigs app.

Step 2. Add and run the app

Now that you have data in your Narrative, you can select a app to analyze it. Locate the Apps Panel directly below your data. Find this app by either scrolling through the list or searching for it at the top of the apps list.

Click on the app name or its icon to add it as a new cell to the main Narrative panel.

Next, fill in the parameters of the app cell. In the first field for Genome 1 ID, select the K-12 genome from the dropdown menu. For Genome 2 ID,  select the O157:H7 genome.  Now provide a name for your output data (the proteome comparison). In this example, we will use “Ecoli_Comparison” as the Output Proteome Comparison ID.

Notice that as you fill in the required parameter fields, the red arrows next to those fields change to green checkmarks. Once all required fields have a green checkmark, the app is ready to run.


Click the Run button at the bottom of the cell to launch the analysis job. A blue box will appear around the cell indicating that the analysis is running. To check the status of your job, click on the Jobs tab near the top left of your Narrative. This analysis should only take a couple of minutes to complete.


Be sure to save your Narrative frequently, using the Save button at the top right of the screen.

Step 3. Look at the output

The graphical output of this app is a synteny plot that will appear below the app cell. The first genome corresponds to the X-axis, and the second genome to the Y-axis. Use the buttons near the top left of the output cell to zoom and navigate the plot.


The IDs for genes represented in the plot are displayed by mousing over a point.


If you click on a plot point (which represents a gene), a two-column table will appear on the right side of the plot. The left column represents the first genome, and the right column represents the second genome. The genes in red correspond to the region of the plot where you clicked. A line linking the left and right columns displays synteny.


The arrows at the top and bottom of the column allow you to walk the plot. If you zoom in, you will see a red box that represents the window size.

Clicking on a gene ID in the column, will open a Data Landing page for that gene in another browser tab. Data Landing pages (which are still in development) provide both known and contextual information about a data object, allowing users to examine various particulars about the data and, eventually, compare it to other data objects.


Further analysis / next steps

The Proteome Comparison object that this app creates can be used as input for the Compare Two Metabolic Models app and the Propagate Genome-scale Model to Close Genome app. Click on their links to access tutorials.