This tutorial describes how to create a phylogenetic tree of closely related genomes in the KBase Narrative Interface using the Insert Genomes into Species Tree app and then navigate and curate the resulting species tree and genome set.
In this tutorial, you will:
The Insert Genomes into Species Tree app enables a user to determine evolutionary relationships between organisms based on the differences in their genomic sequences by creating both a species tree and a genome set of closely related organisms. A set of reference alignments based on 49 highly conserved Clusters of Orthologous Groups (COG) families is used to find the matching corresponding set of sequences for a specific genome. The sequences from the selected genome are then inserted into the reference alignments, the closest neighbors are extracted and concatenated, and a tree is rendered from them using FastTree2 (an approximate maximum likelihood method). Note that when inserting a genome into a species tree, if that genome is contained within the reference set of alignments used to build the tree, the tree building algorithm will duplicate that genome within the tree and the genome set generated by this app.
The 49 COG domains used by this app are:
For more information, please see the details page for this app.
This app takes one or more “Genomes” as input. In KBase, a “Genome” or “Genome typed object” is a special object type that contains the feature calls and annotation data for a genome. You can load genome data into KBase for analysis in a number of ways:
This tutorial will take you through the steps for running the Insert Genomes into Species Tree app using example data from KBase’s reference data collection.
Once you’re ready to upload your own data, see the section of the Genome Data Upload and Download Guide for instructions on uploading a genome from GenBank.
The output of this app is a tree of related organisms.
Note: This tutorial assumes that you have already created a new Narrative. For instructions on how to accomplish this and other tasks such as finding or uploading data to your Narrative, please refer to the Narrative Interface User Guide.
Step 1. Add data that you want to analyze
Before we run the app, we need to copy or upload the needed input data. For the point and click instructions, we will start by copying an annotated genome into our Narrative from the KBase reference data collection.
First, click the Add Data (or “+”) button in the Data Panel on the left of your screen. (If you don’t see this button, make sure you have the Analyze tab selected.) The Data Browser will slide out, with tabs that show several data sources. Choose the Public tab to see a list of publicly available KBase reference data. Genomes are displayed by default, but the data types dropdown menu allows you to search for other types of data as well.
With Genomes selected, search for “Escherichia coli str. K-12 substr. MG1655.” Add the genome to your Narrative by mousing over it and then clicking the Add button that appears to its left. (Here, we will use the MG1655 substrain with 4520 genes but feel free to use another substrain or genome if you choose.)
Exit the Data Browser by clicking either the Close button at the bottom right of the browser window or the arrow at the top of the Data Panel. (Note that you also can close the Data Browser by clicking anywhere in the main Narrative panel in the center.)
Notice that your Data Panel now displays the annotated genome you added:
You can find out more about this genome by clicking the “…” that appears when mousing over the object in the Data Panel or by dragging it into the main Narrative panel to create a Genome Viewer cell. Please see the Explore Data section of the Narrative Interface User Guide for more information.
Step 2. Add and run the app
Now that you have your input data, you can add the Insert Genomes into Species Tree app to your Narrative. Take a closer look at the Apps Panel directly below your data.
You can search for apps using the search box at the top of the Apps Panel or just scroll until you find the one you want. Locate the Insert Genomes into Species Tree app and click on its name or icon to add it as a new cell in the main Narrative panel.
To run the app on the sample genome you copied, you must first fill out the fields in each step in the app cell. In the first field (Genome), select the newly added E. coli genome. Next, we must specify the number of neighboring genomes for the tree, choosing 10 in this example for simplicity. Now provide a name for the tree that will be generated. Here, we will use “E_coli_tree.”
Finally, click the green Run button at the top right of the app cell to launch the analysis job.
This app typically takes about 3 to 20 minutes to run, depending on how many other jobs are queued or running.
When the job finishes, you will see an output cell below the app. Also notice your Data Panel. It now contains the Tree object.
Examine the output cell containing the species tree of the E. coli K-12 genome and its 10 closest relatives. If you click on an internal parent node, all of its children will collapse and the node will turn green. (Note that this may reconfigure the tree topology.) To display the children again, click the green node. Clicking on a terminal node will bring up (in a new browser tab) a page of information about the corresponding genome. (Note that Data Landing pages, such as this one, are still in development.)
If you click on the green Change layout button in the top right corner of the cell (you may need to scroll to see it), the tree layout will switch to a circular format.
Step 4. Download the results
Download options for the data generated by this app are still in development. We hope to make this capability available soon.
The Insert Genomes into Species Tree app allows users to rapidly create trees for genomes in KBase’s reference data collection without having to select the close relatives or generate the alignment and tree themselves. This capability enables users to easily and rapidly assess speciation events and to quickly select genomes for further analyses.
In this use case, we will generate a tree for the Mycoplasma, which are a group of obligate intracellular bacterial pathogens that lack a cell wall. The Mycoplasma represent an interesting use case. According to the NCBI Taxonomy Browser, they belong to a phylum-level clade called the Tenericutes because they lack a cell wall (the term “Tenericute” was coined in the early 1980s to mean soft cuticle). We will assess the evolutionary history of the Mycoplasma using the Insert Genomes into Species Tree app.
First, create a new Narrative (or continue working with the one you already created). Using the Data Browser, add a Mycoplasma capricolum genome from KBase’s public reference data.
Next, add the Insert Genomes into Species Tree app to your Narrative, and select the Mycoplasma capricolum genome from the dropdown menu for the Genome field. We will use 100 neighbors and call the resulting tree “Mycoplasma_tree.” Name the genome set “Mycoplasma_set.”
Click Run to start the analysis, which may take up to 20 minutes to run.
When the job completes, you will see a very large tree of Mycoplasma strains and their associated relatives.
If you collapse some shorter branches, you will notice that the Mycoplasma and other wall-less organisms (the ones with “plasma” in their names) share a branch with Lactobacillus, a low G+C Gram positive bacterium that has a cell wall. Thus the tree is suggesting that the ancestor of the Mycoplasma was an organism with a cell wall, and that during their evolutionary history, the Mycoplasmas became host-associated and the cell wall was ultimately lost.
The Genome Set object generated by this app can be used as input for the Compare Genomes from Pangenome app. We encourage you to take a moment to familiarize yourself with the tutorial on building a pangenome and as an exercise, see if you can curate the genome set and build a pangenome in the Narrative Interface to identify genes involved in cell wall biosynthesis.