App Catalog
Sign Up Sign In
Classify Microbes with GTDB-Tk - v2.3.2
kb_gtdbtk

v.1.4.0

Launch

Obtain objective taxonomic assignments for bacterial and archaeal genomes based on the Genome Taxonomy Database (GTDB)

Description

GTDB-Tk v2.3.2 is a software toolkit for assigning objective taxonomic classifications to bacterial and archaeal genomes. It is designed to work with recent advances that allow hundreds or thousands of metagenome-assembled genomes (MAGs) to be obtained directly from environmental samples. It can also be applied to isolate and single-cell genomes. If the input is a GenomeSet, the App will optionally overwrite the Taxonomy information in the Genome object(s).

Notes:

As of v2.0.0, the GTDB-Tk Classify workflow by default uses class-level subtrees rather than the entire Bacterial tree. This uses far less memory. An option to use the full tree is provided.

As of v2.0.0, intermediate files in the classification workflow are deleted. An option to keep those intermediate files is provided.

Individual Genome objects must first be placed into a GenomeSet to be used as input. This is to avoid accidentally running the App multiple times, once for each Genome, which is very inefficient. Similarly, individual Assembly objects must be placed into an AssemblySet object. The utility Apps Build GenomeSet or Build AssemblySet can be used for this purpose.


Tool and Data Sources:

GTDB-Tk v2.3.2 is installed from https://github.com/Ecogenomics/GTDBTk

This app is using gtdbtk release 207 data gtdbtk_r207_v2_data.tar.gz and release 214 data gtdbtk_r214_data.tar.gz and accompanying metadata ar53_metadata_r[207/214].tsv and bac120_metadata_r[207/214]. These reference data correspond to the GTDB R07-RS207 and R08-RS214 releases respectively.

The Genome Taxonomy Database (GTDB) is constructed from RefSeq and Genbank genomes, and releases are indexed to RefSeq releases. All genomes are quality controlled using CheckM and those statistics can be found on the GTDB website.


Usage

GTDB-Tk Classify will place your Binned Contigs, Assemblies, or Genomes (from a GenomeSet or a Species Tree) into the GTDB Taxonomy. Additionally, if your queries are Genome objects, the KBase implementation will allow the user to copy the GTDB species representative Genome objects to their Narrative, as well as any members in the Species Tree from the GTDB-Tk classify placement. Options for those copy operations, as well as whether to overwrite any existing taxonomic classification in the query genome objects, are available, as described in the Parameters section below.

In addition to the visual reports (see below) and the GTDB species representative Genome objects, this App has the option of generative KBase Species Tree objects, which can be used in subsequent analyses such as Pangenome Calculations with the mOTUpan App.


Report

GTDB Species Tree Tab

dendrogram / ultrametric tree

Queries are placed into the GTDB Species Tree. While a uniform dendrogram and a evolutionary distance-based branch length version of the tree are generated by the App, the User has the option of the report including the dendrogram or the phylogenetic branch length version of the image. Both versions are available for Download in both bitmap (PNG) and vector (PDF) formats. A key which indicates taxonomic membership of each leaf is provided corresponding to the outer circle on the plot. One tree for any Archaea and typically one tree per Bacterial phylum are included in the output. The Queries are in yellow and proximal lineages found by GTDB-Tk Classify in lavender. The remaining tree is trimmed to include one species representative per branch attached to the primary branches within which the queries and proximal species are found. These genomes are selected following the same weighting as used by GTDB to select species representatives, with an added term with the greatest weight to select the GTDB genus representatives.

phylogenetic tree

As an alternative to the uniform dendrogram version of the tree, a proper phylogenetic tree with branch lengths indicating evolutionary distance is available and can be configured to be what is shown in the report. Regardless, it can be downloaded as the tree without the label "ultrametric".

rectangular tree

Rectangular tree images are also generated and available for download.

bacterial trees

While the Archaea are all placed within a single tree, Bacterial queries are placed within separate trees, typically one phylum per tree. There is also often a "backbone" tree at a higher taxonomic level where queries that don't fall within class trees can be found.

Krona Plot Tab

The classifications will also be available using the Krona visualization.

Bacteria and Archaea Classification Tabs

A table with the classifications is also available in the App report. Bacterial and Archaeal classifications are in separate tabs. Columns for the consensus classification (often only to a higher taxonomic level than species) and ANI assignment are given.

The classification tabs also include tree-placement-based classification, terms used in the determination of the placement, and other notes and the GTDB ID of additional proximal species representative genomes. Please see the GTDB-Tk paper for the full details.

Bacteria and Archaea Marker Gene Tabs

Lastly there are two tabs, one for Bacteria and one for Archaea, with the phylogenetic marker genes used in the taxonomic placement by GTDB-Tk. Often, not all markers are found in a given query, and the presence or absence of a marker is indicated in this table.


Generated objects

If the query objects are Genomes (or refer to Genome objects such as a Species Tree query), then the user has the choice of generating the following output objects from the GTDB-Tk Classify App.

Genome objects

The user has the option of copying the GTDB species representative Genome objects into the current workspace. This can be done for the proximal species representatives as well as the more distal lineages found in the trimmed Species Tree (see above). If this is done, any Genome Sets and Species Tree objects will refer to the local Genome copies. Therefore, subsequent analyses, such as Pangenome calculations, will use these local copies.

Genome Sets

Genome Sets are produced for the proximal species representatives identified by GTDB-Tk Classify. The User may wish to adjust the membership of which genomes are in the Genome Set and can do so with Apps such as Remove Genome from GenomeSet or Trim Species Tree to GenomeSet prior to further analysis.

Species Trees

Species Tree objects can also be produced by GTDB-Tk Classify. One will be generated that contains just the query genomes and the proximal species hits, as well as a Species Tree that includes the greater set of distal genomes from the trimmed tree. There will be one for Archaea and one for each Bacterial phylum.


Downloadable Files

The entire working directory used in the GTDB-Tk Classify calculations and all output are availble for download as a zip archive with the file name "GTDB-Tk_classify_wf.zip". Additionally, several key output files generated by the KBase App version of GTDB-Tk Classify can be downloaded, including the PNG and PDF versions of the trees described above, the newick versions of the proximal and trimmed trees, as well as a file with the GTDB lineage for each query and each member of the species trees. This set of downloadable files is available for the Archaea with the prefix "gtdbtk.ar53.classify" and for Bacteria, one per phylum, with the prefix "gtdbtk.bac120.classify.tree.#" where "#" is just an index for the phyla.


Team members who implemented App in KBase: Dylan Chivian and Paramvir Dehal. For questions, please contact us.

If you use GTDB-Tk in KBase, please cite

Primary References

GTDB-Tk is described in:

The Genome Taxonomy Database (GTDB) is described in:

We also strongly encourage you to cite the following 3rd party dependencies:

Related Publications


App Specification:

https://github.com/kbaseapps/kb_gtdbtk/tree/fe4ea607625541d265c245416f9ec33885d83434/ui/narrative/methods/run_kb_gtdbtk_classify_wf

Module Commit: fe4ea607625541d265c245416f9ec33885d83434