App Catalog
Sign Up Sign In
Classify Taxonomy of Metagenomic Reads with Kaiju - v1.9.0
kb_kaiju

v.1.3.4

Launch

Allows users to perform taxonomic classification of shotgun metagenomic read data with Kaiju.

This App makes the tool Kaiju: Fast and sensitive taxonomic classification for metagenomics available through KBase. Kaiju is written by Peter Menzel and Anders Krogh at the Bioinformatics Centre, a part of the Section for Computational and RNA Biology at the University of Copenhagen.

From the Kaiju homepage:

Kaiju is a program for sensitive taxonomic classification of high-throughput sequencing reads from metagenomic whole genome sequencing or metatranscriptomics experiments.

Each sequencing read is assigned to a taxon in the NCBI taxonomy by comparing it to a reference database containing microbial and viral protein sequences. By using protein-level classification, Kaiju achieves a higher sensitivity compared with methods based on nucleotide comparison.

Kaiju can use either the set of available complete genomes from NCBI RefSeq or the microbial subset of the NCBI BLAST non-redundant protein database nr, optionally also including fungi and microbial eukaryotes.

Reads are translated into amino acid sequences, which are then searched in the database using a modified backward search on a memory-efficient implementation of the Burrows-Wheeler transform, which finds maximum exact matches (MEMs), optionally allowing mismatches in the protein alignment.

The search can process up to millions of reads per minute using, for example, only 10 GB RAM with a reference database comprising 4821 complete microbial genomes.

Kaiju offers at least four reference databases for classification, which are downloaded from the Kaiju webserver page (last updated early 2022). The databases are:

Subsampling

Large datasets can take a long time to process, and there are situations where it is worth the wait. Sometimes, however, users just want a sample of how the App works or only want the higher taxonomic levels. At the higher taxonomic levels, the results are just as good when you run against a small fraction of the data, and it is much faster. The ability to randomly subsample reads was added as a preprocessor to running the Kaiju App. This function can greatly speed up the App for those situations where the it is being tested or only used for high taxonomic levels. See Randomly Subsample Reads for more information on the subsampling process.

Notes

Team members who wrapped the app for KBase: Dylan Chivian (lead), Sean Jungbluth. For questions, please use the Help Board.

Related Publications


App Specification:

https://github.com/kbaseapps/kb_kaiju/tree/83aa257ccd7d0391e118c6f41f6410319954c376/ui/narrative/methods/run_kaiju

Module Commit: 83aa257ccd7d0391e118c6f41f6410319954c376