Allows users to perform taxonomic classification of shotgun metagenomic read data with Kaiju.
This App makes the tool Kaiju: Fast and sensitive taxonomic classification for metagenomics available through KBase. Kaiju is written by Peter Menzel and Anders Krogh at the Bioinfomratics Centre, a part of the Section for Computational and RNA Biology at the University of Copenhagen.
From the Kaiju homepage:
Kaiju is a program for sensitive taxonomic classification of high-throughput sequencing reads from metagenomic whole genome sequencing or metatranscriptomics experiments.
Each sequencing read is assigned to a taxon in the NCBI taxonomy by comparing it to a reference database containing microbial and viral protein sequences. By using protein-level classification, Kaiju achieves a higher sensitivity compared with methods based on nucleotide comparison.
Kaiju can use either the set of available complete genomes from NCBI RefSeq or the microbial subset of the NCBI BLAST non-redundant protein database nr, optionally also including fungi and microbial eukaryotes.
Reads are translated into amino acid sequences, which are then searched in the database using a modified backward search on a memory-efficient implementation of the Burrows-Wheeler transform, which finds maximum exact matches (MEMs), optionally allowing mismatches in the protein alignment.
The search can process up to millions of reads per minute using, for example, only 10 GB RAM with a reference database comprising 4821 complete microbial genomes.
Kaiju offers four reference databases for classification, which are downloaded from the Kaiju webserver page (last updated 2019-06-25). The databases are:
- RefSeq Complete Genomes: 50.9M protein sequences from completely assembled bacterial, archaeal, and viral genomes from NCBI RefSeq.
- proGenomes: 19.7M protein sequences from a representative set of genomes derived from NCBI RefSeq bacterial, archaeal, and viral genomes.
- NCBI BLAST nr: 164M protein sequences from nr: Bacteria, Archaea, and Viruses.
- NCBI BLAST nr+euk: 178M protein sequences from nr: Bacteria, Archaea, Viruses, Fungi and microbial eukaryotes.
Large datasets can take a long time to process, and there are situations where it is worth the wait. Sometimes, however, users just want a sample of how the App works or only want the higher taxonomic levels. At the higher taxonomic levels, the results are just as good when you run against a small fraction of the data, and it is much faster. The ability to randomly subsample reads was added as a preprocessor to running the Kaiju App. This function can greatly speed up the App for those situations where the it is being tested or only used for high taxonomic levels. See Randomly Subsample Reads for more information on the subsampling process.
- Kaiju v1.7.2 updates: flags -i and -j now required to run kaiju binary; kaijuReport renamed to kaiju2table
- Krona Snapshots: It may be that you will not be able to take a snapshot of the Krona plot. This is a known issue with Krona for some versions of Chrome and Firefox on Windows 7 and 10. To remedy this, we suggest trying it with a different browser.
Team members who wrapped the app for KBase: Dylan Chivian (lead), Sean Jungbluth. For questions, please use the Help Board.
- Menzel P, Ng KL, Krogh A. Fast and sensitive taxonomic classification for metagenomics with Kaiju. Nat Commun. 2016;7: 11257. doi:10.1038/ncomms11257 , http://www.ncbi.nlm.nih.gov/pubmed/27071849
- Ondov BD, Bergman NH, Phillippy AM. Interactive metagenomic visualization in a Web browser. BMC Bioinformatics. 2011;12: 385. doi:10.1186/1471-2105-12-385 , http://www.ncbi.nlm.nih.gov/pubmed/21961884
- Kaiju Homepage: , http://kaiju.binf.ku.dk/
- Kaiju DBs from: , http://kaiju.binf.ku.dk/server
- Github for Kaiju: , https://github.com/bioinformatics-centre/kaiju
- Krona homepage: , https://github.com/marbl/Krona/wiki
- Github for Krona: , https://github.com/marbl/Krona
Module Commit: 67bd2909742437ae322ede9d41201ff0fc36d524