Assemble metagenomic reads using the MEGAHIT assembler.
This is a KBase wrapper for the ultra-fast and memory-efficient metagenome assembler MEGAHIT. MEGAHIT is a single node assembler optimized for large and complex metagenomics next-generation sequencing (NGS) reads, such as soil. It can also be used to assemble reads from less complex metagenomes. It makes use of succinct de Bruijn graphs (SdBG) to achieve a low memory assembly. The output is an assembly object, which can be used for downstream analysis such as binning genomes from a metagenome assembly.
MEGAHIT uses a multiple k-mer strategy. Minimum k-mer size, maximum k-mer size, and the k-mer step size for each iteration can be set in the parameters. Values for k must be odd numbers while the step size must be an even number.
User feedback and extensive use on KBase has led to these observations about setting input parameters for MEGAHIT:
- MEGAHIT makes decisions about the order in which optional parameters are implemented. If k-min, k-max, and k-step are all three left blank, the k-list of [21,29,39,59,79,99,119,141] is used. This list is different than the one generated by using the default values for minimum k-mer size, maximum k-mer size, and the k-mer step size
- If parameters are specified for the minimum k-mer size, maximum k-mer size, or the k-mer step size parameters, then the default values will be used for the remaining parameter(s).
- The original documentation claims that the default k-mer step size (k-step) is 12. In practice, it is set to 10.
- The maximum k-mer size will be reset if the kmaximum k-mer size is greater than the longest read length + 20. If this results in an even number, the maximum k-mer size will be reset to the longest read length + 19.
- It is possible to set parameters so extreme that MEGAHIT ends with an error.
- If the minimum k-mer size is larger than the longest read, there are no reads to assemble and MEGAHIT ends in an error.
- If the minimum k-mer size is set high, there may be very few reads to assemble. In these cases, it is possible to set min_count at a value that has no assembled contigs, and MEGAHIT ends with an error.
- If the minimum k-mer size is set high, there may be very few reads to assemble. If contigs do get assembled but none pass the minimum contig length filter, MEGAHIT will end cleanly but have no contigs.
KBase module authors:
- Michael Sneddon
- Roman Sutormin
- Dylan Chivian
KBase module authors:
- Michael Sneddon
- Roman Sutormin
- Dylan Chivian
Related Publications
- Li D, Liu C-M, Luo R, Sadakane K, Lam T-W. MEGAHIT: an ultra-fast single-node solution for large and complex metagenomics assembly via succinct de Bruijn graph. Bioinformatics. 2015;31: 1674 1676. doi:10.1093/bioinformatics/btv033 , http://www.ncbi.nlm.nih.gov/pubmed/25609793
App Specification:
https://github.com/kbaseapps/kb_megahit/tree/e2e6d557da8a92d8b9a6cb4893a2a7293a3068bf/ui/narrative/methods/run_megahitModule Commit: e2e6d557da8a92d8b9a6cb4893a2a7293a3068bf