Group assembled metagenomic contigs into lineages (Bins) using depth-of-coverage and nucleotide composition
CONCOCT clusters metagenomic contigs into different "bins", each of which should correspond to a putative genome.
CONCOCT uses nucleotide composition information and source strain abundance (measured by depth-of-coverage by aligning the reads to the contigs) to perform binning.
CONCOCT takes a metagenome assembly and the reads that produced the assembly and organizes the contigs into putative genomes, called "bins". CONCOCT uses nucleotide composition information and depth of coverage information to perform binning.
Configuration:
Assembly Object: The Assembly object is a collection of assembled genome fragments, called "contigs". These are the items that CONCOCT will bin. Currently only a single Metagenome Assembly object is accepted by the CONCOCT App.
Input BinnedContig Object Name: The BinnedContig Object represents the directory of binned contigs created by CONCOCT. This object can be used for downstream analysis
Read Library Object: The read libraries are aligned to the assembly using a selected read mapper, and provide the abundance information for each contig that roughly follows the species abundance.
Minimum Contig Length: Contigs that are too short may slow down analysis and not give statistically meaningful nucleotide composition profiles. A value of 2500 bp is a reasonable cutoff, but as low as 1000 bp can be used.
Contig Split Size: Contigs are split before clustering to minimize the bias imposed by very large contigs. A value of 5000-10000 bp is a reasonable cutoff.
Contig Split Overlap: Contigs split before clustering can have varying degrees of overlap. A value of 0 is most often used.
Kmer Length: Size of the kmers used during profiling. A default value of 4 is reasonable to start with. Note: adjusting the kmer length will impact speed.
Output:
Output BinnedContig Object Name:The BinnedContig Object represents the directory of binned contigs created by CONCOCT. This object can be used for downstream analysis.
Output Bin Summary Report:The number of bins produced, the number of contigs that were binned and the total number of contigs in the assembly.
Downloadable files: The enitre output of the CONCOCT run may be downloaded as a zip file. This zip file also contains a table of read-depth coverage per contig ("*.depth.txt")
Implemented for KBase by Sean Jungbluth([email protected])
Related Publications
- Alneberg J, Bjarnason BS, de Bruijn I, Schirmer M, Quick J, Ijaz UZ, Lahti L, Loman NJ, Andersson AF, Quince C. Binning metagenomic contigs by coverage and composition. Nature Methods. 2014;11: 1144-1146. doi:10.1038/nmeth.3103 , https://doi.org/10.1038/nmeth.3103
- CONCOCT source: , https://github.com/BinPro/CONCOCT
App Specification:
https://github.com/kbaseapps/kb_concoct/tree/936c9bfed7e07dd1ddcb76ac4fa775b73d60afbb/ui/narrative/methods/run_kb_concoctModule Commit: 936c9bfed7e07dd1ddcb76ac4fa775b73d60afbb