Group assembled metagenomic contigs into lineages (Bins) using depth-of-coverage, nucleotide composition, and marker genes.
MaxBin2 clusters metagenomic contigs (assembled contiguous genome fragments) into different "bins", each of which corresponds to a putative population genome. It uses nucleotide composition information, source strain abundance (measured by depth-of-coverage by aligning the reads to the contigs), and phylogenetic marker genes to perform binning through an Expectation-Maximization (EM) algorithm.
MaxBin2 takes a metagenome assembly and the read sequence data that produced the assembly and produces groupings of the contigs that correspond to putative genomes, called "bins". MaxBin is software for binning assembled metagenomic sequences based on an Expectation-Maximization algorithm. Users can understand the underlying bins (genomes) of the microbes in their metagenomes by simply providing assembled metagenomic sequences and the reads coverage information or sequencing reads. For convenience, MaxBin will report genome-related statistics, including estimated completeness, GC content, and genome size in the binning summary page.
Team members who developed & deployed algorithm in KBase: Tianhao Gu, William J Riehl, and Dylan Chivian. For questions, please contact us.
App Configuration Information:
- Assembly Object: The Assembly object is a collection of assembled genome fragments, called "contigs". These are the items that MaxBin2 will bin. Currently, only a single Assembly object is accepted by the MaxBin2 App.
- BinnedContig Object Name: The BinnedContig object is created by MaxBin2. It stores the bin assignments for each of the contigs.
- Read Library Object: The read libraries are aligned to the contigs using BowTie2, and provide the abundance information for each contig that roughly follows the species abundance.
- Probability Threshold: This sets the confidence that must be met by the Expectation Maximization (EM) algorithm in order for a given contig to be grouped with a bin. Contigs receiving values below this threshold are deemed "unclassified".
- Marker Set: Phylogenetic markers are used as a term in the EM algorithm. The user can select between 107 markers that correspond primarily to bacterial lineages, or 40 markers that are found in both bacteria and archaea. If a user expects that there are mostly bacteria in their samples, they should select the set of 107.
- Minimum Contig Length: Some assemblies contain shorter contigs that may slow down analysis. These can be of reduced value in that they may not contain complete genes and may be too short for statistically meaningful nucleotide composition profiles. A value of 1000 bp is a reasonable cutoff. By setting a larger value (e.g., 2500 bp), once can minimize the contamination of bins by small contigs.
- Plot Marker: Phylogenetic marker genes found in the bins can be indicated and downloaded.
App Output:
- Output Object: The BinnedContig object contains the bin assignments for each contig in the Assembly object.
- Output Bin Summary Report: A table with the characteristics of each of the bins, including the bin name, its depth of coverage (abundance), the estimated completeness (using the phylogenetic marker genes found in each bin), an estimate of the genome size (combining the completeness estimation with the observed contig lengths for that bin), and the the binned contig percent GC.
- Downloadable files: The entire output of the MaxBin2 run may be downloaded as a zip file.
Related Publications
- Wu Y-W, Simmons BA, Singer SW. MaxBin 2.0: an automated binning algorithm to recover genomes from multiple metagenomic datasets. Bioinformatics. 2016;32: 605 607. doi:10.1093/bioinformatics/btv638 (2) 1. Wu Y-W, Tang Y-H, Tringe SG, Simmons BA, Singer SW. MaxBin: an automated binning method to recover individual genomes from metagenomes using an expectation-maximization algorithm. Microbiome. 2014;2: 26. doi:10.1186/2049-2618-2-26 , https://academic.oup.com/bioinformatics/article/32/4/605/1744462
- Wu Y-W, Tang Y-H, Tringe SG, Simmons BA, Singer SW. MaxBin: an automated binning method to recover individual genomes from metagenomes using an expectation-maximization algorithm. Microbiome. 2014;2: 26. doi:10.1186/2049-2618-2-26 , https://microbiomejournal.biomedcentral.com/articles/10.1186/2049-2618-2-26
- Maxbin2 source: , https://sourceforge.net/projects/maxbin2/
- Maxbin source: , http://downloads.jbei.org/data/microbial_communities/MaxBin/README.txt
App Specification:
https://github.com/kbaseapps/kb_maxbin/tree/3574f0b2cc343f4f530bd52b86c8cf8b6b1427f3/ui/narrative/methods/run_maxbin2Module Commit: 3574f0b2cc343f4f530bd52b86c8cf8b6b1427f3