Calculate pangenome for microbial genomes, including MAGs of varying quality
Description
mOTUpan is a Pangenome calculation tool used to analyze a group of related MAGs/Genomes/bins/SAGs that may include less than complete genomes. The core calculation and posterior completeness are re-estimated after each iterative round of running mOTUpan. The KBase mOTUpan App will run CheckM to score the initial genome completeness if the Genome object does not already contain quality scores. Additionally, the Pangenome object will have "core" or "accessory" classifications based on mOTUpan estimations, which account for completeness in determining core likelihood (please see mOTUpan publication below).
The report for the mOTUpan App is from the View Pangenome Circle Plot App. For an explanation of the input parameters, the output figures, subsets of the Pangenome, and the FeatureSet objects it can produce, please see its App Documentation.
NotesIf you use a Species Tree as the input for the Pangenome Calculation, you may wish to follow this analysis with the Phylogenetic Pangenome Accumulation App.
If all of your input genomes are not near-complete isolates, especially if they are not high-quality, you probably should use multiple iterations of mOTUpan to get a better posterior genome completeness estimates. This value is set by the "max-iter" parameter.
Tool Source:
mOTUpan v0.3.2 is installed via pip. Source code is from https://github.com/moritzbuck/mOTUlizer
MMseqs2 release 14-7e284 is installed from https://github.com/soedinglab/MMseqs2/releases/tag/14-7e284
Team members who implemented App in KBase: Dylan Chivian. For questions, please contact us.
References
mOTUpan is described in:
- Moritz Buck, Maliheh Mehrshad, Stefan Bertilsson. mOTUpan: a robust Bayesian approach to leverage metagenome-assembled genomes for core-genome estimation. NAR Genom Bioinform. 2022 Aug 15;4(3):lqac060. doi: 10.1093/nargab/lqac060.
We also strongly encourage you to cite the following 3rd party dependency:
- Martin Steinegger, Johannes S ding. MMseqs2 enables sensitive protein sequence searching for the analysis of massive data sets. Nat Biotechnol. 2017 Nov;35(11):1026-1028. doi: 10.1038/nbt.3988
Related Publications
- Moritz Buck, Maliheh Mehrshad, Stefan Bertilsson. mOTUpan: a robust Bayesian approach to leverage metagenome-assembled genomes for core-genome estimation. NAR Genom Bioinform. 2022 Aug 15;4(3):lqac060. doi: 10.1093/nargab/lqac060. , https://doi.org/10.1093/nargab/lqac060
- Martin Steinegger, Johannes S ding. MMseqs2 enables sensitive protein sequence searching for the analysis of massive data sets. Nat Biotechnol. 2017 Nov;35(11):1026-1028. doi: 10.1038/nbt.3988 , https://doi.org/10.1038/nbt.3988
App Specification:
https://github.com/kbaseapps/kb_motupan/tree/6496176e5048e69826234cbb7b78c97c9383dbd3/ui/narrative/methods/run_kb_motupanModule Commit: 6496176e5048e69826234cbb7b78c97c9383dbd3