Dereplicate genomes based on ANI and quality
dRep is a genome dereplication pipeline useful for, for example, reducing to representative genomes for downstream analysis, or for individual assembly workflows. dRep can first filter by length, then run CheckM and filter by completness and contamination, then run pairwise MASH for primary clustering, then calculate intra-cluster pairwise ANI for secondary clustering, then dereplicate secondary clusters based on a parameterized genome score.
See Read the Docs on how to use advanced parameters.
Related Publications
- <a href="https://github.com/MrOlm/drep">Github</a> ,
- <a href="https://drep.readthedocs.io/en/latest/index.html">Read the Docs</a> ,
- <a href="https://www.nature.com/articles/ismej2017126">Olm, M., Brown, C., Brooks, B. <i>et al</i>. dRep: a tool for fast and accurate genomic comparisons that enables improved genome recovery from metagenomes through de-replication. <i>ISME J</i> <b>11</b>, 2864 2868 (2017).</a> ,
- <a href="https://doi.org/10.1101/108142">bioRxiv</a> ,
- Hyatt D. et al. (2010) Prodigal: prokaryotic gene recognition and translation initiation site identification. BMC Bioinformatics, 11, 119. ,
- Eddy S.R. (2011) Accelerated profile HMM searches. PLoS Comput. Biol., 7, e1002195. ,
- Matsen F. et al. (2010) pplacer: linear time maximum-likelihood and Bayesian phylogenetic placement of sequences onto a fixed reference tree. BMC Bioinformatics, 11, 538. ,
- Ondov, B.D., Treangen, T.J., Melsted, P. et al. Mash: fast genome and metagenome distance estimation using MinHash. Genome Biol 17, 132 (2016). https://doi.org/10.1186/s13059-016-0997-x ,
- Parks, Donovan H et al. CheckM: assessing the quality of microbial genomes recovered from isolates, single cells, and metagenomes. Genome research vol. 25,7 (2015): 1043-55. doi:10.1101/gr.186072.114 ,
- Microbial species delineation using whole genome sequences. Neha J. Varghese; Supratim Mukherjee; Natalia Ivanova; Konstantinos T. Konstantinidis; Kostas Mavrommatis; Nikos C. Kyrpides; Amrita Pati. Nucleic Acids Research 2015; doi: 10.1093/nar/gkv657 ,
- Kurtz, S., Phillippy, A., Delcher, A.L. et al. Versatile and open software for comparing large genomes. Genome Biol 5, R12 (2004). https://doi.org/10.1186/gb-2004-5-2-r12 ,
App Specification:
https://github.com/kbaseapps/kb_dRep/tree/805c450d9fa4767567aa452ca1a744bb25c15813/ui/narrative/methods/run_dereplicateModule Commit: 805c450d9fa4767567aa452ca1a744bb25c15813