Generated August 23, 2022

Draft genome sequences for 6 isolates of endospore forming class Bacilli species isolated from soil from a suburban, wooded, developed space

Introduction

Soil bacteria and endospore forming bacteria in particular play critical environmental roles in biomass degradation, plant health, and many have also been sources for commercially important secondary metabolites including antibiotics (1, 2). Sequencing multiple isolates of known species also can identify new genes and gene clusters not found in other members of the same species, increasing our understanding of microbial diversity.

The publication by [Anna L. McLoon, Thomas T. Awad, Molly F. Bogardus, Meredith G. Buono, Kaitlyn A. Devine, Rebecca M. Draper, Brianna Femenella, Hannah M. Gallagher, Laura A. Morelock, Mishal Razi, Jacqueline R. Rennick, Abigail K. Sheridan, Righlee J. Thibault, Katie L. Touchette, and Grace E. Zuchowski] can be found here: [URL]

Table of Contents

  1. Background and Experimental Methods
  2. Import and annotation
  3. QC, Assembly, and Annotation
  4. Taxonomic Classification
  5. References
This scaffolding narrative was created by: [Anna L. McLoon]

Access to Narrative workflows and sequence data

Strain Kbase narrative used for analyses Bioproject accession number SRA Biosample GenBank accession
Bacillus pseudomycoides strain SC107

https://kbase.us/n/110380/32/

PRJNA862062 SAMN29936620 JANIOB000000000
Rossellomorea sp. strain SC111

https://kbase.us/n/110406/24/

PRJNA862062 SAMN29936621 JANIOC000000000
Peribacillus frigoritolerans strain SC112

https://kbase.us/n/110408/28/

PRJNA862062 SAMN29936622 JANIOD000000000
Priestia megaterium strain SC114

https://kbase.us/n/110407/21/

PRJNA862062 SAMN29936623 JANIOE000000000
Paenibacillus sp. strain SC116

https://kbase.us/n/110414/22/

PRJNA862062 SAMN29936624 JANIOF000000000
Lysinibacillus fusiformis strain SC117

https://kbase.us/n/109725/40/

PRJNA862062 SAMN29936625 JANIOG000000000

Background and Experimental Methods

Sample Collection

We collected a soil sample on January 5, 2022 from a heavily impacted wooded area between a parking lot and a marsh overgrown with phragmites reeds on the Siena College campus (42°43'11.0"N 73°44'59.6"W 42.719711, -73.749876) near a willow (Salix sp.) tree. The soil in the area is Stafford loamy fine sand, and was frozen at the time of collection.

Isolation

Subsamples of soil were suspended in water and boiled for 10 minutes to isolate endospores, then cultured on 5% sheep blood TSA plates overnight at 37ºC. 6 isolates were recultured and assessed via Gram staining and other tests to ensure purity. All grow well on TSA at temperatures between 22 and 37 ºC. All 6 isolates are in the Bacilli class (3), several in newly reclassified genera (4, 5).

Genome Sequencing

For genomic DNA isolation, cells were grown in tryptic soy broth at 37 ºC for 3-4 hours, until the culture was visibly dense. Cells were resuspended in 50 mM EDTA, and peptidoglycan was digested for 1 hour at 37 ºC with the addition of 30 µl of 20 mg/ml lysozyme, then genomic DNA was purified using a Promega Wizard Genomic DNA purification kit. Genomic DNA was sequenced as 151bp paired end reads using an Illumina NextSeq2000 by the Microbial Genome Sequencing Center, Pittsburgh, PA.

QC, Assembly, and Annotation

Adapter sequences were removed by the sequencing facility. All subsequent analysis steps were carried out within the KBase environment (6, 7). Analyses were carried out in parallel for each isolate in its own Narrative. Sequence quality was validated with FastQC v0.11.9 (8), then reads were trimmed with Trimmomatic v0.36 (9). Trimmed reads were assembled into genome assemblies using SPAdes v3.15.3, annotated using RAST v1.073 and/or Prokka v.1.14.5, the species identity was assigned using GTDB-Tk v1.7.0 and TYGS, and metabolic predictions were made using DRAM v0.1.0 (10–21). Default parameters were used. Genome assemblies range in size from 4.38 to 6.07 Mb bp and are AT rich (Table 1).

 

Table 1: Data summary

Strain

Species

Number of reads

# contigs

Total length (bp)

N50

%GC

Predicted genes (prokka via KBase)

SC107

Bacillus pseudomycoides

6,363,180

97

5,714,291

270,478

35.33

5885

SC111

Rosellomorea sp.

7,539,986

36

4,381,526

242,411

43,42

4503

SC112

Peribacillus frigoritolerans

8,203,330

54

5,332,774

241,338

40.51

5193

SC114

Priestia megaterium

7,160,128

38

6,065,181

4,198,608

37.45

6305

SC116

Paenibacillus sp.

7,022,018

22

5,444,016

2,729,508

43.46

4852

SC117

Lysinibacillus fusiformis

8,267,586

30

4,809,961

474,952

37.09

4670

 

Run QUAST (QUality ASsessment Tool) on a set of Assemblies to assess their quality.
This app completed without errors in 1m 11s.
Summary
All statistics are based on contigs of size >= 500 bp, unless otherwise noted (e.g., "# contigs (>= 0 bp)" and "Total length (>= 0 bp)" include all contigs). Assembly Unknown_107_trimmed_SPAdesAssembly # contigs (>= 0 bp) 97 # contigs (>= 1000 bp) 78 # contigs (>= 10000 bp) 46 # contigs (>= 100000 bp) 16 # contigs (>= 1000000 bp) 0 Total length (>= 0 bp) 5714291 Total length (>= 1000 bp) 5702513 Total length (>= 10000 bp) 5554900 Total length (>= 100000 bp) 4547690 Total length (>= 1000000 bp) 0 # contigs 97 Largest contig 699747 Total length 5714291 GC (%) 35.33 N50 270478 N75 144666 L50 7 L75 14 # N's per 100 kbp 41.00 # predicted genes (unique) 1771 # predicted genes (>= 0 bp) 1766 + 5 part # predicted genes (>= 300 bp) 1716 + 5 part # predicted genes (>= 1500 bp) 604 + 3 part # predicted genes (>= 3000 bp) 166 + 2 part
Links
Run QUAST (QUality ASsessment Tool) on a set of Assemblies to assess their quality.
This app completed without errors in 1m 7s.
Summary
All statistics are based on contigs of size >= 500 bp, unless otherwise noted (e.g., "# contigs (>= 0 bp)" and "Total length (>= 0 bp)" include all contigs). Assembly Unknown_111_SPAdesAssembly # contigs (>= 0 bp) 36 # contigs (>= 1000 bp) 34 # contigs (>= 10000 bp) 29 # contigs (>= 100000 bp) 14 # contigs (>= 1000000 bp) 0 Total length (>= 0 bp) 4381526 Total length (>= 1000 bp) 4380170 Total length (>= 10000 bp) 4373074 Total length (>= 100000 bp) 3452272 Total length (>= 1000000 bp) 0 # contigs 36 Largest contig 379195 Total length 4381526 GC (%) 43.42 N50 242411 N75 137046 L50 7 L75 13 # N's per 100 kbp 18.03 # predicted genes (unique) 2963 # predicted genes (>= 0 bp) 2960 + 3 part # predicted genes (>= 300 bp) 2856 + 3 part # predicted genes (>= 1500 bp) 666 + 1 part # predicted genes (>= 3000 bp) 111 + 0 part
Links
Run QUAST (QUality ASsessment Tool) on a set of Assemblies to assess their quality.
This app completed without errors in 1m 42s.
Summary
All statistics are based on contigs of size >= 500 bp, unless otherwise noted (e.g., "# contigs (>= 0 bp)" and "Total length (>= 0 bp)" include all contigs). Assembly Unknown_112_trimmed_SPAdesAssembly # contigs (>= 0 bp) 54 # contigs (>= 1000 bp) 49 # contigs (>= 10000 bp) 36 # contigs (>= 100000 bp) 18 # contigs (>= 1000000 bp) 0 Total length (>= 0 bp) 5332774 Total length (>= 1000 bp) 5329626 Total length (>= 10000 bp) 5277792 Total length (>= 100000 bp) 4503069 Total length (>= 1000000 bp) 0 # contigs 54 Largest contig 495998 Total length 5332774 GC (%) 40.51 N50 241338 N75 156090 L50 8 L75 15 # N's per 100 kbp 55.07 # predicted genes (unique) 2864 # predicted genes (>= 0 bp) 2864 + 0 part # predicted genes (>= 300 bp) 2797 + 0 part # predicted genes (>= 1500 bp) 757 + 0 part # predicted genes (>= 3000 bp) 150 + 0 part
Links
Run QUAST (QUality ASsessment Tool) on a set of Assemblies to assess their quality.
This app completed without errors in 1m 39s.
Summary
All statistics are based on contigs of size >= 500 bp, unless otherwise noted (e.g., "# contigs (>= 0 bp)" and "Total length (>= 0 bp)" include all contigs). Assembly Unknown_114_trimmed_SPAdesAssembly # contigs (>= 0 bp) 38 # contigs (>= 1000 bp) 33 # contigs (>= 10000 bp) 26 # contigs (>= 100000 bp) 6 # contigs (>= 1000000 bp) 1 Total length (>= 0 bp) 6065181 Total length (>= 1000 bp) 6061940 Total length (>= 10000 bp) 6031872 Total length (>= 100000 bp) 5155661 Total length (>= 1000000 bp) 4198608 # contigs 38 Largest contig 4198608 Total length 6065181 GC (%) 37.45 N50 4198608 N75 367872 L50 1 L75 2 # N's per 100 kbp 16.24 # predicted genes (unique) 2626 # predicted genes (>= 0 bp) 2624 + 3 part # predicted genes (>= 300 bp) 2560 + 3 part # predicted genes (>= 1500 bp) 822 + 0 part # predicted genes (>= 3000 bp) 196 + 0 part
Links
Run QUAST (QUality ASsessment Tool) on a set of Assemblies to assess their quality.
This app completed without errors in 1m 14s.
Summary
All statistics are based on contigs of size >= 500 bp, unless otherwise noted (e.g., "# contigs (>= 0 bp)" and "Total length (>= 0 bp)" include all contigs). Assembly Unknown_116_SPAdes.assembly # contigs (>= 0 bp) 22 # contigs (>= 1000 bp) 19 # contigs (>= 10000 bp) 13 # contigs (>= 100000 bp) 7 # contigs (>= 1000000 bp) 1 Total length (>= 0 bp) 5444016 Total length (>= 1000 bp) 5441622 Total length (>= 10000 bp) 5428326 Total length (>= 100000 bp) 5104603 Total length (>= 1000000 bp) 2729508 # contigs 22 Largest contig 2729508 Total length 5444016 GC (%) 43.47 N50 2729508 N75 432679 L50 1 L75 4 # N's per 100 kbp 10.58 # predicted genes (unique) 3364 # predicted genes (>= 0 bp) 3357 + 7 part # predicted genes (>= 300 bp) 3265 + 7 part # predicted genes (>= 1500 bp) 838 + 3 part # predicted genes (>= 3000 bp) 169 + 3 part
Links
Run QUAST (QUality ASsessment Tool) on a set of Assemblies to assess their quality.
This app completed without errors in 1m 15s.
Summary
All statistics are based on contigs of size >= 500 bp, unless otherwise noted (e.g., "# contigs (>= 0 bp)" and "Total length (>= 0 bp)" include all contigs). Assembly Unknown_117_trimmed_SPAdesAssembly # contigs (>= 0 bp) 30 # contigs (>= 1000 bp) 27 # contigs (>= 10000 bp) 23 # contigs (>= 100000 bp) 14 # contigs (>= 1000000 bp) 1 Total length (>= 0 bp) 4809961 Total length (>= 1000 bp) 4807599 Total length (>= 10000 bp) 4796337 Total length (>= 100000 bp) 4361854 Total length (>= 1000000 bp) 1587707 # contigs 30 Largest contig 1587707 Total length 4809961 GC (%) 37.09 N50 474952 N75 135982 L50 3 L75 9 # N's per 100 kbp 10.19 # predicted genes (unique) 1856 # predicted genes (>= 0 bp) 1856 + 0 part # predicted genes (>= 300 bp) 1828 + 0 part # predicted genes (>= 1500 bp) 708 + 0 part # predicted genes (>= 3000 bp) 200 + 0 part
Links

Taxonomic Identification

To assign each isolate to the appropriate genus and, where possible, species, we used the GTDB-Tk classification program v1.7.0 using the draft genome assemblies. Each identification was independently achieved using the TYGS type strain server (Meier-Kolthoff et al., 2022). See the individual narratives for full taxonomies, with genus and species summarized here:

Isolate Taxonomic identification
SienaCollegeUnknown_107 Bacillus pseudomycoides, strain SC107
SienaCollegeUnknown_111 Rossellomorea sp., strain SC111
SienaCollegeUnknown_112 Peribacillus frigoritolerans, strain SC112
SienaCollegeUnknown_114 Priestia megaterium, strain SC114
SienaCollegeUnknown_116 Paenibacillus sp., strain SC116
SienaCollegeUnknown_117 Lysinibacillus fusiformis, strain SC117
Obtain objective taxonomic assignments for bacterial and archaeal genomes based on the Genome Taxonomy Database (GTDB) ver R06-RS202
This app completed without errors in 35m 14s.
Objects
Created Object Name Type Description
Unknown_107_annotated_prokka Genome Taxonomy and taxon_assignment updated with GTDB
Links
Obtain objective taxonomic assignments for bacterial and archaeal genomes based on the Genome Taxonomy Database (GTDB) ver R06-RS202
This app completed without errors in 35m 7s.
Objects
Created Object Name Type Description
Unknown_111_annotated_prokka Genome Taxonomy and taxon_assignment updated with GTDB
Links
Obtain objective taxonomic assignments for bacterial and archaeal genomes based on the Genome Taxonomy Database (GTDB) ver R06-RS202
This app completed without errors in 36m 2s.
Objects
Created Object Name Type Description
Unknown_112_annotated_Prokka Genome Taxonomy and taxon_assignment updated with GTDB
Links
Obtain objective taxonomic assignments for bacterial and archaeal genomes based on the Genome Taxonomy Database (GTDB) ver R06-RS202
This app completed without errors in 36m 55s.
Objects
Created Object Name Type Description
Unknown_114_annotated_prokka Genome Taxonomy and taxon_assignment updated with GTDB
Links
Obtain objective taxonomic assignments for bacterial and archaeal genomes based on the Genome Taxonomy Database (GTDB) ver R06-RS202
This app completed without errors in 40m 16s.
Objects
Created Object Name Type Description
Unknown_116_annotated_prokka Genome Taxonomy and taxon_assignment updated with GTDB
Links
Obtain objective taxonomic assignments for bacterial and archaeal genomes based on the Genome Taxonomy Database (GTDB) ver R06-RS202
This app completed without errors in 36m 5s.
Objects
Created Object Name Type Description
Unknown_117_annotated_Prokka Genome Taxonomy and taxon_assignment updated with GTDB
Links
Allows users to create a GenomeSet object.
This app completed without errors in 51s.
Objects
Created Object Name Type Description
6_bacilli_genomes GenomeSet KButil_Build_GenomeSet
Summary
genomes in output set 6_bacilli_genomes: 6
Add a user-provided GenomeSet to a KBase SpeciesTree.
This app completed without errors in 9m 2s.
Files
These are only available in the live Narrative: https://narrative.kbase.us/narrative/122484
  • Tree_with_6_isolates.newick
  • Tree_with_6_isolates-labels.newick
  • Tree_with_6_isolates.png
  • Tree_with_6_isolates.pdf

References

  1. Mandic-Mulec I, Stefanic P, van Elsas JD. 2015. Ecology of Bacillaceae. Microbiol Spectr 3:TBS-0017-2013.
  2. Grady EN, MacDonald J, Liu L, Richman A, Yuan Z-C. 2016. Current knowledge and perspectives of Paenibacillus: a review. Microb Cell Fact 15:203.
  3. Vos, paul, George M. Garrity, Dorothy Jones, Noel R. Krieg, Wolfgang Ludwig, Fred A. Rainey, Karl-Heinz Schleifer, William B. Whitman. Bergey’s Manual of Systematic Bacteriology, 2nd ed. Springer.
  4. Gupta RS, Patel S, Saini N, Chen S. 2020. Robust demarcation of 17 distinct Bacillus species clades, proposed as novel Bacillaceae genera, by phylogenomics and comparative genomic analyses: description of Robertmurraya kyonggiensis sp. nov. and proposal for an emended genus Bacillus limiting it only to the members of the Subtilis and Cereus clades of species. Int J Syst Evol Microbiol 70:5753–5798.
  5. Patel S, Gupta RS. 2020. A phylogenomic and comparative genomic framework for resolving the polyphyly of the genus Bacillus: Proposal for six new genera of Bacillus species, Peribacillus gen. nov., Cytobacillus gen. nov., Mesobacillus gen. nov., Neobacillus gen. nov., Metabacillus gen. nov. and Alkalihalobacillus gen. nov. Int J Syst Evol Microbiol 70:406–438.
  6. Arkin AP, Cottingham RW, Henry CS, Harris NL, Stevens RL, Maslov S, Dehal P, Ware D, Perez F, Canon S, Sneddon MW, Henderson ML, Riehl WJ, Murphy-Olson D, Chan SY, Kamimura RT, Kumari S, Drake MM, Brettin TS, Glass EM, Chivian D, Gunter D, Weston DJ, Allen BH, Baumohl J, Best AA, Bowen B, Brenner SE, Bun CC, Chandonia J-M, Chia J-M, Colasanti R, Conrad N, Davis JJ, Davison BH, DeJongh M, Devoid S, Dietrich E, Dubchak I, Edirisinghe JN, Fang G, Faria JP, Frybarger PM, Gerlach W, Gerstein M, Greiner A, Gurtowski J, Haun HL, He F, Jain R, Joachimiak MP, Keegan KP, Kondo S, Kumar V, Land ML, Meyer F, Mills M, Novichkov PS, Oh T, Olsen GJ, Olson R, Parrello B, Pasternak S, Pearson E, Poon SS, Price GA, Ramakrishnan S, Ranjan P, Ronald PC, Schatz MC, Seaver SMD, Shukla M, Sutormin RA, Syed MH, Thomason J, Tintle NL, Wang D, Xia F, Yoo H, Yoo S, Yu D. 2018. KBase: The United States Department of Energy Systems Biology Knowledgebase. Nat Biotechnol 36:566–569.
  7. Allen B, Drake M, Harris N, Sullivan T. 2017. Using KBase to Assemble and Annotate Prokaryotic Genomes. Curr Protoc Microbiol 46:1E.13.1-1E.13.18.
  8. Andrews, Simon. 2019. FastQC: a quality control tool for high throughput sequence data. Available online at: http://www.bioinformatics.babraham.ac.uk/projects/fastqc (0.11.9).
  9. Bolger AM, Lohse M, Usadel B. 2014. Trimmomatic: a flexible trimmer for Illumina sequence data. Bioinformatics 30:2114–2120.
  10. Bankevich A, Nurk S, Antipov D, Gurevich AA, Dvorkin M, Kulikov AS, Lesin VM, Nikolenko SI, Pham S, Prjibelski AD, Pyshkin AV, Sirotkin AV, Vyahhi N, Tesler G, Alekseyev MA, Pevzner PA. 2012. SPAdes: a new genome assembly algorithm and its applications to single-cell sequencing. J Comput Biol 19:455–477.
  11. Prjibelski A, Antipov D, Meleshko D, Lapidus A, Korobeynikov A. 2020. Using SPAdes De Novo Assembler. Curr Protoc Bioinformatics 70:e102.
  12. Seemann T. 2014. Prokka: rapid prokaryotic genome annotation. Bioinformatics 30:2068–2069.
  13. Aziz RK, Bartels D, Best AA, DeJongh M, Disz T, Edwards RA, Formsma K, Gerdes S, Glass EM, Kubal M, Meyer F, Olsen GJ, Olson R, Osterman AL, Overbeek RA, McNeil LK, Paarmann D, Paczian T, Parrello B, Pusch GD, Reich C, Stevens R, Vassieva O, Vonstein V, Wilke A, Zagnitko O. 2008. The RAST Server: rapid annotations using subsystems technology. BMC Genomics 9:75.
  14. Overbeek R, Olson R, Pusch GD, Olsen GJ, Davis JJ, Disz T, Edwards RA, Gerdes S, Parrello B, Shukla M, Vonstein V, Wattam AR, Xia F, Stevens R. 2014. The SEED and the Rapid Annotation of microbial genomes using Subsystems Technology (RAST). Nucleic Acids Res 42:D206-214.
  15. Brettin T, Davis JJ, Disz T, Edwards RA, Gerdes S, Olsen GJ, Olson R, Overbeek R, Parrello B, Pusch GD, Shukla M, Thomason JA, Stevens R, Vonstein V, Wattam AR, Xia F. 2015. RASTtk: a modular and extensible implementation of the RAST algorithm for building custom annotation pipelines and annotating batches of genomes. Sci Rep 5:8365.
  16. Chaumeil P-A, Mussig AJ, Hugenholtz P, Parks DH. 2019. GTDB-Tk: a toolkit to classify genomes with the Genome Taxonomy Database. Bioinformatics btz848.
  17. Parks DH, Chuvochina M, Waite DW, Rinke C, Skarshewski A, Chaumeil P-A, Hugenholtz P. 2018. A standardized bacterial taxonomy based on genome phylogeny substantially revises the tree of life. Nat Biotechnol 36:996–1004.
  18. Parks DH, Chuvochina M, Chaumeil P-A, Rinke C, Mussig AJ, Hugenholtz P. 2020. A complete domain-to-species taxonomy for Bacteria and Archaea. Nat Biotechnol 38:1079–1086.
  19. Shaffer M, Borton MA, McGivern BB, Zayed AA, La Rosa SL, Solden LM, Liu P, Narrowe AB, Rodríguez-Ramos J, Bolduc B, Gazitúa MC, Daly RA, Smith GJ, Vik DR, Pope PB, Sullivan MB, Roux S, Wrighton KC. 2020. DRAM for distilling microbial metabolism to automate the curation of microbiome function. Nucleic Acids Res 48:8883–8900.
  20. Meier-Kolthoff JP, Göker M. 2019. TYGS is an automated high-throughput platform for state-of-the-art genome-based taxonomy. Nat Commun 10:2182.
  21. Meier-Kolthoff JP, Carbasse JS, Peinado-Olarte RL, Göker M. 2022. TYGS and LPSN: a database tandem for fast and reliable genome-based classification and nomenclature of prokaryotes. Nucleic Acids Res 50:D801–D807.

Apps

  1. Assess Quality of Assemblies with QUAST - v4.4
    • [1] Gurevich A, Saveliev V, Vyahhi N, Tesler G. QUAST: quality assessment tool for genome assemblies. Bioinformatics. 2013;29: 1072 1075. doi:10.1093/bioinformatics/btt086
    • [2] Mikheenko A, Valin G, Prjibelski A, Saveliev V, Gurevich A. Icarus: visualizer for de novo assembly evaluation. Bioinformatics. 2016;32: 3321 3323. doi:10.1093/bioinformatics/btw379
  2. Build GenomeSet - v1.7.6
    • Arkin AP, Cottingham RW, Henry CS, Harris NL, Stevens RL, Maslov S, et al. KBase: The United States Department of Energy Systems Biology Knowledgebase. Nature Biotechnology. 2018;36: 566. doi: 10.1038/nbt.4163
  3. Classify Microbes with GTDB-Tk - v1.7.0
    • Pierre-Alain Chaumeil, Aaron J Mussig, Philip Hugenholtz, Donovan H Parks, GTDB-Tk: a toolkit to classify genomes with the Genome Taxonomy Database, Bioinformatics, Volume 36, Issue 6, 15 March 2020, Pages 1925 1927. DOI: https://doi.org/10.1093/bioinformatics/btz848
    • Parks, D., Chuvochina, M., Waite, D. et al. A standardized bacterial taxonomy based on genome phylogeny substantially revises the tree of life. Nat Biotechnol 36, 996 1004 (2018). DOI: https://doi.org/10.1038/nbt.4229
    • Parks DH, Chuvochina M, Chaumeil PA, Rinke C, Mussig AJ, Hugenholtz P. A complete domain-to-species taxonomy for Bacteria and Archaea. Nat Biotechnol. 2020;10.1038/s41587-020-0501-8. DOI:10.1038/s41587-020-0501-8
    • Rinke C, Chuvochina M, Mussig AJ, Chaumeil PA, Dav n AA, Waite DW, Whitman WB, Parks DH, and Hugenholtz P. A standardized archaeal taxonomy for the Genome Taxonomy Database. Nat Microbiol. 2021 Jul;6(7):946-959. DOI:10.1038/s41564-021-00918-8
    • Matsen FA, Kodner RB, Armbrust EV. pplacer: linear time maximum-likelihood and Bayesian phylogenetic placement of sequences onto a fixed reference tree. BMC Bioinformatics. 2010;11:538. Published 2010 Oct 30. doi:10.1186/1471-2105-11-538
    • Jain C, Rodriguez-R LM, Phillippy AM, Konstantinidis KT, Aluru S. High throughput ANI analysis of 90K prokaryotic genomes reveals clear species boundaries. Nat Commun. 2018;9(1):5114. Published 2018 Nov 30. DOI:10.1038/s41467-018-07641-9
    • Hyatt D, Chen GL, Locascio PF, Land ML, Larimer FW, Hauser LJ. Prodigal: prokaryotic gene recognition and translation initiation site identification. BMC Bioinformatics. 2010;11:119. Published 2010 Mar 8. DOI:10.1186/1471-2105-11-119
    • Price MN, Dehal PS, Arkin AP. FastTree 2--approximately maximum-likelihood trees for large alignments. PLoS One. 2010;5(3):e9490. Published 2010 Mar 10. DOI:10.1371/journal.pone.0009490 link: https://www.ncbi.nlm.nih.gov/pmc/articles/PMC2835736/
    • Eddy SR. Accelerated Profile HMM Searches. PLoS Comput Biol. 2011;7(10):e1002195. DOI:10.1371/journal.pcbi.1002195
  4. Insert Set of Genomes Into SpeciesTree - v2.2.0
    • Price MN, Dehal PS, Arkin AP. FastTree 2 Approximately Maximum-Likelihood Trees for Large Alignments. PLoS One. 2010;5. doi:10.1371/journal.pone.0009490