Generated March 21, 2025

Draft genome assemblies from seven Bacillaceae isolates from woodland soil

Abstract

We isolated 7 endospore forming bacteria from campus woodland and sequenced their genomes using Illumina NextSeq. We share the draft genome assemblies for strains Bacillus wiedmanii_SC129, Bacillus pseudomycoides_SC131, Bacillus pumilis_SC133, Peribacillus butanolivorans_SC135, Bacillus thuringiensis_SC136, Priestia megaterium_SC138, and Bacillus wiedmanii_SC141. Draft genomes are between 3645032-5969865 bp and 34.8-41.2 % GC.

Introduction

Endospore forming bacteria are frequently found in a variety of environments including terrestrial soils and many are considered to be plant growth promoting bacteria (1). The publication by Anna L. McLoon, Julia Barker, Gillian Churan, Anthony Cucca, Lillian Gardner, Justin Gejo, Francesca Gerbasi, Michaela Higgins, Lillian Kronau, Jalin Le, Alyssa Lunman, Kelly Maune, Veezen Denise Mondelo, Madeline Naef, Caitlin Rigby, Caitlin Spiliotis can be found here: [URL]

Table of Contents

  1. Background and Experimental Methods
  2. Import and annotation
  3. QC, Assembly, and Annotation
  4. Taxonomic Classification
  5. References
Narratives and data were generated by: Anna L. McLoon, Julia Barker, Gillian Churan, Anthony Cucca, Lillian Gardner, Justin Gejo, Francesca Gerbasi, Michaela Higgins, Lillian Kronau, Jalin Le, Alyssa Lunman, Kelly Maune, Veezen Denise Mondelo, Madeline Naef, Caitlin Rigby, Caitlin Spiliotis This umbrella narrative was created by Anna L. McLoon

Note to Authors

The publication may not be available at the time of the static Narrative creation. This can be added after the fact; please contact [email protected] to update the DOI landing page when this is done.

Background and Experimental Methods

</form>

Sample Collection

We collected two soil cores from wooded areas of a suburban college campus, one a sloped, wooded area between buildings (42.7205, -73.7534), and another in a wooded area near a small wetland (42.7199, -73.7488) on January 23, 2024. Strain Bacillus wiedmannii SC129 was isolated from the organic horizon and strains Priestia megaterium SC138 and Bacillus wiedmannii SC141 from the mineral horizon from location 42.7205, -73.7534 (the wooded slope between campus buildings). Strains Bacillus pumilus SC133 and Peribacillus butanolivorans SC135 were isolated from the organic soil horizon and strains Bacillus pseudomycoides SC131 and Bacillus thuringiensis SC136 were isolated from the mineral horizon from location 42.7199, -73.7488 (the wooded area near a small wetland). We separated mineral and organic horizons, and to isolate individual endospores, we mixed approximately 100 μl of soil with 1 ml of sterile water, vortexed, and heated at 95°C for 10 minutes.

We characterized the sites and soil samples by measuring temperature, leaf litter depth, soil moisture, and soil pH using both water and calcium chloride solution (2-4):

 

Table 1: Characterization of soil samples
Site 1 Coordinates Temperature leaf litter depth % moisture pH (water) pH (calcium chloride)
organic 42.7204552, -73.7533970 3C/37.4F 2 cm 38.9 7.45 6.48
mineral 42.7204552, -73.7533970 3C/37.4F 2 cm 18.72 6.3 5.19
Site 2 Coordinates Temperature leaf litter depth % moisture pH (water) pH (calcium chloride)
organic 42.7198559, -73.7488097 2.0C/37.4F 4 cm 33.4 7.04 6.22
mineral 42.7198559, -73.7488097 2.0C/37.4F 4 cm 20.9 6.04 5.01

 

Isolation

Suspended cells were spread onto Tryptic soy agar + 5% sheep blood using a sterile swab and incubated overnight at 37° C as described previously (5,6). Isolates were colony purified on Tryptic Soy Agar plates. Tryptic soy broth cultures were grown for approximately 4 hours, then DNA was isolated using a Promega DNA wizard kit, following peptidoglycan digestion for an hour (1.15 μg/μl) in 50 mM EDTA.

Genome Sequencing

Genomic DNA libraries were prepared using the tagmentation based Illumina DNA prep kit and IDT 10 bp unique dual indices and sequenced as paired end 151 bp reads by SeqCenter (Pittsburgh, PA) using an Illumina NovaSeq X Plus sequencer. Demultiplexing, quality control, and adapter trimming were performed by the SeqCenter with bcl-convert v4.2.4. Accession numbers and individual KBase narratives are in table 2.

Table 2: Access to Narrative workflows and sequence data

Strain Kbase narrative used for analyses Bioproject accession number SRA Biosample GenBank accession
Bacillus wiedmannii strain SC129

https://kbase.us/n/172257/32/

PRJNA862062 SAMN43273684 JBLOJC000000000
Bacillus pseudomycoides strain SC131

https://kbase.us/n/172258/23/

PRJNA862062 SAMN43273685 JBLOJB000000000
Bacillus pumilis strain SC133

https://kbase.us/n/172259/45/

PRJNA862062 SAMN43273686 JBLOJA000000000
Peribacillus butanolivorans strain SC135

https://kbase.us/n/172260/62/

PRJNA862062 SAMN43273687 JBLOIZ000000000
Bacillus thuringiensis strain SC136

https://kbase.us/n/172261/23/

PRJNA862062 SAMN43273688 JBLOIY000000000
Priestia megaterium strain SC138

https://kbase.us/n/172262/51/

PRJNA862062 SAMN43273689 JBLOIX000000000
Bacillus wiedmanii strain SC141

https://kbase.us/n/172263/26/

PRJNA862062 SAMN43273690 JBLOIW000000000

QC, Assembly, and Annotation, and taxonomic clasification.

Sequence reads from each isolate were imported into separate narratives in the KBase environment for analysis (7,8). We checked read quality with FastQC v0.12.1, trimmed reads with Trimmomatic v0.36, assembled genomes de novo using SPAdes v3.15.3, and annotated the assemblies using RASTtk v1.073, Prokka v1.14.5, and DRAM v0.1.2 (9-18). We determined probable species identities using TYGS and GTDB-Tk v1.7.0, and results were concordant between programs (19-23). All programs were run using default parameters. The draft genomes range in completeness consisting of between 27 and 100 contigs, and range in size from 3,645,032 to 5,969,865 base pairs of sequence (table 2). For relative placement of species into a phylogenetic tree, see tree built below in this umbrella narrative.We also ran the draft genome assemblies through the antiSMASH secondary metabolite prediction program (24, 25).

Draft genomes range in size from 3,645,032 bp to 5,969,865 bp and from 34.8 - 41.2 % GC. Each strain is predicted to make 8 to 16 unique secondary metabolites including terpenes, RiPPs, non-ribosomally synthesized peptides, and NI-siderophores among others. While many strains are predicted to make at least one well characterized natural product including paeninodin and petrobactin, many operons identified have low or no similarity to known secondary metabolites (24–28).Therefore, these strains and genomes represent useful additions to our knowledge of soil microbes and potential sources of beneficial natural products.

 

Table 2: Data summary

strain SC129 SC131 SC133 SC135 SC136 SC138 SC141
Site 42.7204552, -73.7533970 42.7198559, -73.7488097 42.7198559, -73.7488097 42.7198559, -73.7488097 42.7198559, -73.7488097 42.7204552, -73.7533970 42.7204552, -73.7533970
soil horizon organic mineral organic organic mineral mineral mineral
species Bacillus wiedmannii Bacillus pseudomycoides Bacillus pumilis Peribacillus butanolivorans Bacillus thuringiensis Priestia megaterium Bacillus wiedmannii
number of reads 5,545,784 6,877,022 6,439,688 6,974,014 6,648,606 5,923,568 5,481,716
number of contigs 31 100 40 46 31 41 27
Total length (bp) 5,485,886 5,216,294 3,645,032 5,859,099 5,969,865 5,853,104 5,671,044
N50 781,638 195,779 218,892 299,378 634,513 903,089 927,123
% GC 35.2 35.5 41.2 37.9 34.8 37.5 35.1
Predicted genes (Prokka via KBase) 5,742 5,258 3,695 5,635 5,953 6,063 5,762
Number of predicted secondary operons (antiSMASH) 12 12 10 8 16 8 10
predicted natural product types (specific known cluster name and similarity) terpene (molybdenum cofactor 17%), 4 NRPS (anachelin 10%, bacillibactin, 85%), 3 RiPP-like, betalactone (fengycin 40%), LAP, NI-siderophore (petrobactin 100%), cyclic lactone autoinducer/ thiopeptide lassopeptide (paeninodin 100%), LAP, 5 NRPS (bacillibactin 85%, desmamide A/B/C 18%), lanthipeptide class II (plantaricin W 66%), terpene, betalactone (fengycin 40%), 2 RiPP-like 2 betalactone (fengycin 53%, other (bacilysin 85%), T1PKS (zwittermicin A 18%), NRPS (lichenysin 92%), NI-siderophore (schizokinen 60%), terpene, T3PKS, RRE containing, RiPP-like 2 terpenes, 2 lassopeptide (paeninodin 100%), betalactone (fengycin 46%), NI-siderophore (schizokinen 60%), LAP, T3PKS LAP, 6 NRPS (bacillibactin 85%), NI-siderophore (petrobactin 100%), betalactone (fengycin 40%), 2 RiPP-like, NRPS-like, terpene (molybdenum cofactor 17%), ranthipeptide, 2 RRE containing (thuricin CD 100%) 3 terpene (50% carotenoid, 13% surfactin), phosphonate, RRE-containing, T3PKS, NI-siderophore (62% schizokinen), lanthipeptide class i LAP, 3 NRPS (bacillibactin 85% ), NI-siderophore (petrobactin, 100%), betalactone (fengycin 40%), lassopeptide (paeninodin 100%), terpene, 2 RiPP-like
Allows users to create a GenomeSet object.
This app completed without errors in 46s.
Objects
Created Object Name Type Description
Soil_endospore_formers GenomeSet KButil_Build_GenomeSet
Summary
genomes in output set Soil_endospore_formers: 7
Add a user-provided GenomeSet to a KBase SpeciesTree.
This app completed without errors in 5m 39s.
Files
These are only available in the live Narrative: https://narrative.kbase.us/narrative/189485
  • soil_endospore_former_tree.newick
  • soil_endospore_former_tree-labels.newick
  • soil_endospore_former_tree.png
  • soil_endospore_former_tree.pdf

References

 

  1. Tsotetsi T, Nephali L, Malebe M, Tugizimana F. 2022. Bacillus for Plant Growth Promotion and Stress Resilience: What Have We Learned? Plants (Basel) 11:2482.
  2.  

  3. Dow E, McLoon AL, Goller CC, Wood-Charlson E, Allen B, Schirmer A, Meier C. 2025. Soil microbiome sample collection protocol (adapted from NEON). protocols.io DOI:

     

  4. Dow E, McLoon AL, Wood-Charlson E,Goller CC, Schirmer A, Allen B, Meier C. 2025. Soil moisture measurements (adapted from NEON protocols). protocols.io DOI:

     

  5. Dow E, McLoon AL, Goller CC, Allen B, Wood-Charlson E, Schirmer A, Meier C. 2025. Soil pH measurements (adapted from NEON protocols). protocols.io DOI:

     

  6. McLoon AL, Awad TT, Bogardus MF, Buono MG, Devine KA, Draper RM, Femenella B, Gallagher HM, Morelock LA, Razi M, Rennick JR, Sheridan AK, Thibault RJ, Touchette KL, Zuchowski GE. 2022. Draft Genome Sequences for 6 Isolates of Endospore-Forming Class Bacilli Species Isolated from Soil from a Suburban, Wooded, Developed Space. Microbiol Resour Announc 11:e0087422.
  7.  

  8. McLoon AL, Ackaah Asante P, Anderson T, Cahill K, Cochrane D, Cohen K, German J, Hrubes CM, LaCroix I, McNamee K, Mossakowski A, Nichter AM, Pepe JL, Schofield AT. 2024. Five draft genome assemblies from Bacillaceae isolated from a degraded wetland environment. Microbiol Resour Announc 13:e0084523.
  9.  

  10. Arkin AP, Cottingham RW, Henry CS, Harris NL, Stevens RL, Maslov S, Dehal P, Ware D, Perez F, Canon S, Sneddon MW, Henderson ML, Riehl WJ, Murphy-Olson D, Chan SY, Kamimura RT, Kumari S, Drake MM, Brettin TS, Glass EM, Chivian D, Gunter D, Weston DJ, Allen BH, Baumohl J, Best AA, Bowen B, Brenner SE, Bun CC, Chandonia J-M, Chia J-M, Colasanti R, Conrad N, Davis JJ, Davison BH, DeJongh M, Devoid S, Dietrich E, Dubchak I, Edirisinghe JN, Fang G, Faria JP, Frybarger PM, Gerlach W, Gerstein M, Greiner A, Gurtowski J, Haun HL, He F, Jain R, Joachimiak MP, Keegan KP, Kondo S, Kumar V, Land ML, Meyer F, Mills M, Novichkov PS, Oh T, Olsen GJ, Olson R, Parrello B, Pasternak S, Pearson E, Poon SS, Price GA, Ramakrishnan S, Ranjan P, Ronald PC, Schatz MC, Seaver SMD, Shukla M, Sutormin RA, Syed MH, Thomason J, Tintle NL, Wang D, Xia F, Yoo H, Yoo S, Yu D. 2018. KBase: The United States Department of Energy Systems Biology Knowledgebase. Nat Biotechnol 36:566–569.
  11.  

  12. Allen B, Drake M, Harris N, Sullivan T. 2017. Using KBase to Assemble and Annotate Prokaryotic Genomes. Curr Protoc Microbiol 46:1E.13.1-1E.13.18.
  13.  

  14. Andrews, Simon. 2019. FastQC: a quality control tool for high throughput sequence data. Available online at: http://www.bioinformatics.babraham.ac.uk/projects/fastqc (0.11.9).
  15.  

  16. Bolger AM, Lohse M, Usadel B. 2014. Trimmomatic: a flexible trimmer for Illumina sequence data. Bioinformatics 30:2114–2120.
  17.  

  18. Bankevich A, Nurk S, Antipov D, Gurevich AA, Dvorkin M, Kulikov AS, Lesin VM, Nikolenko SI, Pham S, Prjibelski AD, Pyshkin AV, Sirotkin AV, Vyahhi N, Tesler G, Alekseyev MA, Pevzner PA. 2012. SPAdes: a new genome assembly algorithm and its applications to single-cell sequencing. J Comput Biol 19:455–477.
  19.  

  20. Prjibelski A, Antipov D, Meleshko D, Lapidus A, Korobeynikov A. 2020. Using SPAdes De Novo Assembler. Curr Protoc Bioinformatics 70:e102.
  21.  

  22. Seemann T. 2014. Prokka: rapid prokaryotic genome annotation. Bioinformatics 30:2068–2069.
  23.  

  24. Aziz RK, Bartels D, Best AA, DeJongh M, Disz T, Edwards RA, Formsma K, Gerdes S, Glass EM, Kubal M, Meyer F, Olsen GJ, Olson R, Osterman AL, Overbeek RA, McNeil LK, Paarmann D, Paczian T, Parrello B, Pusch GD, Reich C, Stevens R, Vassieva O, Vonstein V, Wilke A, Zagnitko O. 2008. The RAST Server: rapid annotations using subsystems technology. BMC Genomics 9:75.
  25.  

  26. Overbeek R, Olson R, Pusch GD, Olsen GJ, Davis JJ, Disz T, Edwards RA, Gerdes S, Parrello B, Shukla M, Vonstein V, Wattam AR, Xia F, Stevens R. 2014. The SEED and the Rapid Annotation of microbial genomes using Subsystems Technology (RAST). Nucleic Acids Res 42:D206-214.
  27.  

  28. Brettin T, Davis JJ, Disz T, Edwards RA, Gerdes S, Olsen GJ, Olson R, Overbeek R, Parrello B, Pusch GD, Shukla M, Thomason JA, Stevens R, Vonstein V, Wattam AR, Xia F. 2015. RASTtk: a modular and extensible implementation of the RAST algorithm for building custom annotation pipelines and annotating batches of genomes. Sci Rep 5:8365.
  29.  

  30. Shaffer M, Borton MA, McGivern BB, Zayed AA, La Rosa SL, Solden LM, Liu P, Narrowe AB, Rodríguez-Ramos J, Bolduc B, Gazitúa MC, Daly RA, Smith GJ, Vik DR, Pope PB, Sullivan MB, Roux S, Wrighton KC. 2020. DRAM for distilling microbial metabolism to automate the curation of microbiome function. Nucleic Acids Res 48:8883–8900.
  31.  

  32. Li W, O’Neill KR, Haft DH, DiCuccio M, Chetvernin V, Badretdin A, Coulouris G, Chitsaz F, Derbyshire MK, Durkin AS, Gonzales NR, Gwadz M, Lanczycki CJ, Song JS, Thanki N, Wang J, Yamashita RA, Yang M, Zheng C, Marchler-Bauer A, Thibaud-Nissen F. 2021. RefSeq: expanding the Prokaryotic Genome Annotation Pipeline reach with protein family model curation. Nucleic Acids Res 49:D1020–D1028.
  33.  

  34. Chaumeil P-A, Mussig AJ, Hugenholtz P, Parks DH. 2019. GTDB-Tk: a toolkit to classify genomes with the Genome Taxonomy Database. Bioinformatics btz848.
  35.  

  36. Parks DH, Chuvochina M, Waite DW, Rinke C, Skarshewski A, Chaumeil P-A, Hugenholtz P. 2018. A standardized bacterial taxonomy based on genome phylogeny substantially revises the tree of life. Nat Biotechnol 36:996–1004.
  37.  

  38. Parks DH, Chuvochina M, Chaumeil P-A, Rinke C, Mussig AJ, Hugenholtz P. 2020. A complete domain-to-species taxonomy for Bacteria and Archaea. Nat Biotechnol 38:1079–1086.
  39.  

  40. Meier-Kolthoff JP, Carbasse JS, Peinado-Olarte RL, Göker M. 2022. TYGS and LPSN: a database tandem for fast and reliable genome-based classification and nomenclature of prokaryotes. Nucleic Acids Res 50:D801–D807.
  41.  

  42. Meier-Kolthoff JP, Göker M. 2019. TYGS is an automated high-throughput platform for state-of-the-art genome-based taxonomy. Nat Commun 10:2182.
  43.  

  44. Blin K, Shaw S, Augustijn HE, Reitz ZL, Biermann F, Alanjary M, Fetter A, Terlouw BR, Metcalf WW, Helfrich EJN, van Wezel GP, Medema MH, Weber T. 2023. antiSMASH 7.0: new and improved predictions for detection, regulation, chemical structures and visualisation. Nucleic Acids Res 51:W46–W50.
  45.  

  46. Medema MH, Blin K, Cimermancic P, de Jager V, Zakrzewski P, Fischbach MA, Weber T, Takano E, Breitling R. 2011. antiSMASH: rapid identification, annotation and analysis of secondary metabolite biosynthesis gene clusters in bacterial and fungal genome sequences. Nucleic Acids Res 39:W339-346.
  47.  

  48. Barbeau K, Zhang G, Live DH, Butler A. 2002. Petrobactin, a photoreactive siderophore produced by the oil-degrading marine bacterium Marinobacter hydrocarbonoclasticus. J Am Chem Soc 124:378–379.
  49.  

  50. Koppisch AT, Browder CC, Moe AL, Shelley JT, Kinkel BA, Hersman LE, Iyer S, Ruggiero CE. 2005. Petrobactin is the primary siderophore synthesized by Bacillus anthracis str. Sterne under conditions of iron starvation. Biometals 18:577–585.
  51.  

  52. Zhu S, Hegemann JD, Fage CD, Zimmermann M, Xie X, Linne U, Marahiel MA. 2016. Insights into the Unique Phosphorylation of the Lasso Peptide Paeninodin. J Biol Chem 291:13662–13678.

Final Steps

Requesting a DOI

Once you have completed the analysis steps in KBase and are ready for a public Narrative DOI, send an email to Zach Crockett ([email protected]) and CC Elisha Wood-Charlson. You should delete the top two cells (Introduction to the Narrative and Resources) and this cell before finalizing the Narrative, and remove any pre-filled text from the other Markdown cells. To get a DOI, we will need a public landing page in the form of static Narrative. To create a static Narrative, make the Narrative public, then select the "Manage Static Narratives" option under the Share menu (see documentation for more information). We will also need some additional metadata not captured in the Narrative, which we will request through the email. If you are not ready to make the Narrative public but do want to start the DOI process, we can reserve a DOI which you can use as when writing the manuscript, but which is not publicly available. Please reach out to us if you have any questions and/or are ready to create or reserve a DOI.

Using a "Dataset" DOI

The external infrastructure that tracks and records relations between citable scientific products only includes the references section, which is sent as structured metadata from the publishers. As such, you must include your Narrative DOI in the references for the bibliometrics infrastructure to recognize it.

This requirement is to enable the machine-readability of the references, but in general the intuitive place to provide a link to the Narrative is in data availability statement. It is perfectly acceptable and even recommended to provide the Narrative DOI in both places as there is no risk of duplicating citation counts between the human- and machine-readable references.

When referring to the Narrative, use the DOI instead of the URL for the static Narrative. We can and will edit the DOI so that it resolves to the most recent version, and this provides forward compatibility in case we ever need to change the URL format for static Narratives.

Depositing in NCBI

Most users deposit reads in the NBCI Sequence Read Archive (SRA). To submit at SRA, login to their Submission Portal. The portal walks users through the steps required for submission, but the metadata submission can be difficult to format correctly. NCBI uses the Minimum Information about any (x) Sequence (MIxS) templates for submission. MIxS is not single template but a collection of templates for different sequence types, such as genomes (MIGS) or metagenomes (MIMS). We recommend reading the MIxS publication to learn more about formatting metadata for the templates. The SRA Submission Portal will help choose which template fits your data.

Apps

  1. Build GenomeSet - v1.7.6
    • Chivian D, Jungbluth SP, Dehal PS, Wood-Charlson EM, Canon RS, Allen BH, Clark MM, Gu T, Land ML, Price GA, Riehl WJ, Sneddon MW, Sutormin R, Zhang Q, Cottingham RW, Henry CS, Arkin AP. Metagenome-assembled genome extraction and analysis from microbiomes using KBase. Nat Protoc. 2023 Jan;18(1):208-238. doi: 10.1038/s41596-022-00747-x
    • Arkin AP, Cottingham RW, Henry CS, Harris NL, Stevens RL, Maslov S, et al. KBase: The United States Department of Energy Systems Biology Knowledgebase. Nature Biotechnology. 2018;36: 566. doi: 10.1038/nbt.4163
  2. Insert Set of Genomes Into SpeciesTree - v2.2.0
    • Price MN, Dehal PS, Arkin AP. FastTree 2 Approximately Maximum-Likelihood Trees for Large Alignments. PLoS One. 2010;5. doi:10.1371/journal.pone.0009490