Generated May 2, 2022

Complete Genome Sequence of Bacillus cereus strain CPT56D-587-MTF Isolated from a Nitrate and Metals Contaminated Subsurface Environment

Introduction

Bacillus cereus strain CPT56D-587-MTF (also referred to as CPTF) was isolated from a nitrate and heavy-metal contaminated site at the Oak Ridge Field Research Center (ORFRC) in Oak Ridge, Tennessee, USA (1). Strain CPTF has 100% 16S rRNA gene identity to the most abundant metagenome 16S v4 region amplicon sequence variant (ASV) in the soils of the highly contaminated Area 3, the site immediately adjacent to the contamination source: the former S-3 ponds.

Publication

Goff, Jennifer L., Lauren M. Lui, Torben N. Nielsen, Michael P. Thorgersen, Elizabeth G. Szink, John-Marc Chandonia, Farris L. Poole, Jizhong Zhou, Terry C. Hazen, Adam P. Arkin, and Michael W. W. Adams. "Complete Genome Sequence of Bacillus cereus Strain CPT56D-587-MTF, Isolated from a Nitrate- and Metal-Contaminated Subsurface Environment." Microbiology Resource Announcements (e00145-22). https://dx.doi.org/doi:10.1128/mra.00145-22.

External Data Availability

The whole-genome sequencing project has been deposited in GenBank under the accession number GCA_021391515.1 The raw sequence reads have been deposited in the SRA under the accession number PRJNA791653

Table of Contents

  1. Background and Experimental Methods
  2. QC and Assembly
  3. Import and Annotation
  4. Taxonomic Classification
  5. Metabolic Modeling and Flux Balance Analysis
  6. References
Narrative created by Jennifer L. Goff

Background and Experimental Methods

Sample Collection


A soil sample (long. -84.27335º, lat. 35.977268º, depth 535.94 cm) was collected from ORR Area 3 which lies immediately adjacent to the former S-3 ponds. The sampling was done in October 2020. Samples were stored at -20°C until ready for use.

Isolation

  1. Initial enrichment was carried out anoxically in R2A medium amended with 10 mM nitrite and 100 mM KH2PO4 with the pH adjusted to 5.5 and inoculated with ~1 g of soil.
  2. Following a week of incubation at room temperature, isolates were obtained by streak-plating onto LB agar plates. Isolated colonies were selected for further characterization.
  3. Purity of the isolate was confirmed by gram-staining and microscopy, streak plating and observation of colony morphology, and 16S Sanger sequencing.

Genome Sequencing

  1. To obtain a cell pellet for genomic DNA extraction, strain CPTF was grown in R2A media at 30 ºC shaking at 200 RPM for ~24 hours.
  2. For the first round of digestion, the cell pellet was resuspended in 750 µL of PBS, 25 µL of MetaPolyzyme (Sigma-Aldrich) and 25 µL of Qiagen lytic enzyme solution and incubated at 37 ºC for 30 minutes.
  3. The second round of digestion was performed in 167 µL of 6X Qiagen Buffer B1 (300 mM Tris-Cl pH 8.0, 300 mM EDTA pH 8.0, 3% Tween 20, 3% Triton-X100), 35 µL Proteinase K, and 2µL RNAse A with incubation at 50 ºC for 30 minutes at 50 RPM.
  4. The lysate was processed with the Genomic-Tip 20/G kit (Qiagen) according to the manufacturer’s directions. The presence of high molecular weight (HMW) DNA was confirmed by running the DNA on a 0.5% agarose gel with Quick-Load 1kb Extend DNA ladder (New England BioLabs).
  5. The HMW DNA was prepared for nanopore sequencing. End-repair was performed using the NEBNext® Companion Module for Oxford Nanopore Technologies® Ligation Sequencing (New England BioLabs) according to manufacturer’s instructions. The Native Barcoding Expansion (EXP-NBD104, Oxford Nanopore Technologies) and Ligation Sequencing kit s(LSK-SQK109 kit, Oxford Nanopore Technologies) were used for barcoding and adapter ligation.
  6. The HMW DNA was prepared for Illumina library creation by needle shearing. The Illumina library was made using the Illumina DNA prep kit according to manufacturer’s instructions.
  7. The nanopore library was sequenced on a R9.4.1 flow cell on a MinION device (Oxford Nanopore Technologies). The Illumina library was sequenced using 2x150bp reads on a NovaSeq 6000 by Novogene.

Genome Assembly

QC and Assembly

  1. For Illumina data, adapters were removed by Novogene in-house and then further processed using BBTools (https://jgi.doe.gov/data-and-tools/bbtools) for trimming and quality filtering as described in Lui et. al. (2021) (2).
  2. Nanopore base calling, adapter removal, demultiplexing, and quality filtering were performed with Guppy v4.0.
  3. The genome was assembled using the nanopore and Illumina reads as inputs to the hybrid assembler Unicycler v0.4.8 (3) using default parameters.
  4. Unicycler logs were checked to confirm that the assembly passed quality thresholds and that the DNA elements were circularized.

Genome Statistics


The completed genome contains 6,548,342 bp in 9 contigs with a G+C content of 35.37%. Contig 1 is the circularized chromosome. Contigs 2-9 are predicted plasmids.

Import and Annotation

  1. The CPTF genome assembly was imported into KBase using the default parameters in the Import FASTA File as Assembly from Staging Area application.
  2. The genome was annotated in KBase using the Annotate Microbial Assembly application which is based on RASTtk v1.073 and using default parameters.
  3. The circular genome was visualized using the KBase Circular Genome Visualization Tool.
  4. The quality of the genome was assessed using the KBase Assess Genome Quality with CheckM-v1.0.18 application.
Import a FASTA file from your staging area into your Narrative as an Assembly data object
This app completed without errors in 1m 40s.
Objects
Created Object Name Type Description
CPT56D-587-MTF_contigs_KBase.fasta_assembly Assembly Imported Assembly
Links
Annotate a bacterial or archaeal assembly using RASTtk (Rapid Annotations using Subsystems Technology toolkit).
This app completed without errors in 7m 20s.
Objects
Created Object Name Type Description
CPT56D-587-MTF Genome Annotated genome
Summary
The RAST algorithm was applied to annotating a genome sequence comprised of 9 contigs containing 6548342 nucleotides. 
No initial gene calls were provided.
Standard features were called using: glimmer3; prodigal.
A scan was conducted for the following additional feature types: rRNA; tRNA; selenoproteins; pyrrolysoproteins; repeat regions; crispr.
The genome features were functionally annotated using the following algorithm(s): Kmers V2; Kmers V1; protein similarity.
In addition to the remaining original 0 coding features and 0 non-coding features, 7702 new features were called, of which 737 are non-coding.
Output genome has the following feature types:
	Coding gene                     6965 
	Non-coding repeat                604 
	Non-coding rna                   133 
Overall, the genes have 4024 distinct functions. 
The genes include 2027 genes with a SEED annotation ontology across 1434 distinct SEED functions.
The number of distinct functions can exceed the number of genes because some genes have multiple functions.
v1 - KBaseGenomes.Genome-11.0
The viewer for the data in this Cell is available at the original Narrative here: https://narrative.kbase.us/narrative/105874
Generate a map and annotations of circular genomes using CGView.
This app completed without errors in 5m 40s.
Files
These are only available in the live Narrative: https://narrative.kbase.us/narrative/105874
  • KBase_derived_CPT56D-587-MTF.png
  • KBase_derived_CPT56D-587-MTF.jpg
  • KBase_derived_CPT56D-587-MTF.svg
Runs the CheckM lineage workflow to assess the genome quality of isolates, single cells, or genome bins from metagenome assemblies through comparison to an existing database of genomes.
This app completed without errors in 6m 59s.
Links
Files
These are only available in the live Narrative: https://narrative.kbase.us/narrative/105874
  • CheckM_summary_table.tsv.zip - TSV Summary Table from CheckM
  • full_output.zip - Full output of CheckM
  • plots.zip - Output plots from CheckM

Taxonomic Identification

  1. Taxonomic identification was performed using the KBase Classify Microbes with GTDB-Tk-v1.7.0 application.
  2. A phylogenetic tree was constructed using the Insert Genome Into Species Tree v2.2.0 application.
Obtain objective taxonomic assignments for bacterial and archaeal genomes based on the Genome Taxonomy Database (GTDB) ver R06-RS202
This app completed without errors in 1h 13m 52s.
Links
Add one or more Genomes to a KBase SpeciesTree.
This app completed without errors in 4m 57s.
Files
These are only available in the live Narrative: https://narrative.kbase.us/narrative/105874
  • CPT56D-587-MTF_genome_tree.newick
  • CPT56D-587-MTF_genome_tree-labels.newick
  • CPT56D-587-MTF_genome_tree.png
  • CPT56D-587-MTF_genome_tree.pdf

Metabolic Modeling and Flux Balance Analysis

  1. A draft metabolic model based on the annotated genome was constructed using the KBase Build Metabolic Model v2.0.0 application.
  2. A Flux Balance Analysis using the the draft metabolic model was performed using the KBase Run Flux Balance Analysis v2.0.0 application.
Construct a draft metabolic model based on an annotated genome.
This app completed without errors in 2m 24s.
Objects
Created Object Name Type Description
CPT56D-587-MTF_metabolic_model FBAModel FBAModel-14 CPT56D-587-MTF_metabolic_model
CPT56D-587-MTF_metabolic_model.gf.1 FBA FBA-13 CPT56D-587-MTF_metabolic_model.gf.1
Report
Summary
RefGlucoseMinimal media.
Output from Build Metabolic Model
The viewer for the output created by this App is available at the original Narrative here: https://narrative.kbase.us/narrative/105874
Predict metabolite fluxes in a metabolic model of an organism grown on a given media using flux balance analysis (FBA).
This app completed without errors in 42s.
Objects
Created Object Name Type Description
CPT56D-587-MTF_FBA FBA FBA-13 CPT56D-587-MTF_FBA
Report
Summary
A flux balance analysis (FBA) was performed on the metabolic model 105874/10/1 growing in Complete media.
Output from Run Flux Balance Analysis
The viewer for the output created by this App is available at the original Narrative here: https://narrative.kbase.us/narrative/105874

References

  1. Brooks SC. 2001. Waste characteristics of the former S-3 ponds and outline of uranium chemistry relevant to NABIR Field Research Center studies. NABIR Field Research Center, Oak Ridge, Tenn. doi: 10.2172/814525
  2. Lui LM, Nielsen TN, Arkin AP. 2021. A method for achieving complete microbial genomes and improving bins from metagenomics data. PLoS Comput Biol 17:e1008972. doi: 10.1371/journal.pcbi.1008972
  3. Wick RR, Judd LM, Gorrie CL, Holt KE. 2017. Unicycler: Resolving bacterial genome assemblies from short and long sequencing reads. PLoS Comput Biol 13:e1005595. doi: 10.1371/journal.pcbi.1005595

Apps

  1. Annotate Microbial Assembly with RASTtk - v1.073
    • [1] Aziz RK, Bartels D, Best AA, DeJongh M, Disz T, Edwards RA, et al. The RAST Server: Rapid Annotations using Subsystems Technology. BMC Genomics. 2008;9: 75. doi:10.1186/1471-2164-9-75
    • [2] Overbeek R, Olson R, Pusch GD, Olsen GJ, Davis JJ, Disz T, et al.vThe SEED and the Rapid Annotation of microbial genomes using Subsystems Technology (RAST). Nucleic Acids Res. 2014;42: D206 D214. doi:10.1093/nar/gkt1226
    • [3] Brettin T, Davis JJ, Disz T, Edwards RA, Gerdes S, Olsen GJ, et al. RASTtk: A modular and extensible implementation of the RAST algorithm for building custom annotation pipelines and annotating batches of genomes. Sci Rep. 2015;5. doi:10.1038/srep08365
    • [4] Kent WJ. BLAT The BLAST-Like Alignment Tool. Genome Res. 2002;12: 656 664. doi:10.1101/gr.229202
    • [5] Altschul SF, Madden TL, Schaffer AA, Zhang J, Zhang Z, Miller W, Lipman DJ. Gapped BLAST and PSI-BLAST: a new generation of protein database search programs. Nucleic Acids Res. 1997;25: 3389-3402. doi:10.1093/nar/25.17.3389
    • [6] Lowe TM, Eddy SR. tRNAscan-SE: a program for improved detection of transfer RNA genes in genomic sequence. Nucleic Acids Res. 1997;25: 955 964.
    • [7] Cobucci-Ponzano B, Rossi M, Moracci M. Translational recoding in archaea. Extremophiles. 2012;16: 793 803. doi:10.1007/s00792-012-0482-8
    • [8] Meyer F, Overbeek R, Rodriguez A. FIGfams: yet another set of protein families. Nucleic Acids Res. 2009;37 6643-54. doi:10.1093/nar/gkp698.
    • [9] van Belkum A, Sluijuter M, de Groot R, Verbrugh H, Hermans PW. Novel BOX repeat PCR assay for high-resolution typing of Streptococcus pneumoniae strains. J Clin Microbiol. 1996;34: 1176 1179.
    • [10] Croucher NJ, Vernikos GS, Parkhill J, Bentley SD. Identification, variation and transcription of pneumococcal repeat sequences. BMC Genomics. 2011;12: 120. doi:10.1186/1471-2164-12-120
    • [11] Hyatt D, Chen G-L, Locascio PF, Land ML, Larimer FW, Hauser LJ. Prodigal: prokaryotic gene recognition and translation initiation site identification. BMC Bioinformatics. 2010;11: 119. doi:10.1186/1471-2105-11-119
    • [12] Delcher AL, Bratke KA, Powers EC, Salzberg SL. Identifying bacterial genes and endosymbiont DNA with Glimmer. Bioinformatics. 2007;23: 673 679. doi:10.1093/bioinformatics/btm009
    • [13] Akhter S, Aziz RK, Edwards RA. PhiSpy: a novel algorithm for finding prophages in bacterial genomes that combines similarity- and composition-based strategies. Nucleic Acids Res. 2012;40: e126. doi:10.1093/nar/gks406
  2. Assess Genome Quality with CheckM - v1.0.18
    • Parks DH, Imelfort M, Skennerton CT, Hugenholtz P, Tyson GW. CheckM: assessing the quality of microbial genomes recovered from isolates, single cells, and metagenomes. Genome Res. 2015;25: 1043 1055. doi:10.1101/gr.186072.114
    • CheckM source:
    • Additional info:
  3. Build Metabolic Model
    • [1] Henry CS, DeJongh M, Best AA, Frybarger PM, Linsay B, Stevens RL. High-throughput generation, optimization and analysis of genome-scale metabolic models. Nat Biotechnol. 2010;28: 977 982. doi:10.1038/nbt.1672
    • [2] Overbeek R, Olson R, Pusch GD, Olsen GJ, Davis JJ, Disz T, et al. The SEED and the Rapid Annotation of microbial genomes using Subsystems Technology (RAST). Nucleic Acids Res. 2014;42: D206 D214. doi:10.1093/nar/gkt1226
    • [3] Latendresse M. Efficiently gap-filling reaction networks. BMC Bioinformatics. 2014;15: 225. doi:10.1186/1471-2105-15-225
    • [4] Dreyfuss JM, Zucker JD, Hood HM, Ocasio LR, Sachs MS, Galagan JE. Reconstruction and Validation of a Genome-Scale Metabolic Model for the Filamentous Fungus Neurospora crassa Using FARM. PLOS Computational Biology. 2013;9: e1003126. doi:10.1371/journal.pcbi.1003126
    • [5] Mahadevan R, Schilling CH. The effects of alternate optimal solutions in constraint-based genome-scale metabolic models. Metab Eng. 2003;5: 264 276.
  4. Circular Genome Visualization Tool
    no citations
  5. Classify Microbes with GTDB-Tk - v1.7.0
    • Pierre-Alain Chaumeil, Aaron J Mussig, Philip Hugenholtz, Donovan H Parks, GTDB-Tk: a toolkit to classify genomes with the Genome Taxonomy Database, Bioinformatics, Volume 36, Issue 6, 15 March 2020, Pages 1925 1927. DOI: https://doi.org/10.1093/bioinformatics/btz848
    • Parks, D., Chuvochina, M., Waite, D. et al. A standardized bacterial taxonomy based on genome phylogeny substantially revises the tree of life. Nat Biotechnol 36, 996 1004 (2018). DOI: https://doi.org/10.1038/nbt.4229
    • Parks DH, Chuvochina M, Chaumeil PA, Rinke C, Mussig AJ, Hugenholtz P. A complete domain-to-species taxonomy for Bacteria and Archaea. Nat Biotechnol. 2020;10.1038/s41587-020-0501-8. DOI:10.1038/s41587-020-0501-8
    • Rinke C, Chuvochina M, Mussig AJ, Chaumeil PA, Dav n AA, Waite DW, Whitman WB, Parks DH, and Hugenholtz P. A standardized archaeal taxonomy for the Genome Taxonomy Database. Nat Microbiol. 2021 Jul;6(7):946-959. DOI:10.1038/s41564-021-00918-8
    • Matsen FA, Kodner RB, Armbrust EV. pplacer: linear time maximum-likelihood and Bayesian phylogenetic placement of sequences onto a fixed reference tree. BMC Bioinformatics. 2010;11:538. Published 2010 Oct 30. doi:10.1186/1471-2105-11-538
    • Jain C, Rodriguez-R LM, Phillippy AM, Konstantinidis KT, Aluru S. High throughput ANI analysis of 90K prokaryotic genomes reveals clear species boundaries. Nat Commun. 2018;9(1):5114. Published 2018 Nov 30. DOI:10.1038/s41467-018-07641-9
    • Hyatt D, Chen GL, Locascio PF, Land ML, Larimer FW, Hauser LJ. Prodigal: prokaryotic gene recognition and translation initiation site identification. BMC Bioinformatics. 2010;11:119. Published 2010 Mar 8. DOI:10.1186/1471-2105-11-119
    • Price MN, Dehal PS, Arkin AP. FastTree 2--approximately maximum-likelihood trees for large alignments. PLoS One. 2010;5(3):e9490. Published 2010 Mar 10. DOI:10.1371/journal.pone.0009490 link: https://www.ncbi.nlm.nih.gov/pmc/articles/PMC2835736/
    • Eddy SR. Accelerated Profile HMM Searches. PLoS Comput Biol. 2011;7(10):e1002195. DOI:10.1371/journal.pcbi.1002195
  6. Import FASTA File as Assembly from Staging Area
    no citations
  7. Insert Genome Into SpeciesTree - v2.2.0
    • Price MN, Dehal PS, Arkin AP. FastTree 2 Approximately Maximum-Likelihood Trees for Large Alignments. PLoS One. 2010;5. doi:10.1371/journal.pone.0009490
  8. Run Flux Balance Analysis
    • Henry CS, DeJongh M, Best AA, Frybarger PM, Linsay B, Stevens RL. High-throughput generation, optimization and analysis of genome-scale metabolic models. Nat Biotechnol. 2010;28: 977 982. doi:10.1038/nbt.1672
    • Orth JD, Thiele I, Palsson B . What is flux balance analysis? Nature Biotechnology. 2010;28: 245 248. doi:10.1038/nbt.1614