Generated November 10, 2021

Draft Genome Sequence of Bacillus sp. EB106-08-02-XG196 Isolated from High Nitrate Contaminated Sediment

Xiaoxuan Ge, Michael P. Thorgersen, Farris L. Poole II, Adam M. Deutschbauer, John-Marc Chandonia, Pavel S. Novichkov, Paul D. Adams, Adam P. Arkin, Terry C. Hazen, Michael W. W. Adams

Submitted to Microbiology Resource Announcements

Table of Contents

  1. Methods
  2. Quality Control and Domain Annotation
  3. Classification
  4. References

This Narrative contains data for isolate EB106-08-02-XG196, also referred to as XG196, which is described in the above manuscript. Data for another isolate from ORR EB106 sediment , XG77, is in another narrative, here.

Methods

Isolation

An 8-meter-deep borehole of 8.9 cm diameter (designated EB-106) located 21.1 meters downstream from the S-3 ponds area was drilled at ORR. The sediment was collected and cut into 22 cm segments all under anaerobic conditions, as reported elsewhere (Ge et al., 2019)</li>. For microbial enrichment, sediment samples (1 g) were incubated anaerobically in 5 ml of a defined medium containing 1.3 mM KCl, 2 mM MgSO4, 0.1 mM CaCl2, 0.3 mM NaCl, 30 mM NaHCO3, 5 mM NaH2PO4 and 20 mM NaNO3, with added vitamins and minerals as described (Widdel and Bak, 1992)</li>. A mixture of organic compounds (2 mM of formate , acetate, ethanol, lactate, succinate and glucose together with 0.1 g/L yeast extract) was used as carbon source. A mixture of metals (MM) containing 5 ĀµM cadmium acetate (Cd(CH3COO)2Ā·2H2O), 100 ĀµM manganous chloride (MnCl2Ā·2H2O), 30 ĀµM cobalt chloride (CoCl2Ā·6H2O), 100 ĀµM nickel chloride (NiCl2Ā·6H2O), 10 ĀµM cupric chloride (CuCl2Ā·2H2O), 10 ĀµM ferrous ammonium sulfate (Fe(NH4)2(SO4)2 Ā·6H2O) and 100 ĀµM uranyl acetate (UO2(CH3COO)2Ā·2H2O) was used to mimic the metal contamination in the groundwater near the ORR S-3 ponds (Table S1).

Table S1

Metal (1 Ɨ) Compound added Final Conc.(ĀµM)
Mn2+ MnCl2Ā·2H2O 100
Fe6+ Fe(NH4)2(SO4)2 Ā·6H2O 10
Co2+ CoCl2Ā·6H2O 30
Ni2+ NiCl2Ā·6H2O 150
Cu2+ CuCl2Ā·2H2O 10
Cd2+ Cd(CH3COO)2Ā·2H2O 5
U6+ UO2(CH3COO)2Ā·2H2O 100

DNA Extraction

The ZymoBead Genomic DNA kit was used to extract genomic DNA. More than 1 Āµg of purified genomic DNA was sent out to the U.S. Department of Energy (DOE) Joint Genome Institute (JGI) for Illumina sequencing.

KBase Pipeline

This pipeline was performed in another KBase Narrative, which contains other unpublished data. Relevant objects from that Narrative have been copied to this one.
A summary of the methods follows and the provenance of each object can be found by opening up the "Data explorer" window (click on the binoculars icon under each object in the data panel).

Read Trimming with Trimmomatic

The Illumina sequencing reads were trimmed using Trimmomatic 0.36, with parameters "-phred33 LEADING:3 TRAILING:3 SLIDINGWINDOW:4:15 MINLEN:36 ILLUMINACLIP:TruSeq3-PE.fa" (Bolger et al., 2014).

Assembly with SPAdes

The trimmed reads were assembled de novo using SPAdes v3.12.0 with parameters "-k 21,33,55,77" (Bankevich et al., 2012).

Annotation with Prokka

Genes were identified using Prokka v1.12, with default parameters (Seemann, 2014).

Quality Control and Domain Annotation

Genome quality control was perfomred using CheckM using default parameters. CheckM provides robust estimates of genome completeness and contamination by using collocated sets of genes that are ubiquitous and single-copy within a phylogenetic lineage. More documentation describing CheckM is here.

The genome was then annotated using the Annotate Domains in a Genome App using all domain libraries. This app annotates domains from COGs, CDD, NCBI-curated domains, SMART, PRK, Pfam, and TIGRFAMs databases. More detail on annotating domains in KBase is here. Note that the 4966 genes listed in the Annotate Domains output include only the protein-coding genes with annotated domains; in total, the genome contains 5750 protein-encoding genes.

Runs the CheckM lineage workflow to assess the genome quality of isolates, single cells, or genome bins from metagenome assemblies through comparison to an existing database of genomes.
This app completed without errors in 7m 21s.
Links
Files
These are only available in the live Narrative: https://narrative.kbase.us/narrative/60201
  • CheckM_summary_table.tsv.zip - TSV Summary Table from CheckM
  • full_output.zip - Full output of CheckM
  • plots.zip - Output plots from CheckM
Annotate a Genome object with protein domains from widely used domain libraries.
This app completed without errors in 3h 39m 50s.
Objects
Created Object Name Type Description
EB106-08-02-XG196.domains DomainAnnotation Domain Annotations
Summary
Search Domains output: Getting DomainModelSet from storage. Getting Genome from storage. Running domain search against library 2959/35/1 Running domain search against library 2959/18/2 Running domain search against library 2959/24/2 Running domain search against library 2959/25/2 Running domain search against library 2959/39/1 Running domain search against library 2959/7/8 Running domain search against library 2959/36/1 Running domain search against library 2959/34/1 Running domain search against library 2959/37/1 Running domain search against library 2959/38/1
Output from Annotate Domains in a Genome
The viewer for the output created by this App is available at the original Narrative here: https://narrative.kbase.us/narrative/60201

Classification

Finally, we classified the genome. As discussed in the manuscript, our initial classification was done by 16S rRNA alignment. We also built species trees for XG196 using two more KBase apps that rely on phylogenetic marker genes other than the 16S rRNA:

  • GTDB-Tk was run on the genome with default parameters. This app assigns objective taxonomic classifications to bacterial and archaeal genomes, using a set of domain-specific phylogenetic marker genes. More info about the app is here.

  • We used the Insert Genome into Species Tree App, using default parameters, to make a species tree called "EB106-08-02-XG196.tree" using 49 marker genes. More info about this app is here.

All classification methods produced consistent results: the most similar genome to XG196 that has been previously described is Bacillus niacini.

Obtain objective taxonomic assignments for bacterial and archaeal genomes based on the Genome Taxonomy Database (GTDB) ver 1.1.0
This app completed without errors in 42m 58s.
Links
Add one or more genomes to a KBase species tree.
This app completed without errors in 3m 44s.
Files
These are only available in the live Narrative: https://narrative.kbase.us/narrative/60201
  • EB106-08-02-XG196.tree.newick
  • EB106-08-02-XG196.tree-labels.newick
  • EB106-08-02-XG196.tree.png
  • EB106-08-02-XG196.tree.pdf

We imported the final genome into Genbank. Due to compatibility issues, we had to re-run the annotation pipeline in Genbank instead of using the same annotations created in KBase.

v1 - KBaseGenomes.Genome-11.0
The viewer for the data in this Cell is available at the original Narrative here: https://narrative.kbase.us/narrative/60201
Annotate or re-annotate bacterial or archaeal genome using RASTtk.
This app completed without errors in 2m 38s.
Objects
Created Object Name Type Description
EB106-08-02-XG196-genbank-RAST.genome Genome Annotated genome
Summary
Some RAST tools will not run unless the taxonomic domain is Archaea, Bacteria, or Virus. 
These tools include: call selenoproteins, call pyrroysoproteins, call crisprs, and call prophage phispy features.
You may not get the results you were expecting with your current domain of Unknown.
The RAST algorithm was applied to annotating an existing genome: Bacillus sp. EB106-08-02-XG196. 
The sequence for this genome is comprised of 55 contigs containing 6010169 nucleotides. 
The input genome has 5721 existing coding features and 234 existing non-coding features.
Input genome has the following feature types:
	Non-coding assembly_gap           12 
	Non-coding gene                   78 
	Non-coding misc_binding           20 
	Non-coding misc_feature            4 
	Non-coding ncRNA                   6 
	Non-coding rRNA                   14 
	Non-coding regulatory             42 
	Non-coding tRNA                   57 
	Non-coding tmRNA                   1 
	gene                            5721 
The genome features were functionally annotated using the following algorithm(s): Kmers V2; Kmers V1; protein similarity.
In addition to the remaining original 5721 coding features and 234 non-coding features, 0 new features were called, of which 0 are non-coding.
Output genome has the following feature types:
	Coding gene                     5721 
	Non-coding assembly_gap           12 
	Non-coding gene                   78 
	Non-coding misc_binding           20 
	Non-coding misc_feature            4 
	Non-coding ncRNA                   6 
	Non-coding rRNA                   14 
	Non-coding regulatory             42 
	Non-coding tRNA                   57 
	Non-coding tmRNA                   1 
Overall, the genes have 2881 distinct functions. 
The genes include 3670 genes with a SEED annotation ontology across 1414 distinct SEED functions.
The number of distinct functions can exceed the number of genes because some genes have multiple functions.
v1 - KBaseGenomes.Genome-11.0
The viewer for the data in this Cell is available at the original Narrative here: https://narrative.kbase.us/narrative/60201
Examine the general functional distribution or specific functional gene families for a GenomeSet.
This app completed without errors in 43s.

References

  1. Ge, X., Vaccaro, B.J., Thorgersen, M.P., Poole, F.L., Majumder, E.L., Zane, G.M., et al. (2019). Iron- and aluminiumā€induced depletion of molybdenum in acidic environments impedes the nitrogen cycle. Environmental microbiology 21(1), 152-163.
  2. Widdel, F., and Bak, F. (1992). "Gram-negative mesophilic sulfate-reducing bacteria," in The prokaryotes. Springer), 3352-3378.
  3. Bolger, A.M., Lohse, M., and Usadel, B. (2014). Trimmomatic: a flexible trimmer for Illumina sequence data. Bioinformatics 30(15), 2114-2120.
  4. Bankevich, A., Nurk, S., Antipov, D., Gurevich, A.A., Dvorkin, M., Kulikov, A.S., et al. (2012). SPAdes: a new genome assembly algorithm and its applications to single-cell sequencing. Journal of computational biology 19(5), 455-477.
  5. Seemann, T. (2014). Prokka: rapid prokaryotic genome annotation. Bioinformatics 30(14), 2068-2069.

Released Apps

  1. Annotate Microbial Genome with RASTtk - v1.073
    • [1] Aziz RK, Bartels D, Best AA, DeJongh M, Disz T, Edwards RA, et al. The RAST Server: Rapid Annotations using Subsystems Technology. BMC Genomics. 2008;9: 75. doi:10.1186/1471-2164-9-75
    • [2] Overbeek R, Olson R, Pusch GD, Olsen GJ, Davis JJ, Disz T, et al. The SEED and the Rapid Annotation of microbial genomes using Subsystems Technology (RAST). Nucleic Acids Res. 2014;42: D206 D214. doi:10.1093/nar/gkt1226
    • [3] Brettin T, Davis JJ, Disz T, Edwards RA, Gerdes S, Olsen GJ, et al. RASTtk: A modular and extensible implementation of the RAST algorithm for building custom annotation pipelines and annotating batches of genomes. Sci Rep. 2015;5. doi:10.1038/srep08365
    • [4] Kent WJ. BLAT The BLAST-Like Alignment Tool. Genome Res. 2002;12: 656 664. doi:10.1101/gr.229202
    • [5] Altschul SF, Madden TL, Schaffer AA, Zhang J, Zhang Z, Miller W, Lipman DJ. Gapped BLAST and PSI-BLAST: a new generation of protein database search programs. Nucleic Acids Res. 1997;25: 3389-3402. doi:10.1093/nar/25.17.3389
    • [6] Lowe TM, Eddy SR. tRNAscan-SE: a program for improved detection of transfer RNA genes in genomic sequence. Nucleic Acids Res. 1997;25: 955 964.
    • [7] Cobucci-Ponzano B, Rossi M, Moracci M. Translational recoding in archaea. Extremophiles. 2012;16: 793 803. doi:10.1007/s00792-012-0482-8
    • [8] Meyer F, Overbeek R, Rodriguez A. FIGfams: yet another set of protein families. Nucleic Acids Res. 2009;37 6643-54. doi:10.1093/nar/gkp698.
    • [9] van Belkum A, Sluijuter M, de Groot R, Verbrugh H, Hermans PW. Novel BOX repeat PCR assay for high-resolution typing of Streptococcus pneumoniae strains. J Clin Microbiol. 1996;34: 1176 1179.
    • [10] Croucher NJ, Vernikos GS, Parkhill J, Bentley SD. Identification, variation and transcription of pneumococcal repeat sequences. BMC Genomics. 2011;12: 120. doi:10.1186/1471-2164-12-120
    • [11] Hyatt D, Chen G-L, Locascio PF, Land ML, Larimer FW, Hauser LJ. Prodigal: prokaryotic gene recognition and translation initiation site identification. BMC Bioinformatics. 2010;11: 119. doi:10.1186/1471-2105-11-119
    • [12] Delcher AL, Bratke KA, Powers EC, Salzberg SL. Identifying bacterial genes and endosymbiont DNA with Glimmer. Bioinformatics. 2007;23: 673 679. doi:10.1093/bioinformatics/btm009
    • [13] Akhter S, Aziz RK, Edwards RA. PhiSpy: a novel algorithm for finding prophages in bacterial genomes that combines similarity- and composition-based strategies. Nucleic Acids Res. 2012;40: e126. doi:10.1093/nar/gks406
  2. Insert Genome Into SpeciesTree - v2.2.0
    • Price MN, Dehal PS, Arkin AP. FastTree 2 Approximately Maximum-Likelihood Trees for Large Alignments. PLoS One. 2010;5. doi:10.1371/journal.pone.0009490
  3. View Function Profile for Genomes - v1.4.0
    • Arkin AP, Cottingham RW, Henry CS, Harris NL, Stevens RL, Maslov S, et al. KBase: The United States Department of Energy Systems Biology Knowledgebase. Nature Biotechnology. 2018;36: 566. doi: 10.1038/nbt.4163

Apps in Beta

  1. Annotate Domains in a Genome
    • Altschul SF, Madden TL, Sch ffer AA, Zhang J, Zhang Z, Miller W, et al. Gapped BLAST and PSI-BLAST: a new generation of protein database search programs. Nucleic Acids Res. 1997;25: 3389 3402. doi:10.1093/nar/25.17.3389
    • Camacho C, Coulouris G, Avagyan V, Ma N, Papadopoulos J, Bealer K, et al. BLAST+: architecture and applications. BMC Bioinformatics. 2009;10: 421. doi:10.1186/1471-2105-10-421
    • Eddy SR. Accelerated Profile HMM Searches. PLOS Computational Biology. 2011;7: e1002195. doi:10.1371/journal.pcbi.1002195
    • El-Gebali S, Mistry J, Bateman A, Eddy SR, Luciani A, Potter SC, Qureshi M, Richardson LJ, Salazar GA, Smart A, Sonnhammer ELL, Hirsh L, Paladin L, Piovesan D, Tosatto SCE, Finn RD. The Pfam protein families database in 2019. Nucleic Acids Research. 2019;47: D427 D432. doi:10.1093/nar/gky995
    • Haft DH, Selengut JD, Richter RA, Harkins D, Basu MK, Beck E. TIGRFAMs and Genome Properties in 2013. Nucleic Acids Res. 2013;41: D387 D395. doi:10.1093/nar/gks1234
    • Letunic I, Bork P. 20 years of the SMART protein domain annotation resource. Nucleic Acids Res. 2018;46: D493 D496. doi:10.1093/nar/gkx922
    • Letunic I, Doerks T, Bork P. SMART: recent updates, new developments and status in 2015. Nucleic Acids Res. 2015;43: D257-260. doi:10.1093/nar/gku949
    • Marchler-Bauer A, Bo Y, Han L, He J, Lanczycki CJ, Lu S, et al. CDD/SPARCLE: functional classification of proteins via subfamily domain architectures. Nucleic Acids Res. 2017;45: D200 D203. doi:10.1093/nar/gkw1129
    • Selengut JD, Haft DH, Davidsen T, Ganapathy A, Gwinn-Giglio M, Nelson WC, et al. TIGRFAMs and Genome Properties: tools for the assignment of molecular function and biological process in prokaryotic genomes. Nucleic Acids Res. 2007;35: D260-264. doi:10.1093/nar/gkl1043
    • Tatusov RL, Koonin EV, Lipman DJ. A Genomic Perspective on Protein Families. Science. 1997;278: 631 637. doi:10.1126/science.278.5338.631
  2. Assess Genome Quality with CheckM - v1.0.18
    • Parks DH, Imelfort M, Skennerton CT, Hugenholtz P, Tyson GW. CheckM: assessing the quality of microbial genomes recovered from isolates, single cells, and metagenomes. Genome Res. 2015;25: 1043 1055. doi:10.1101/gr.186072.114
    • CheckM source:
    • Additional info:
  3. GTDB-Tk Classify - v1.6.0
    • Pierre-Alain Chaumeil, Aaron J Mussig, Philip Hugenholtz, Donovan H Parks, GTDB-Tk: a toolkit to classify genomes with the Genome Taxonomy Database, Bioinformatics, Volume 36, Issue 6, 15 March 2020, Pages 1925 1927. DOI: https://doi.org/10.1093/bioinformatics/btz848
    • Parks, D., Chuvochina, M., Waite, D. et al. A standardized bacterial taxonomy based on genome phylogeny substantially revises the tree of life. Nat Biotechnol 36, 996 1004 (2018). DOI: https://doi.org/10.1038/nbt.4229
    • Parks DH, Chuvochina M, Chaumeil PA, Rinke C, Mussig AJ, Hugenholtz P. A complete domain-to-species taxonomy for Bacteria and Archaea [published online ahead of print, 2020 Apr 27]. Nat Biotechnol. 2020;10.1038/s41587-020-0501-8. DOI:10.1038/s41587-020-0501-8
    • Matsen FA, Kodner RB, Armbrust EV. pplacer: linear time maximum-likelihood and Bayesian phylogenetic placement of sequences onto a fixed reference tree. BMC Bioinformatics. 2010;11:538. Published 2010 Oct 30. doi:10.1186/1471-2105-11-538
    • Jain C, Rodriguez-R LM, Phillippy AM, Konstantinidis KT, Aluru S. High throughput ANI analysis of 90K prokaryotic genomes reveals clear species boundaries. Nat Commun. 2018;9(1):5114. Published 2018 Nov 30. DOI:10.1038/s41467-018-07641-9
    • Hyatt D, Chen GL, Locascio PF, Land ML, Larimer FW, Hauser LJ. Prodigal: prokaryotic gene recognition and translation initiation site identification. BMC Bioinformatics. 2010;11:119. Published 2010 Mar 8. DOI:10.1186/1471-2105-11-119
    • Price MN, Dehal PS, Arkin AP. FastTree 2--approximately maximum-likelihood trees for large alignments. PLoS One. 2010;5(3):e9490. Published 2010 Mar 10. DOI:10.1371/journal.pone.0009490 link: https://www.ncbi.nlm.nih.gov/pmc/articles/PMC2835736/
    • Eddy SR. Accelerated Profile HMM Searches. PLoS Comput Biol. 2011;7(10):e1002195. DOI:10.1371/journal.pcbi.1002195