Draft Genome Sequence of Bacillus sp. EB106-08-02-XG196 Isolated from High Nitrate Contaminated Sediment¶

Xiaoxuan Ge, Michael P. Thorgersen, Farris L. Poole II, Adam M. Deutschbauer, John-Marc Chandonia, Pavel S. Novichkov, Paul D. Adams, Adam P. Arkin, Terry C. Hazen, Michael W. W. Adams

Submitted to Microbiology Resource Announcements

Methods

Isolation

An 8-meter-deep borehole of 8.9 cm diameter (designated EB-106) located 21.1 meters downstream from the S-3 ponds area was drilled at ORR. The sediment was collected and cut into 22 cm segments all under anaerobic conditions, as reported elsewhere (Ge et al., 2019)</li>. For microbial enrichment, sediment samples (1 g) were incubated anaerobically in 5 ml of a defined medium containing 1.3 mM KCl, 2 mM MgSO₄, 0.1 mM CaCl₂, 0.3 mM NaCl, 30 mM NaHCO₃, 5 mM NaH₂PO₄ and 20 mM NaNO₃, with added vitamins and minerals as described (Widdel and Bak, 1992)</li>. A mixture of organic compounds (2 mM of formate , acetate, ethanol, lactate, succinate and glucose together with 0.1 g/L yeast extract) was used as carbon source. A mixture of metals (MM) containing 5 µM cadmium acetate (Cd(CH₃COO)₂·2H₂O), 100 µM manganous chloride (MnCl₂·2H₂O), 30 µM cobalt chloride (CoCl₂·6H₂O), 100 µM nickel chloride (NiCl₂·6H₂O), 10 µM cupric chloride (CuCl₂·2H₂O), 10 µM ferrous ammonium sulfate (Fe(NH₄)₂(SO₄)₂ ·6H₂O) and 100 µM uranyl acetate (UO₂(CH₃COO)₂·2H₂O) was used to mimic the metal contamination in the groundwater near the ORR S-3 ponds (Table S1).

Table S1

Metal (1 ×)	Compound added	Final Conc.(µM)
Mn²⁺	MnCl₂·2H₂O	100
Fe⁶⁺	Fe(NH₄)₂(SO₄)₂ ·6H₂O	10
Co²⁺	CoCl₂·6H₂O	30
Ni²⁺	NiCl₂·6H₂O	150
Cu²⁺	CuCl₂·2H₂O	10
Cd²⁺	Cd(CH₃COO)₂·2H₂O	5
U⁶⁺	UO₂(CH₃COO)₂·2H₂O	100

DNA Extraction

The ZymoBead Genomic DNA kit was used to extract genomic DNA. More than 1 µg of purified genomic DNA was sent out to the U.S. Department of Energy (DOE) Joint Genome Institute (JGI) for Illumina sequencing.

KBase Pipeline

This pipeline was performed in another KBase Narrative, which contains other unpublished data. Relevant objects from that Narrative have been copied to this one.
A summary of the methods follows and the provenance of each object can be found by opening up the "Data explorer" window (click on the binoculars icon under each object in the data panel).

Read Trimming with Trimmomatic

The Illumina sequencing reads were trimmed using Trimmomatic 0.36, with parameters "-phred33 LEADING:3 TRAILING:3 SLIDINGWINDOW:4:15 MINLEN:36 ILLUMINACLIP:TruSeq3-PE.fa" (Bolger et al., 2014).

Assembly with SPAdes

The trimmed reads were assembled de novo using SPAdes v3.12.0 with parameters "-k 21,33,55,77" (Bankevich et al., 2012).

Annotation with Prokka

Genes were identified using Prokka v1.12, with default parameters (Seemann, 2014).

Quality Control and Domain Annotation

Genome quality control was perfomred using CheckM using default parameters. CheckM provides robust estimates of genome completeness and contamination by using collocated sets of genes that are ubiquitous and single-copy within a phylogenetic lineage. More documentation describing CheckM is here.

The genome was then annotated using the Annotate Domains in a Genome App using all domain libraries. This app annotates domains from COGs, CDD, NCBI-curated domains, SMART, PRK, Pfam, and TIGRFAMs databases. More detail on annotating domains in KBase is here. Note that the 4966 genes listed in the Annotate Domains output include only the protein-coding genes with annotated domains; in total, the genome contains 5750 protein-encoding genes.

Classification

Finally, we classified the genome. As discussed in the manuscript, our initial classification was done by 16S rRNA alignment. We also built species trees for XG196 using two more KBase apps that rely on phylogenetic marker genes other than the 16S rRNA:

GTDB-Tk was run on the genome with default parameters. This app assigns objective taxonomic classifications to bacterial and archaeal genomes, using a set of domain-specific phylogenetic marker genes. More info about the app is here.
We used the Insert Genome into Species Tree App, using default parameters, to make a species tree called "EB106-08-02-XG196.tree" using 49 marker genes. More info about this app is here.

All classification methods produced consistent results: the most similar genome to XG196 that has been previously described is Bacillus niacini.

We imported the final genome into Genbank. Due to compatibility issues, we had to re-run the annotation pipeline in Genbank instead of using the same annotations created in KBase.

References

Ge, X., Vaccaro, B.J., Thorgersen, M.P., Poole, F.L., Majumder, E.L., Zane, G.M., et al. (2019). Iron- and aluminium‐induced depletion of molybdenum in acidic environments impedes the nitrogen cycle. Environmental microbiology 21(1), 152-163.
Widdel, F., and Bak, F. (1992). "Gram-negative mesophilic sulfate-reducing bacteria," in The prokaryotes. Springer), 3352-3378.
Bolger, A.M., Lohse, M., and Usadel, B. (2014). Trimmomatic: a flexible trimmer for Illumina sequence data. Bioinformatics 30(15), 2114-2120.
Bankevich, A., Nurk, S., Antipov, D., Gurevich, A.A., Dvorkin, M., Kulikov, A.S., et al. (2012). SPAdes: a new genome assembly algorithm and its applications to single-cell sequencing. Journal of computational biology 19(5), 455-477.
Seemann, T. (2014). Prokka: rapid prokaryotic genome annotation. Bioinformatics 30(14), 2068-2069.