Generated June 30, 2023

Complete genome sequence of Bradyrhizobium NP1, isolated from forest soil

Introduction


We report the complete genome sequence of Bradyrhizobium strain NP1, which was isolated from forest soil that had been subject to chronic warming. The diverse genus Bradyrhizobium is predicted to contain approximately 800 species (1) and includes non-symbiotic species that dominate forest soil (2). Bradyrhizobium NP1 was isolated from a Long-Term Ecological Research site in the Harvard Forest (HRF), in Petersham, MA (42.54, -72.18).

Authors: Trevor Fishera, Francesca Durmazolua, Kristen M. DeAngelisb, Maureen A. Morrowa

Affiliations


a Department of Biology, State University of New York at New Paltz, New Paltz, New York, USA b Department of Microbiology, University of Massachusetts, Amherst, Massachusetts, USA

Table of Contents

  1. Experimental Methods
  2. Assembly, and Annotation
  3. Taxonomic Identification
  4. References
Narrative created by: Maureen Morrow and Trevor Fisher

Experimental Methods


We report the complete genome sequence of Bradyrhizobium strain NP1. This bacterium was isolated from forest soil that had been subject to chronic warming. The genome of this novel isolated bacteria is presented as a single circular contig of 7,712,921 base pairs with 64.14% GC content.

Sample Collection

Bradyrhizobium NP1 was isolated from the Prospect Hill long-term warming experiment (3) at the Harvard Forest Long-Term Ecological Research site, in Petersham, MA (42.54, -72.18). In May, 2021, organic horizon soil was collected from a heated plot and stored at 4°C until use.

Isolation

In August 2021, soil was plated on dilute nutrient broth supplemented with ammonium nitrate (Table 1); and incubated at room temperature (22°C). NP1 was a slow growing colony that appeared after 10 weeks, and thus was chosen for analysis.

Table 1 Isolation Medium
Ingredient Amount Per Liter
DifcoTM Nutrient Broth 0.08g
NH4NO3 0.50g
1M CaCl 0.60ml
Agar 6.00g
Gellan Gum 6.40g
Cyclohexamide 50.0g



16S sequence analysis

NP1 was identified as a Bradyrhizobuim using the online NCBI BLASTn tool (standard database) with a 16S rRNA gene sequence amplified by PCR (universal primers 27F and 1492R) from genomic DNA extracted with the Quick-DNA Fecal/Soil Microbe Miniprep Kit (Zymo, Irvine, CA) (GenBank accession number OR045828). The sequence has 99.6 % identity with at least 100 species of Bradyrhizobium.

Genome Sequencing

NP1 was streaked to purity and grown in 10% Tryptic Soy broth (Becton, Dickinson and Company, Sparks, MD) with 1X MEM Vitamin Solution (Gibco, Grand Island, NY) at 30°C for 2 days with shaking. Genomic DNA was extracted with the DNeasy Blood and Tissue kit using a 1-hour lysozyme pre-treatment. (Qiagen, Hilden, Germany). The DNA was quantified using a Qubit 4.0 fluorometer (ds DNA HS Assay, Invitrogen, Waltham, MA).

Whole-genome sequencing (WGS) was performed using the Illumina DNA Prep kit and IDT 10bp UDI indices on an Illumina NextSeq 2000 (2x151bp reads) by SeqCenter (Pittsburgh, PA). Demultiplexing, quality control and adapter trimming was performed with the proprietary bcl-convert (v3.9.30), resulting in 7,223,840 reads. The reads were trimmed with Trimmomatic (v0.36)(4), in the DOE Systems Biology Knowledgebase (KBase) platform (5) using default parameters. The resultant 7,115,010 reads had an average read length of 145.82 ± 17.45 (134X coverage).

The same DNA sample was sequenced at Plasmidsaurus (Eugene, OR). The library was constructed with the Oxford Nanopore Technologies Ligation Sequencing Kit version SQK-LSK115 and was sequenced on GridION 10.4.1 flowcells (FLO-MIN114) using the “Super accuracy” basecaller in MinKNOW. The reads were filtered with Filtlong (v.0.2.10, https://github.com/rrwick/Filtlong) in KBase (5) to remove reads <1000 nucleotides and 5% of the lowest quality. A total of 52,003 reads were obtained (average read length, 8030.49 ± 6726.47, 54X coverage).

Assembly, and Annotation

Assembly


A hybrid assembly was generated with Unicyclyer (v0.4.8)(5), with rotation between multiple rounds of polishing. Assembly quality was assessed with QUAST(v4.4)and CheckM (v1.0.18)(7).
The assembled sequence was annotated with RASTtk (v1.073)(7)

The assembly resulted in a single circular contig of 7,712,921 base pairs (N50=7,712,921). The 7,712,921 base pair genome has 64.14% GC content. See QUAST report

CheckM showed 99.98% completeness and 1.01% estimated contamination. See CheckM report

The assembled sequence was annotated with RASTtk (v1.073)(8) and is predicted to encode 7,808 proteins.

Accession numbers


The 16S rRNA gene sequence is available under GenBank accession number OR045828.
The assembled genome sequence was deposited in GenBank under the accession number CP127385.
The raw sequence reads are available under BioProject PRJNA975924 and SRA SRX20568407 (Illumina reads) and SRX20568406 (Nanopore reads).

v1 - KBaseGenomeAnnotations.Assembly-5.1
The viewer for the data in this Cell is available at the original Narrative here: https://narrative.kbase.us/narrative/147325
Run QUAST (QUality ASsessment Tool) on a set of Assemblies to assess their quality.
This app completed without errors in 1m 17s.
Summary
All statistics are based on contigs of size >= 500 bp, unless otherwise noted (e.g., "# contigs (>= 0 bp)" and "Total length (>= 0 bp)" include all contigs). Assembly NP1_hybrid_unicycler.contigs # contigs (>= 0 bp) 1 # contigs (>= 1000 bp) 1 # contigs (>= 10000 bp) 1 # contigs (>= 100000 bp) 1 # contigs (>= 1000000 bp) 1 Total length (>= 0 bp) 7712921 Total length (>= 1000 bp) 7712921 Total length (>= 10000 bp) 7712921 Total length (>= 100000 bp) 7712921 Total length (>= 1000000 bp) 7712921 # contigs 1 Largest contig 7712921 Total length 7712921 GC (%) 64.14 N50 7712921 N75 7712921 L50 1 L75 1 # N's per 100 kbp 0.00 # predicted genes (unique) 7533 # predicted genes (>= 0 bp) 7534 + 0 part # predicted genes (>= 300 bp) 6584 + 0 part # predicted genes (>= 1500 bp) 911 + 0 part # predicted genes (>= 3000 bp) 81 + 0 part
Links
Runs the CheckM lineage workflow to assess the genome quality of isolates, single cells, or genome bins from metagenome assemblies through comparison to an existing database of genomes.
This app completed without errors in 6m 3s.
Links
Files
These are only available in the live Narrative: https://narrative.kbase.us/narrative/147325
  • CheckM_summary_table.tsv.zip - TSV Summary Table from CheckM
  • full_output.zip - Full output of CheckM
  • plots.zip - Output plots from CheckM
Annotate or re-annotate genome/assembly using RASTtk (Rapid Annotations using Subsystems Technology toolkit).
This app completed without errors in 10m 24s.
Objects
Created Object Name Type Description
NP1_RastAnnotated Genome RAST re-annotated genome
Summary
The RAST algorithm was applied to annotating a genome sequence comprised of 1 contigs containing 7712921 nucleotides. No initial gene calls were provided. Standard features were called using: glimmer3; prodigal. A scan was conducted for the following additional feature types: rRNA; tRNA; selenoproteins; pyrrolysoproteins; repeat regions; crispr. The genome features were functionally annotated using the following algorithm(s): Kmers V2; Kmers V1; protein similarity. In addition to the remaining original 0 coding features and 0 non-coding features, 7955 new features were called, of which 147 are non-coding. Output genome has the following feature types: Coding gene 7808 Non-coding repeat 95 Non-coding rna 52 The number of distinct functions can exceed the number of genes because some genes have multiple functions.
Links

Taxonomic Identification

The initial classification was done by 16S rRNA alignment in BLASTn with the PCR product having a average quality score of 42.5 (4Peaks, v1.8, GenBank acession number:OR045828).
The alignment results produced a 99.6% match with at least 100 strains of Bradyrhizobium (see table). We also built species trees using KBase apps that employ a set of phylogenetic marker genes other than the 16S rRNA:GTDB-Tk and Insert Genome into Species Tree, using default parameters (see below). No species match was produced.

Table: 16S BLASTn result 5/26/23:


NP1_16S_BLASTn.png

Obtain objective taxonomic assignments for bacterial and archaeal genomes based on the Genome Taxonomy Database (GTDB) ver R06-RS202
This app completed without errors in 40m 13s.
Objects
Created Object Name Type Description
NP1_RastAnnotated Genome Taxonomy and taxon_assignment updated with GTDB
Links
Add one or more Genomes to a KBase SpeciesTree.
This app completed without errors in 3m 56s.
Files
These are only available in the live Narrative: https://narrative.kbase.us/narrative/147325
  • NP1_SpeciesTree.newick
  • NP1_SpeciesTree-labels.newick
  • NP1_SpeciesTree.png
  • NP1_SpeciesTree.pdf

References

  1. Ormeño-Orrillo E, Martínez-Romero E. 2019. A Genomotaxonomy View of the Bradyrhizobium Genus. Frontiers in Microbiology 10.
  2. VanInsberghe D, Maas KR, Cardenas E, Strachan CR, Hallam SJ, Mohn WW. 2015. Non-symbiotic Bradyrhizobium ecotypes dominate North American forest soils. The ISME Journal 9:2435–2441.
  3. Melillo JM, Frey SD, DeAngelis KM, Werner WJ, Bernard MJ, Bowles FP, Pold G, Knorr MA, Grandy AS. 2017. Long-term pattern and magnitude of soil carbon feedback to the climate system in a warming world. Science 358:101–105.
  4. Bolger AM, Lohse M, Usadel B. 2014. Trimmomatic: a flexible trimmer for Illumina sequence data. Bioinformatics 30:2114–2120.
  5. Arkin AP, Cottingham RW, Henry CS, Harris NL, Stevens RL, Maslov S, Dehal P, Ware D, Perez F, Canon S, Sneddon MW, Henderson ML, Riehl WJ, Murphy-Olson D, Chan SY, Kamimura RT, Kumari S, Drake MM, Brettin TS, Glass EM, Chivian D, Gunter D, Weston DJ, Allen BH, Baumohl J, Best AA, Bowen B, Brenner SE, Bun CC, Chandonia J-M, Chia J-M, Colasanti R, Conrad N, Davis JJ, Davison BH, DeJongh M, Devoid S, Dietrich E, Dubchak I, Edirisinghe JN, Fang G, Faria JP, Frybarger PM, Gerlach W, Gerstein M, Greiner A, Gurtowski J, Haun HL, He F, Jain R, Joachimiak MP, Keegan KP, Kondo S, Kumar V, Land ML, Meyer F, Mills M, Novichkov PS, Oh T, Olsen GJ, Olson R, Parrello B, Pasternak S, Pearson E, Poon SS, Price GA, Ramakrishnan S, Ranjan P, Ronald PC, Schatz MC, Seaver SMD, Shukla M, Sutormin RA, Syed MH, Thomason J, Tintle NL, Wang D, Xia F, Yoo H, Yoo S, Yu D. 2018. KBase: The United States Department of Energy Systems Biology Knowledgebase. Nature Biotechnology 36:566–569.
  6. Wick RR, Judd LM, Gorrie CL, Holt KE. 2017. Unicycler: Resolving bacterial genome assemblies from short and long sequencing reads. PLoS Comput Biol 13:e1005595.
  7. Parks DH, Imelfort M, Skennerton CT, Hugenholtz P, Tyson GW. 2015. CheckM: assessing the quality of microbial genomes recovered from isolates, single cells, and metagenomes. Genome Res 25:1043–1055.
  8. Aziz RK, Bartels D, Best AA, DeJongh M, Disz T, Edwards RA, Formsma K, Gerdes S, Glass EM, Kubal M, Meyer F, Olsen GJ, Olson R, Osterman AL, Overbeek RA, McNeil LK, Paarmann D, Paczian T, Parrello B, Pusch GD, Reich C, Stevens R, Vassieva O, Vonstein V, Wilke A, Zagnitko O. 2008. The RAST Server: Rapid Annotations using Subsystems Technology. BMC Genomics 9:75.
  9. Chaumeil P-A, Mussig AJ, Hugenholtz P, Parks DH. GTDB-Tk: a toolkit to classify genomes with the Genome Taxonomy Database.

Apps

  1. Annotate Genome/Assembly with RASTtk - v1.073
    • [1] Aziz RK, Bartels D, Best AA, DeJongh M, Disz T, Edwards RA, et al. The RAST Server: Rapid Annotations using Subsystems Technology. BMC Genomics. 2008;9: 75. doi:10.1186/1471-2164-9-75
    • [2] Overbeek R, Olson R, Pusch GD, Olsen GJ, Davis JJ, Disz T, et al. The SEED and the Rapid Annotation of microbial genomes using Subsystems Technology (RAST). Nucleic Acids Res. 2014;42: D206 D214. doi:10.1093/nar/gkt1226
    • [3] Brettin T, Davis JJ, Disz T, Edwards RA, Gerdes S, Olsen GJ, et al. RASTtk: A modular and extensible implementation of the RAST algorithm for building custom annotation pipelines and annotating batches of genomes. Sci Rep. 2015;5. doi:10.1038/srep08365
    • [4] Kent WJ. BLAT The BLAST-Like Alignment Tool. Genome Res. 2002;12: 656 664. doi:10.1101/gr.229202
    • [5] Altschul SF, Madden TL, Schaffer AA, Zhang J, Zhang Z, Miller W, Lipman DJ. Gapped BLAST and PSI-BLAST: a new generation of protein database search programs. Nucleic Acids Res. 1997;25: 3389-3402. doi:10.1093/nar/25.17.3389
    • [6] Lowe TM, Eddy SR. tRNAscan-SE: a program for improved detection of transfer RNA genes in genomic sequence. Nucleic Acids Res. 1997;25: 955 964.
    • [7] Cobucci-Ponzano B, Rossi M, Moracci M. Translational recoding in archaea. Extremophiles. 2012;16: 793 803. doi:10.1007/s00792-012-0482-8
    • [8] Meyer F, Overbeek R, Rodriguez A. FIGfams: yet another set of protein families. Nucleic Acids Res. 2009;37 6643-54. doi:10.1093/nar/gkp698.
    • [9] van Belkum A, Sluijuter M, de Groot R, Verbrugh H, Hermans PW. Novel BOX repeat PCR assay for high-resolution typing of Streptococcus pneumoniae strains. J Clin Microbiol. 1996;34: 1176 1179.
    • [10] Croucher NJ, Vernikos GS, Parkhill J, Bentley SD. Identification, variation and transcription of pneumococcal repeat sequences. BMC Genomics. 2011;12: 120. doi:10.1186/1471-2164-12-120
    • [11] Hyatt D, Chen G-L, Locascio PF, Land ML, Larimer FW, Hauser LJ. Prodigal: prokaryotic gene recognition and translation initiation site identification. BMC Bioinformatics. 2010;11: 119. doi:10.1186/1471-2105-11-119
    • [12] Delcher AL, Bratke KA, Powers EC, Salzberg SL. Identifying bacterial genes and endosymbiont DNA with Glimmer. Bioinformatics. 2007;23: 673 679. doi:10.1093/bioinformatics/btm009
    • [13] Akhter S, Aziz RK, Edwards RA. PhiSpy: a novel algorithm for finding prophages in bacterial genomes that combines similarity- and composition-based strategies. Nucleic Acids Res. 2012;40: e126. doi:10.1093/nar/gks406
  2. Assess Genome Quality with CheckM - v1.0.18
    • Parks DH, Imelfort M, Skennerton CT, Hugenholtz P, Tyson GW. CheckM: assessing the quality of microbial genomes recovered from isolates, single cells, and metagenomes. Genome Res. 2015;25: 1043 1055. doi:10.1101/gr.186072.114
    • CheckM source:
    • Additional info:
  3. Assess Quality of Assemblies with QUAST - v4.4
    • [1] Gurevich A, Saveliev V, Vyahhi N, Tesler G. QUAST: quality assessment tool for genome assemblies. Bioinformatics. 2013;29: 1072 1075. doi:10.1093/bioinformatics/btt086
    • [2] Mikheenko A, Valin G, Prjibelski A, Saveliev V, Gurevich A. Icarus: visualizer for de novo assembly evaluation. Bioinformatics. 2016;32: 3321 3323. doi:10.1093/bioinformatics/btw379
  4. Classify Microbes with GTDB-Tk - v1.7.0
    • Pierre-Alain Chaumeil, Aaron J Mussig, Philip Hugenholtz, Donovan H Parks, GTDB-Tk: a toolkit to classify genomes with the Genome Taxonomy Database, Bioinformatics, Volume 36, Issue 6, 15 March 2020, Pages 1925 1927. DOI: https://doi.org/10.1093/bioinformatics/btz848
    • Parks, D., Chuvochina, M., Waite, D. et al. A standardized bacterial taxonomy based on genome phylogeny substantially revises the tree of life. Nat Biotechnol 36, 996 1004 (2018). DOI: https://doi.org/10.1038/nbt.4229
    • Parks DH, Chuvochina M, Chaumeil PA, Rinke C, Mussig AJ, Hugenholtz P. A complete domain-to-species taxonomy for Bacteria and Archaea. Nat Biotechnol. 2020;10.1038/s41587-020-0501-8. DOI:10.1038/s41587-020-0501-8
    • Rinke C, Chuvochina M, Mussig AJ, Chaumeil PA, Dav n AA, Waite DW, Whitman WB, Parks DH, and Hugenholtz P. A standardized archaeal taxonomy for the Genome Taxonomy Database. Nat Microbiol. 2021 Jul;6(7):946-959. DOI:10.1038/s41564-021-00918-8
    • Matsen FA, Kodner RB, Armbrust EV. pplacer: linear time maximum-likelihood and Bayesian phylogenetic placement of sequences onto a fixed reference tree. BMC Bioinformatics. 2010;11:538. Published 2010 Oct 30. doi:10.1186/1471-2105-11-538
    • Jain C, Rodriguez-R LM, Phillippy AM, Konstantinidis KT, Aluru S. High throughput ANI analysis of 90K prokaryotic genomes reveals clear species boundaries. Nat Commun. 2018;9(1):5114. Published 2018 Nov 30. DOI:10.1038/s41467-018-07641-9
    • Hyatt D, Chen GL, Locascio PF, Land ML, Larimer FW, Hauser LJ. Prodigal: prokaryotic gene recognition and translation initiation site identification. BMC Bioinformatics. 2010;11:119. Published 2010 Mar 8. DOI:10.1186/1471-2105-11-119
    • Price MN, Dehal PS, Arkin AP. FastTree 2--approximately maximum-likelihood trees for large alignments. PLoS One. 2010;5(3):e9490. Published 2010 Mar 10. DOI:10.1371/journal.pone.0009490 link: https://www.ncbi.nlm.nih.gov/pmc/articles/PMC2835736/
    • Eddy SR. Accelerated Profile HMM Searches. PLoS Comput Biol. 2011;7(10):e1002195. DOI:10.1371/journal.pcbi.1002195
  5. Insert Genome Into SpeciesTree - v2.2.0
    • Price MN, Dehal PS, Arkin AP. FastTree 2 Approximately Maximum-Likelihood Trees for Large Alignments. PLoS One. 2010;5. doi:10.1371/journal.pone.0009490