Generated December 5, 2022

Classifying the Genome of Bacillus Mobilis for Electronic Waste Recycling

Abstract

In this experiment, a sample of bacteria was isolated from the environment and grown in a laboratory setting. From here, it underwent a series of tests in order to isolate the microbes DNA, quantify it, then sequence its genome in order to determine the species of the unknown bacterium. Using this procedure, the species of the organism was determined to be Bacillus mobilis, where it was then analyzed to understand the role of this organism in electronic waste recycling.

Introduction

Bacillus mobilis is a Gram positive, mesophilic, facultative anaerobe that is able to survive in a variety of conditions (1). It is a bacterium that belongs to the Bacillus genus, and is able to withstand harsh environments, including high acidity and in the presence of toxic substances, such as arsenic (2). For this experiment, NC State researchers Dr. Amy Grunden and Jason Whitham retrieved the organism from a community of microbes inhabiting the soil of an acid mine drainage site. The drainage area was around room temperature, and supported the growth of pokeweed plants, which are known to grow in low pH soils. This environment supported the growth of B. mobilis, as they are able to metabolize a variety of nutrients and produce acids (1). With these characteristics, B. mobilis demonstrates to be a potential organism that may assist in electronic waste recycling through the degradation and isolation of precious metals.

The publication by K. Maynard, G. Thompson, and V. Prew can be found here: [https://docs.google.com/document/d/1KB54U7xRq45fDnwUfIRbQAbvu3hKqzntOp6mZgozJTc/edit?usp=sharing]

Table of Contents

  1. Background and Experimental Methods
  2. Import and annotation
  3. QC, Assembly, and Annotation
  4. Taxonomic Classification
  5. References
Narrative created by: Kaylie Maynard, Gabriel Thompson, Versace Prew

Background and Experimental Methods

The microbes being studied were collected from an acid mine drainage site in collaboration with NC State researchers Dr. Amy Grunden and Jason Whitham. Soil was taken from areas of about room temperature where pokeweed plants grew and a culture of different microbes were isolated. From here, the microbes were cultured in Tryptic Soy Broth (TSB) and agar (TSA) plates and streaked in order to form singular colonies. The first characterization of the microbe was a Biolog GEN III phenotypic characterization. Upon the incubation of these plates at 28℃, microbes were selected from a singular location on the agar plate and placed into Biolog inoculating fluid. This was done so in the hopes that one species of microbe would be isolated for the study of its DNA. Correct concentrations of bacteria were obtained through the use of a turbidimeter set at 95% turbidity. From here, the microbe was placed into a Gen III MicroPlate from company BiOLOG in order to phenotypically characterize the microbe. This is done through the use of a purple metabolic dye, which is used to assess the metabolic activity of the microbe over an incubation window. The turbidimeter was used in order to control for the microbial concentration per unit volume so that the results of the dye were not influenced by the number of microbes. The microbial solution was pipetted into various wells on the 96-well plate, each containing a different test for phenotypic characteristics. These tests include bacterial ‘food’ such as carbon, nitrogen, sulfur, and phosphorus sources, as well as various environments with ranging pH levels and chemical exposures. Examples of stressors in this assay include

  • Sugars (fructose, galactose, sucrose, etc.)
  • Salts (various concentrations of sodium chloride, sodium lactate)
  • Acids (n-acetyl neuraminic acid, aspartic acid, mucic acid)
  • pH levels (5, 6)
71 of the wells are used as carbon-source assays, 23 as chemical-sensitivity assays, and two as positive and negative controls. The degree to which the microbe could prosper in each well was determined through the strength of the purple metabolic dye (4).

Next was the isolation of High-molecular Weight (HMW) DNA. Genomics DNA (gDNA) was isolated through the enzymatic lysis process using the NEB Monarch HMW kit for tissues and cells. After lysing the cells, DNA was collected through the use of glass beads, which attract DNA by means of its electric charge. The DNA was washed several times before being collected in a solution and incubated in the lab.

As another means of microbial identification, the 16S rRNA gene was sequenced by instructors Dr. Carlos Goller and Dr. Carly Sjogren. Compared to Nanopore sequencing, 16S Sanger sequencing is a lower throughput process that uses fluorophore techniques rather than measuring current. Similar to how the COVID-19 tests worked, Sanger sequencing relies on polymerase chain reaction (PCR) amplification of the gene sequence. Here, primers are used to define the region of DNA that is to be amplified. After amplification, the DNA was cleaned using the Qiagen DNA Clean Up Kit. The obtained 16S gene was then analyzed through BLAST identification (5). There were a few exact species that had close matches, but from this identification process the microbe was determined to be of the Bacillus genus. Below are the closest results for possible species:

  • Bacillus mobilis
  • Bacillus toyonensis
  • Bacillus cereus
  • Bacillus thuringiensis
  • Bacillus wiedmannii
From here, long-read sequencing was performed on our previously isolated DNA using the Oxford Nanopore Technologies Minion Sequencer. In the Minion sequencer, strands of DNA pass through pores which can detect bases from changes in current. “Barcodes” are appended to the ends of DNA strands to allow for identification. In this trial, the DNA from all three groups were combined and sequenced simultaneously. The sequencer ran for a total of 72 hours and sequenced 7.58Gb of bases that passed quality control.

However, Nanopore sequencing did not yield tangible results. This may be due to a lack of large read lengths (~10-100kb) for which the Nanopore-based assemblers are designed. Though we obtained a lot of raw data, the mean read length was ~750b, which is very small for Nanopore sequencing. The lack of large, contiguous pieces of DNA may have inhibited the assembly of DNA scaffolds and eventually a genome.

QC, Assembly, and Annotation

Throughout the process, we encountered a lot of trouble with importing and assembling data from the Nanopore sequencer. This may be due to a lack of large read lengths (~10-100kb) for which the Nanopore-based assemblers are designed. Though we obtained a lot of raw data, the mean read length was ~750b, which is very small for Nanopore sequencing. The lack of large, contiguous pieces of DNA may have inhibited the assembly of DNA scaffolds and eventually a genome.

To mitigate this issue, the genome of Bacillus mobilus was used for annotation and a psuedo-taxonomic idenitification. This genome was chosen because it was the closest match to our microbial species in 16S Sanger Sequencing. The sequence was obtained from the NIH National Center for Biotechnology Information, a repository of genomes and protein sequences, among other information. For more information, view https://www.ncbi.nlm.nih.gov/genome/?term=Bacillus+mobilis.

For the gene annotation, the RASTtk and Prokka apps were used. From here, various heavy metals were used as keywords in the search tables to find genes for heavy metal uptake.

from biokbase.narrative.jobs.appmanager import AppManager
AppManager().run_app_batch(
    [{
        "app_id": "kb_uploadmethods/import_fasta_as_assembly_from_staging",
        "tag": "release",
        "version": "5b9346463df88a422ff5d4f4cba421679f63c73f",
        "params": [{
            "staging_file_subdir_path": "GCF_900177005.1_Bcereus.16-00174_genomic.fna",
            "assembly_name": "GCF_900177005.1_Bcereus.16-00174_genomic.fna_assembly"
        }],
        "shared_params": {
            "type": "sag",
            "min_contig_length": 500
        }
    }],
    cell_id="7d89c37a-dee2-4d63-be88-826960509c36",
    run_id="d646d395-6ab2-4cd8-ade3-171c5ab5e14f"
)
Annotate or re-annotate genome/assembly using RASTtk (Rapid Annotations using Subsystems Technology toolkit).
This app completed without errors in 7m 12s.
Objects
Created Object Name Type Description
AnnotatedAssembly Genome RAST re-annotated genome
Summary
The RAST algorithm was applied to annotating a genome sequence comprised of 106 contigs containing 5689796 nucleotides. No initial gene calls were provided. Standard features were called using: glimmer3; prodigal. A scan was conducted for the following additional feature types: rRNA; tRNA; selenoproteins; pyrrolysoproteins; repeat regions; crispr. The genome features were functionally annotated using the following algorithm(s): Kmers V2; Kmers V1; protein similarity. In addition to the remaining original 0 coding features and 0 non-coding features, 6102 new features were called, of which 192 are non-coding. Output genome has the following feature types: Coding gene 5910 Non-coding crispr_array 1 Non-coding crispr_repeat 4 Non-coding crispr_spacer 3 Non-coding repeat 118 Non-coding rna 66 The number of distinct functions can exceed the number of genes because some genes have multiple functions.
Links
Annotate Assembly and Re-annotate Genomes with Prokka annotation pipeline.
This app completed without errors in 3m 7s.
Objects
Created Object Name Type Description
AnnotatedGenome Genome Annotated Genome
Summary
Annotated Genome saved to: gabe_t_:narrative_1668541084214/AnnotatedGenome Number of genes predicted: 5788 Number of protein coding genes: 5715 Number of genes with non-hypothetical function: 3097 Number of genes with EC-number: 1173 Number of genes with Seed Subsystem Ontology: 0 Average protein length: 278 aa.
Output from Annotate Assembly and Re-annotate Genomes with Prokka - v1.14.5
The viewer for the output created by this App is available at the original Narrative here: https://narrative.kbase.us/narrative/131578

References

1) Y. Liu. “Bacillus mobilis 0711P9-1”. BacDive. 2017. https://bacdive.dsmz.de/strain/140964 This source provides basic characteristics about our species, Bacillus Mobilis, which is used to help understand our results.

2) A.S. Ayangbenro, O.O. Babalola. “Genomic analysis of Bacillus cereus NWUAB01 and its heavy metal removal from polluted soil.” Scientific Reports. 10, 19660 (2020). https://doi.org/10.1038/s41598-020-75170-x

This research provides information about the genus of our microorganism and how it can be used in rare metal isolation.

3) USA.gov. “Bacillus mobilis”. National Library of Medicine. https://www.ncbi.nlm.nih.gov/genome/?term=Bacillus+mobilis

4) “Gen III MicroPlate Instructions for use.” BiOLOG.com. 1-8 (October 2016). https://www.biolog.com/wp-content/uploads/2020/04/00P_185_GEN_III_MicroPlate_IFU.pdf

This document gives information about using Biolog plates and what the test demonstrates about the organism.

5) Basic Local Alignment Search Tool, National Library of Medicine https://blast.ncbi.nlm.nih.gov/Blast.cgi

Apps

  1. Annotate Assembly and Re-annotate Genomes with Prokka - v1.14.5
    • Seemann T. Prokka: rapid prokaryotic genome annotation. Bioinformatics. 2014;30: 2068 2069. doi:10.1093/bioinformatics/btu153
  2. Annotate Genome/Assembly with RASTtk - v1.073
    • [1] Aziz RK, Bartels D, Best AA, DeJongh M, Disz T, Edwards RA, et al. The RAST Server: Rapid Annotations using Subsystems Technology. BMC Genomics. 2008;9: 75. doi:10.1186/1471-2164-9-75
    • [2] Overbeek R, Olson R, Pusch GD, Olsen GJ, Davis JJ, Disz T, et al. The SEED and the Rapid Annotation of microbial genomes using Subsystems Technology (RAST). Nucleic Acids Res. 2014;42: D206 D214. doi:10.1093/nar/gkt1226
    • [3] Brettin T, Davis JJ, Disz T, Edwards RA, Gerdes S, Olsen GJ, et al. RASTtk: A modular and extensible implementation of the RAST algorithm for building custom annotation pipelines and annotating batches of genomes. Sci Rep. 2015;5. doi:10.1038/srep08365
    • [4] Kent WJ. BLAT The BLAST-Like Alignment Tool. Genome Res. 2002;12: 656 664. doi:10.1101/gr.229202
    • [5] Altschul SF, Madden TL, Schaffer AA, Zhang J, Zhang Z, Miller W, Lipman DJ. Gapped BLAST and PSI-BLAST: a new generation of protein database search programs. Nucleic Acids Res. 1997;25: 3389-3402. doi:10.1093/nar/25.17.3389
    • [6] Lowe TM, Eddy SR. tRNAscan-SE: a program for improved detection of transfer RNA genes in genomic sequence. Nucleic Acids Res. 1997;25: 955 964.
    • [7] Cobucci-Ponzano B, Rossi M, Moracci M. Translational recoding in archaea. Extremophiles. 2012;16: 793 803. doi:10.1007/s00792-012-0482-8
    • [8] Meyer F, Overbeek R, Rodriguez A. FIGfams: yet another set of protein families. Nucleic Acids Res. 2009;37 6643-54. doi:10.1093/nar/gkp698.
    • [9] van Belkum A, Sluijuter M, de Groot R, Verbrugh H, Hermans PW. Novel BOX repeat PCR assay for high-resolution typing of Streptococcus pneumoniae strains. J Clin Microbiol. 1996;34: 1176 1179.
    • [10] Croucher NJ, Vernikos GS, Parkhill J, Bentley SD. Identification, variation and transcription of pneumococcal repeat sequences. BMC Genomics. 2011;12: 120. doi:10.1186/1471-2164-12-120
    • [11] Hyatt D, Chen G-L, Locascio PF, Land ML, Larimer FW, Hauser LJ. Prodigal: prokaryotic gene recognition and translation initiation site identification. BMC Bioinformatics. 2010;11: 119. doi:10.1186/1471-2105-11-119
    • [12] Delcher AL, Bratke KA, Powers EC, Salzberg SL. Identifying bacterial genes and endosymbiont DNA with Glimmer. Bioinformatics. 2007;23: 673 679. doi:10.1093/bioinformatics/btm009
    • [13] Akhter S, Aziz RK, Edwards RA. PhiSpy: a novel algorithm for finding prophages in bacterial genomes that combines similarity- and composition-based strategies. Nucleic Acids Res. 2012;40: e126. doi:10.1093/nar/gks406
  3. Classify Microbes with GTDB-Tk - v1.7.0
    • Pierre-Alain Chaumeil, Aaron J Mussig, Philip Hugenholtz, Donovan H Parks, GTDB-Tk: a toolkit to classify genomes with the Genome Taxonomy Database, Bioinformatics, Volume 36, Issue 6, 15 March 2020, Pages 1925 1927. DOI: https://doi.org/10.1093/bioinformatics/btz848
    • Parks, D., Chuvochina, M., Waite, D. et al. A standardized bacterial taxonomy based on genome phylogeny substantially revises the tree of life. Nat Biotechnol 36, 996 1004 (2018). DOI: https://doi.org/10.1038/nbt.4229
    • Parks DH, Chuvochina M, Chaumeil PA, Rinke C, Mussig AJ, Hugenholtz P. A complete domain-to-species taxonomy for Bacteria and Archaea. Nat Biotechnol. 2020;10.1038/s41587-020-0501-8. DOI:10.1038/s41587-020-0501-8
    • Rinke C, Chuvochina M, Mussig AJ, Chaumeil PA, Dav n AA, Waite DW, Whitman WB, Parks DH, and Hugenholtz P. A standardized archaeal taxonomy for the Genome Taxonomy Database. Nat Microbiol. 2021 Jul;6(7):946-959. DOI:10.1038/s41564-021-00918-8
    • Matsen FA, Kodner RB, Armbrust EV. pplacer: linear time maximum-likelihood and Bayesian phylogenetic placement of sequences onto a fixed reference tree. BMC Bioinformatics. 2010;11:538. Published 2010 Oct 30. doi:10.1186/1471-2105-11-538
    • Jain C, Rodriguez-R LM, Phillippy AM, Konstantinidis KT, Aluru S. High throughput ANI analysis of 90K prokaryotic genomes reveals clear species boundaries. Nat Commun. 2018;9(1):5114. Published 2018 Nov 30. DOI:10.1038/s41467-018-07641-9
    • Hyatt D, Chen GL, Locascio PF, Land ML, Larimer FW, Hauser LJ. Prodigal: prokaryotic gene recognition and translation initiation site identification. BMC Bioinformatics. 2010;11:119. Published 2010 Mar 8. DOI:10.1186/1471-2105-11-119
    • Price MN, Dehal PS, Arkin AP. FastTree 2--approximately maximum-likelihood trees for large alignments. PLoS One. 2010;5(3):e9490. Published 2010 Mar 10. DOI:10.1371/journal.pone.0009490 link: https://www.ncbi.nlm.nih.gov/pmc/articles/PMC2835736/
    • Eddy SR. Accelerated Profile HMM Searches. PLoS Comput Biol. 2011;7(10):e1002195. DOI:10.1371/journal.pcbi.1002195