Generated December 10, 2024

Draft Genome Sequence of Exiguobacterium indicum strain LL15 from a urban freshwater lake in Northern Kentucky

Elisha J. Redman and Joshua T. Cooper ORCID

Department of Biological Sciences, Northern Kentucky University, 1 Nunn Drive, Highland Heights, KY 41099, USA

Introduction

Exiguobacterium indicum has often reported from a wide variety of environmental samples, ranging from extreme to common habitats1,2. Here we report the 3.1-Mb draft genome of E. indicum strain LL15, discovered during an investigation of freshwater lakes in residential-urban environments.

Table of Contents

  1. Background and Experimental Methods
  2. Import and annotation
  3. QC, Assembly, and Annotation
  4. References
Narrative created by: Elisha Redman and Joshua T. Cooper

External Data Availability

  • The data were deposited in the Sequence Read Archive (SRA) as SRR19987099 under BioProject PRJNA734631, and BioSample SAMN28844364.
  • This Whole Genome Shotgun project has been deposited at GenBank under the accession JANFNX000000000. The version described in this paper is version JANFNX010000000.
  • Linked publication:

The publication may not be available at the time of the static Narrative creation. This can be added after the fact; please contact [email protected] to update the DOI landing page when this is done.

from biokbase.narrative.jobs.appmanager import AppManager
AppManager().run_app_bulk(
    [{
        "app_id": "kb_uploadmethods/import_fastq_noninterleaved_as_reads_from_staging",
        "tag": "release",
        "version": "31e93066beb421a51b9c8e44b1201aa93aea0b4e",
        "params": [{
            "fastq_fwd_staging_file_name": "LL15_S182_R1_001.fastq.gz",
            "fastq_rev_staging_file_name": "LL15_S182_R2_001.fastq.gz",
            "name": "LL15reads",
            "sequencing_tech": "Illumina",
            "single_genome": 1,
            "read_orientation_outward": 0,
            "insert_size_std_dev": None,
            "insert_size_mean": None
        }]
    }],
    cell_id="25ece73e-368d-46da-90ef-3fa85fffcf88",
    run_id="d3c5920f-acd6-4914-820f-d39065521349"
)
v1 - KBaseFile.PairedEndLibrary-2.1
The viewer for the data in this Cell is available at the original Narrative here: https://narrative.kbase.us/narrative/124597

Background and Experimental Methods

Sample Collection

Exiguobacterium indicum, strain LL15 was isolated from Camp Ernst Lake in Burlington, KY using a sterile WhirlPak bag on a surface level grab.

Isolation

The water sample was serially diluted and spread onto tryptic soy agar (TSA) and was then incubated for 48 hours at 25°C. Individual colonies were purified to single isolates on TSA media using the quadrant streak method

Genome Sequencing

Genomic DNA was isolated using the UltraClean® Microbial DNA Isolation Kit (Qiagen, Maryland, USA) from tryptic soy broth inoculated from an isolated colony. DNA was quantified using the Qubit 3.0 Broad Range Kit (Invitrogen) and the sample was sequenced at the Microbial Genome Center (MiGS, University of Pittsburgh, PA), using the Illumina Nextera chemistry, as 150 base paired-end reads on the Illumina NextSeq 2000. Raw data were processed using the KBase v2.3.2 3 web interface from quality control to genome assembly using default settings unless otherwise noted. Reads were quality controlled using FastQC v0.119 4, and then trimmed with Trimmomatic v0.36 5 and Nextera Adapters. The genome was assembled using SPAdes v3.15.3 6 , a de Bruijn graph assembler into 10 contigs, with a total length of 3,102,801 bp, and an N50 of 2,120,415 bp. CheckM v1.0.18 7 was used to estimate the genome completeness and contamination. The genome of species LL15 was 99.34% complete and had 0.0% contamination.

QC, Assembly, and Annotation

Raw data were processed using the KBase v2.3.2 web interface from quality control to genome assembly using default settings unless otherwise noted. Reads were quality controlled using FastQC v0.119, and then trimmed with Trimmomatic v0.36 and Nextera Adapters.

The genome was assembled using SPAdes v3.15.3 , a de Bruijn graph assembler into 10 contigs, with a total length of 3,102,801 bp, and an N50 of 2,120,415 bp. CheckM v1.0.18 was used to estimate the genome completeness and contamination. The genome of species LL15 was 99.34% complete and had 0.0% contamination.

Proteins were initially predicted using RASTtk v1.073 and Prokka v1.14.5. NCBI Prokaryotic Genome Annotation8 was used to make the final protein predictions to be compliant with GenBank submission.

A quality control application for high throughput sequence data.
This app completed without errors in 2m 24s.
Links
Files
These are only available in the live Narrative: https://narrative.kbase.us/narrative/124597
  • LL15reads_trim_paired_108124_5_1.fwd_fastqc.zip - Zip file generated by fastqc that contains original images seen in the report
  • LL15reads_trim_paired_108124_5_1.rev_fastqc.zip - Zip file generated by fastqc that contains original images seen in the report
Trim paired- or single-end Illumina reads with Trimmomatic.
This app completed without errors in 16m 27s.
Objects
Created Object Name Type Description
LL15reads_trim_paired PairedEndLibrary Trimmed Reads
LL15reads_trim_unpaired_fwd SingleEndLibrary Trimmed Unpaired Forward Reads
LL15reads_trim_unpaired_rev SingleEndLibrary Trimmed Unpaired Reverse Reads
A quality control application for high throughput sequence data.
This app completed without errors in 1m 46s.
Links
Files
These are only available in the live Narrative: https://narrative.kbase.us/narrative/124597
  • LL15reads_108124_2_1.fwd_fastqc.zip - Zip file generated by fastqc that contains original images seen in the report
  • LL15reads_108124_2_1.rev_fastqc.zip - Zip file generated by fastqc that contains original images seen in the report
Assemble reads using the SPAdes assembler.
This app completed without errors in 13m 30s.
Objects
Created Object Name Type Description
SPAdes.Assembly Assembly Assembled contigs
Summary
Assembly saved to: redmane:narrative_1643665960882/SPAdes.Assembly Assembled into 10 contigs. Avg Length: 310280.1 bp. Contig Length Distribution (# of contigs -- min to max basepairs): 6 -- 933.0 to 218881.2 bp 3 -- 218881.2 to 436829.4 bp 0 -- 436829.4 to 654777.6000000001 bp 0 -- 654777.6000000001 to 872725.8 bp 0 -- 872725.8 to 1090674.0 bp 0 -- 1090674.0 to 1308622.2000000002 bp 0 -- 1308622.2000000002 to 1526570.4000000001 bp 0 -- 1526570.4000000001 to 1744518.6 bp 0 -- 1744518.6 to 1962466.8 bp 1 -- 1962466.8 to 2180415.0 bp
Links
Annotate or re-annotate genome/assembly using RASTtk (Rapid Annotations using Subsystems Technology toolkit).
This app completed without errors in 12m 44s.
Objects
Created Object Name Type Description
LL15_RASTtkGenomeAnnotation Genome RAST re-annotated genome
Summary
The RAST algorithm was applied to annotating a genome sequence comprised of 10 contigs containing 3102801 nucleotides. No initial gene calls were provided. Standard gene features were called using: prodigal; glimmer3. A scan was conducted for the following additional feature types: rRNA; tRNA; selenoproteins; pyrrolysoproteins; repeat regions; crispr. The genome features were functionally annotated using the following algorithm(s): Kmers V2; Kmers V1; protein similarity. In addition to the remaining original 0 coding features and 0 non-coding features, 3304 new features were called, of which 61 are non-coding. Output genome has the following feature types: Coding gene 3243 Non-coding prophage 4 Non-coding repeat 11 Non-coding rna 46 Overall, the genes have 2105 distinct functions The genes include 1461 genes with a SEED annotation ontology across 1050 distinct SEED functions. The number of distinct functions can exceed the number of genes because some genes have multiple functions.
Links
Annotate Assembly and Re-annotate Genomes with Prokka annotation pipeline.
This app completed without errors in 3m 39s.
Objects
Created Object Name Type Description
LL15_ProkkaAnnotation Genome Annotated Genome
Summary
Annotated Genome saved to: redmane:narrative_1643665960882/LL15_ProkkaAnnotation Number of genes predicted: 3248 Number of protein coding genes: 3202 Number of genes with non-hypothetical function: 1882 Number of genes with EC-number: 796 Number of genes with Seed Subsystem Ontology: 0 Average protein length: 289 aa.
Runs the CheckM lineage workflow to assess the genome quality of isolates, single cells, or genome bins from metagenome assemblies through comparison to an existing database of genomes.
This app completed without errors in 7m 15s.
Links
Files
These are only available in the live Narrative: https://narrative.kbase.us/narrative/124597
  • CheckM_summary_table.tsv.zip - TSV Summary Table from CheckM
  • full_output.zip - Full output of CheckM
  • plots.zip - Output plots from CheckM

Taxonomic Identification

The taxonomy of strain LL115 was explored using GTDB-Tk v1.7.0. GTDB-Tk useds metagenome-assembled genomes (MAGS) to be obtained from environmental samples. Insert Genome Into SpeciesTree v2.2.0 compared strain LL15 to the public genomes within KBase. The average coverage of the genome was 135.4003, determined using Qualimap v1.1.2 and BowTie2 v2.3.2. DRAM v0.1.0 annotated the assembly and summarized the various metabolisms strain LL15 is capable of.
Obtain objective taxonomic assignments for bacterial and archaeal genomes based on the Genome Taxonomy Database (GTDB) ver R06-RS202
This app completed without errors in 36m 10s.
Links
Add one or more Genomes to a KBase SpeciesTree.
This app completed without errors in 7m 16s.
Links
Files
These are only available in the live Narrative: https://narrative.kbase.us/narrative/124597
  • RASTtee.newick
  • RASTtee-labels.newick
  • RASTtee.png
  • RASTtee.pdf
Align sequencing reads to long reference prokaryotic genome sequences using Bowtie2.
This app completed without errors in 14m 22s.
No output found.
Display BAM quality control information for a ReadsAlignment or ReadsAlignmentSet using QualiMap.
This app completed without errors in 2m 16s.
Annotate your assembly with DRAM. Annotations will then be distilled to create an interactive functional summary per assembly.
This app completed without errors in 40m 25s.
Objects
Created Object Name Type Description
SPAdes.Assembly_DRAM Genome Annotated Genome
Dram_Assembly GenomeSet LL15_Dram
Summary
Here are the results from your DRAM run.
Links
Files
These are only available in the live Narrative: https://narrative.kbase.us/narrative/124597
  • annotations.tsv - DRAM annotations in a tab separate table format
  • genes.fna - Genes as nucleotides predicted by DRAM with brief annotations
  • genes.faa - Genes as amino acids predicted by DRAM with brief annotations
  • genes.gff - GFF file of all DRAM annotations
  • rrnas.tsv - Tab separated table of rRNAs as detected by barrnap
  • trnas.tsv - Tab separated table of tRNAs as detected by tRNAscan-SE
  • genbank.tar.gz - Compressed folder of output genbank files
  • product.tsv - DRAM product in tabular format
  • metabolism_summary.xlsx - DRAM metabolism summary tables
  • genome_stats.tsv - DRAM genome statistics table

References

  1. Chaturvedi P, Shivaji S. 2006. Exiguobacterium indicum sp. nov., a psychrophilic bacterium from the Hamta glacier of the Himalayan mountain ranges of India. International Journal of Systematic and Evolutionary Microbiology 56:2765–2770.
  2. Dastager SG, Mawlankar R, Sonalkar VV, Thorat MN, Mual P, Verma A, Krishnamurthi S, Tang S-K, Li W-J. 2015. Exiguobacterium enclense sp. nov., isolated from sediment. International Journal of Systematic and Evolutionary Microbiology 65:1611–1616.
  3. Arkin AP, Cottingham RW, Henry CS, Harris NL, Stevens RL, Maslov S, Dehal P, Ware D, Perez F, Canon S, Sneddon MW, Henderson ML, Riehl WJ, Murphy-Olson D, Chan SY, Kamimura RT, Kumari S, Drake MM, Brettin TS, Glass EM, Chivian D, Gunter D, Weston DJ, Allen BH, Baumohl J, Best AA, Bowen B, Brenner SE, Bun CC, Chandonia J-M, Chia J-M, Colasanti R, Conrad N, Davis JJ, Davison BH, DeJongh M, Devoid S, Dietrich E, Dubchak I, Edirisinghe JN, Fang G, Faria JP, Frybarger PM, Gerlach W, Gerstein M, Greiner A, Gurtowski J, Haun HL, He F, Jain R, Joachimiak MP, Keegan KP, Kondo S, Kumar V, Land ML, Meyer F, Mills M, Novichkov PS, Oh T, Olsen GJ, Olson R, Parrello B, Pasternak S, Pearson E, Poon SS, Price GA, Ramakrishnan S, Ranjan P, Ronald PC, Schatz MC, Seaver SMD, Shukla M, Sutormin RA, Syed MH, Thomason J, Tintle NL, Wang D, Xia F, Yoo H, Yoo S, Yu D. 2018. KBase: The United States Department of Energy Systems Biology Knowledgebase. 7. Nat Biotechnol 36:566–569.
  4. FastQC source: Bioinformatics Group at the Babraham Institute, UK. http://www.bioinformatics.babraham.ac.uk/projects/fastqc/
  5. Bolger AM, Lohse M, Usadel B. Trimmomatic: a flexible trimmer for Illumina sequence data. Bioinformatics. 2014;30: 2114 2120. doi:10.1093/bioinformatics/btu170
  6. Bankevich A, Nurk S, Antipov D, Gurevich AA, Dvorkin M, Kulikov AS, et al. SPAdes: A New Genome Assembly Algorithm and Its Applications to Single-Cell Sequencing. Journal of Computational Biology. 2012;19: 455-477. doi: 10.1089/cmb.2012.0021
  7. Parks DH, Imelfort M, Skennerton CT, Hugenholtz P, Tyson GW. CheckM: assessing the quality of microbial genomes recovered from isolates, single cells, and metagenomes. Genome Res. 2015;25: 1043 1055. doi:10.1101/gr.186072.114
  8. Tatusova T, DiCuccio M, Badretdin A, Chetvernin V, Nawrocki EP, Zaslavsky L, Lomsadze A, Pruitt KD, Borodovsky M, Ostell J. 2016. NCBI prokaryotic genome annotation pipeline. 14. Nucleic Acids Res 44:6614–6624.

Apps

  1. Align Reads using Bowtie2 - v2.3.2
    • Langmead B, Salzberg SL. Fast gapped-read alignment with Bowtie 2. Nat Methods. 2012;9: 357 359. doi:10.1038/nmeth.1923
    • Langmead B, Trapnell C, Pop M, Salzberg SL. Ultrafast and memory-efficient alignment of short DNA sequences to the human genome. Genome Biol. 2009;10: R25. doi:10.1186/gb-2009-10-3-r25
  2. Annotate and Distill Assemblies with DRAM
    • DRAM source code
    • DRAM documentation
    • DRAM Tutorial
    • DRAM publication
  3. Annotate Assembly and Re-annotate Genomes with Prokka - v1.14.5
    • Seemann T. Prokka: rapid prokaryotic genome annotation. Bioinformatics. 2014;30: 2068 2069. doi:10.1093/bioinformatics/btu153
  4. Annotate Genome/Assembly with RASTtk - v1.073
    • [1] Aziz RK, Bartels D, Best AA, DeJongh M, Disz T, Edwards RA, et al. The RAST Server: Rapid Annotations using Subsystems Technology. BMC Genomics. 2008;9: 75. doi:10.1186/1471-2164-9-75
    • [2] Overbeek R, Olson R, Pusch GD, Olsen GJ, Davis JJ, Disz T, et al. The SEED and the Rapid Annotation of microbial genomes using Subsystems Technology (RAST). Nucleic Acids Res. 2014;42: D206 D214. doi:10.1093/nar/gkt1226
    • [3] Brettin T, Davis JJ, Disz T, Edwards RA, Gerdes S, Olsen GJ, et al. RASTtk: A modular and extensible implementation of the RAST algorithm for building custom annotation pipelines and annotating batches of genomes. Sci Rep. 2015;5. doi:10.1038/srep08365
    • [4] Kent WJ. BLAT The BLAST-Like Alignment Tool. Genome Res. 2002;12: 656 664. doi:10.1101/gr.229202
    • [5] Altschul SF, Madden TL, Schaffer AA, Zhang J, Zhang Z, Miller W, Lipman DJ. Gapped BLAST and PSI-BLAST: a new generation of protein database search programs. Nucleic Acids Res. 1997;25: 3389-3402. doi:10.1093/nar/25.17.3389
    • [6] Lowe TM, Eddy SR. tRNAscan-SE: a program for improved detection of transfer RNA genes in genomic sequence. Nucleic Acids Res. 1997;25: 955 964.
    • [7] Cobucci-Ponzano B, Rossi M, Moracci M. Translational recoding in archaea. Extremophiles. 2012;16: 793 803. doi:10.1007/s00792-012-0482-8
    • [8] Meyer F, Overbeek R, Rodriguez A. FIGfams: yet another set of protein families. Nucleic Acids Res. 2009;37 6643-54. doi:10.1093/nar/gkp698.
    • [9] van Belkum A, Sluijuter M, de Groot R, Verbrugh H, Hermans PW. Novel BOX repeat PCR assay for high-resolution typing of Streptococcus pneumoniae strains. J Clin Microbiol. 1996;34: 1176 1179.
    • [10] Croucher NJ, Vernikos GS, Parkhill J, Bentley SD. Identification, variation and transcription of pneumococcal repeat sequences. BMC Genomics. 2011;12: 120. doi:10.1186/1471-2164-12-120
    • [11] Hyatt D, Chen G-L, Locascio PF, Land ML, Larimer FW, Hauser LJ. Prodigal: prokaryotic gene recognition and translation initiation site identification. BMC Bioinformatics. 2010;11: 119. doi:10.1186/1471-2105-11-119
    • [12] Delcher AL, Bratke KA, Powers EC, Salzberg SL. Identifying bacterial genes and endosymbiont DNA with Glimmer. Bioinformatics. 2007;23: 673 679. doi:10.1093/bioinformatics/btm009
    • [13] Akhter S, Aziz RK, Edwards RA. PhiSpy: a novel algorithm for finding prophages in bacterial genomes that combines similarity- and composition-based strategies. Nucleic Acids Res. 2012;40: e126. doi:10.1093/nar/gks406
  5. Assemble Reads with SPAdes - v3.15.3
    • Bankevich A, Nurk S, Antipov D, Gurevich AA, Dvorkin M, Kulikov AS, et al. SPAdes: A New Genome Assembly Algorithm and Its Applications to Single-Cell Sequencing. Journal of Computational Biology. 2012;19: 455-477. doi: 10.1089/cmb.2012.0021
    • Prjibelski A, Antipov D, Meleshko D, Lapidus A, Korobeynikov A. Using SPAdes De Novo Assembler. Curr Protoc Bioinformatics. 2020 Jun;70(1):e102. doi: 10.1002/cpbi.102.
  6. Assess Genome Quality with CheckM - v1.0.18
    • Parks DH, Imelfort M, Skennerton CT, Hugenholtz P, Tyson GW. CheckM: assessing the quality of microbial genomes recovered from isolates, single cells, and metagenomes. Genome Res. 2015;25: 1043 1055. doi:10.1101/gr.186072.114
    • CheckM source:
    • Additional info:
  7. Assess Read Quality with FastQC - v0.12.1
    • FastQC source: Bioinformatics Group at the Babraham Institute, UK.
  8. Assess Reads Alignment Quality using Qualimap - v2.2.1
    • Okonechnikov K, Conesa A, Garc a-Alcalde F. Qualimap 2: advanced multi-sample quality control for high-throughput sequencing data. Bioinformatics. 2016;32: 292 294. doi:10.1093/bioinformatics/btv566
  9. Classify Microbes with GTDB-Tk - v2.3.2
    • Pierre-Alain Chaumeil, Aaron J Mussig, Philip Hugenholtz, Donovan H Parks. GTDB-Tk v2: memory friendly classification with the genome taxonomy database. Bioinformatics, Volume 38, Issue 23, 1 December 2022, Pages 5315 5316. DOI: https://doi.org/10.1093/bioinformatics/btac672
    • Pierre-Alain Chaumeil, Aaron J Mussig, Philip Hugenholtz, Donovan H Parks, GTDB-Tk: a toolkit to classify genomes with the Genome Taxonomy Database, Bioinformatics, Volume 36, Issue 6, 15 March 2020, Pages 1925 1927. DOI: https://doi.org/10.1093/bioinformatics/btz848
    • Donovan H Parks, Maria Chuvochina, Christian Rinke, Aaron J Mussig, Pierre-Alain Chaumeil, Philip Hugenholtz. GTDB: an ongoing census of bacterial and archaeal diversity through a phylogenetically consistent, rank normalized and complete genome-based taxonomy. Nucleic Acids Research, Volume 50, Issue D1, 7 January 2022, Pages D785 D794. DOI: https://doi.org/10.1093/nar/gkab776
    • Parks, D., Chuvochina, M., Waite, D. et al. A standardized bacterial taxonomy based on genome phylogeny substantially revises the tree of life. Nat Biotechnol 36, 996 1004 (2018). DOI: https://doi.org/10.1038/nbt.4229
    • Parks DH, Chuvochina M, Chaumeil PA, Rinke C, Mussig AJ, Hugenholtz P. A complete domain-to-species taxonomy for Bacteria and Archaea. Nat Biotechnol. 2020;10.1038/s41587-020-0501-8. DOI:10.1038/s41587-020-0501-8
    • Rinke C, Chuvochina M, Mussig AJ, Chaumeil PA, Dav n AA, Waite DW, Whitman WB, Parks DH, and Hugenholtz P. A standardized archaeal taxonomy for the Genome Taxonomy Database. Nat Microbiol. 2021 Jul;6(7):946-959. DOI:10.1038/s41564-021-00918-8
    • Chivian D, Jungbluth SP, Dehal PS, Wood-Charlson EM, Canon RS, Allen BH, Clark MM, Gu T, Land ML, Price GA, Riehl WJ, Sneddon MW, Sutormin R, Zhang Q, Cottingham RW, Henry CS, Arkin AP. Metagenome-assembled genome extraction and analysis from microbiomes using KBase. Nat Protoc. 2023 Jan;18(1):208-238. doi: 10.1038/s41596-022-00747-x
    • Matsen FA, Kodner RB, Armbrust EV. pplacer: linear time maximum-likelihood and Bayesian phylogenetic placement of sequences onto a fixed reference tree. BMC Bioinformatics. 2010;11:538. Published 2010 Oct 30. doi:10.1186/1471-2105-11-538
    • Jain C, Rodriguez-R LM, Phillippy AM, Konstantinidis KT, Aluru S. High throughput ANI analysis of 90K prokaryotic genomes reveals clear species boundaries. Nat Commun. 2018;9(1):5114. Published 2018 Nov 30. DOI:10.1038/s41467-018-07641-9
    • Hyatt D, Chen GL, Locascio PF, Land ML, Larimer FW, Hauser LJ. Prodigal: prokaryotic gene recognition and translation initiation site identification. BMC Bioinformatics. 2010;11:119. Published 2010 Mar 8. DOI:10.1186/1471-2105-11-119
    • Price MN, Dehal PS, Arkin AP. FastTree 2--approximately maximum-likelihood trees for large alignments. PLoS One. 2010;5(3):e9490. Published 2010 Mar 10. DOI:10.1371/journal.pone.0009490 link: https://www.ncbi.nlm.nih.gov/pmc/articles/PMC2835736/
    • Eddy SR. Accelerated Profile HMM Searches. PLoS Comput Biol. 2011;7(10):e1002195. DOI:10.1371/journal.pcbi.1002195
    • Ondov BD, Treangen TJ, Melsted P, Mallonee AB, Bergman NH, Koren S, Phillippy AM. Mash: fast genome and metagenome distance estimation using MinHash. Genome Biol. 2016 Jun 20;17(1):132. DOI: 10.1186/s13059-016-0997-x
  10. Insert Genome Into SpeciesTree - v2.2.0
    • Price MN, Dehal PS, Arkin AP. FastTree 2 Approximately Maximum-Likelihood Trees for Large Alignments. PLoS One. 2010;5. doi:10.1371/journal.pone.0009490
  11. Trim Reads with Trimmomatic - v0.36
    • Bolger AM, Lohse M, Usadel B. Trimmomatic: a flexible trimmer for Illumina sequence data. Bioinformatics. 2014;30: 2114 2120. doi:10.1093/bioinformatics/btu170