Generated July 6, 2020

Ecogenomics of groundwater viruses suggests niche differentiation linked to specific environmental tolerance

Ankita Kothari, Simon Roux, Hanqiao Zhang, Anatori Prieto, John-Marc Chandonia, Sarah Spencer, Xiaoqin Wu, Adam M. Deutschbauer, Adam P. Arkin, Eric J. Alm, Romy Chakraborty, Aindrila Mukhopadhyay

Submitted to mBio

Table of Contents

  1. Methods
  2. References

This Narrative contains data for the 261 ENIGMA isolates described in the paper. See Table S2 for a complete list.


Computational Pipeline


Libraries were sequenced on an Illumina NextSeq producing 2x150 bp paired-end reads. Each sample contained 2,071,301 ± 409,888 reads, excluding one failed sample with < 2,000 reads.

The raw reads for each isolate were uploaded to this Narrative.
The PairedEndLibrary objects are called ISOLATE_NAME-alm-2017-05-19.reads

Adapter removal with Cutadapt

The program Cutadapt v1.12 was used to remove adapter sequences with parameters -a CTGTCTCTTAT -A CTGTCTCTTAT (Martin, 2011).

Read Trimming with Trimmomatic

The Illumina sequencing reads were trimmed using Trimmomatic 0.36, with parameters "-phred33 LEADING:3 TRAILING:3 SLIDINGWINDOW:4:15 MINLEN:36 ILLUMINACLIP:TruSeq3-PE.fa" (Bolger et al., 2014).

Assembly with SPAdes

The trimmed reads were assembled de novo using SPAdes v3.9.0 with parameters "-k 21,33,55,77" (Bankevich et al., 2012).

The contigs for each isolate were uploaded to this Narrative.

Annotation with Prokka

Genes were identified using Prokka v1.12, with default parameters (Seemann, 2014).

The annotated genome for each isolate were uploaded to this Narrative.

GTDB-Tk classification

This was run within KBase to obtain taxonomic assignments for each genome; see information and reference in the app cell below.

Obtain objective taxonomic assignments for bacterial and archaeal genomes based on the Genome Taxonomy Database (GTDB) ver 1.1.0
This app completed without errors in 6h 31m 13s.


  1. Martin M. (2011) Cutadapt removes adapter sequences from high-throughput sequencing reads. EMBnet.journal. 2011;17: 10 12. doi:10.14806/ej.17.1.200
  2. Bolger, A.M., Lohse, M., and Usadel, B. (2014). Trimmomatic: a flexible trimmer for Illumina sequence data. Bioinformatics 30(15), 2114-2120.
  3. Bankevich, A., Nurk, S., Antipov, D., Gurevich, A.A., Dvorkin, M., Kulikov, A.S., et al. (2012). SPAdes: a new genome assembly algorithm and its applications to single-cell sequencing. Journal of computational biology 19(5), 455-477.
  4. Seemann, T. (2014). Prokka: rapid prokaryotic genome annotation. Bioinformatics 30(14), 2068-2069.


  1. GTDB-Tk classify
    • Pierre-Alain Chaumeil, Aaron J Mussig, Philip Hugenholtz, Donovan H Parks, GTDB-Tk: a toolkit to classify genomes with the Genome Taxonomy Database, Bioinformatics, Volume 36, Issue 6, 15 March 2020, Pages 1925 1927. DOI:
    • Parks, D., Chuvochina, M., Waite, D. et al. A standardized bacterial taxonomy based on genome phylogeny substantially revises the tree of life. Nat Biotechnol 36, 996 1004 (2018). DOI:
    • Parks DH, Chuvochina M, Chaumeil PA, Rinke C, Mussig AJ, Hugenholtz P. A complete domain-to-species taxonomy for Bacteria and Archaea [published online ahead of print, 2020 Apr 27]. Nat Biotechnol. 2020;10.1038/s41587-020-0501-8. DOI:10.1038/s41587-020-0501-8
    • Matsen FA, Kodner RB, Armbrust EV. pplacer: linear time maximum-likelihood and Bayesian phylogenetic placement of sequences onto a fixed reference tree. BMC Bioinformatics. 2010;11:538. Published 2010 Oct 30. doi:10.1186/1471-2105-11-538
    • Jain C, Rodriguez-R LM, Phillippy AM, Konstantinidis KT, Aluru S. High throughput ANI analysis of 90K prokaryotic genomes reveals clear species boundaries. Nat Commun. 2018;9(1):5114. Published 2018 Nov 30. DOI:10.1038/s41467-018-07641-9
    • Hyatt D, Chen GL, Locascio PF, Land ML, Larimer FW, Hauser LJ. Prodigal: prokaryotic gene recognition and translation initiation site identification. BMC Bioinformatics. 2010;11:119. Published 2010 Mar 8. DOI:10.1186/1471-2105-11-119
    • Price MN, Dehal PS, Arkin AP. FastTree 2--approximately maximum-likelihood trees for large alignments. PLoS One. 2010;5(3):e9490. Published 2010 Mar 10. DOI:10.1371/journal.pone.0009490 link:
    • Eddy SR. Accelerated Profile HMM Searches. PLoS Comput Biol. 2011;7(10):e1002195. DOI:10.1371/journal.pcbi.1002195