Ecogenomics of groundwater viruses suggests niche differentiation linked to specific environmental tolerance

Ankita Kothari, Simon Roux, Hanqiao Zhang, Anatori Prieto, John-Marc Chandonia, Sarah Spencer, Xiaoqin Wu, Adam M. Deutschbauer, Adam P. Arkin, Eric J. Alm, Romy Chakraborty, Aindrila Mukhopadhyay

Submitted to mBio

Table of Contents

  1. Methods
  2. References

This Narrative contains data for the 261 ENIGMA isolates described in the paper. See Table S2 for a complete list.


Computational Pipeline


Libraries were sequenced on an Illumina NextSeq producing 2x150 bp paired-end reads. Each sample contained 2,071,301 ± 409,888 reads, excluding one failed sample with < 2,000 reads.

The raw reads for each isolate were uploaded to this Narrative.
The PairedEndLibrary objects are called ISOLATE_NAME-alm-2017-05-19.reads

Adapter removal with Cutadapt

The program Cutadapt v1.12 was used to remove adapter sequences with parameters -a CTGTCTCTTAT -A CTGTCTCTTAT (Martin, 2011).

Read Trimming with Trimmomatic

The Illumina sequencing reads were trimmed using Trimmomatic 0.36, with parameters "-phred33 LEADING:3 TRAILING:3 SLIDINGWINDOW:4:15 MINLEN:36 ILLUMINACLIP:TruSeq3-PE.fa" (Bolger et al., 2014).

Assembly with SPAdes

The trimmed reads were assembled de novo using SPAdes v3.9.0 with parameters "-k 21,33,55,77" (Bankevich et al., 2012).

The contigs for each isolate were uploaded to this Narrative.

Annotation with Prokka

Genes were identified using Prokka v1.12, with default parameters (Seemann, 2014).

The annotated genome for each isolate were uploaded to this Narrative.

GTDB-Tk classification

This was run within KBase to obtain taxonomic assignments for each genome; see information and reference in the app cell below.

Obtain objective taxonomic assignments for bacterial and archaeal genomes based on the Genome Taxonomy Database (GTDB) ver 1.1.0
This app completed without errors in 6h 31m 13s.


