Generated January 26, 2022

Dethiosulfovibrio faecalis sp. nov., a novel proteolytic, non-sulfur-reducing bacterium isolated from a marine aquaculture solid waste bioreactor

Steven Grabowski, Ethel A. Apolinario, Nicholas Schneider, Christopher W. Marshall and Kevin R. Sowers

Abstract

A new Dethiosulfovibrio strain, designated F2BT, was isolated from an anaerobic digester for treating solid waste from a marine recirculating aquaculture system. The motile, Gram negative, non-spore-forming curved rods were 2-7 μm in length and 1-2 μm in diameter. Growth occurred at temperatures ranging from of 20–40oC with a maximum rate of growth at 40oC. The pH range for growth was 6.0 to 8.0, with a maximum rate of growth at pH 7.5. This isolate was halotolerant, growing in NaCl concentrations ranging 0 to 1.6 M with a maximum rate of growth at 0.4 M. Similarly to the five other described Dethiosulfovibrio spp., this obligate anaerobe isolate was fermentative, capable of utilizing peptides, amino acids and some organic acids for growth, but unlike all other described strains did not reduce thiosulfate or elemental sulfur to hydrogen sulfide during fermentation of organic substrates. The G+C content of 55% is similar to the other described Dethiosulfovibrio spp. Average nucleotide identity of whole genomes showed less than 93% or less sequence similarity between strain F2BT and the five other described Dethiosulfovibrio spp. Differences in the physiological and phylogenetic characteristics between the new strain and other Dethiosulfovibrio spp. indicate that F2BT is a novel species of this genus and the epithet Dethiosulfovibrio faecalis sp. nov. is proposed. The type strain is F2BT (= DSM 112078 T = KCTC 25378T).

Import reads and assemblies from Dethiosulfovibrio species

Import a GenBank file from your staging area into your Narrative as a Genome data object
This app completed without errors in 2m 10s.
Objects
Created Object Name Type Description
D_salsuginis_genome Genome Imported Genome
Links
Import a FASTQ/SRA file into your Narrative as a Reads data object
This app completed without errors in 1m 57s.
Objects
Created Object Name Type Description
dsm12590_S10_R1_001.fastq.gz_reads SingleEndLibrary Imported Reads
Links
Import a FASTQ/SRA file into your Narrative as a Reads data object
This app completed without errors in 29m 22s.
Objects
Created Object Name Type Description
dsm12590_reads PairedEndLibrary Imported Reads
Links
Import a FASTQ/SRA file into your Narrative as a Reads data object
This app completed without errors in 23m 44s.
Objects
Created Object Name Type Description
dsm12538_reads PairedEndLibrary Imported Reads
Links
Import a FASTQ/SRA file into your Narrative as a Reads data object
This app completed without errors in 25m 8s.
Objects
Created Object Name Type Description
F2B_S9_reads PairedEndLibrary Imported Reads
Links
Output from Import GenBank File as Genome from Staging Area
The viewer for the output created by this App is available at the original Narrative here: https://narrative.kbase.us/narrative/107808
Trim paired- or single-end Illumina reads with Trimmomatic.
This app completed without errors in 22m 46s.
Objects
Created Object Name Type Description
dsm12538_trim_reads_paired PairedEndLibrary Trimmed Reads
dsm12538_trim_reads_unpaired_fwd SingleEndLibrary Trimmed Unpaired Forward Reads
dsm12538_trim_reads_unpaired_rev SingleEndLibrary Trimmed Unpaired Reverse Reads
Trim paired- or single-end Illumina reads with Trimmomatic.
This app completed without errors in 16m 7s.
Objects
Created Object Name Type Description
dsm12537_trim_reads_paired PairedEndLibrary Trimmed Reads
dsm12537_trim_reads_unpaired_fwd SingleEndLibrary Trimmed Unpaired Forward Reads
dsm12537_trim_reads_unpaired_rev SingleEndLibrary Trimmed Unpaired Reverse Reads
Trim paired- or single-end Illumina reads with Trimmomatic.
This app completed without errors in 13m 31s.
Objects
Created Object Name Type Description
dsm12590_trim_reads_paired PairedEndLibrary Trimmed Reads
dsm12590_trim_reads_unpaired_fwd SingleEndLibrary Trimmed Unpaired Forward Reads
dsm12590_trim_reads_unpaired_rev SingleEndLibrary Trimmed Unpaired Reverse Reads
Trim paired- or single-end Illumina reads with Trimmomatic.
This app completed without errors in 9m 16s.
Objects
Created Object Name Type Description
f2_paired PairedEndLibrary Trimmed Reads
f2_unpaired_fwd SingleEndLibrary Trimmed Unpaired Forward Reads
f2_unpaired_rev SingleEndLibrary Trimmed Unpaired Reverse Reads
Links
Assemble reads using the SPAdes assembler.
This app completed without errors in 12m 13s.
Objects
Created Object Name Type Description
dsm12590_trim_SPAdes3.15.Assembly Assembly Assembled contigs
Summary
Assembly saved to: nickschneider:narrative_1574347966169/dsm12590_trim_SPAdes3.15.Assembly Assembled into 68 contigs. Avg Length: 39339.117647058825 bp. Contig Length Distribution (# of contigs -- min to max basepairs): 45 -- 508.0 to 33332.1 bp 9 -- 33332.1 to 66156.2 bp 6 -- 66156.2 to 98980.29999999999 bp 4 -- 98980.29999999999 to 131804.4 bp 1 -- 131804.4 to 164628.5 bp 0 -- 164628.5 to 197452.59999999998 bp 1 -- 197452.59999999998 to 230276.69999999998 bp 0 -- 230276.69999999998 to 263100.8 bp 0 -- 263100.8 to 295924.89999999997 bp 2 -- 295924.89999999997 to 328749.0 bp
Links
Assemble reads using the SPAdes assembler.
This app completed without errors in 13m 13s.
Objects
Created Object Name Type Description
dsm12537_trim_SPAdes3.15.Assembly Assembly Assembled contigs
Summary
Assembly saved to: nickschneider:narrative_1574347966169/dsm12537_trim_SPAdes3.15.Assembly Assembled into 41 contigs. Avg Length: 65377.07317073171 bp. Contig Length Distribution (# of contigs -- min to max basepairs): 24 -- 534.0 to 36397.0 bp 4 -- 36397.0 to 72260.0 bp 5 -- 72260.0 to 108123.0 bp 2 -- 108123.0 to 143986.0 bp 2 -- 143986.0 to 179849.0 bp 0 -- 179849.0 to 215712.0 bp 0 -- 215712.0 to 251575.0 bp 1 -- 251575.0 to 287438.0 bp 1 -- 287438.0 to 323301.0 bp 2 -- 323301.0 to 359164.0 bp
Links
Assemble reads using the SPAdes assembler.
This app completed without errors in 14m 59s.
Objects
Created Object Name Type Description
dsm12538_trim_SPAdes3.15.Assembly Assembly Assembled contigs
Summary
Assembly saved to: nickschneider:narrative_1574347966169/dsm12538_trim_SPAdes3.15.Assembly Assembled into 41 contigs. Avg Length: 65437.51219512195 bp. Contig Length Distribution (# of contigs -- min to max basepairs): 27 -- 508.0 to 36373.6 bp 3 -- 36373.6 to 72239.2 bp 3 -- 72239.2 to 108104.79999999999 bp 2 -- 108104.79999999999 to 143970.4 bp 1 -- 143970.4 to 179836.0 bp 0 -- 179836.0 to 215701.59999999998 bp 0 -- 215701.59999999998 to 251567.19999999998 bp 1 -- 251567.19999999998 to 287432.8 bp 0 -- 287432.8 to 323298.39999999997 bp 4 -- 323298.39999999997 to 359164.0 bp
Links
from biokbase.narrative.jobs.appmanager import AppManager
AppManager().run_app_bulk(
    [{
        "app_id": "kb_uploadmethods/import_fasta_as_assembly_from_staging",
        "tag": "release",
        "version": "31e93066beb421a51b9c8e44b1201aa93aea0b4e",
        "params": [{
            "staging_file_subdir_path": "GCF_000172975.1_ASM17297v1_genomic.fna",
            "assembly_name": "D_peptidovorans_GCF_000172975.1_ASM17297v1_genomic.fna",
            "type": "finished isolate",
            "min_contig_length": 500
        }]
    }],
    cell_id="21283f73-95a9-41a2-a0e1-ea7c83644218",
    run_id="99fdd762-111e-4d25-90eb-71044318c109"
)
v1 - KBaseGenomeAnnotations.Assembly-5.0
The viewer for the data in this Cell is available at the original Narrative here: https://narrative.kbase.us/narrative/107808
v2 - KBaseGenomeAnnotations.Assembly-5.0
The viewer for the data in this Cell is available at the original Narrative here: https://narrative.kbase.us/narrative/107808

Assemble strain F2B genome

Assemble reads using the SPAdes assembler.
This app completed without errors in 8m 45s.
Objects
Created Object Name Type Description
f2b_trim_SPAdes3.15.Assembly Assembly Assembled contigs
Summary
Assembly saved to: nickschneider:narrative_1574347966169/f2b_trim_SPAdes3.15.Assembly Assembled into 43 contigs. Avg Length: 62118.93023255814 bp. Contig Length Distribution (# of contigs -- min to max basepairs): 23 -- 610.0 to 39008.7 bp 8 -- 39008.7 to 77407.4 bp 7 -- 77407.4 to 115806.09999999999 bp 2 -- 115806.09999999999 to 154204.8 bp 0 -- 154204.8 to 192603.5 bp 0 -- 192603.5 to 231002.19999999998 bp 0 -- 231002.19999999998 to 269400.89999999997 bp 2 -- 269400.89999999997 to 307799.6 bp 0 -- 307799.6 to 346198.3 bp 1 -- 346198.3 to 384597.0 bp
Links

Annotate F2B assembly

Annotate a bacterial or archaeal assembly using RASTtk (Rapid Annotations using Subsystems Technology toolkit).
This app completed without errors in 8m 6s.
Objects
Created Object Name Type Description
F2B_trim_genome Genome Annotated genome
Summary
The RAST algorithm was applied to annotating a genome sequence comprised of 43 contigs containing 2671114 nucleotides. 
No initial gene calls were provided.
Standard features were called using: glimmer3; prodigal.
A scan was conducted for the following additional feature types: rRNA; tRNA; selenoproteins; pyrrolysoproteins; repeat regions; crispr.
The genome features were functionally annotated using the following algorithm(s): Kmers V2; Kmers V1; protein similarity.
In addition to the remaining original 0 coding features and 0 non-coding features, 2727 new features were called, of which 123 are non-coding.
Output genome has the following feature types:
	Coding gene                     2604 
	Non-coding crispr_array            1 
	Non-coding crispr_repeat           7 
	Non-coding crispr_spacer           6 
	Non-coding prophage                2 
	Non-coding repeat                 52 
	Non-coding rna                    55 
Overall, the genes have 1689 distinct functions. 
The genes include 1410 genes with a SEED annotation ontology across 921 distinct SEED functions.
The number of distinct functions can exceed the number of genes because some genes have multiple functions.
Output from Annotate Microbial Assembly with RASTtk - v1.073
The viewer for the output created by this App is available at the original Narrative here: https://narrative.kbase.us/narrative/107808
Annotate Assembly and Re-annotate Genomes with Prokka annotation pipeline.
This app completed without errors in 4m 32s.
Objects
Created Object Name Type Description
f2b_trim_prokka Genome Annotated Genome
Summary
Annotated Genome saved to: nickschneider:narrative_1574347966169/f2b_trim_prokka Number of genes predicted: 2603 Number of protein coding genes: 2518 Number of genes with non-hypothetical function: 1579 Number of genes with EC-number: 677 Number of genes with Seed Subsystem Ontology: 0 Average protein length: 327 aa.
Output from Annotate Assembly and Re-annotate Genomes with Prokka - v1.14.5
The viewer for the output created by this App is available at the original Narrative here: https://narrative.kbase.us/narrative/107808
Map short reads to a reference sequence with SAMtools
This app completed without errors in 16m 1s.
Summary
Indexing contigs. Mapping reads to contigs. Getting bam stats. 4867692 + 0 in total (QC-passed reads + QC-failed reads) 0 + 0 secondary 1197232 + 0 supplementary 0 + 0 duplicates 4846089 + 0 mapped (99.56% : N/A) 0 + 0 paired in sequencing 0 + 0 read1 0 + 0 read2 0 + 0 properly paired (N/A : N/A) 0 + 0 with itself and mate mapped 0 + 0 singletons (N/A : N/A) 0 + 0 with mate mapped to a different chr 0 + 0 with mate mapped to a different chr (mapQ>=5)
Files
These are only available in the live Narrative: https://narrative.kbase.us/narrative/107808
  • mapped_reads.bam - BAM file

Annotate other Dethiosulfovibrio genome assemblies

Annotate bacterial or archaeal assemblies and/or assembly sets using RASTtk (Rapid Annotations using Subsystems Technology toolkit).
This app completed without errors in 50m 7s.
Objects
Created Object Name Type Description
dsm12538_trim_SPAdes3.15.Assembly.RAST Genome Annotated genome
dsm12537_trim_SPAdes3.15.Assembly.RAST Genome Annotated genome
dsm12590_trim_SPAdes3.15.Assembly.RAST Genome Annotated genome
f2b_trim_SPAdes3.15.Assembly.RAST Genome Annotated genome
D_salsuginis_genome_assembly.RAST Genome Annotated genome
D_peptidovorans_GCF_000172975.1_ASM17297v1_genomic.fna.RAST Genome Annotated genome
dethiosulfo_all_RAST GenomeSet Genome Set
Summary
The RAST algorithm was applied to annotating a genome sequence comprised of 41 contigs containing 2682938 nucleotides. 
No initial gene calls were provided.
Standard features were called using: glimmer3; prodigal.
A scan was conducted for the following additional feature types: rRNA; tRNA; selenoproteins; pyrrolysoproteins; repeat regions; crispr.
The genome features were functionally annotated using the following algorithm(s): Kmers V2; Kmers V1; protein similarity.
In addition to the remaining original 0 coding features and 0 non-coding features, 2764 new features were called, of which 171 are non-coding.
Output genome has the following feature types:
	Coding gene                     2593 
	Non-coding crispr_array            1 
	Non-coding crispr_repeat          28 
	Non-coding crispr_spacer          27 
	Non-coding repeat                 58 
	Non-coding rna                    57 
Overall, the genes have 1695 distinct functions. 
The genes include 1416 genes with a SEED annotation ontology across 925 distinct SEED functions.
The number of distinct functions can exceed the number of genes because some genes have multiple functions.
dsm12538_trim_SPAdes3.15.Assembly succeeded!

The RAST algorithm was applied to annotating a genome sequence comprised of 41 contigs containing 2680460 nucleotides. 
No initial gene calls were provided.
Standard features were called using: glimmer3; prodigal.
A scan was conducted for the following additional feature types: rRNA; tRNA; selenoproteins; pyrrolysoproteins; repeat regions; crispr.
The genome features were functionally annotated using the following algorithm(s): Kmers V2; Kmers V1; protein similarity.
In addition to the remaining original 0 coding features and 0 non-coding features, 2756 new features were called, of which 171 are non-coding.
Output genome has the following feature types:
	Coding gene                     2585 
	Non-coding crispr_array            1 
	Non-coding crispr_repeat          28 
	Non-coding crispr_spacer          27 
	Non-coding repeat                 57 
	Non-coding rna                    58 
Overall, the genes have 1695 distinct functions. 
The genes include 1414 genes with a SEED annotation ontology across 925 distinct SEED functions.
The number of distinct functions can exceed the number of genes because some genes have multiple functions.
dsm12537_trim_SPAdes3.15.Assembly succeeded!

The RAST algorithm was applied to annotating a genome sequence comprised of 68 contigs containing 2675060 nucleotides. 
No initial gene calls were provided.
Standard features were called using: glimmer3; prodigal.
A scan was conducted for the following additional feature types: rRNA; tRNA; selenoproteins; pyrrolysoproteins; repeat regions; crispr.
The genome features were functionally annotated using the following algorithm(s): Kmers V2; Kmers V1; protein similarity.
In addition to the remaining original 0 coding features and 0 non-coding features, 2769 new features were called, of which 180 are non-coding.
Output genome has the following feature types:
	Coding gene                     2589 
	Non-coding crispr_array            1 
	Non-coding crispr_repeat          28 
	Non-coding crispr_spacer          27 
	Non-coding repeat                 68 
	Non-coding rna                    56 
Overall, the genes have 1693 distinct functions. 
The genes include 1420 genes with a SEED annotation ontology across 925 distinct SEED functions.
The number of distinct functions can exceed the number of genes because some genes have multiple functions.
dsm12590_trim_SPAdes3.15.Assembly succeeded!

The RAST algorithm was applied to annotating a genome sequence comprised of 43 contigs containing 2671114 nucleotides. 
No initial gene calls were provided.
Standard features were called using: glimmer3; prodigal.
A scan was conducted for the following additional feature types: rRNA; tRNA; selenoproteins; pyrrolysoproteins; repeat regions; crispr.
The genome features were functionally annotated using the following algorithm(s): Kmers V2; Kmers V1; protein similarity.
In addition to the remaining original 0 coding features and 0 non-coding features, 2727 new features were called, of which 124 are non-coding.
Output genome has the following feature types:
	Coding gene                     2603 
	Non-coding crispr_array            1 
	Non-coding crispr_repeat           7 
	Non-coding crispr_spacer           6 
	Non-coding prophage                2 
	Non-coding repeat                 52 
	Non-coding rna                    56 
Overall, the genes have 1689 distinct functions. 
The genes include 1409 genes with a SEED annotation ontology across 921 distinct SEED functions.
The number of distinct functions can exceed the number of genes because some genes have multiple functions.
f2b_trim_SPAdes3.15.Assembly succeeded!

The RAST algorithm was applied to annotating a genome sequence comprised of 68 contigs containing 2684322 nucleotides. 
No initial gene calls were provided.
Standard features were called using: glimmer3; prodigal.
A scan was conducted for the following additional feature types: rRNA; tRNA; selenoproteins; pyrrolysoproteins; repeat regions; crispr.
The genome features were functionally annotated using the following algorithm(s): Kmers V2; Kmers V1; protein similarity.
In addition to the remaining original 0 coding features and 0 non-coding features, 2962 new features were called, of which 296 are non-coding.
Output genome has the following feature types:
	Coding gene                     2666 
	Non-coding crispr_array            1 
	Non-coding crispr_repeat          15 
	Non-coding crispr_spacer          14 
	Non-coding repeat                214 
	Non-coding rna                    52 
Overall, the genes have 1563 distinct functions. 
The genes include 1243 genes with a SEED annotation ontology across 844 distinct SEED functions.
The number of distinct functions can exceed the number of genes because some genes have multiple functions.
D_salsuginis_genome_assembly succeeded!

The RAST algorithm was applied to annotating a genome sequence comprised of 3 contigs containing 2576359 nucleotides. 
No initial gene calls were provided.
Standard features were called using: glimmer3; prodigal.
A scan was conducted for the following additional feature types: rRNA; tRNA; selenoproteins; pyrrolysoproteins; repeat regions; crispr.
The genome features were functionally annotated using the following algorithm(s): Kmers V2; Kmers V1; protein similarity.
In addition to the remaining original 0 coding features and 0 non-coding features, 2824 new features were called, of which 288 are non-coding.
Output genome has the following feature types:
	Coding gene                     2536 
	Non-coding crispr_array            2 
	Non-coding crispr_repeat          68 
	Non-coding crispr_spacer          66 
	Non-coding prophage                2 
	Non-coding repeat                 81 
	Non-coding rna                    69 
Overall, the genes have 1667 distinct functions. 
The genes include 1461 genes with a SEED annotation ontology across 914 distinct SEED functions.
The number of distinct functions can exceed the number of genes because some genes have multiple functions.
D_peptidovorans_GCF_000172975.1_ASM17297v1_genomic.fna succeeded!

Files
These are only available in the live Narrative: https://narrative.kbase.us/narrative/107808
  • annotation_report.dethiosulfo_all_RAST - Microbial Annotation Report

Create genome tree

Add a user-provided GenomeSet to a KBase SpeciesTree.
This app completed without errors in 5m 43s.
Files
These are only available in the live Narrative: https://narrative.kbase.us/narrative/107808
  • all_dethio_treev2.newick
  • all_dethio_treev2-labels.newick
  • all_dethio_treev2.png
  • all_dethio_treev2.pdf

Taxonomic assignment and genome relatedness of F2B

Obtain objective taxonomic assignments for bacterial and archaeal genomes based on the Genome Taxonomy Database (GTDB) ver R06-RS202
This app completed without errors in 1h 7m 56s.
Objects
Created Object Name Type Description
f2b_trim_SPAdes3.15.Assembly.RAST Genome Taxonomy and taxon_assignment updated with GTDB
Links
Allows users to compute fast whole-genome Average Nucleotide Identity (ANI) estimation.
This app completed without errors in 4m 37s.
Links

Pangenome analysis of Dethiosulfovibrio

v1 - KBaseSearch.GenomeSet-2.1
The viewer for the data in this Cell is available at the original Narrative here: https://narrative.kbase.us/narrative/107808
Allows users to compute a pangenome from a set of individual genomes.
This app completed without errors in 2m 28s.
Objects
Created Object Name Type Description
F2B_trim_pangenome_output Pangenome Pangenome
Summary
Pangenome saved to cmarshall:narrative_1643148751808/F2B_trim_pangenome_output
Create a Pangenome object by performing OrthoMCL orthologous groups construction on a set of Genomes.
This app completed without errors in 26m 3s.
Objects
Created Object Name Type Description
orthoMCL_trim_pangenome Pangenome Pangenome object
Summary
Input genomes: 6 Output orthologs: 3816
Annotate domains in every Genome within a GenomeSet using protein domains from widely used domain libraries.
This app completed without errors in 5h 2m 25s.
Summary
Search Domains output: Getting DomainModelSet from storage. Getting Genome from storage. Running domain search against library 2959/1/7 Running domain search against library 2959/6/6 Running domain search against library 2959/7/6 Running domain search against library 2959/4/6 Running domain search against library 2959/5/7 Search Domains output: Getting DomainModelSet from storage. Getting Genome from storage. Running domain search against library 2959/1/7 Running domain search against library 2959/6/6 Running domain search against library 2959/7/6 Running domain search against library 2959/4/6 Running domain search against library 2959/5/7 Search Domains output: Getting DomainModelSet from storage. Getting Genome from storage. Running domain search against library 2959/1/7 Running domain search against library 2959/6/6 Running domain search against library 2959/7/6 Running domain search against library 2959/4/6 Running domain search against library 2959/5/7 Search Domains output: Getting DomainModelSet from storage. Getting Genome from storage. Running domain search against library 2959/1/7 Running domain search against library 2959/6/6 Running domain search against library 2959/7/6 Running domain search against library 2959/4/6 Running domain search against library 2959/5/7 Search Domains output: Getting DomainModelSet from storage. Getting Genome from storage. Running domain search against library 2959/1/7 Running domain search against library 2959/6/6 Running domain search against library 2959/7/6 Running domain search against library 2959/4/6 Running domain search against library 2959/5/7 Search Domains output: Getting DomainModelSet from storage. Getting Genome from storage. Running domain search against library 2959/1/7 Running domain search against library 2959/6/6 Running domain search against library 2959/7/6 Running domain search against library 2959/4/6 Running domain search against library 2959/5/7
Examine the general functional distribution or specific functional gene families for a GenomeSet.
This app completed without errors in 1m 4s.
Annotate your genome(s) with DRAM. Annotations will then be distilled to create an interactive functional summary per genome.
This app completed without errors in 28m 48s.
Summary
Here are the results from your DRAM run.
Links
Files
These are only available in the live Narrative: https://narrative.kbase.us/narrative/107808
  • annotations.tsv - DRAM annotations in a tab separate table format
  • genes.faa - Genes as amino acids predicted by DRAM with brief annotations
  • product.tsv - DRAM product in tabular format
  • metabolism_summary.xlsx - DRAM metabolism summary tables
  • genome_stats.tsv - DRAM genome statistics table

Apps

  1. Annotate and Distill Genomes with DRAM
    • DRAM source code
    • DRAM documentation
    • DRAM publication
  2. Annotate Assembly and Re-annotate Genomes with Prokka - v1.14.5
    • Seemann T. Prokka: rapid prokaryotic genome annotation. Bioinformatics. 2014;30: 2068 2069. doi:10.1093/bioinformatics/btu153
  3. Annotate Domains in a GenomeSet
    • Altschul SF, Madden TL, Sch ffer AA, Zhang J, Zhang Z, Miller W, et al. Gapped BLAST and PSI-BLAST: a new generation of protein database search programs. Nucleic Acids Res. 1997;25: 3389 3402. doi:10.1093/nar/25.17.3389
    • Camacho C, Coulouris G, Avagyan V, Ma N, Papadopoulos J, Bealer K, et al. BLAST+: architecture and applications. BMC Bioinformatics. 2009;10: 421. doi:10.1186/1471-2105-10-421
    • Eddy SR. Accelerated Profile HMM Searches. PLOS Computational Biology. 2011;7: e1002195. doi:10.1371/journal.pcbi.1002195
    • Finn RD, Coggill P, Eberhardt RY, Eddy SR, Mistry J, Mitchell AL, et al. The Pfam protein families database: towards a more sustainable future. Nucleic Acids Res. 2016;44: D279 D285. doi:10.1093/nar/gkv1344
    • Haft DH, Selengut JD, Richter RA, Harkins D, Basu MK, Beck E. TIGRFAMs and Genome Properties in 2013. Nucleic Acids Res. 2013;41: D387 D395. doi:10.1093/nar/gks1234
    • Letunic I, Bork P. 20 years of the SMART protein domain annotation resource. Nucleic Acids Res. 2018;46: D493 D496. doi:10.1093/nar/gkx922
    • Letunic I, Doerks T, Bork P. SMART: recent updates, new developments and status in 2015. Nucleic Acids Res. 2015;43: D257-260. doi:10.1093/nar/gku949
    • Marchler-Bauer A, Bo Y, Han L, He J, Lanczycki CJ, Lu S, et al. CDD/SPARCLE: functional classification of proteins via subfamily domain architectures. Nucleic Acids Res. 2017;45: D200 D203. doi:10.1093/nar/gkw1129
    • Selengut JD, Haft DH, Davidsen T, Ganapathy A, Gwinn-Giglio M, Nelson WC, et al. TIGRFAMs and Genome Properties: tools for the assignment of molecular function and biological process in prokaryotic genomes. Nucleic Acids Res. 2007;35: D260-264. doi:10.1093/nar/gkl1043
    • Tatusov RL, Koonin EV, Lipman DJ. A Genomic Perspective on Protein Families. Science. 1997;278: 631 637. doi:10.1126/science.278.5338.631
  4. Annotate Microbial Assembly with RASTtk - v1.073
    • [1] Aziz RK, Bartels D, Best AA, DeJongh M, Disz T, Edwards RA, et al. The RAST Server: Rapid Annotations using Subsystems Technology. BMC Genomics. 2008;9: 75. doi:10.1186/1471-2164-9-75
    • [2] Overbeek R, Olson R, Pusch GD, Olsen GJ, Davis JJ, Disz T, et al.vThe SEED and the Rapid Annotation of microbial genomes using Subsystems Technology (RAST). Nucleic Acids Res. 2014;42: D206 D214. doi:10.1093/nar/gkt1226
    • [3] Brettin T, Davis JJ, Disz T, Edwards RA, Gerdes S, Olsen GJ, et al. RASTtk: A modular and extensible implementation of the RAST algorithm for building custom annotation pipelines and annotating batches of genomes. Sci Rep. 2015;5. doi:10.1038/srep08365
    • [4] Kent WJ. BLAT The BLAST-Like Alignment Tool. Genome Res. 2002;12: 656 664. doi:10.1101/gr.229202
    • [5] Altschul SF, Madden TL, Schaffer AA, Zhang J, Zhang Z, Miller W, Lipman DJ. Gapped BLAST and PSI-BLAST: a new generation of protein database search programs. Nucleic Acids Res. 1997;25: 3389-3402. doi:10.1093/nar/25.17.3389
    • [6] Lowe TM, Eddy SR. tRNAscan-SE: a program for improved detection of transfer RNA genes in genomic sequence. Nucleic Acids Res. 1997;25: 955 964.
    • [7] Cobucci-Ponzano B, Rossi M, Moracci M. Translational recoding in archaea. Extremophiles. 2012;16: 793 803. doi:10.1007/s00792-012-0482-8
    • [8] Meyer F, Overbeek R, Rodriguez A. FIGfams: yet another set of protein families. Nucleic Acids Res. 2009;37 6643-54. doi:10.1093/nar/gkp698.
    • [9] van Belkum A, Sluijuter M, de Groot R, Verbrugh H, Hermans PW. Novel BOX repeat PCR assay for high-resolution typing of Streptococcus pneumoniae strains. J Clin Microbiol. 1996;34: 1176 1179.
    • [10] Croucher NJ, Vernikos GS, Parkhill J, Bentley SD. Identification, variation and transcription of pneumococcal repeat sequences. BMC Genomics. 2011;12: 120. doi:10.1186/1471-2164-12-120
    • [11] Hyatt D, Chen G-L, Locascio PF, Land ML, Larimer FW, Hauser LJ. Prodigal: prokaryotic gene recognition and translation initiation site identification. BMC Bioinformatics. 2010;11: 119. doi:10.1186/1471-2105-11-119
    • [12] Delcher AL, Bratke KA, Powers EC, Salzberg SL. Identifying bacterial genes and endosymbiont DNA with Glimmer. Bioinformatics. 2007;23: 673 679. doi:10.1093/bioinformatics/btm009
    • [13] Akhter S, Aziz RK, Edwards RA. PhiSpy: a novel algorithm for finding prophages in bacterial genomes that combines similarity- and composition-based strategies. Nucleic Acids Res. 2012;40: e126. doi:10.1093/nar/gks406
  5. Annotate Multiple Microbial Assemblies with RASTtk - v1.073
    • [1] Aziz RK, Bartels D, Best AA, DeJongh M, Disz T, Edwards RA, et al. The RAST Server: Rapid Annotations using Subsystems Technology. BMC Genomics. 2008;9: 75. doi:10.1186/1471-2164-9-75
    • [2] Overbeek R, Olson R, Pusch GD, Olsen GJ, Davis JJ, Disz T, et al. The SEED and the Rapid Annotation of microbial genomes using Subsystems Technology (RAST). Nucleic Acids Res. 2014;42: D206 D214. doi:10.1093/nar/gkt1226
    • [3] Brettin T, Davis JJ, Disz T, Edwards RA, Gerdes S, Olsen GJ, et al. RASTtk: A modular and extensible implementation of the RAST algorithm for building custom annotation pipelines and annotating batches of genomes. Sci Rep. 2015;5. doi:10.1038/srep08365
    • [4] Kent WJ. BLAT The BLAST-Like Alignment Tool. Genome Res. 2002;12: 656 664. doi:10.1101/gr.229202
    • [5] Altschul SF, Madden TL, Schaffer AA, Zhang J, Zhang Z, Miller W, Lipman DJ. Gapped BLAST and PSI-BLAST: a new generation of protein database search programs. Nucleic Acids Res. 1997;25: 3389-3402. doi:10.1093/nar/25.17.3389
    • [6] Lowe TM, Eddy SR. tRNAscan-SE: a program for improved detection of transfer RNA genes in genomic sequence. Nucleic Acids Res. 1997;25: 955 964.
    • [7] Cobucci-Ponzano B, Rossi M, Moracci M. Translational recoding in archaea. Extremophiles. 2012;16: 793 803. doi:10.1007/s00792-012-0482-8
    • [8] Meyer F, Overbeek R, Rodriguez A. FIGfams: yet another set of protein families. Nucleic Acids Res. 2009;37 6643-54. doi:10.1093/nar/gkp698.
    • [9] van Belkum A, Sluijuter M, de Groot R, Verbrugh H, Hermans PW. Novel BOX repeat PCR assay for high-resolution typing of Streptococcus pneumoniae strains. J Clin Microbiol. 1996;34: 1176 1179.
    • [10] Croucher NJ, Vernikos GS, Parkhill J, Bentley SD. Identification, variation and transcription of pneumococcal repeat sequences. BMC Genomics. 2011;12: 120. doi:10.1186/1471-2164-12-120
    • [11] Hyatt D, Chen G-L, Locascio PF, Land ML, Larimer FW, Hauser LJ. Prodigal: prokaryotic gene recognition and translation initiation site identification. BMC Bioinformatics. 2010;11: 119. doi:10.1186/1471-2105-11-119
    • [12] Delcher AL, Bratke KA, Powers EC, Salzberg SL. Identifying bacterial genes and endosymbiont DNA with Glimmer. Bioinformatics. 2007;23: 673 679. doi:10.1093/bioinformatics/btm009
    • [13] Akhter S, Aziz RK, Edwards RA. PhiSpy: a novel algorithm for finding prophages in bacterial genomes that combines similarity- and composition-based strategies. Nucleic Acids Res. 2012;40: e126. doi:10.1093/nar/gks406
  6. Assemble Reads with SPAdes - v3.15.3
    • Bankevich A, Nurk S, Antipov D, Gurevich AA, Dvorkin M, Kulikov AS, et al. SPAdes: A New Genome Assembly Algorithm and Its Applications to Single-Cell Sequencing. Journal of Computational Biology. 2012;19: 455-477. doi: 10.1089/cmb.2012.0021
    • Prjibelski A, Antipov D, Meleshko D, Lapidus A, Korobeynikov A. Using SPAdes De Novo Assembler. Curr Protoc Bioinformatics. 2020 Jun;70(1):e102. doi: 10.1002/cpbi.102.
  7. Build Pangenome with OrthoMCL - v2.0
    • Li L, Stoeckert CJ, Roos DS. OrthoMCL: Identification of Ortholog Groups for Eukaryotic Genomes. Genome Res. 2003;13: 2178 2189. doi:10.1101/gr.1224503
  8. Classify Microbes with GTDB-Tk - v1.7.0
    • Pierre-Alain Chaumeil, Aaron J Mussig, Philip Hugenholtz, Donovan H Parks, GTDB-Tk: a toolkit to classify genomes with the Genome Taxonomy Database, Bioinformatics, Volume 36, Issue 6, 15 March 2020, Pages 1925 1927. DOI: https://doi.org/10.1093/bioinformatics/btz848
    • Parks, D., Chuvochina, M., Waite, D. et al. A standardized bacterial taxonomy based on genome phylogeny substantially revises the tree of life. Nat Biotechnol 36, 996 1004 (2018). DOI: https://doi.org/10.1038/nbt.4229
    • Parks DH, Chuvochina M, Chaumeil PA, Rinke C, Mussig AJ, Hugenholtz P. A complete domain-to-species taxonomy for Bacteria and Archaea. Nat Biotechnol. 2020;10.1038/s41587-020-0501-8. DOI:10.1038/s41587-020-0501-8
    • Rinke C, Chuvochina M, Mussig AJ, Chaumeil PA, Dav n AA, Waite DW, Whitman WB, Parks DH, and Hugenholtz P. A standardized archaeal taxonomy for the Genome Taxonomy Database. Nat Microbiol. 2021 Jul;6(7):946-959. DOI:10.1038/s41564-021-00918-8
    • Matsen FA, Kodner RB, Armbrust EV. pplacer: linear time maximum-likelihood and Bayesian phylogenetic placement of sequences onto a fixed reference tree. BMC Bioinformatics. 2010;11:538. Published 2010 Oct 30. doi:10.1186/1471-2105-11-538
    • Jain C, Rodriguez-R LM, Phillippy AM, Konstantinidis KT, Aluru S. High throughput ANI analysis of 90K prokaryotic genomes reveals clear species boundaries. Nat Commun. 2018;9(1):5114. Published 2018 Nov 30. DOI:10.1038/s41467-018-07641-9
    • Hyatt D, Chen GL, Locascio PF, Land ML, Larimer FW, Hauser LJ. Prodigal: prokaryotic gene recognition and translation initiation site identification. BMC Bioinformatics. 2010;11:119. Published 2010 Mar 8. DOI:10.1186/1471-2105-11-119
    • Price MN, Dehal PS, Arkin AP. FastTree 2--approximately maximum-likelihood trees for large alignments. PLoS One. 2010;5(3):e9490. Published 2010 Mar 10. DOI:10.1371/journal.pone.0009490 link: https://www.ncbi.nlm.nih.gov/pmc/articles/PMC2835736/
    • Eddy SR. Accelerated Profile HMM Searches. PLoS Comput Biol. 2011;7(10):e1002195. DOI:10.1371/journal.pcbi.1002195
  9. Compute ANI with FastANI
    • [1] Jain C, Rodriguez-R LM, Phillippy AM, Konstantinidis KT, Aluru S. High-throughput ANI Analysis of 90K Prokaryotic Genomes Reveals Clear Species Boundaries. 2017; doi:10.1101/225342
    • [2] Goris J, Konstantinidis KT, Klappenbach JA, Coenye T, Vandamme P, Tiedje JM. DNA-DNA hybridization values and their relationship to whole-genome sequence similarities. Int J Syst Evol Microbiol. 2007;57: 81 91. doi:10.1099/ijs.0.64483-0
    • FastANI module and source code:
  10. Compute Pangenome
    • Arkin AP, Cottingham RW, Henry CS, Harris NL, Stevens RL, Maslov S, et al. KBase: The United States Department of Energy Systems Biology Knowledgebase. Nature Biotechnology. 2018;36: 566. doi: 10.1038/nbt.4163
  11. Import FASTQ/SRA File as Reads from Staging Area
    no citations
  12. Import GenBank File as Genome from Staging Area
    no citations
  13. Insert Set of Genomes Into SpeciesTree - v2.2.0
    • Price MN, Dehal PS, Arkin AP. FastTree 2 Approximately Maximum-Likelihood Trees for Large Alignments. PLoS One. 2010;5. doi:10.1371/journal.pone.0009490
  14. Map Reads to a Reference Sequence
    • [1] Li H, Durbin R. Fast and accurate short read alignment with Burrows Wheeler transform. Bioinformatics. 2017. 25(14):1754 1760, doi:10.1093/bioinformatics/btp324
  15. Trim Reads with Trimmomatic - v0.36
    • Bolger AM, Lohse M, Usadel B. Trimmomatic: a flexible trimmer for Illumina sequence data. Bioinformatics. 2014;30: 2114 2120. doi:10.1093/bioinformatics/btu170
  16. View Function Profile for Genomes - v1.4.0
    • Arkin AP, Cottingham RW, Henry CS, Harris NL, Stevens RL, Maslov S, et al. KBase: The United States Department of Energy Systems Biology Knowledgebase. Nature Biotechnology. 2018;36: 566. doi: 10.1038/nbt.4163