Generated December 6, 2023

Complete Genome Sequence of Acidovorax temperans strain LMJ from a contaminated Chlamydomonas reinhardtii culture plate.

Mautusi Mitra1 ORCID and Ana Stanescu2 ORCID

1University of West Georgia, Department of Natural Sciences; 2University of West Georgia, Department of Computing and Mathematics, 1601 Maple Street, Carrollton, GA 30118, USA.

Abtsract

We have isolated a new strain of Acidovorax temperans strain LMJ (hereafter called strain LMJ) from a contaminated Tris-Acetate-Phosphate (TAP) medium plate of a green micro-alga Chlamydomonas reinhardtii strain LMJ.SG0182 (a Chlamydomonas Library project (CLiP) strain). We sequenced the whole genome of the strain LMJ using the PacBio Sequel II technology and have submitted it to NCBI along with the SRA and PacBio methylation motif data. We present the whole genome sequence of this strain that offer insights into its coding and non-coding genes and its nearest taxonomic neighbors.

Key words

Acidovorax, pyomelanin, xenobiotics-degrader, heavy metal-tolerant

Introduction

A novel Acidovorax temperans strain, designated as LMJ and subsequently referred to by this name, was isolated from a contaminated Tris Acetate Phosphate (TAP) medium culture plate of Chlamydomonas reinhardtii, at the University of West Georgia (Geolocation data: Carrollton, Georgia ; 1,102 ft (336 m); 33.5730 N 85.1037 W (1). Strain LMJ has one cirucular chromosome and two circular plasmids pEP1 and pJG1). We report the complete genome sequence of LMJ and offer insights into its genomic characteristics.

External Data Availability

  • The whole-genome sequence along with the PacBio DNA methylation motifs have been deposited in the GenBank under the accession number GCA_028596105.1.
  • The raw sequence reads have been deposited in the SRA under the accession number SRR23501622.
  • Linked publication: (1)

Table of Contents

  1. Background and Experimental Methods
  2. QC and Assembly
  3. Import and Annotation
  4. Taxonomic Classification
  5. References

Background and Experimental Methods

Sample Collection

Acidovorax temperans strain LMJ was isolated from a contaminated Tris Acetate Phosphate (TAP) medium culture plate of the green micro-alga Chlamydomonas reinhardtii strain LMJ.SG0182, a Chlamydomonas Library project (CLiP) strain.

Isolation

Genomic DNA was isolated from the Lysogeny Broth (LB) medium-grown Acidovorax temperans strain LMJ (colony # 10) using the Qiagen’s blood and cell culture DNA mini kit.

Genome Sequencing

After determination of genomic DNA purity and DNA quantification, the DNA sample was shipped to Georgia Genomics and Bioinformatics Core (GGBC) at the University of Georgia (Athens, GA). At GGBC, the sample was processed for preparation of the PacBio Single Molecule Real Time (SMRT) bell sequencing library according to the protocol given in the PacBio technical manual for template preparation and sequencing (please see QC section for more details). The SMRT bell sequencing library was barcoded and sequenced with two additional barcoded microbial SMRT Bell sequencing libraries in a single SMRT cell using PacBio SMRT Continuous Long Read sequencing on the PacBio Sequel II instrument. SMRT Link version 9 was used as an interface to manage the workflow from sample setup to result analysis.

QC and Assembly

QC

  1. Quantitative and qualitative QC assessment was performed on the DNA sample at GGBC using Qubit, Nanodrop, and Fragment Analyzer.
  2. DNA was sheared using a Covaris®g-TUBE®. After shearing, the approximate size range of the fragments was determined with a Bioanalyzer® 12000 chip and the quantification of DNA was performed on a Nanodrop system.
  3. Purification and concentration of 12 kb fragment sizes was performed using use 0.45X AMPure PB beads.
  4. DNA damages in the sheared DNA were repaired with DNA Damage Repair reagents provided by Pacific Biosciences and the PacBio Template Prep Kit was used to repair the ends of fragmented DNA. Following end repairs, DNA was purified with 0.45X AMPure PB beads.
  5. BLUNT hairpin adapters were ligated to the DNA fragments followed by exonuclease (ExoII and ExoVII) treatments to remove failed ligation products followed by size selection and purification using three distinct and consecutive 0.45X AMPure PB bead purification steps at room temperature to adequately remove enzymes (exonucleases, ligases, etc.) and ligation products smaller than 0.4 kb (e.g., adapter dimers).
  6. SMRTbell™Library Quality assessment was performed using a Bioanalyzer® 12000 chip for sizing and was quantified via fluorescence using a Qubit® High Sensitivity kit.
  7. Sequencing primer v4 was bound to the SMRTbell template. DNA sequencing polymerases were bound to the primer-annealed SMRTbell templates using the Sequel® II Binding Kit 2.0. AMPure® PB Purification of Polymerase Bound SMRTbell® Complexes was performed.
  8. A dilution of the DNA Internal Control Complex (these controls are SMRTbell templates already bound with the polymerase, available from Pacific Biosciences) that had 30X DNA Internal Control Complex was added to the SMRT Bell template for independent determination of any problems that might have occurred during binding and the sequencing run.
  9. Prior to sequencing, the SMRT Bell template-polymerase complex was loaded using MagBead loading to a 96-well sample plate with concentrations and volumes specified by the Pacific Biosciences Binding Calculator.

Genome Assembly

Reads were first assembled using the SMRT Link version 9 software tools (has inbuilt HGAP version 4:0). The pipeline was run at default with a pre-specified approximately estimated genome size of 5.3Mb (based on available complete genome sizes of various Acidovorax sp. on NCBI). After assembly, the assembly metrics for each sample along with HMM predicted genes were determined by running Quast ver. 5.02. Genome was also assembled with FLYE version 2.9.1 for statistical confidence.

Genome Statistics

The LMJ genome consists of one circularized chromosome and two circularized plasmids comprising a total of 4,730,769 bp with an overall G+C content of 63.18%. Genome coverage was calculated using the formula: Number of Subread Bases (mapped)/Genome Size = 12,367,684,489/4,730,769 = 2614X. Genome coverage (based on PacBio polished assembly in the PacBio assembly report): 2535.7X.

GenBank Topology Size (bp) GC Content
Chromosome CP117193.1 circular 4,510,507 63.314%
Plasmid pEP1 CP117194.1 circular 187,379 60.103%
Plasmid pJG1 CP117195.1 circular 32,883 61.834%
Total: 4,730,769 63.180%

Import and Annotation

  1. The LMJ genome was imported into KBase using the following parameters in the Import GenBank File as Genome from Staging Area application. Chromosome CP117193.1 was imported under the Genome Type as a Finished Isolate with NCBI Tax ID 80878: Acidovorax temperans. The two plasmids, CP117194.1 and CP117195.1, were imported under the Genome Type Plasmid.
  2. The genome was annotated in KBase using the Annotate Microbial Genome application which is based on RASTtk v1.073, with default parameters
  3. The circular genome was visualized using the KBase Circular Genome Visualization Tool with default parameters except for Linear, which was unchecked.
  4. The quality of the genome was assessed using the KBase Assess Genome Quality with CheckM-v1.0.18 application.
Annotate or re-annotate bacterial or archaeal genome using RASTtk (Rapid Annotations using Subsystems Technology toolkit).
This app completed without errors in 2m 53s.
Objects
Created Object Name Type Description
CP117193.1.gb_microbial_genome_annotation_RASTtk_v1.073 Genome RAST annotation
Summary
The RAST algorithm was applied to annotating an existing genome: Acidovorax temperans. 
The sequence for this genome is comprised of 1 contigs containing 4510507 nucleotides. 
The input genome has 4157 existing coding features and 0 existing non-coding features.
NOTE: Older input genomes did not properly separate coding and non-coding features.
Input genome has the following feature types:
	gene                            4157 
The genome features were functionally annotated using the following algorithm(s): Kmers V2; Kmers V1; protein similarity.
In addition to the remaining original 4157 coding features and 0 non-coding features, 0 new features were called, of which 0 are non-coding.
Output genome has the following feature types:
	Coding gene                     4157 
Overall, the genes have 0 distinct functions. 
The genes include 0 genes with a SEED annotation ontology across 0 distinct SEED functions.
The number of distinct functions can exceed the number of genes because some genes have multiple functions.
Annotate or re-annotate bacterial or archaeal genome using RASTtk (Rapid Annotations using Subsystems Technology toolkit).
This app completed without errors in 1m 9s.
Objects
Created Object Name Type Description
CP117194.1.gb_microbial_genome_annotation_RASTtk_v1.073_plasmid_pEP1 Genome RAST annotation
Summary
Some RAST tools will not run unless the taxonomic domain is Archaea, Bacteria, or Virus. 
These tools include: call selenoproteins, call pyrroysoproteins, call crisprs, and call prophage phispy features.
You may not get the results you were expecting with your current domain of Unknown.
The RAST algorithm was applied to annotating an existing genome: Acidovorax temperans. 
The sequence for this genome is comprised of 1 contigs containing 187379 nucleotides. 
The input genome has 204 existing coding features and 0 existing non-coding features.
NOTE: Older input genomes did not properly separate coding and non-coding features.
Input genome has the following feature types:
	gene                             204 
The genome features were functionally annotated using the following algorithm(s): Kmers V2; Kmers V1; protein similarity.
In addition to the remaining original 204 coding features and 0 non-coding features, 0 new features were called, of which 0 are non-coding.
Output genome has the following feature types:
	Coding gene                      204 
Overall, the genes have 0 distinct functions. 
The genes include 0 genes with a SEED annotation ontology across 0 distinct SEED functions.
The number of distinct functions can exceed the number of genes because some genes have multiple functions.
Annotate or re-annotate bacterial or archaeal genome using RASTtk (Rapid Annotations using Subsystems Technology toolkit).
This app completed without errors in 3m 34s.
Objects
Created Object Name Type Description
CP117195.1.gb_microbial_genome_annotation_RASTtk_v1.073_plasmid_pJG1 Genome RAST annotation
Summary
Some RAST tools will not run unless the taxonomic domain is Archaea, Bacteria, or Virus. 
These tools include: call selenoproteins, call pyrroysoproteins, call crisprs, and call prophage phispy features.
You may not get the results you were expecting with your current domain of Unknown.
The RAST algorithm was applied to annotating an existing genome: Acidovorax temperans. 
The sequence for this genome is comprised of 1 contigs containing 32883 nucleotides. 
The input genome has 35 existing coding features and 1 existing non-coding features.
Input genome has the following feature types:
	Non-coding misc_feature            1 
	gene                              35 
The genome features were functionally annotated using the following algorithm(s): Kmers V2; Kmers V1; protein similarity.
In addition to the remaining original 35 coding features and 1 non-coding features, 0 new features were called, of which 0 are non-coding.
Output genome has the following feature types:
	Coding gene                       35 
	Non-coding misc_feature            1 
Overall, the genes have 0 distinct functions. 
The genes include 0 genes with a SEED annotation ontology across 0 distinct SEED functions.
The number of distinct functions can exceed the number of genes because some genes have multiple functions.
Generate a map and annotations of circular genomes using CGView.
This app completed without errors in 3m 0s.
Files
These are only available in the live Narrative: https://narrative.kbase.us/narrative/161632
  • KBase_derived_CP117193.1.gb_genome.png
  • KBase_derived_CP117193.1.gb_genome.jpg
  • KBase_derived_CP117193.1.gb_genome.svg
Generate a map and annotations of circular genomes using CGView.
This app completed without errors in 1m 5s.
Files
These are only available in the live Narrative: https://narrative.kbase.us/narrative/161632
  • KBase_derived_CP117194.1.gb_genome.png
  • KBase_derived_CP117194.1.gb_genome.jpg
  • KBase_derived_CP117194.1.gb_genome.svg
Generate a map and annotations of circular genomes using CGView.
This app completed without errors in 1m 36s.
Files
These are only available in the live Narrative: https://narrative.kbase.us/narrative/161632
  • KBase_derived_CP117195.1.gb_genome.png
  • KBase_derived_CP117195.1.gb_genome.jpg
  • KBase_derived_CP117195.1.gb_genome.svg
Runs the CheckM lineage workflow to assess the genome quality of isolates, single cells, or genome bins from metagenome assemblies through comparison to an existing database of genomes.
This app completed without errors in 5m 20s.
Links
Files
These are only available in the live Narrative: https://narrative.kbase.us/narrative/161632
  • CheckM_summary_table.tsv.zip - TSV Summary Table from CheckM
  • full_output.zip - Full output of CheckM
  • plots.zip - Output plots from CheckM
Runs the CheckM lineage workflow to assess the genome quality of isolates, single cells, or genome bins from metagenome assemblies through comparison to an existing database of genomes.
This app completed without errors in 1m 41s.
Links
Files
These are only available in the live Narrative: https://narrative.kbase.us/narrative/161632
  • CheckM_summary_table.tsv.zip - TSV Summary Table from CheckM
  • full_output.zip - Full output of CheckM
  • plots.zip - Output plots from CheckM
Runs the CheckM lineage workflow to assess the genome quality of isolates, single cells, or genome bins from metagenome assemblies through comparison to an existing database of genomes.
This app completed without errors in 1m 12s.
Links
Files
These are only available in the live Narrative: https://narrative.kbase.us/narrative/161632
  • CheckM_summary_table.tsv.zip - TSV Summary Table from CheckM
  • full_output.zip - Full output of CheckM
  • plots.zip - Output plots from CheckM

Taxonomic Classification

  1. Taxonomic identification was performed using the KBase Classify Microbes with GTDB-Tk-v2.3.2 application on a GenomeSet generated with the Build GenomeSet-v1.7.6 application.
  2. A phylogenetic tree was constructed using the KBase Insert Genome Into Species Tree-v2.2.0 application with parameters:
    • Neighbor Public Genome Count = 200
  3. Another phylogenetic tree based on the 16S rRNA gene was constructed using the KBase Build Phylogenetic Tree from MSA using FastTree2-v2.1.11 application with parameters:
    • top 114 sequences obtained from ncbi using blastn and aligned using CLUSTAL in MEGA 11 with default parameters
Obtain objective taxonomic assignments for bacterial and archaeal genomes based on the Genome Taxonomy Database (GTDB)
This app completed without errors in 22m 54s.
Objects
Created Object Name Type Description
CP117195.1.gb_genome Genome Taxonomy and taxon_assignment updated with GTDB
CP117194.1.gb_genome Genome Taxonomy and taxon_assignment updated with GTDB
CP117193.1.gb_genome Genome Taxonomy and taxon_assignment updated with GTDB
LMJ.gb_genome.GenomeSet GenomeSet Taxonomy and taxon_assignment updated with GTDB
Links
Files
These are only available in the live Narrative: https://narrative.kbase.us/narrative/161632
  • gtdbtk.backbone.bac120.classify.tree - gtdbtk.backbone.bac120.classify.tree - whole tree GTDB formatted Newick
  • gtdbtk.backbone.bac120.classify-ITOL.tree - gtdbtk.backbone.bac120.classify-ITOL.tree - whole tree ITOL formatted Newick
  • gtdbtk.bac120.classify.tree.1.tree - gtdbtk.bac120.classify.tree.1.tree - whole tree GTDB formatted Newick
  • gtdbtk.bac120.classify.tree.1-ITOL.tree - gtdbtk.bac120.classify.tree.1-ITOL.tree - whole tree ITOL formatted Newick
  • gtdbtk.backbone.bac120.classify-proximals.tree - gtdbtk.backbone.bac120.classify-proximals.tree - Newick
  • gtdbtk.backbone.bac120.classify-trimmed.tree - gtdbtk.backbone.bac120.classify-trimmed.tree - Newick
  • gtdbtk.backbone.bac120.classify-lineages.map - gtdbtk.backbone.bac120.classify-lineages.map - GTDB lineage
  • gtdbtk.backbone.bac120.classify-trimmed.tree-rectangle.PNG - gtdbtk.backbone.bac120.classify-trimmed.tree - Image
  • gtdbtk.backbone.bac120.classify-trimmed.tree-rectangle.PDF - gtdbtk.backbone.bac120.classify-trimmed.tree - Image
  • gtdbtk.backbone.bac120.classify-trimmed.tree-circle.PNG - gtdbtk.backbone.bac120.classify-trimmed.tree - Image
  • gtdbtk.backbone.bac120.classify-trimmed.tree-circle.PDF - gtdbtk.backbone.bac120.classify-trimmed.tree - Image
  • gtdbtk.backbone.bac120.classify-trimmed.tree-circle-ultrametric.PNG - gtdbtk.backbone.bac120.classify-trimmed.tree - Image
  • gtdbtk.backbone.bac120.classify-trimmed.tree-circle-ultrametric.PDF - gtdbtk.backbone.bac120.classify-trimmed.tree - Image
  • gtdbtk.bac120.classify.tree.1-proximals.tree - gtdbtk.bac120.classify.tree.1-proximals.tree - Newick
  • gtdbtk.bac120.classify.tree.1-trimmed.tree - gtdbtk.bac120.classify.tree.1-trimmed.tree - Newick
  • gtdbtk.bac120.classify.tree.1-lineages.map - gtdbtk.bac120.classify.tree.1-lineages.map - GTDB lineage
  • gtdbtk.bac120.classify.tree.1-trimmed.tree-rectangle.PNG - gtdbtk.bac120.classify.tree.1-trimmed.tree - Image
  • gtdbtk.bac120.classify.tree.1-trimmed.tree-rectangle.PDF - gtdbtk.bac120.classify.tree.1-trimmed.tree - Image
  • gtdbtk.bac120.classify.tree.1-trimmed.tree-circle.PNG - gtdbtk.bac120.classify.tree.1-trimmed.tree - Image
  • gtdbtk.bac120.classify.tree.1-trimmed.tree-circle.PDF - gtdbtk.bac120.classify.tree.1-trimmed.tree - Image
  • gtdbtk.bac120.classify.tree.1-trimmed.tree-circle-ultrametric.PNG - gtdbtk.bac120.classify.tree.1-trimmed.tree - Image
  • gtdbtk.bac120.classify.tree.1-trimmed.tree-circle-ultrametric.PDF - gtdbtk.bac120.classify.tree.1-trimmed.tree - Image
  • GTDB-Tk_classify_wf.zip - GTDB-Tk Classify WF output
Add one or more Genomes to a KBase SpeciesTree.
This app completed without errors in 5m 13s.
Files
These are only available in the live Narrative: https://narrative.kbase.us/narrative/161632
  • Tree_Acidovorax_temperans_strain_LMJ_GCF_028596105.1_NeighborPublicGenomeCount200.newick
  • Tree_Acidovorax_temperans_strain_LMJ_GCF_028596105.1_NeighborPublicGenomeCount200-labels.newick
  • Tree_Acidovorax_temperans_strain_LMJ_GCF_028596105.1_NeighborPublicGenomeCount200.png
  • Tree_Acidovorax_temperans_strain_LMJ_GCF_028596105.1_NeighborPublicGenomeCount200.pdf
v1 - KBaseGenomeAnnotations.Assembly-5.1
The viewer for the data in this Cell is available at the original Narrative here: https://narrative.kbase.us/narrative/161632
Allows users to import a file from the staging area into a Narrative as a multiple sequence alignment (MSA) data object.
This app completed without errors in 24s.
Objects
Created Object Name Type Description
MSA_clustal_seq114_score2528_len1500 MSA Imported MSA
Summary
A Multiple Sequence Alignment with 114 sequences and an alignment length of 1684 was produced
Build a phylogenetic reconstruction from a Multiple Sequence Alignment (MSA) using FastTree2.
This app completed without errors in 1m 15s.
Objects
Created Object Name Type Description
Tree_MSA_clustal_seq114_score2528_len1500 Tree Tree_MSA_clustal_seq114_score2528_len1500 Tree
Files
These are only available in the live Narrative: https://narrative.kbase.us/narrative/161632
  • Tree_MSA_clustal_seq114_score2528_len1500.newick
  • Tree_MSA_clustal_seq114_score2528_len1500-labels.newick
  • Tree_MSA_clustal_seq114_score2528_len1500.png
  • Tree_MSA_clustal_seq114_score2528_len1500.pdf

References

  1. Mitra M, Nguyen KMAK, Box TW et al. Isolation and characterization of a novel bacterial strain from a Tris-Acetate-Phosphate agar medium plate of the green micro-alga Chlamydomonas reinhardtii that can utilize common environmental pollutants as a carbon source [version 1; peer review: 3 approved]. F1000Research 2020, 9:656 (https://doi.org/10.12688/f1000research.24680.1)

Apps

  1. Annotate Microbial Genome with RASTtk - v1.073
    • [1] Aziz RK, Bartels D, Best AA, DeJongh M, Disz T, Edwards RA, et al. The RAST Server: Rapid Annotations using Subsystems Technology. BMC Genomics. 2008;9: 75. doi:10.1186/1471-2164-9-75
    • [2] Overbeek R, Olson R, Pusch GD, Olsen GJ, Davis JJ, Disz T, et al. The SEED and the Rapid Annotation of microbial genomes using Subsystems Technology (RAST). Nucleic Acids Res. 2014;42: D206 D214. doi:10.1093/nar/gkt1226
    • [3] Brettin T, Davis JJ, Disz T, Edwards RA, Gerdes S, Olsen GJ, et al. RASTtk: A modular and extensible implementation of the RAST algorithm for building custom annotation pipelines and annotating batches of genomes. Sci Rep. 2015;5. doi:10.1038/srep08365
    • [4] Kent WJ. BLAT The BLAST-Like Alignment Tool. Genome Res. 2002;12: 656 664. doi:10.1101/gr.229202
    • [5] Altschul SF, Madden TL, Schaffer AA, Zhang J, Zhang Z, Miller W, Lipman DJ. Gapped BLAST and PSI-BLAST: a new generation of protein database search programs. Nucleic Acids Res. 1997;25: 3389-3402. doi:10.1093/nar/25.17.3389
    • [6] Lowe TM, Eddy SR. tRNAscan-SE: a program for improved detection of transfer RNA genes in genomic sequence. Nucleic Acids Res. 1997;25: 955 964.
    • [7] Cobucci-Ponzano B, Rossi M, Moracci M. Translational recoding in archaea. Extremophiles. 2012;16: 793 803. doi:10.1007/s00792-012-0482-8
    • [8] Meyer F, Overbeek R, Rodriguez A. FIGfams: yet another set of protein families. Nucleic Acids Res. 2009;37 6643-54. doi:10.1093/nar/gkp698.
    • [9] van Belkum A, Sluijuter M, de Groot R, Verbrugh H, Hermans PW. Novel BOX repeat PCR assay for high-resolution typing of Streptococcus pneumoniae strains. J Clin Microbiol. 1996;34: 1176 1179.
    • [10] Croucher NJ, Vernikos GS, Parkhill J, Bentley SD. Identification, variation and transcription of pneumococcal repeat sequences. BMC Genomics. 2011;12: 120. doi:10.1186/1471-2164-12-120
    • [11] Hyatt D, Chen G-L, Locascio PF, Land ML, Larimer FW, Hauser LJ. Prodigal: prokaryotic gene recognition and translation initiation site identification. BMC Bioinformatics. 2010;11: 119. doi:10.1186/1471-2105-11-119
    • [12] Delcher AL, Bratke KA, Powers EC, Salzberg SL. Identifying bacterial genes and endosymbiont DNA with Glimmer. Bioinformatics. 2007;23: 673 679. doi:10.1093/bioinformatics/btm009
    • [13] Akhter S, Aziz RK, Edwards RA. PhiSpy: a novel algorithm for finding prophages in bacterial genomes that combines similarity- and composition-based strategies. Nucleic Acids Res. 2012;40: e126. doi:10.1093/nar/gks406
  2. Assess Genome Quality with CheckM - v1.0.18
    • Parks DH, Imelfort M, Skennerton CT, Hugenholtz P, Tyson GW. CheckM: assessing the quality of microbial genomes recovered from isolates, single cells, and metagenomes. Genome Res. 2015;25: 1043 1055. doi:10.1101/gr.186072.114
    • CheckM source:
    • Additional info:
  3. Build Phylogenetic Tree from MSA using FastTree2 - v2.1.11
    • Price MN, Dehal PS, Arkin AP. FastTree 2 Approximately Maximum-Likelihood Trees for Large Alignments. PLOS ONE. 2010;5: e9490. doi:10.1371/journal.pone.0009490
    • Price MN, Dehal PS, Arkin AP. FastTree: computing large minimum evolution trees with profiles instead of a distance matrix. Mol Biol Evol. 2009;26: 1641 1650. doi:10.1093/molbev/msp077
    • Huerta-Cepas J, Serra F, Bork P. ETE 3: Reconstruction, Analysis, and Visualization of Phylogenomic Data. Mol Biol Evol. 2016;33: 1635 1638. doi:10.1093/molbev/msw046
    • FastTree-2 source:
    • ETE3 source:
  4. Circular Genome Visualization Tool
    no citations
  5. Classify Microbes with GTDB-Tk - v2.3.2
    • Pierre-Alain Chaumeil, Aaron J Mussig, Philip Hugenholtz, Donovan H Parks. GTDB-Tk v2: memory friendly classification with the genome taxonomy database. Bioinformatics, Volume 38, Issue 23, 1 December 2022, Pages 5315 5316. DOI: https://doi.org/10.1093/bioinformatics/btac672
    • Pierre-Alain Chaumeil, Aaron J Mussig, Philip Hugenholtz, Donovan H Parks, GTDB-Tk: a toolkit to classify genomes with the Genome Taxonomy Database, Bioinformatics, Volume 36, Issue 6, 15 March 2020, Pages 1925 1927. DOI: https://doi.org/10.1093/bioinformatics/btz848
    • Donovan H Parks, Maria Chuvochina, Christian Rinke, Aaron J Mussig, Pierre-Alain Chaumeil, Philip Hugenholtz. GTDB: an ongoing census of bacterial and archaeal diversity through a phylogenetically consistent, rank normalized and complete genome-based taxonomy. Nucleic Acids Research, Volume 50, Issue D1, 7 January 2022, Pages D785 D794. DOI: https://doi.org/10.1093/nar/gkab776
    • Parks, D., Chuvochina, M., Waite, D. et al. A standardized bacterial taxonomy based on genome phylogeny substantially revises the tree of life. Nat Biotechnol 36, 996 1004 (2018). DOI: https://doi.org/10.1038/nbt.4229
    • Parks DH, Chuvochina M, Chaumeil PA, Rinke C, Mussig AJ, Hugenholtz P. A complete domain-to-species taxonomy for Bacteria and Archaea. Nat Biotechnol. 2020;10.1038/s41587-020-0501-8. DOI:10.1038/s41587-020-0501-8
    • Rinke C, Chuvochina M, Mussig AJ, Chaumeil PA, Dav n AA, Waite DW, Whitman WB, Parks DH, and Hugenholtz P. A standardized archaeal taxonomy for the Genome Taxonomy Database. Nat Microbiol. 2021 Jul;6(7):946-959. DOI:10.1038/s41564-021-00918-8
    • Chivian D, Jungbluth SP, Dehal PS, Wood-Charlson EM, Canon RS, Allen BH, Clark MM, Gu T, Land ML, Price GA, Riehl WJ, Sneddon MW, Sutormin R, Zhang Q, Cottingham RW, Henry CS, Arkin AP. Metagenome-assembled genome extraction and analysis from microbiomes using KBase. Nat Protoc. 2023 Jan;18(1):208-238. doi: 10.1038/s41596-022-00747-x
    • Matsen FA, Kodner RB, Armbrust EV. pplacer: linear time maximum-likelihood and Bayesian phylogenetic placement of sequences onto a fixed reference tree. BMC Bioinformatics. 2010;11:538. Published 2010 Oct 30. doi:10.1186/1471-2105-11-538
    • Jain C, Rodriguez-R LM, Phillippy AM, Konstantinidis KT, Aluru S. High throughput ANI analysis of 90K prokaryotic genomes reveals clear species boundaries. Nat Commun. 2018;9(1):5114. Published 2018 Nov 30. DOI:10.1038/s41467-018-07641-9
    • Hyatt D, Chen GL, Locascio PF, Land ML, Larimer FW, Hauser LJ. Prodigal: prokaryotic gene recognition and translation initiation site identification. BMC Bioinformatics. 2010;11:119. Published 2010 Mar 8. DOI:10.1186/1471-2105-11-119
    • Price MN, Dehal PS, Arkin AP. FastTree 2--approximately maximum-likelihood trees for large alignments. PLoS One. 2010;5(3):e9490. Published 2010 Mar 10. DOI:10.1371/journal.pone.0009490 link: https://www.ncbi.nlm.nih.gov/pmc/articles/PMC2835736/
    • Eddy SR. Accelerated Profile HMM Searches. PLoS Comput Biol. 2011;7(10):e1002195. DOI:10.1371/journal.pcbi.1002195
    • Ondov BD, Treangen TJ, Melsted P, Mallonee AB, Bergman NH, Koren S, Phillippy AM. Mash: fast genome and metagenome distance estimation using MinHash. Genome Biol. 2016 Jun 20;17(1):132. DOI: 10.1186/s13059-016-0997-x
  6. Import Multiple Sequence Alignment (MSA) from File in Staging Area
    no citations
  7. Insert Genome Into SpeciesTree - v2.2.0
    • Price MN, Dehal PS, Arkin AP. FastTree 2 Approximately Maximum-Likelihood Trees for Large Alignments. PLoS One. 2010;5. doi:10.1371/journal.pone.0009490