Generated December 4, 2024

Complete genome sequence of a novel Microbacterium sp. strain Clip185.

Mautusi Mitra1 ORCID and Ana Stanescu2 ORCID

1 University of West Georgia, School of Field Investigations and Experimental Sciences
2 University of West Georgia, School of Computing, Analytics, and Modeling, 1601 Maple Street, Carrollton, GA 30118, USA

Abstract

We have isolated a new species of Microbacterium, an Actinobacterium. We have temporarily named this bacterium as Microbacterium sp. strain Clip185 (hereafter called strain Clip185) from a contaminated Tris-Acetate-Phosphate (TAP) medium culture plate of a green micro-alga Chlamydomonas reinhardtii strain LMJ.RY0402.185141 (a Chlamydomonas Library project CLiP strain). We sequenced the whole genome of strain Clip185 using the PacBio Sequel II Continuous Long Read technology and have submitted it to NCBI along with the SRA and PacBio methylation motif data. Additionally, we have submitted the PacBio methylome to REBASE, Ref#35996. We present the whole genome sequence of this new Microbacterium species that offers insights into its coding and non-coding genes and its nearest taxonomic neighbors.

Keywords

Microbacterium, Actinobacterium, decaprenoxanthin, xenobiotics-degrader, heavy metal-tolerant

Introduction

A novel species of Microbacterium strain Clip185, subsequently referred to as Clip185 in this narrative, was isolated from a contaminated Tris Acetate Phosphate (TAP) medium culture plate of Chlamydomonas reinhardtii at the University of West Georgia; Geolocation data: Carrollton, Georgia; 1,102 ft (336 m); 33.5730 N 85.1037 W (1). Strain Clip185 has one circular chromosome with a genome size of 3.3Mb. We report the complete genome sequence of Clip185 and offer insights into its genomic coding potential.

External Data Availability

  • The whole-genome sequence along with the PacBio DNA methylation motifs has been deposited in the GenBank under the accession number GCA_028743715.1.
  • The PacBio methylome has been submitted to REBASE, Ref#35996.
  • The raw sequence reads have been deposited in the SRA under the accession number SRR23538270.
  • Linked publication: (1)

Table of Contents

  1. Background and Experimental Methods
  2. QC and Assembly
  3. Import and Annotation
  4. Taxonomic Classification
  5. References

Background and Experimental Methods

Sample Collection

Microbacterium sp. strain Clip185 was isolated from a contaminated Tris Acetate Phosphate (TAP) medium culture plate of the green micro-alga Chlamydomonas reinhardtii strain LMJ.RY0402.18514, a Chlamydomonas Library project (CLiP) strain.

Isolation

Genomic DNA was isolated from the Lysogeny Broth (LB) medium-grown Microbacterium sp. strain Clip185 (colony #37) using the Qiagen Blood and Cell Culture DNA Mini Kit.

Genome Sequencing

After determination of genomic DNA purity and DNA quantification, the DNA sample was shipped to Georgia Genomics and Bioinformatics Core (GGBC) at the University of Georgia (Athens, GA). At GGBC, the sample was processed for preparation of the PacBio Single Molecule Real Time (SMRT) bell sequencing library according to the protocol given in the PacBio technical manual for template preparation and sequencing (please see QC section for more details). The SMRT bell sequencing library was barcoded and sequenced with two additional barcoded microbial SMRT Bell sequencing libraries in a single SMRT cell using PacBio SMRT Continuous Long Read sequencing on the PacBio Sequel II instrument. SMRT Link v9 was used as an interface to manage the workflow from sample setup to result analysis.


QC and Assembly

QC

  1. Quantitative and qualitative QC assessment was performed on the DNA sample at GGBC using Qubit, Nanodrop, and Fragment Analyzer.
  2. DNA was sheared using a Covaris®g-TUBE®. After shearing, the approximate size range of the fragments was determined with a Bioanalyzer® 12000 chip and the quantification of DNA was performed on a Nanodrop system.
  3. Purification and concentration of 12kb fragment sizes was performed using use 0.45X AMPure PB beads.
  4. DNA damages in the sheared DNA were repaired with DNA Damage Repair reagents provided by Pacific Biosciences and the PacBio Template Prep Kit was used to repair the ends of fragmented DNA. Following end repairs, DNA was purified with 0.45X AMPure PB beads.
  5. BLUNT hairpin adapters were ligated to the DNA fragments followed by exonuclease (ExoII and ExoVII) treatments to remove failed ligation products followed by size selection and purification using three distinct and consecutive 0.45X AMPure PB bead purification steps at room temperature to adequately remove enzymes (exonucleases, ligases, etc.) and ligation products smaller than 0.4kb (e.g., adapter dimers).
  6. SMRTbell™Library Quality assessment was performed using a Bioanalyzer® 12000 chip for sizing and was quantified via fluorescence using a Qubit® High Sensitivity kit.
  7. Sequencing primer v4 was bound to the SMRTbell template. DNA sequencing polymerases were bound to the primer-annealed SMRTbell templates using the Sequel® II Binding Kit 2.0. AMPure® PB Purification of Polymerase Bound SMRTbell® Complexes was performed.
  8. A dilution of the DNA Internal Control Complex (these controls are SMRTbell templates already bound with the polymerase, available from Pacific Biosciences) that had 30X DNA Internal Control Complex was added to the SMRT Bell template for independent determination of any problems that might have occurred during binding and the sequencing run.
  9. Prior to sequencing, the SMRT Bell template-polymerase complex was loaded using MagBead loading to a 96-well sample plate with concentrations and volumes specified by the Pacific Biosciences Binding Calculator.

Genome Assembly

Reads were first assembled using the SMRT Link v9 software, which has inbuilt HGAP v4.0. The pipeline was run at default with a pre-specified approximately estimated genome size of 3.3Mb (based on available complete genome sizes of various Microbacterium sp. on NCBI). After assembly, the assembly metrics for each sample along with HMM predicted genes were determined by running Quast v5.02. Genome was also assembled with FLYE v2.9.1 and CANU v2.2 for statistical confidence.

Genome Statistics

The Clip185 genome consists of one circularized chromosome comprising a total of 3,305,635bp with an overall G+C content of 69.5%. Genome coverage was calculated using the formula: Number of Subread Bases (mapped)/Genome Size = 10,942,194,255/3,305,635 = 3,310X. Genome coverage (based on hgap.depth_coverage_mean in the PacBio coverage report): 3193.6X.

GenBank Topology Size (bp) GC Content (%)
CP117996.1 Circular 3,305,635 69.5

Import and Annotation

  1. The Clip185 genome was imported into KBase using the Import from Staging Area application. More specifically, chromosome CP117996.1 was imported using the Type Genome and the NCBI Tax ID 51671: Microbacterium sp.
  2. The genome was annotated in KBase using the microbial Annotate Genome/Assembly with RASTtk - v1.073 application with default parameters.
  3. The circular genome was visualized using the KBase Circular Genome Visualization Tool with default parameters, except for Linear, which remained unchecked.
  4. The quality of the genome was assessed by the KBase Assess Genome Quality with the CheckM - v1.0.18 application.

Taxonomic Classification

  1. Taxonomic identification was performed using the KBase Classify Microbes with GTDB-Tk - v2.3.2 application on a GenomeSet generated with the Build GenomeSet-v1.7.6 application.
  2. A phylogenetic tree was constructed using the KBase Insert Genome Into Species Tree - v2.2.0 application with parameters: Neighbor Public Genome Count = 200.
  3. Another phylogenetic tree based on the 16S rRNA gene was constructed using the KBase Build Phylogenetic Tree from MSA using FastTree2-v2.1.11 application with parameters: top 50 sequences obtained from NCBI using BLASTN and aligned using MUSCLE in MEGA 11.

References

  1. Mitra M, Nguyen KMAK, Box TW, et al. Isolation and characterization of a heavy metal- and antibiotic-tolerant novel bacterial strain from a contaminated culture plate of Chlamydomonas reinhardtii, a green micro-alga. F1000Research 2021, 10:533. https://f1000research.com/articles/10-533

from biokbase.narrative.jobs.appmanager import AppManager
AppManager().run_app_batch(
    [{
        "app_id": "kb_uploadmethods/import_fasta_as_assembly_from_staging",
        "tag": "release",
        "version": "5b9346463df88a422ff5d4f4cba421679f63c73f",
        "params": [{
            "staging_file_subdir_path": "GCF_028743715.1_ASM2874371v1_genomic.fna",
            "assembly_name": "GCF_028743715.1_ASM2874371v1_genomic.fna_assembly"
        }],
        "shared_params": {
            "type": "draft isolate",
            "min_contig_length": 500
        }
    }],
    cell_id="ba02057d-8da4-4709-805b-35ed1caf1972",
    run_id="64b02c3c-dcca-489b-a771-bee8ed06af48"
)
Annotate or re-annotate genome/assembly using RASTtk (Rapid Annotations using Subsystems Technology toolkit).
This app completed without errors in 6m 32s.
Objects
Created Object Name Type Description
Clip185_Annotation_RASTtk-v1.073 Genome RAST re-annotated genome
Summary
The RAST algorithm was applied to annotating a genome sequence comprised of 1 contigs containing 3305635 nucleotides. No initial gene calls were provided. Standard features were called using: glimmer3; prodigal. A scan was conducted for the following additional feature types: rRNA; tRNA; selenoproteins; pyrrolysoproteins; repeat regions; crispr. The genome features were functionally annotated using the following algorithm(s): Kmers V2; Kmers V1; protein similarity. In addition to the remaining original 0 coding features and 0 non-coding features, 3273 new features were called, of which 57 are non-coding. Output genome has the following feature types: Coding gene 3216 Non-coding repeat 6 Non-coding rna 51 The number of distinct functions can exceed the number of genes because some genes have multiple functions.
Links
Generate a map and annotations of circular genomes using CGView.
This app completed without errors in 2m 12s.
Files
These are only available in the live Narrative: https://narrative.kbase.us/narrative/193023
  • KBase_derived_Clip185_Annotation_RASTtk-v1.073.png
  • KBase_derived_Clip185_Annotation_RASTtk-v1.073.jpg
  • KBase_derived_Clip185_Annotation_RASTtk-v1.073.svg
Runs the CheckM lineage workflow to assess the genome quality of isolates, single cells, or genome bins from metagenome assemblies through comparison to an existing database of genomes.
This app completed without errors in 4m 33s.
Links
Files
These are only available in the live Narrative: https://narrative.kbase.us/narrative/193023
  • CheckM_summary_table.tsv.zip - TSV Summary Table from CheckM
  • full_output.zip - Full output of CheckM
  • plots.zip - Output plots from CheckM
Obtain objective taxonomic assignments for bacterial and archaeal genomes based on the Genome Taxonomy Database (GTDB)
This app completed without errors in 48m 39s.
Objects
Created Object Name Type Description
GCF_000202635.1 Genome Taxonomy unchanged, taxon_assignment added GTDB
GCF_000633215.1 Genome Taxonomy unchanged, taxon_assignment added GTDB
GCF_001262495.1 Genome Taxonomy unchanged, taxon_assignment added GTDB
GCF_001314225.1 Genome Taxonomy unchanged, taxon_assignment added GTDB
GCF_001427145.1 Genome Taxonomy unchanged, taxon_assignment added GTDB
GCF_001427525.1 Genome Taxonomy unchanged, taxon_assignment added GTDB
GCF_001428485.1 Genome Taxonomy unchanged, taxon_assignment added GTDB
GCF_001552475.1 Genome Taxonomy unchanged, taxon_assignment added GTDB
GCF_001592125.1 Genome Taxonomy unchanged, taxon_assignment added GTDB
GCF_001652465.1 Genome Taxonomy unchanged, taxon_assignment added GTDB
GCF_001887285.1 Genome Taxonomy unchanged, taxon_assignment added GTDB
GCF_900104345.1 Genome Taxonomy unchanged, taxon_assignment added GTDB
GCF_000746195.1 Genome Taxonomy unchanged, taxon_assignment added GTDB
Clip185_Annotation_RASTtk-v1.073 Genome Taxonomy and taxon_assignment updated with GTDB
GCF_000799385.1 Genome Taxonomy unchanged, taxon_assignment added GTDB
GCF_000802305.1 Genome Taxonomy unchanged, taxon_assignment added GTDB
GCF_000956415.1 Genome Taxonomy unchanged, taxon_assignment added GTDB
GCF_000956465.1 Genome Taxonomy unchanged, taxon_assignment added GTDB
GCF_000956475.1 Genome Taxonomy unchanged, taxon_assignment added GTDB
GCF_000956535.1 Genome Taxonomy unchanged, taxon_assignment added GTDB
GCF_000956575.1 Genome Taxonomy unchanged, taxon_assignment added GTDB
Clip185_Output_Genome_Set GenomeSet Taxonomy and taxon_assignment updated with GTDB
Links
Files
These are only available in the live Narrative: https://narrative.kbase.us/narrative/193023
  • gtdbtk.backbone.bac120.classify.tree - gtdbtk.backbone.bac120.classify.tree - whole tree GTDB formatted Newick
  • gtdbtk.backbone.bac120.classify-ITOL.tree - gtdbtk.backbone.bac120.classify-ITOL.tree - whole tree ITOL formatted Newick
  • gtdbtk.bac120.classify.tree.5.tree - gtdbtk.bac120.classify.tree.5.tree - whole tree GTDB formatted Newick
  • gtdbtk.bac120.classify.tree.5-ITOL.tree - gtdbtk.bac120.classify.tree.5-ITOL.tree - whole tree ITOL formatted Newick
  • gtdbtk.backbone.bac120.classify-proximals.tree - gtdbtk.backbone.bac120.classify-proximals.tree - Newick
  • gtdbtk.backbone.bac120.classify-trimmed.tree - gtdbtk.backbone.bac120.classify-trimmed.tree - Newick
  • gtdbtk.backbone.bac120.classify-lineages.map - gtdbtk.backbone.bac120.classify-lineages.map - GTDB lineage
  • gtdbtk.backbone.bac120.classify-trimmed.tree-rectangle.PNG - gtdbtk.backbone.bac120.classify-trimmed.tree - Image
  • gtdbtk.backbone.bac120.classify-trimmed.tree-rectangle.PDF - gtdbtk.backbone.bac120.classify-trimmed.tree - Image
  • gtdbtk.backbone.bac120.classify-trimmed.tree-circle.PNG - gtdbtk.backbone.bac120.classify-trimmed.tree - Image
  • gtdbtk.backbone.bac120.classify-trimmed.tree-circle.PDF - gtdbtk.backbone.bac120.classify-trimmed.tree - Image
  • gtdbtk.backbone.bac120.classify-trimmed.tree-circle-ultrametric.PNG - gtdbtk.backbone.bac120.classify-trimmed.tree - Image
  • gtdbtk.backbone.bac120.classify-trimmed.tree-circle-ultrametric.PDF - gtdbtk.backbone.bac120.classify-trimmed.tree - Image
  • gtdbtk.bac120.classify.tree.5-proximals.tree - gtdbtk.bac120.classify.tree.5-proximals.tree - Newick
  • gtdbtk.bac120.classify.tree.5-trimmed.tree - gtdbtk.bac120.classify.tree.5-trimmed.tree - Newick
  • gtdbtk.bac120.classify.tree.5-lineages.map - gtdbtk.bac120.classify.tree.5-lineages.map - GTDB lineage
  • gtdbtk.bac120.classify.tree.5-trimmed.tree-rectangle.PNG - gtdbtk.bac120.classify.tree.5-trimmed.tree - Image
  • gtdbtk.bac120.classify.tree.5-trimmed.tree-rectangle.PDF - gtdbtk.bac120.classify.tree.5-trimmed.tree - Image
  • gtdbtk.bac120.classify.tree.5-trimmed.tree-circle.PNG - gtdbtk.bac120.classify.tree.5-trimmed.tree - Image
  • gtdbtk.bac120.classify.tree.5-trimmed.tree-circle.PDF - gtdbtk.bac120.classify.tree.5-trimmed.tree - Image
  • gtdbtk.bac120.classify.tree.5-trimmed.tree-circle-ultrametric.PNG - gtdbtk.bac120.classify.tree.5-trimmed.tree - Image
  • gtdbtk.bac120.classify.tree.5-trimmed.tree-circle-ultrametric.PDF - gtdbtk.bac120.classify.tree.5-trimmed.tree - Image
  • GTDB-Tk_classify_wf.zip - GTDB-Tk Classify WF output
Add one or more Genomes to a KBase SpeciesTree.
This app completed without errors in 5m 55s.
Files
These are only available in the live Narrative: https://narrative.kbase.us/narrative/193023
  • Cilp185_Output_Tree.newick
  • Cilp185_Output_Tree-labels.newick
  • Cilp185_Output_Tree.png
  • Cilp185_Output_Tree.pdf
Build a phylogenetic reconstruction from a Multiple Sequence Alignment (MSA) using FastTree2.
This app completed without errors in 1m 53s.
Objects
Created Object Name Type Description
top50_mega_muscle_msa_tree Tree top50_mega_muscle_msa_tree Tree
Files
These are only available in the live Narrative: https://narrative.kbase.us/narrative/193023
  • top50_mega_muscle_msa_tree.newick
  • top50_mega_muscle_msa_tree-labels.newick
  • top50_mega_muscle_msa_tree.png
  • top50_mega_muscle_msa_tree.pdf

Apps

  1. Annotate Genome/Assembly with RASTtk - v1.073
    • [1] Aziz RK, Bartels D, Best AA, DeJongh M, Disz T, Edwards RA, et al. The RAST Server: Rapid Annotations using Subsystems Technology. BMC Genomics. 2008;9: 75. doi:10.1186/1471-2164-9-75
    • [2] Overbeek R, Olson R, Pusch GD, Olsen GJ, Davis JJ, Disz T, et al. The SEED and the Rapid Annotation of microbial genomes using Subsystems Technology (RAST). Nucleic Acids Res. 2014;42: D206 D214. doi:10.1093/nar/gkt1226
    • [3] Brettin T, Davis JJ, Disz T, Edwards RA, Gerdes S, Olsen GJ, et al. RASTtk: A modular and extensible implementation of the RAST algorithm for building custom annotation pipelines and annotating batches of genomes. Sci Rep. 2015;5. doi:10.1038/srep08365
    • [4] Kent WJ. BLAT The BLAST-Like Alignment Tool. Genome Res. 2002;12: 656 664. doi:10.1101/gr.229202
    • [5] Altschul SF, Madden TL, Schaffer AA, Zhang J, Zhang Z, Miller W, Lipman DJ. Gapped BLAST and PSI-BLAST: a new generation of protein database search programs. Nucleic Acids Res. 1997;25: 3389-3402. doi:10.1093/nar/25.17.3389
    • [6] Lowe TM, Eddy SR. tRNAscan-SE: a program for improved detection of transfer RNA genes in genomic sequence. Nucleic Acids Res. 1997;25: 955 964.
    • [7] Cobucci-Ponzano B, Rossi M, Moracci M. Translational recoding in archaea. Extremophiles. 2012;16: 793 803. doi:10.1007/s00792-012-0482-8
    • [8] Meyer F, Overbeek R, Rodriguez A. FIGfams: yet another set of protein families. Nucleic Acids Res. 2009;37 6643-54. doi:10.1093/nar/gkp698.
    • [9] van Belkum A, Sluijuter M, de Groot R, Verbrugh H, Hermans PW. Novel BOX repeat PCR assay for high-resolution typing of Streptococcus pneumoniae strains. J Clin Microbiol. 1996;34: 1176 1179.
    • [10] Croucher NJ, Vernikos GS, Parkhill J, Bentley SD. Identification, variation and transcription of pneumococcal repeat sequences. BMC Genomics. 2011;12: 120. doi:10.1186/1471-2164-12-120
    • [11] Hyatt D, Chen G-L, Locascio PF, Land ML, Larimer FW, Hauser LJ. Prodigal: prokaryotic gene recognition and translation initiation site identification. BMC Bioinformatics. 2010;11: 119. doi:10.1186/1471-2105-11-119
    • [12] Delcher AL, Bratke KA, Powers EC, Salzberg SL. Identifying bacterial genes and endosymbiont DNA with Glimmer. Bioinformatics. 2007;23: 673 679. doi:10.1093/bioinformatics/btm009
    • [13] Akhter S, Aziz RK, Edwards RA. PhiSpy: a novel algorithm for finding prophages in bacterial genomes that combines similarity- and composition-based strategies. Nucleic Acids Res. 2012;40: e126. doi:10.1093/nar/gks406
  2. Assess Genome Quality with CheckM - v1.0.18
    • Parks DH, Imelfort M, Skennerton CT, Hugenholtz P, Tyson GW. CheckM: assessing the quality of microbial genomes recovered from isolates, single cells, and metagenomes. Genome Res. 2015;25: 1043 1055. doi:10.1101/gr.186072.114
    • CheckM source:
    • Additional info:
  3. Build Phylogenetic Tree from MSA using FastTree2 - v2.1.11
    • Price MN, Dehal PS, Arkin AP. FastTree 2 Approximately Maximum-Likelihood Trees for Large Alignments. PLOS ONE. 2010;5: e9490. doi:10.1371/journal.pone.0009490
    • Price MN, Dehal PS, Arkin AP. FastTree: computing large minimum evolution trees with profiles instead of a distance matrix. Mol Biol Evol. 2009;26: 1641 1650. doi:10.1093/molbev/msp077
    • Huerta-Cepas J, Serra F, Bork P. ETE 3: Reconstruction, Analysis, and Visualization of Phylogenomic Data. Mol Biol Evol. 2016;33: 1635 1638. doi:10.1093/molbev/msw046
    • FastTree-2 source:
    • ETE3 source:
  4. Circular Genome Visualization Tool
    no citations
  5. Classify Microbes with GTDB-Tk - v2.3.2
    • Pierre-Alain Chaumeil, Aaron J Mussig, Philip Hugenholtz, Donovan H Parks. GTDB-Tk v2: memory friendly classification with the genome taxonomy database. Bioinformatics, Volume 38, Issue 23, 1 December 2022, Pages 5315 5316. DOI: https://doi.org/10.1093/bioinformatics/btac672
    • Pierre-Alain Chaumeil, Aaron J Mussig, Philip Hugenholtz, Donovan H Parks, GTDB-Tk: a toolkit to classify genomes with the Genome Taxonomy Database, Bioinformatics, Volume 36, Issue 6, 15 March 2020, Pages 1925 1927. DOI: https://doi.org/10.1093/bioinformatics/btz848
    • Donovan H Parks, Maria Chuvochina, Christian Rinke, Aaron J Mussig, Pierre-Alain Chaumeil, Philip Hugenholtz. GTDB: an ongoing census of bacterial and archaeal diversity through a phylogenetically consistent, rank normalized and complete genome-based taxonomy. Nucleic Acids Research, Volume 50, Issue D1, 7 January 2022, Pages D785 D794. DOI: https://doi.org/10.1093/nar/gkab776
    • Parks, D., Chuvochina, M., Waite, D. et al. A standardized bacterial taxonomy based on genome phylogeny substantially revises the tree of life. Nat Biotechnol 36, 996 1004 (2018). DOI: https://doi.org/10.1038/nbt.4229
    • Parks DH, Chuvochina M, Chaumeil PA, Rinke C, Mussig AJ, Hugenholtz P. A complete domain-to-species taxonomy for Bacteria and Archaea. Nat Biotechnol. 2020;10.1038/s41587-020-0501-8. DOI:10.1038/s41587-020-0501-8
    • Rinke C, Chuvochina M, Mussig AJ, Chaumeil PA, Dav n AA, Waite DW, Whitman WB, Parks DH, and Hugenholtz P. A standardized archaeal taxonomy for the Genome Taxonomy Database. Nat Microbiol. 2021 Jul;6(7):946-959. DOI:10.1038/s41564-021-00918-8
    • Chivian D, Jungbluth SP, Dehal PS, Wood-Charlson EM, Canon RS, Allen BH, Clark MM, Gu T, Land ML, Price GA, Riehl WJ, Sneddon MW, Sutormin R, Zhang Q, Cottingham RW, Henry CS, Arkin AP. Metagenome-assembled genome extraction and analysis from microbiomes using KBase. Nat Protoc. 2023 Jan;18(1):208-238. doi: 10.1038/s41596-022-00747-x
    • Matsen FA, Kodner RB, Armbrust EV. pplacer: linear time maximum-likelihood and Bayesian phylogenetic placement of sequences onto a fixed reference tree. BMC Bioinformatics. 2010;11:538. Published 2010 Oct 30. doi:10.1186/1471-2105-11-538
    • Jain C, Rodriguez-R LM, Phillippy AM, Konstantinidis KT, Aluru S. High throughput ANI analysis of 90K prokaryotic genomes reveals clear species boundaries. Nat Commun. 2018;9(1):5114. Published 2018 Nov 30. DOI:10.1038/s41467-018-07641-9
    • Hyatt D, Chen GL, Locascio PF, Land ML, Larimer FW, Hauser LJ. Prodigal: prokaryotic gene recognition and translation initiation site identification. BMC Bioinformatics. 2010;11:119. Published 2010 Mar 8. DOI:10.1186/1471-2105-11-119
    • Price MN, Dehal PS, Arkin AP. FastTree 2--approximately maximum-likelihood trees for large alignments. PLoS One. 2010;5(3):e9490. Published 2010 Mar 10. DOI:10.1371/journal.pone.0009490 link: https://www.ncbi.nlm.nih.gov/pmc/articles/PMC2835736/
    • Eddy SR. Accelerated Profile HMM Searches. PLoS Comput Biol. 2011;7(10):e1002195. DOI:10.1371/journal.pcbi.1002195
    • Ondov BD, Treangen TJ, Melsted P, Mallonee AB, Bergman NH, Koren S, Phillippy AM. Mash: fast genome and metagenome distance estimation using MinHash. Genome Biol. 2016 Jun 20;17(1):132. DOI: 10.1186/s13059-016-0997-x
  6. Insert Genome Into SpeciesTree - v2.2.0
    • Price MN, Dehal PS, Arkin AP. FastTree 2 Approximately Maximum-Likelihood Trees for Large Alignments. PLoS One. 2010;5. doi:10.1371/journal.pone.0009490