Generated June 8, 2021

Draft Genome Sequence of Staphylococcus succinus strain GN1, isolated from the floor of the basement of a house built in 1916 in Milwaukee, WI

This is the narrative for the Microbial Resource Announcement publication of the isolated strain GN1

Gram stain of GN1 image.png

Import a FASTQ/SRA file into your Narrative as a Reads data object
This app completed without errors in 28m 49s.
Objects
Created Object Name Type Description
23_reads PairedEndLibrary Imported Reads
Links
Import a FASTA file from your staging area into your Narrative as an Assembly data object
This app completed without errors in 26m 19s.
Objects
Created Object Name Type Description
CLI_Spades_assembly Assembly Imported Assembly
Links
A quality control application for high throughput sequence data.
This app completed without errors in 2m 49s.
Links
Files
These are only available in the live Narrative: https://narrative.kbase.us/narrative/51640
  • 23_reads_51640_4_1.fwd_fastqc.zip - Zip file generated by fastqc that contains original images seen in the report
  • 23_reads_51640_4_1.rev_fastqc.zip - Zip file generated by fastqc that contains original images seen in the report
Assemble reads using the SPAdes assembler.
This app completed without errors in 14m 53s.
Objects
Created Object Name Type Description
Grant_kbase_SPAdes.contigs Assembly Assembled contigs
Summary
Assembly saved to: grantnickolson:narrative_1574277230463/kbase_SPAdes.contigs Assembled into 11 contigs. Avg Length: 257923.363636 bp. Contig Length Distribution (# of contigs -- min to max basepairs): 9 -- 593.0 to 143196.1 bp 0 -- 143196.1 to 285799.2 bp 0 -- 285799.2 to 428402.3 bp 0 -- 428402.3 to 571005.4 bp 0 -- 571005.4 to 713608.5 bp 0 -- 713608.5 to 856211.6 bp 0 -- 856211.6 to 998814.7 bp 1 -- 998814.7 to 1141417.8 bp 0 -- 1141417.8 to 1284020.9 bp 1 -- 1284020.9 to 1426624.0 bp
Links
v1 - KBaseGenomeAnnotations.Assembly-5.0
The viewer for the data in this Cell is available at the original Narrative here: https://narrative.kbase.us/narrative/51640
Annotate a bacterial or archaeal assembly using components from the RAST (Rapid Annotations using Subsystems Technology) toolkit (RASTtk).
This app completed without errors in 2m 49s.
Objects
Created Object Name Type Description
staph_rast_annotate Genome Annotated genome
Summary
The RAST algorithm was applied to annotating a genome sequence comprised of 11 contigs containing 2837157 nucleotides. 
No initial gene calls were provided.
Standard features were called using: glimmer3; prodigal.
A scan was conducted for the following additional feature types: rRNA; tRNA; selenoproteins; pyrrolysoproteins; repeat regions; crispr.
The genome features were functionally annotated using the following algorithm(s): Kmers V2; Kmers V1; protein similarity.
In addition to the remaining original 0 coding features and 0 non-coding features, 2831 new features were called, of which 73 are non-coding.
Output genome has the following feature types:
	Coding gene                     2758 
	Non-coding repeat                 25 
	Non-coding rna                    48 
Overall, the genes have 2083 distinct functions. 
The genes include 1247 genes with a SEED annotation ontology across 1063 distinct SEED functions.
The number of distinct functions can exceed the number of genes because some genes have multiple functions.
v1 - KBaseGenomes.Genome-10.0
The viewer for the data in this Cell is available at the original Narrative here: https://narrative.kbase.us/narrative/51640
Annotate Assembly and Re-annotate Genomes with Prokka annotation pipeline.
This app completed without errors in 2m 55s.
Objects
Created Object Name Type Description
kbase_spades_prokka_annot Genome Annotated Genome
Summary
Annotated Genome saved to: grantnickolson:narrative_1574277230463/kbase_spades_prokka_annot Number of genes predicted: 2767 Number of protein coding genes: 2716 Number of genes with non-hypothetical function: 2111 Number of genes with EC-number: 111 Number of genes with Seed Subsystem Ontology: 0 Average protein length: 293 aa.
Output from Annotate Assembly and Re-annotate Genomes with Prokka - v1.14.5
The viewer for the output created by this App is available at the original Narrative here: https://narrative.kbase.us/narrative/51640
Output from Annotate Microbial Assembly
The viewer for the output created by this App is available at the original Narrative here: https://narrative.kbase.us/narrative/51640
Add one or more genomes to a KBase species tree.
This app completed without errors in 13m 29s.
Files
These are only available in the live Narrative: https://narrative.kbase.us/narrative/51640
  • Grant_Staph_Tree.newick
  • Grant_Staph_Tree-labels.newick
  • Grant_Staph_Tree.png
  • Grant_Staph_Tree.pdf
Annotate or re-annotate bacterial or archaeal genomes and/or genome sets using RASTtk.
This app completed without errors in 1h 41m 15s.
Summary
The RAST algorithm was applied to annotating an existing genome: Staphylococcus succinus. 
The sequence for this genome is comprised of 1 contigs containing 2745675 nucleotides. 
The input genome has 2540 existing coding features and 176 existing non-coding features.
Input genome has the following feature types:
	Non-coding gene                   80 
	Non-coding misc_feature            1 
	Non-coding ncRNA                   3 
	Non-coding rRNA                   19 
	Non-coding regulatory             15 
	Non-coding tRNA                   57 
	Non-coding tmRNA                   1 
	gene                            2540 
The existing gene features were cleared due to selection of gene calling with Glimmer3 or Prodigal.
The existing gene features were cleared due to selection of gene calling with Glimmer3 or Prodigal.
Standard features were called using: glimmer3; prodigal.
A scan was conducted for the following additional feature types: rRNA; tRNA; selenoproteins; pyrrolysoproteins; repeat regions; crispr.
The genome features were functionally annotated using the following algorithm(s): Kmers V2; Kmers V1; protein similarity.
In addition to the remaining original 0 coding features and 176 non-coding features, 2709 new features were called, of which 101 are non-coding.
Output genome has the following feature types:
	Coding gene                     2608 
	Non-coding gene                   80 
	Non-coding misc_feature            1 
	Non-coding ncRNA                   3 
	Non-coding prophage                3 
	Non-coding rRNA                   19 
	Non-coding regulatory             15 
	Non-coding repeat                 18 
	Non-coding rna                    80 
	Non-coding tRNA                   57 
	Non-coding tmRNA                   1 
Overall, the genes have 2051 distinct functions. 
The genes include 1216 genes with a SEED annotation ontology across 1057 distinct SEED functions.
The number of distinct functions can exceed the number of genes because some genes have multiple functions.
Staphylococcus_succinus__GCF_001902315.1_ succeeded!

The RAST algorithm was applied to annotating an existing genome: Staphylococcus equorum. 
The sequence for this genome is comprised of 1 contigs containing 2822193 nucleotides. 
The input genome has 2635 existing coding features and 185 existing non-coding features.
Input genome has the following feature types:
	Non-coding gene                   84 
	Non-coding ncRNA                   3 
	Non-coding rRNA                   22 
	Non-coding regulatory             15 
	Non-coding repeat_region           2 
	Non-coding tRNA                   58 
	Non-coding tmRNA                   1 
	gene                            2635 
The existing gene features were cleared due to selection of gene calling with Glimmer3 or Prodigal.
The existing gene features were cleared due to selection of gene calling with Glimmer3 or Prodigal.
Standard features were called using: glimmer3; prodigal.
A scan was conducted for the following additional feature types: rRNA; tRNA; selenoproteins; pyrrolysoproteins; repeat regions; crispr.
The genome features were functionally annotated using the following algorithm(s): Kmers V2; Kmers V1; protein similarity.
In addition to the remaining original 0 coding features and 185 non-coding features, 2864 new features were called, of which 155 are non-coding.
Output genome has the following feature types:
	Coding gene                     2709 
	Non-coding crispr_array            2 
	Non-coding crispr_repeat          17 
	Non-coding crispr_spacer          15 
	Non-coding gene                   84 
	Non-coding ncRNA                   3 
	Non-coding prophage                2 
	Non-coding rRNA                   22 
	Non-coding regulatory             15 
	Non-coding repeat                 36 
	Non-coding repeat_region           2 
	Non-coding rna                    83 
	Non-coding tRNA                   58 
	Non-coding tmRNA                   1 
Overall, the genes have 2094 distinct functions. 
The genes include 1237 genes with a SEED annotation ontology across 1060 distinct SEED functions.
The number of distinct functions can exceed the number of genes because some genes have multiple functions.
Staphylococcus_equorum__GCF_001432245.1_ succeeded!

The RAST algorithm was applied to annotating an existing genome: Staphylococcus saprophyticus. 
The sequence for this genome is comprised of 76 contigs containing 2419582 nucleotides. 
The input genome has 2370 existing coding features and 55 existing non-coding features.
Input genome has the following feature types:
	Non-coding gene                   22 
	Non-coding ncRNA                   3 
	Non-coding rRNA                    3 
	Non-coding regulatory             11 
	Non-coding tRNA                   15 
	Non-coding tmRNA                   1 
	gene                            2370 
The existing gene features were cleared due to selection of gene calling with Glimmer3 or Prodigal.
The existing gene features were cleared due to selection of gene calling with Glimmer3 or Prodigal.
Standard features were called using: glimmer3; prodigal.
A scan was conducted for the following additional feature types: rRNA; tRNA; selenoproteins; pyrrolysoproteins; repeat regions; crispr.
The genome features were functionally annotated using the following algorithm(s): Kmers V2; Kmers V1; protein similarity.
In addition to the remaining original 0 coding features and 55 non-coding features, 2481 new features were called, of which 51 are non-coding.
Output genome has the following feature types:
	Coding gene                     2430 
	Non-coding gene                   22 
	Non-coding ncRNA                   3 
	Non-coding rRNA                    3 
	Non-coding regulatory             11 
	Non-coding repeat                 34 
	Non-coding rna                    17 
	Non-coding tRNA                   15 
	Non-coding tmRNA                   1 
Overall, the genes have 1899 distinct functions. 
The genes include 1168 genes with a SEED annotation ontology across 1005 distinct SEED functions.
The number of distinct functions can exceed the number of genes because some genes have multiple functions.
Staphylococcus_saprophyticus__GCF_001074355.1_ succeeded!

The RAST algorithm was applied to annotating an existing genome: Staphylococcus capitis subsp. capitis. 
The sequence for this genome is comprised of 2 contigs containing 2503265 nucleotides. 
The input genome has 2329 existing coding features and 183 existing non-coding features.
Input genome has the following feature types:
	Non-coding gene                   85 
	Non-coding ncRNA                   3 
	Non-coding rRNA                   19 
	Non-coding regulatory             13 
	Non-coding tRNA                   62 
	Non-coding tmRNA                   1 
	gene                            2329 
The existing gene features were cleared due to selection of gene calling with Glimmer3 or Prodigal.
The existing gene features were cleared due to selection of gene calling with Glimmer3 or Prodigal.
Standard features were called using: glimmer3; prodigal.
A scan was conducted for the following additional feature types: rRNA; tRNA; selenoproteins; pyrrolysoproteins; repeat regions; crispr.
The genome features were functionally annotated using the following algorithm(s): Kmers V2; Kmers V1; protein similarity.
In addition to the remaining original 0 coding features and 183 non-coding features, 2491 new features were called, of which 101 are non-coding.
Output genome has the following feature types:
	Coding gene                     2390 
	Non-coding gene                   85 
	Non-coding ncRNA                   3 
	Non-coding prophage                2 
	Non-coding rRNA                   19 
	Non-coding regulatory             13 
	Non-coding repeat                 18 
	Non-coding rna                    81 
	Non-coding tRNA                   62 
	Non-coding tmRNA                   1 
Overall, the genes have 2017 distinct functions. 
The genes include 1097 genes with a SEED annotation ontology across 1010 distinct SEED functions.
The number of distinct functions can exceed the number of genes because some genes have multiple functions.
Staphylococcus_capitis_subsp._capitis__GCF_001028645.1_ succeeded!

The RAST algorithm was applied to annotating an existing genome: Staphylococcus cohnii subsp. cohnii. 
The sequence for this genome is comprised of 16 contigs containing 2826849 nucleotides. 
The input genome has 2677 existing coding features and 170 existing non-coding features.
Input genome has the following feature types:
	Non-coding gene                   78 
	Non-coding ncRNA                   3 
	Non-coding rRNA                   17 
	Non-coding regulatory             14 
	Non-coding tRNA                   57 
	Non-coding tmRNA                   1 
	gene                            2677 
The existing gene features were cleared due to selection of gene calling with Glimmer3 or Prodigal.
The existing gene features were cleared due to selection of gene calling with Glimmer3 or Prodigal.
Standard features were called using: glimmer3; prodigal.
A scan was conducted for the following additional feature types: rRNA; tRNA; selenoproteins; pyrrolysoproteins; repeat regions; crispr.
The genome features were functionally annotated using the following algorithm(s): Kmers V2; Kmers V1; protein similarity.
In addition to the remaining original 0 coding features and 170 non-coding features, 2891 new features were called, of which 135 are non-coding.
Output genome has the following feature types:
	Coding gene                     2756 
	Non-coding gene                   78 
	Non-coding ncRNA                   3 
	Non-coding prophage                1 
	Non-coding rRNA                   17 
	Non-coding regulatory             14 
	Non-coding repeat                 64 
	Non-coding rna                    70 
	Non-coding tRNA                   57 
	Non-coding tmRNA                   1 
Overall, the genes have 2075 distinct functions. 
The genes include 1297 genes with a SEED annotation ontology across 1041 distinct SEED functions.
The number of distinct functions can exceed the number of genes because some genes have multiple functions.
Staphylococcus_cohnii_subsp._cohnii__GCF_000972575.1_ succeeded!

The RAST algorithm was applied to annotating an existing genome: Staphylococcus gallinarum. 
The sequence for this genome is comprised of 272 contigs containing 3171720 nucleotides. 
The input genome has 3095 existing coding features and 194 existing non-coding features.
Input genome has the following feature types:
	Non-coding gene                   89 
	Non-coding ncRNA                   3 
	Non-coding rRNA                   28 
	Non-coding regulatory             16 
	Non-coding tRNA                   57 
	Non-coding tmRNA                   1 
	gene                            3095 
The existing gene features were cleared due to selection of gene calling with Glimmer3 or Prodigal.
The existing gene features were cleared due to selection of gene calling with Glimmer3 or Prodigal.
Standard features were called using: glimmer3; prodigal.
A scan was conducted for the following additional feature types: rRNA; tRNA; selenoproteins; pyrrolysoproteins; repeat regions; crispr.
The genome features were functionally annotated using the following algorithm(s): Kmers V2; Kmers V1; protein similarity.
In addition to the remaining original 0 coding features and 194 non-coding features, 3667 new features were called, of which 439 are non-coding.
Output genome has the following feature types:
	Coding gene                     3228 
	Non-coding crispr_array            1 
	Non-coding crispr_repeat          12 
	Non-coding crispr_spacer          11 
	Non-coding gene                   89 
	Non-coding ncRNA                   3 
	Non-coding prophage                3 
	Non-coding rRNA                   28 
	Non-coding regulatory             16 
	Non-coding repeat                342 
	Non-coding rna                    70 
	Non-coding tRNA                   57 
	Non-coding tmRNA                   1 
Overall, the genes have 2164 distinct functions. 
The genes include 1365 genes with a SEED annotation ontology across 1090 distinct SEED functions.
The number of distinct functions can exceed the number of genes because some genes have multiple functions.
Staphylococcus_gallinarum__GCF_000875895.1_ succeeded!

The RAST algorithm was applied to annotating an existing genome: Staphylococcus warneri SG1. 
The sequence for this genome is comprised of 9 contigs containing 2560716 nucleotides. 
The input genome has 2424 existing coding features and 167 existing non-coding features.
Input genome has the following feature types:
	Non-coding gene                   78 
	Non-coding ncRNA                   3 
	Non-coding rRNA                   16 
	Non-coding regulatory             11 
	Non-coding tRNA                   58 
	Non-coding tmRNA                   1 
	gene                            2424 
The existing gene features were cleared due to selection of gene calling with Glimmer3 or Prodigal.
The existing gene features were cleared due to selection of gene calling with Glimmer3 or Prodigal.
Standard features were called using: glimmer3; prodigal.
A scan was conducted for the following additional feature types: rRNA; tRNA; selenoproteins; pyrrolysoproteins; repeat regions; crispr.
The genome features were functionally annotated using the following algorithm(s): Kmers V2; Kmers V1; protein similarity.
In addition to the remaining original 0 coding features and 167 non-coding features, 2628 new features were called, of which 153 are non-coding.
Output genome has the following feature types:
	Coding gene                     2475 
	Non-coding gene                   78 
	Non-coding ncRNA                   3 
	Non-coding prophage                2 
	Non-coding rRNA                   16 
	Non-coding regulatory             11 
	Non-coding repeat                 75 
	Non-coding rna                    76 
	Non-coding tRNA                   58 
	Non-coding tmRNA                   1 
Overall, the genes have 2036 distinct functions. 
The genes include 1135 genes with a SEED annotation ontology across 1034 distinct SEED functions.
The number of distinct functions can exceed the number of genes because some genes have multiple functions.
Staphylococcus_warneri_SG1__GCF_000332735.1_ succeeded!

staph_rast_annotate failed!

The RAST algorithm was applied to annotating an existing genome: Staphylococcus arlettae CVD059. 
The sequence for this genome is comprised of 57 contigs containing 2565675 nucleotides. 
The input genome has 2453 existing coding features and 176 existing non-coding features.
Input genome has the following feature types:
	Non-coding gene                   80 
	Non-coding misc_feature            3 
	Non-coding ncRNA                   3 
	Non-coding rRNA                   17 
	Non-coding regulatory             13 
	Non-coding tRNA                   59 
	Non-coding tmRNA                   1 
	gene                            2453 
The existing gene features were cleared due to selection of gene calling with Glimmer3 or Prodigal.
The existing gene features were cleared due to selection of gene calling with Glimmer3 or Prodigal.
Standard features were called using: glimmer3; prodigal.
A scan was conducted for the following additional feature types: rRNA; tRNA; selenoproteins; pyrrolysoproteins; repeat regions; crispr.
The genome features were functionally annotated using the following algorithm(s): Kmers V2; Kmers V1; protein similarity.
In addition to the remaining original 0 coding features and 176 non-coding features, 2789 new features were called, of which 220 are non-coding.
Output genome has the following feature types:
	Coding gene                     2569 
	Non-coding gene                   80 
	Non-coding misc_feature            3 
	Non-coding ncRNA                   3 
	Non-coding rRNA                   17 
	Non-coding regulatory             13 
	Non-coding repeat                159 
	Non-coding rna                    61 
	Non-coding tRNA                   59 
	Non-coding tmRNA                   1 
Overall, the genes have 1959 distinct functions. 
The genes include 1207 genes with a SEED annotation ontology across 1027 distinct SEED functions.
The number of distinct functions can exceed the number of genes because some genes have multiple functions.
Staphylococcus_arlettae_CVD059__GCF_000295715.1_ succeeded!

The RAST algorithm was applied to annotating an existing genome: Staphylococcus saprophyticus subsp. saprophyticus ATCC 15305. 
The sequence for this genome is comprised of 3 contigs containing 2577899 nucleotides. 
The input genome has 2442 existing coding features and 178 existing non-coding features.
Input genome has the following feature types:
	Non-coding gene                   81 
	Non-coding ncRNA                   3 
	Non-coding rRNA                   19 
	Non-coding regulatory             16 
	Non-coding tRNA                   58 
	Non-coding tmRNA                   1 
	gene                            2442 
The existing gene features were cleared due to selection of gene calling with Glimmer3 or Prodigal.
The existing gene features were cleared due to selection of gene calling with Glimmer3 or Prodigal.
Standard features were called using: glimmer3; prodigal.
A scan was conducted for the following additional feature types: rRNA; tRNA; selenoproteins; pyrrolysoproteins; repeat regions; crispr.
The genome features were functionally annotated using the following algorithm(s): Kmers V2; Kmers V1; protein similarity.
In addition to the remaining original 0 coding features and 178 non-coding features, 2646 new features were called, of which 123 are non-coding.
Output genome has the following feature types:
	Coding gene                     2523 
	Non-coding gene                   81 
	Non-coding ncRNA                   3 
	Non-coding rRNA                   19 
	Non-coding regulatory             16 
	Non-coding repeat                 44 
	Non-coding rna                    79 
	Non-coding tRNA                   58 
	Non-coding tmRNA                   1 
Overall, the genes have 2021 distinct functions. 
The genes include 1157 genes with a SEED annotation ontology across 1015 distinct SEED functions.
The number of distinct functions can exceed the number of genes because some genes have multiple functions.
Staphylococcus_saprophyticus_subsp._saprophyticus_ATCC_15305__GCF_000010125.1_ succeeded!

The RAST algorithm was applied to annotating an existing genome: Staphylococcus haemolyticus JCSC1435. 
The sequence for this genome is comprised of 4 contigs containing 2697861 nucleotides. 
The input genome has 2568 existing coding features and 179 existing non-coding features.
Input genome has the following feature types:
	Non-coding gene                   80 
	Non-coding ncRNA                   3 
	Non-coding rRNA                   16 
	Non-coding regulatory             16 
	Non-coding repeat_region           3 
	Non-coding tRNA                   60 
	Non-coding tmRNA                   1 
	gene                            2568 
The existing gene features were cleared due to selection of gene calling with Glimmer3 or Prodigal.
The existing gene features were cleared due to selection of gene calling with Glimmer3 or Prodigal.
Standard features were called using: glimmer3; prodigal.
A scan was conducted for the following additional feature types: rRNA; tRNA; selenoproteins; pyrrolysoproteins; repeat regions; crispr.
The genome features were functionally annotated using the following algorithm(s): Kmers V2; Kmers V1; protein similarity.
In addition to the remaining original 0 coding features and 179 non-coding features, 2869 new features were called, of which 248 are non-coding.
Output genome has the following feature types:
	Coding gene                     2621 
	Non-coding gene                   80 
	Non-coding ncRNA                   3 
	Non-coding prophage                3 
	Non-coding rRNA                   16 
	Non-coding regulatory             16 
	Non-coding repeat                169 
	Non-coding repeat_region           3 
	Non-coding rna                    76 
	Non-coding tRNA                   60 
	Non-coding tmRNA                   1 
Overall, the genes have 2076 distinct functions. 
The genes include 1158 genes with a SEED annotation ontology across 1030 distinct SEED functions.
The number of distinct functions can exceed the number of genes because some genes have multiple functions.
Staphylococcus_haemolyticus_JCSC1435__GCF_000009865.1_ succeeded!

Files
These are only available in the live Narrative: https://narrative.kbase.us/narrative/51640
  • annotation_report.rast_genome_set_from_tree - Microbial Annotation Report
Obtain objective taxonomic assignments for bacterial and archaeal genomes based on the Genome Taxonomy Database (GTDB)
This app completed without errors in 31m 45s.
Links
v1 - KBaseTrees.Tree-1.0
The viewer for the data in this Cell is available at the original Narrative here: https://narrative.kbase.us/narrative/51640
v1 - KBaseSearch.GenomeSet-2.1
The viewer for the data in this Cell is available at the original Narrative here: https://narrative.kbase.us/narrative/51640
Allows users to compute a pangenome from a set of individual genomes.
This app completed without errors in 6m 40s.
Objects
Created Object Name Type Description
Grant_Staph_Pangenome Pangenome Pangenome
Summary
Pangenome saved to grantnickolson:narrative_1574277230463/Grant_Staph_Pangenome
Output from Compute Pangenome
The viewer for the output created by this App is available at the original Narrative here: https://narrative.kbase.us/narrative/51640
Compare isofunctional and homologous gene families for all genomes in a Pangenome.
This app completed without errors in 2m 55s.
Objects
Created Object Name Type Description
Grant_Staph_Comparison_Pangenome GenomeComparison GenomeComparison
Summary
GenomeComparison saved to grantnickolson:narrative_1574277230463/Grant_Staph_Comparison_Pangenome
Output from Compare Genomes from Pangenome
The viewer for the output created by this App is available at the original Narrative here: https://narrative.kbase.us/narrative/51640
View a microbial Pangenome as a circle plot.
This app completed without errors in 2m 59s.
Files
These are only available in the live Narrative: https://narrative.kbase.us/narrative/51640
  • pan_circle_plot.png
  • pan_circle_plot.pdf
Examine the general functional distribution or specific functional gene families for a GenomeSet.
This app completed without errors in 1m 41s.

Apps

  1. Annotate Assembly and Re-annotate Genomes with Prokka - v1.14.5
    • Seemann T. Prokka: rapid prokaryotic genome annotation. Bioinformatics. 2014;30: 2068 2069. doi:10.1093/bioinformatics/btu153
  2. Annotate Microbial Assembly with RASTtk - v1.073
    • [1] Aziz RK, Bartels D, Best AA, DeJongh M, Disz T, Edwards RA, et al. The RAST Server: Rapid Annotations using Subsystems Technology. BMC Genomics. 2008;9: 75. doi:10.1186/1471-2164-9-75
    • [2] Overbeek R, Olson R, Pusch GD, Olsen GJ, Davis JJ, Disz T, et al.vThe SEED and the Rapid Annotation of microbial genomes using Subsystems Technology (RAST). Nucleic Acids Res. 2014;42: D206 D214. doi:10.1093/nar/gkt1226
    • [3] Brettin T, Davis JJ, Disz T, Edwards RA, Gerdes S, Olsen GJ, et al. RASTtk: A modular and extensible implementation of the RAST algorithm for building custom annotation pipelines and annotating batches of genomes. Sci Rep. 2015;5. doi:10.1038/srep08365
    • [4] Kent WJ. BLAT The BLAST-Like Alignment Tool. Genome Res. 2002;12: 656 664. doi:10.1101/gr.229202
    • [5] Altschul SF, Madden TL, Schaffer AA, Zhang J, Zhang Z, Miller W, Lipman DJ. Gapped BLAST and PSI-BLAST: a new generation of protein database search programs. Nucleic Acids Res. 1997;25: 3389-3402. doi:10.1093/nar/25.17.3389
    • [6] Lowe TM, Eddy SR. tRNAscan-SE: a program for improved detection of transfer RNA genes in genomic sequence. Nucleic Acids Res. 1997;25: 955 964.
    • [7] Cobucci-Ponzano B, Rossi M, Moracci M. Translational recoding in archaea. Extremophiles. 2012;16: 793 803. doi:10.1007/s00792-012-0482-8
    • [8] Meyer F, Overbeek R, Rodriguez A. FIGfams: yet another set of protein families. Nucleic Acids Res. 2009;37 6643-54. doi:10.1093/nar/gkp698.
    • [9] van Belkum A, Sluijuter M, de Groot R, Verbrugh H, Hermans PW. Novel BOX repeat PCR assay for high-resolution typing of Streptococcus pneumoniae strains. J Clin Microbiol. 1996;34: 1176 1179.
    • [10] Croucher NJ, Vernikos GS, Parkhill J, Bentley SD. Identification, variation and transcription of pneumococcal repeat sequences. BMC Genomics. 2011;12: 120. doi:10.1186/1471-2164-12-120
    • [11] Hyatt D, Chen G-L, Locascio PF, Land ML, Larimer FW, Hauser LJ. Prodigal: prokaryotic gene recognition and translation initiation site identification. BMC Bioinformatics. 2010;11: 119. doi:10.1186/1471-2105-11-119
    • [12] Delcher AL, Bratke KA, Powers EC, Salzberg SL. Identifying bacterial genes and endosymbiont DNA with Glimmer. Bioinformatics. 2007;23: 673 679. doi:10.1093/bioinformatics/btm009
    • [13] Akhter S, Aziz RK, Edwards RA. PhiSpy: a novel algorithm for finding prophages in bacterial genomes that combines similarity- and composition-based strategies. Nucleic Acids Res. 2012;40: e126. doi:10.1093/nar/gks406
  3. Annotate Multiple Microbial Genomes with RAST - v1.073
    • [1] Aziz RK, Bartels D, Best AA, DeJongh M, Disz T, Edwards RA, et al. The RAST Server: Rapid Annotations using Subsystems Technology. BMC Genomics. 2008;9: 75. doi:10.1186/1471-2164-9-75
    • [2] Overbeek R, Olson R, Pusch GD, Olsen GJ, Davis JJ, Disz T, et al. The SEED and the Rapid Annotation of microbial genomes using Subsystems Technology (RAST). Nucleic Acids Res. 2014;42: D206 D214. doi:10.1093/nar/gkt1226
    • [3] Brettin T, Davis JJ, Disz T, Edwards RA, Gerdes S, Olsen GJ, et al. RASTtk: A modular and extensible implementation of the RAST algorithm for building custom annotation pipelines and annotating batches of genomes. Sci Rep. 2015;5. doi:10.1038/srep08365
    • [4] Kent WJ. BLAT The BLAST-Like Alignment Tool. Genome Res. 2002;12: 656 664. doi:10.1101/gr.229202
    • [5] Altschul SF, Madden TL, Schaffer AA, Zhang J, Zhang Z, Miller W, Lipman DJ. Gapped BLAST and PSI-BLAST: a new generation of protein database search programs. Nucleic Acids Res. 1997;25: 3389-3402. doi:10.1093/nar/25.17.3389
    • [6] Lowe TM, Eddy SR. tRNAscan-SE: a program for improved detection of transfer RNA genes in genomic sequence. Nucleic Acids Res. 1997;25: 955 964.
    • [7] Cobucci-Ponzano B, Rossi M, Moracci M. Translational recoding in archaea. Extremophiles. 2012;16: 793 803. doi:10.1007/s00792-012-0482-8
    • [8] Meyer F, Overbeek R, Rodriguez A. FIGfams: yet another set of protein families. Nucleic Acids Res. 2009;37 6643-54. doi:10.1093/nar/gkp698.
    • [9] van Belkum A, Sluijuter M, de Groot R, Verbrugh H, Hermans PW. Novel BOX repeat PCR assay for high-resolution typing of Streptococcus pneumoniae strains. J Clin Microbiol. 1996;34: 1176 1179.
    • [10] Croucher NJ, Vernikos GS, Parkhill J, Bentley SD. Identification, variation and transcription of pneumococcal repeat sequences. BMC Genomics. 2011;12: 120. doi:10.1186/1471-2164-12-120
    • [11] Hyatt D, Chen G-L, Locascio PF, Land ML, Larimer FW, Hauser LJ. Prodigal: prokaryotic gene recognition and translation initiation site identification. BMC Bioinformatics. 2010;11: 119. doi:10.1186/1471-2105-11-119
    • [12] Delcher AL, Bratke KA, Powers EC, Salzberg SL. Identifying bacterial genes and endosymbiont DNA with Glimmer. Bioinformatics. 2007;23: 673 679. doi:10.1093/bioinformatics/btm009
    • [13] Akhter S, Aziz RK, Edwards RA. PhiSpy: a novel algorithm for finding prophages in bacterial genomes that combines similarity- and composition-based strategies. Nucleic Acids Res. 2012;40: e126. doi:10.1093/nar/gks406
  4. Assemble Reads with SPAdes - v3.13.0
    • [1] Bankevich A, Nurk S, Antipov D, Gurevich AA, Dvorkin M, Kulikov AS, et al. SPAdes: A New Genome Assembly Algorithm and Its Applications to Single-Cell Sequencing. Journal of Computational Biology. 2012;19: 455-477. doi: 10.1089/cmb.2012.0021
  5. Assess Read Quality with FastQC - v0.11.5
    • FastQC source: Bioinformatics Group at the Babraham Institute, UK.
  6. Compare Genomes from Pangenome
    • Medini D, Donati C, Tettelin H, Masignani V, Rappuoli R. The microbial pan-genome. Curr Opin Genet Dev. 2005;15: 589 594. doi:10.1016/j.gde.2005.09.006
    • Tettelin H, Masignani V, Cieslewicz MJ, Donati C, Medini D, Ward NL, et al. Genome analysis of multiple pathogenic isolates of Streptococcus agalactiae: Implications for the microbial pan-genome. Proc Natl Acad Sci U S A. 2005;102: 13950 13955. doi:10.1073/pnas.0506758102
    • Rasko DA, Rosovitz MJ, Myers GSA, Mongodin EF, Fricke WF, Gajer P, et al. The Pangenome Structure of Escherichia coli: Comparative Genomic Analysis of E. coli Commensal and Pathogenic Isolates. J Bacteriol. 2008;190: 6881 6893. doi:10.1128/JB.00619-08
  7. Compute Pangenome
    • Arkin AP, Cottingham RW, Henry CS, Harris NL, Stevens RL, Maslov S, et al. KBase: The United States Department of Energy Systems Biology Knowledgebase. Nature Biotechnology. 2018;36: 566. doi: 10.1038/nbt.4163
  8. GTDB-Tk classify
    • Pierre-Alain Chaumeil, Aaron J Mussig, Philip Hugenholtz, Donovan H Parks, GTDB-Tk: a toolkit to classify genomes with the Genome Taxonomy Database, Bioinformatics, Volume 36, Issue 6, 15 March 2020, Pages 1925 1927. DOI: https://doi.org/10.1093/bioinformatics/btz848
    • Parks, D., Chuvochina, M., Waite, D. et al. A standardized bacterial taxonomy based on genome phylogeny substantially revises the tree of life. Nat Biotechnol 36, 996 1004 (2018). DOI: https://doi.org/10.1038/nbt.4229
    • Parks DH, Chuvochina M, Chaumeil PA, Rinke C, Mussig AJ, Hugenholtz P. A complete domain-to-species taxonomy for Bacteria and Archaea [published online ahead of print, 2020 Apr 27]. Nat Biotechnol. 2020;10.1038/s41587-020-0501-8. DOI:10.1038/s41587-020-0501-8
    • Matsen FA, Kodner RB, Armbrust EV. pplacer: linear time maximum-likelihood and Bayesian phylogenetic placement of sequences onto a fixed reference tree. BMC Bioinformatics. 2010;11:538. Published 2010 Oct 30. doi:10.1186/1471-2105-11-538
    • Jain C, Rodriguez-R LM, Phillippy AM, Konstantinidis KT, Aluru S. High throughput ANI analysis of 90K prokaryotic genomes reveals clear species boundaries. Nat Commun. 2018;9(1):5114. Published 2018 Nov 30. DOI:10.1038/s41467-018-07641-9
    • Hyatt D, Chen GL, Locascio PF, Land ML, Larimer FW, Hauser LJ. Prodigal: prokaryotic gene recognition and translation initiation site identification. BMC Bioinformatics. 2010;11:119. Published 2010 Mar 8. DOI:10.1186/1471-2105-11-119
    • Price MN, Dehal PS, Arkin AP. FastTree 2--approximately maximum-likelihood trees for large alignments. PLoS One. 2010;5(3):e9490. Published 2010 Mar 10. DOI:10.1371/journal.pone.0009490 link: https://www.ncbi.nlm.nih.gov/pmc/articles/PMC2835736/
    • Eddy SR. Accelerated Profile HMM Searches. PLoS Comput Biol. 2011;7(10):e1002195. DOI:10.1371/journal.pcbi.1002195
  9. Import FASTA File as Assembly from Staging Area
    no citations
  10. Import FASTQ/SRA File as Reads from Staging Area
    no citations
  11. Insert Genome Into SpeciesTree - v2.2.0
    • Price MN, Dehal PS, Arkin AP. FastTree 2 Approximately Maximum-Likelihood Trees for Large Alignments. PLoS One. 2010;5. doi:10.1371/journal.pone.0009490
  12. Pangenome Circle Plot - v1.2.0
    • Li L, Stoeckert CJ, Roos DS. OrthoMCL: Identification of Ortholog Groups for Eukaryotic Genomes. Genome Res. 2003;13: 2178 2189. doi:10.1101/gr.1224503
  13. View Function Profile for Genomes - v1.4.0
    • Arkin AP, Cottingham RW, Henry CS, Harris NL, Stevens RL, Maslov S, et al. KBase: The United States Department of Energy Systems Biology Knowledgebase. Nature Biotechnology. 2018;36: 566. doi: 10.1038/nbt.4163