Generated October 29, 2023
from biokbase.narrative.jobs.appmanager import AppManager
AppManager().run_app_batch(
    [{
        "app_id": "kb_uploadmethods/import_fasta_as_assembly_from_staging",
        "tag": "release",
        "version": "5b9346463df88a422ff5d4f4cba421679f63c73f",
        "params": [{
            "staging_file_subdir_path": "RC80_final_assembly_40X.fasta",
            "assembly_name": "RC80_final_assembly_40X.fasta_assembly"
        }],
        "shared_params": {
            "type": "draft isolate",
            "min_contig_length": 500
        }
    }],
    cell_id="b0e48bc2-47c8-4edd-a859-3149c2668304",
    run_id="b57ea939-337c-4771-8467-8659466cb037"
)
# Welcome to the Narrative
from IPython.display import IFrame
IFrame("https://www.kbase.us/narrative-welcome-cell/", width="100%", height="300px")
Out[1]:
v1 - KBaseTrees.Tree-1.0
The viewer for the data in this Cell is available at the original Narrative here: https://narrative.kbase.us/narrative/158078
v1 - KBaseSearch.GenomeSet-2.1
The viewer for the data in this Cell is available at the original Narrative here: https://narrative.kbase.us/narrative/158078
v1 - KBaseGenomes.Pangenome-4.2
The viewer for the data in this Cell is available at the original Narrative here: https://narrative.kbase.us/narrative/158078
Create a Pangenome object by performing OrthoMCL orthologous groups construction on a set of Genomes.
This app completed without errors in 5h 0m 43s.
Objects
Created Object Name Type Description
RC80_pangenome_6 Pangenome Pangenome object
Summary
Input genomes: 7 Output orthologs: 13847
Add one or more Genomes to a KBase SpeciesTree.
This app completed without errors in 3m 32s.
Files
These are only available in the live Narrative: https://narrative.kbase.us/narrative/158078
  • RC80_tree_6.newick
  • RC80_tree_6-labels.newick
  • RC80_tree_6.png
  • RC80_tree_6.pdf
Allows users to compute fast whole-genome Average Nucleotide Identity (ANI) estimation.
This app completed without errors in 53s.
Links
v1 - KBaseGenomes.Pangenome-4.2
The viewer for the data in this Cell is available at the original Narrative here: https://narrative.kbase.us/narrative/158078
Add one or more Genomes to a KBase SpeciesTree.
This app completed without errors in 3m 36s.
Files
These are only available in the live Narrative: https://narrative.kbase.us/narrative/158078
  • RC80_tree.newick
  • RC80_tree-labels.newick
  • RC80_tree.png
  • RC80_tree.pdf
Compare isofunctional and homologous gene families for all genomes in a Pangenome.
This app completed without errors in 1m 31s.
Objects
Created Object Name Type Description
RC80_genome_comparison GenomeComparison GenomeComparison
Summary
GenomeComparison saved to rizzogab:narrative_1695998834800/RC80_genome_comparison
Search for matches to dbCAN HMMs of CAZy carbohydrate active enzyme families using HMMER 3
This app is still in progress.
No output found.
Output from Compare Genomes from Pangenome
The viewer for the output created by this App is available at the original Narrative here: https://narrative.kbase.us/narrative/158078
Obtain objective taxonomic assignments for bacterial and archaeal genomes based on the Genome Taxonomy Database (GTDB) ver R06-RS202
This app completed without errors in 58m 55s.
Objects
Created Object Name Type Description
RC_80_PROKKAgenome Genome Taxonomy and taxon_assignment updated with GTDB
Links
Annotate Assembly and Re-annotate Genomes with Prokka annotation pipeline.
This app completed without errors in 4m 43s.
Objects
Created Object Name Type Description
RC_80_PROKKAgenome Genome Annotated Genome
Summary
Annotated Genome saved to: rizzogab:narrative_1695998834800/RC_80_PROKKAgenome Number of genes predicted: 6917 Number of protein coding genes: 6797 Number of genes with non-hypothetical function: 3608 Number of genes with EC-number: 1442 Number of genes with Seed Subsystem Ontology: 0 Average protein length: 244 aa.
Construct a draft metabolic model based on an annotated genome.This app is now obsolete, replaced by the new ModelSEED2 app: MS2 - Build Prokaryotic Metabolic Models.
This app completed without errors in 1m 36s.
Objects
Created Object Name Type Description
RC80_metabolic_model FBAModel FBAModel-15 RC80_metabolic_model
RC80_metabolic_model.gf.1 FBA FBA-13 RC80_metabolic_model.gf.1
Report
Summary
94026/Glc.O2.atp media.
Output from Build Metabolic Model
The viewer for the output created by this App is available at the original Narrative here: https://narrative.kbase.us/narrative/158078
v2 - KBaseGenomes.Genome-11.1
The viewer for the data in this Cell is available at the original Narrative here: https://narrative.kbase.us/narrative/158078
v1 - KBaseGenomes.Genome-11.1
The viewer for the data in this Cell is available at the original Narrative here: https://narrative.kbase.us/narrative/158078
v1 - KBaseGenomes.Genome-11.1
The viewer for the data in this Cell is available at the original Narrative here: https://narrative.kbase.us/narrative/158078
Output from Annotate Assembly and Re-annotate Genomes with Prokka - v1.14.5
The viewer for the output created by this App is available at the original Narrative here: https://narrative.kbase.us/narrative/158078
Annotate or re-annotate genome/assembly using RASTtk (Rapid Annotations using Subsystems Technology toolkit).
This app completed without errors in 7m 54s.
Objects
Created Object Name Type Description
RC_80_StaphRASTgenome Genome RAST re-annotated genome
Summary
The RAST algorithm was applied to annotating a genome sequence comprised of 1 contigs containing 5977337 nucleotides. No initial gene calls were provided. Standard features were called using: glimmer3; prodigal. A scan was conducted for the following additional feature types: rRNA; tRNA; selenoproteins; pyrrolysoproteins; repeat regions; crispr. The genome features were functionally annotated using the following algorithm(s): Kmers V2; Kmers V1; protein similarity. In addition to the remaining original 0 coding features and 0 non-coding features, 7858 new features were called, of which 502 are non-coding. Output genome has the following feature types: Coding gene 7356 Non-coding crispr_array 5 Non-coding crispr_repeat 139 Non-coding crispr_spacer 134 Non-coding repeat 105 Non-coding rna 119 The number of distinct functions can exceed the number of genes because some genes have multiple functions.
Links
Run QUAST (QUality ASsessment Tool) on a set of Assemblies to assess their quality.
This app completed without errors in 55s.
Summary
All statistics are based on contigs of size >= 500 bp, unless otherwise noted (e.g., "# contigs (>= 0 bp)" and "Total length (>= 0 bp)" include all contigs). Assembly RC80_final_assembly_40X.fasta_assembly # contigs (>= 0 bp) 1 # contigs (>= 1000 bp) 1 # contigs (>= 10000 bp) 1 # contigs (>= 100000 bp) 1 # contigs (>= 1000000 bp) 1 Total length (>= 0 bp) 5977337 Total length (>= 1000 bp) 5977337 Total length (>= 10000 bp) 5977337 Total length (>= 100000 bp) 5977337 Total length (>= 1000000 bp) 5977337 # contigs 1 Largest contig 5977337 Total length 5977337 GC (%) 46.54 N50 5977337 N75 5977337 L50 1 L75 1 # N's per 100 kbp 0.00 # predicted genes (unique) 4381 # predicted genes (>= 0 bp) 4418 + 0 part # predicted genes (>= 300 bp) 4122 + 0 part # predicted genes (>= 1500 bp) 782 + 0 part # predicted genes (>= 3000 bp) 132 + 0 part
Links
Runs the CheckM lineage workflow to assess the genome quality of isolates, single cells, or genome bins from metagenome assemblies through comparison to an existing database of genomes.
This app completed without errors in 4m 38s.
Links
Files
These are only available in the live Narrative: https://narrative.kbase.us/narrative/158078
  • CheckM_summary_table.tsv.zip - TSV Summary Table from CheckM
  • full_output.zip - Full output of CheckM
  • plots.zip - Output plots from CheckM
v1 - KBaseGenomeAnnotations.Assembly-5.1
The viewer for the data in this Cell is available at the original Narrative here: https://narrative.kbase.us/narrative/158078

Apps

  1. Annotate Assembly and Re-annotate Genomes with Prokka - v1.14.5
    • Seemann T. Prokka: rapid prokaryotic genome annotation. Bioinformatics. 2014;30: 2068 2069. doi:10.1093/bioinformatics/btu153
  2. Annotate Genome/Assembly with RASTtk - v1.073
    • [1] Aziz RK, Bartels D, Best AA, DeJongh M, Disz T, Edwards RA, et al. The RAST Server: Rapid Annotations using Subsystems Technology. BMC Genomics. 2008;9: 75. doi:10.1186/1471-2164-9-75
    • [2] Overbeek R, Olson R, Pusch GD, Olsen GJ, Davis JJ, Disz T, et al. The SEED and the Rapid Annotation of microbial genomes using Subsystems Technology (RAST). Nucleic Acids Res. 2014;42: D206 D214. doi:10.1093/nar/gkt1226
    • [3] Brettin T, Davis JJ, Disz T, Edwards RA, Gerdes S, Olsen GJ, et al. RASTtk: A modular and extensible implementation of the RAST algorithm for building custom annotation pipelines and annotating batches of genomes. Sci Rep. 2015;5. doi:10.1038/srep08365
    • [4] Kent WJ. BLAT The BLAST-Like Alignment Tool. Genome Res. 2002;12: 656 664. doi:10.1101/gr.229202
    • [5] Altschul SF, Madden TL, Schaffer AA, Zhang J, Zhang Z, Miller W, Lipman DJ. Gapped BLAST and PSI-BLAST: a new generation of protein database search programs. Nucleic Acids Res. 1997;25: 3389-3402. doi:10.1093/nar/25.17.3389
    • [6] Lowe TM, Eddy SR. tRNAscan-SE: a program for improved detection of transfer RNA genes in genomic sequence. Nucleic Acids Res. 1997;25: 955 964.
    • [7] Cobucci-Ponzano B, Rossi M, Moracci M. Translational recoding in archaea. Extremophiles. 2012;16: 793 803. doi:10.1007/s00792-012-0482-8
    • [8] Meyer F, Overbeek R, Rodriguez A. FIGfams: yet another set of protein families. Nucleic Acids Res. 2009;37 6643-54. doi:10.1093/nar/gkp698.
    • [9] van Belkum A, Sluijuter M, de Groot R, Verbrugh H, Hermans PW. Novel BOX repeat PCR assay for high-resolution typing of Streptococcus pneumoniae strains. J Clin Microbiol. 1996;34: 1176 1179.
    • [10] Croucher NJ, Vernikos GS, Parkhill J, Bentley SD. Identification, variation and transcription of pneumococcal repeat sequences. BMC Genomics. 2011;12: 120. doi:10.1186/1471-2164-12-120
    • [11] Hyatt D, Chen G-L, Locascio PF, Land ML, Larimer FW, Hauser LJ. Prodigal: prokaryotic gene recognition and translation initiation site identification. BMC Bioinformatics. 2010;11: 119. doi:10.1186/1471-2105-11-119
    • [12] Delcher AL, Bratke KA, Powers EC, Salzberg SL. Identifying bacterial genes and endosymbiont DNA with Glimmer. Bioinformatics. 2007;23: 673 679. doi:10.1093/bioinformatics/btm009
    • [13] Akhter S, Aziz RK, Edwards RA. PhiSpy: a novel algorithm for finding prophages in bacterial genomes that combines similarity- and composition-based strategies. Nucleic Acids Res. 2012;40: e126. doi:10.1093/nar/gks406
  3. Assess Genome Quality with CheckM - v1.0.18
    • Parks DH, Imelfort M, Skennerton CT, Hugenholtz P, Tyson GW. CheckM: assessing the quality of microbial genomes recovered from isolates, single cells, and metagenomes. Genome Res. 2015;25: 1043 1055. doi:10.1101/gr.186072.114
    • CheckM source:
    • Additional info:
  4. Assess Quality of Assemblies with QUAST - v4.4
    • [1] Gurevich A, Saveliev V, Vyahhi N, Tesler G. QUAST: quality assessment tool for genome assemblies. Bioinformatics. 2013;29: 1072 1075. doi:10.1093/bioinformatics/btt086
    • [2] Mikheenko A, Valin G, Prjibelski A, Saveliev V, Gurevich A. Icarus: visualizer for de novo assembly evaluation. Bioinformatics. 2016;32: 3321 3323. doi:10.1093/bioinformatics/btw379
  5. Build Metabolic Model
    • [1] Henry CS, DeJongh M, Best AA, Frybarger PM, Linsay B, Stevens RL. High-throughput generation, optimization and analysis of genome-scale metabolic models. Nat Biotechnol. 2010;28: 977 982. doi:10.1038/nbt.1672
    • [2] Overbeek R, Olson R, Pusch GD, Olsen GJ, Davis JJ, Disz T, et al. The SEED and the Rapid Annotation of microbial genomes using Subsystems Technology (RAST). Nucleic Acids Res. 2014;42: D206 D214. doi:10.1093/nar/gkt1226
    • [3] Latendresse M. Efficiently gap-filling reaction networks. BMC Bioinformatics. 2014;15: 225. doi:10.1186/1471-2105-15-225
    • [4] Dreyfuss JM, Zucker JD, Hood HM, Ocasio LR, Sachs MS, Galagan JE. Reconstruction and Validation of a Genome-Scale Metabolic Model for the Filamentous Fungus Neurospora crassa Using FARM. PLOS Computational Biology. 2013;9: e1003126. doi:10.1371/journal.pcbi.1003126
    • [5] Mahadevan R, Schilling CH. The effects of alternate optimal solutions in constraint-based genome-scale metabolic models. Metab Eng. 2003;5: 264 276.
  6. Build Pangenome with OrthoMCL - v2.0
    • Li L, Stoeckert CJ, Roos DS. OrthoMCL: Identification of Ortholog Groups for Eukaryotic Genomes. Genome Res. 2003;13: 2178 2189. doi:10.1101/gr.1224503
  7. Classify Microbes with GTDB-Tk - v2.3.2
    • Pierre-Alain Chaumeil, Aaron J Mussig, Philip Hugenholtz, Donovan H Parks. GTDB-Tk v2: memory friendly classification with the genome taxonomy database. Bioinformatics, Volume 38, Issue 23, 1 December 2022, Pages 5315 5316. DOI: https://doi.org/10.1093/bioinformatics/btac672
    • Pierre-Alain Chaumeil, Aaron J Mussig, Philip Hugenholtz, Donovan H Parks, GTDB-Tk: a toolkit to classify genomes with the Genome Taxonomy Database, Bioinformatics, Volume 36, Issue 6, 15 March 2020, Pages 1925 1927. DOI: https://doi.org/10.1093/bioinformatics/btz848
    • Donovan H Parks, Maria Chuvochina, Christian Rinke, Aaron J Mussig, Pierre-Alain Chaumeil, Philip Hugenholtz. GTDB: an ongoing census of bacterial and archaeal diversity through a phylogenetically consistent, rank normalized and complete genome-based taxonomy. Nucleic Acids Research, Volume 50, Issue D1, 7 January 2022, Pages D785 D794. DOI: https://doi.org/10.1093/nar/gkab776
    • Parks, D., Chuvochina, M., Waite, D. et al. A standardized bacterial taxonomy based on genome phylogeny substantially revises the tree of life. Nat Biotechnol 36, 996 1004 (2018). DOI: https://doi.org/10.1038/nbt.4229
    • Parks DH, Chuvochina M, Chaumeil PA, Rinke C, Mussig AJ, Hugenholtz P. A complete domain-to-species taxonomy for Bacteria and Archaea. Nat Biotechnol. 2020;10.1038/s41587-020-0501-8. DOI:10.1038/s41587-020-0501-8
    • Rinke C, Chuvochina M, Mussig AJ, Chaumeil PA, Dav n AA, Waite DW, Whitman WB, Parks DH, and Hugenholtz P. A standardized archaeal taxonomy for the Genome Taxonomy Database. Nat Microbiol. 2021 Jul;6(7):946-959. DOI:10.1038/s41564-021-00918-8
    • Chivian D, Jungbluth SP, Dehal PS, Wood-Charlson EM, Canon RS, Allen BH, Clark MM, Gu T, Land ML, Price GA, Riehl WJ, Sneddon MW, Sutormin R, Zhang Q, Cottingham RW, Henry CS, Arkin AP. Metagenome-assembled genome extraction and analysis from microbiomes using KBase. Nat Protoc. 2023 Jan;18(1):208-238. doi: 10.1038/s41596-022-00747-x
    • Matsen FA, Kodner RB, Armbrust EV. pplacer: linear time maximum-likelihood and Bayesian phylogenetic placement of sequences onto a fixed reference tree. BMC Bioinformatics. 2010;11:538. Published 2010 Oct 30. doi:10.1186/1471-2105-11-538
    • Jain C, Rodriguez-R LM, Phillippy AM, Konstantinidis KT, Aluru S. High throughput ANI analysis of 90K prokaryotic genomes reveals clear species boundaries. Nat Commun. 2018;9(1):5114. Published 2018 Nov 30. DOI:10.1038/s41467-018-07641-9
    • Hyatt D, Chen GL, Locascio PF, Land ML, Larimer FW, Hauser LJ. Prodigal: prokaryotic gene recognition and translation initiation site identification. BMC Bioinformatics. 2010;11:119. Published 2010 Mar 8. DOI:10.1186/1471-2105-11-119
    • Price MN, Dehal PS, Arkin AP. FastTree 2--approximately maximum-likelihood trees for large alignments. PLoS One. 2010;5(3):e9490. Published 2010 Mar 10. DOI:10.1371/journal.pone.0009490 link: https://www.ncbi.nlm.nih.gov/pmc/articles/PMC2835736/
    • Eddy SR. Accelerated Profile HMM Searches. PLoS Comput Biol. 2011;7(10):e1002195. DOI:10.1371/journal.pcbi.1002195
    • Ondov BD, Treangen TJ, Melsted P, Mallonee AB, Bergman NH, Koren S, Phillippy AM. Mash: fast genome and metagenome distance estimation using MinHash. Genome Biol. 2016 Jun 20;17(1):132. DOI: 10.1186/s13059-016-0997-x
  8. Compare Genomes from Pangenome
    • Medini D, Donati C, Tettelin H, Masignani V, Rappuoli R. The microbial pan-genome. Curr Opin Genet Dev. 2005;15: 589 594. doi:10.1016/j.gde.2005.09.006
    • Tettelin H, Masignani V, Cieslewicz MJ, Donati C, Medini D, Ward NL, et al. Genome analysis of multiple pathogenic isolates of Streptococcus agalactiae: Implications for the microbial pan-genome. Proc Natl Acad Sci U S A. 2005;102: 13950 13955. doi:10.1073/pnas.0506758102
    • Rasko DA, Rosovitz MJ, Myers GSA, Mongodin EF, Fricke WF, Gajer P, et al. The Pangenome Structure of Escherichia coli: Comparative Genomic Analysis of E. coli Commensal and Pathogenic Isolates. J Bacteriol. 2008;190: 6881 6893. doi:10.1128/JB.00619-08
  9. Compute ANI with FastANI
    • [1] Jain C, Rodriguez-R LM, Phillippy AM, Konstantinidis KT, Aluru S. High-throughput ANI Analysis of 90K Prokaryotic Genomes Reveals Clear Species Boundaries. 2017; doi:10.1101/225342
    • [2] Goris J, Konstantinidis KT, Klappenbach JA, Coenye T, Vandamme P, Tiedje JM. DNA-DNA hybridization values and their relationship to whole-genome sequence similarities. Int J Syst Evol Microbiol. 2007;57: 81 91. doi:10.1099/ijs.0.64483-0
    • FastANI module and source code:
  10. Insert Genome Into SpeciesTree - v2.2.0
    • Price MN, Dehal PS, Arkin AP. FastTree 2 Approximately Maximum-Likelihood Trees for Large Alignments. PLoS One. 2010;5. doi:10.1371/journal.pone.0009490
  11. Search with dbCAN2 HMMs of CAZy families - v10
    • Zhang H, Yohe T, Huang L, Entwistle S, Wu P, Yang Z, Busk PK, Xu Y, Yin Y. dbCAN2: a meta server for automated carbohydrate-active enzyme annotation. Nucleic Acids Res. 2018 Jul 2;46(W1):W95-W101. doi: 10.1093/nar/gky418
    • Eddy SR. Accelerated Profile HMM Searches. PLOS Computational Biology. 2011;7: e1002195. doi:10.1371/journal.pcbi.1002195
    • HMMER v3.3.2 source:
    • Chivian D, Jungbluth SP, Dehal PS, Wood-Charlson EM, Canon RS, Allen BH, Clark MM, Gu T, Land ML, Price GA, Riehl WJ, Sneddon MW, Sutormin R, Zhang Q, Cottingham RW, Henry CS, Arkin AP. Metagenome-assembled genome extraction and analysis from microbiomes using KBase. Nat Protoc. 2023 Jan;18(1):208-238. doi: 10.1038/s41596-022-00747-x