Generated August 23, 2022
# Welcome to the Narrative
from IPython.display import IFrame
IFrame("https://www.kbase.us/narrative-welcome-cell/", width="100%", height="300px")
Out[1]:
from biokbase.narrative.jobs.appmanager import AppManager
AppManager().run_app_bulk(
    [{
        "app_id": "kb_uploadmethods/import_fastq_noninterleaved_as_reads_from_staging",
        "tag": "release",
        "version": "31e93066beb421a51b9c8e44b1201aa93aea0b4e",
        "params": [{
            "fastq_fwd_staging_file_name": "117_S48_R1_001.fastq.gz",
            "fastq_rev_staging_file_name": "117_S48_R2_001.fastq.gz",
            "name": "Unknown_117_paired_reads",
            "sequencing_tech": "Illumina",
            "single_genome": 1,
            "read_orientation_outward": 0,
            "insert_size_std_dev": None,
            "insert_size_mean": None
        }]
    }],
    cell_id="b27b9803-b98c-4de6-9eb7-bafea4d313c2",
    run_id="49947758-6fa6-46af-aea4-dea736eb585b"
)
A quality control application for high throughput sequence data.
This app completed without errors in 3m 32s.
Links
Files
These are only available in the live Narrative: https://narrative.kbase.us/narrative/109725
  • Unknown_117_paired_reads_109725_2_1.fwd_fastqc.zip - Zip file generated by fastqc that contains original images seen in the report
  • Unknown_117_paired_reads_109725_2_1.rev_fastqc.zip - Zip file generated by fastqc that contains original images seen in the report
Trim paired- or single-end Illumina reads with Trimmomatic.
This app completed without errors in 8m 24s.
Objects
Created Object Name Type Description
Unknown_117_paired_trimmed_paired PairedEndLibrary Trimmed Reads
Unknown_117_paired_trimmed_unpaired_fwd SingleEndLibrary Trimmed Unpaired Forward Reads
Unknown_117_paired_trimmed_unpaired_rev SingleEndLibrary Trimmed Unpaired Reverse Reads
A quality control application for high throughput sequence data.
This app completed without errors in 2m 55s.
Links
Files
These are only available in the live Narrative: https://narrative.kbase.us/narrative/109725
  • Unknown_117_paired_trimmed_paired_109725_5_1.fwd_fastqc.zip - Zip file generated by fastqc that contains original images seen in the report
  • Unknown_117_paired_trimmed_paired_109725_5_1.rev_fastqc.zip - Zip file generated by fastqc that contains original images seen in the report
Assemble reads using the SPAdes assembler.
This app completed without errors in 20m 55s.
Objects
Created Object Name Type Description
Unknown_117_trimmed_SPAdes.Assembly Assembly Assembled contigs
Summary
Assembly saved to: annamcloon:narrative_1645468966879/Unknown_117_trimmed_SPAdes.Assembly Assembled into 30 contigs. Avg Length: 160332.03333333333 bp. Contig Length Distribution (# of contigs -- min to max basepairs): 23 -- 701.0 to 159401.6 bp 4 -- 159401.6 to 318102.2 bp 1 -- 318102.2 to 476802.80000000005 bp 1 -- 476802.80000000005 to 635503.4 bp 0 -- 635503.4 to 794204.0 bp 0 -- 794204.0 to 952904.6000000001 bp 0 -- 952904.6000000001 to 1111605.2 bp 0 -- 1111605.2 to 1270305.8 bp 0 -- 1270305.8 to 1429006.4000000001 bp 1 -- 1429006.4000000001 to 1587707.0 bp
Links
Annotate Assembly and Re-annotate Genomes with Prokka annotation pipeline.
This app completed without errors in 4m 32s.
Objects
Created Object Name Type Description
Unknown_117_SPAdes_Prokka_annotation Genome Annotated Genome
Summary
Annotated Genome saved to: annamcloon:narrative_1645468966879/Unknown_117_SPAdes_Prokka_annotation Number of genes predicted: 4670 Number of protein coding genes: 4627 Number of genes with non-hypothetical function: 2651 Number of genes with EC-number: 1033 Number of genes with Seed Subsystem Ontology: 0 Average protein length: 294 aa.
Examine the general functional distribution or specific functional gene families for a GenomeSet.
This app produced errors in 7s.
No output found.
Allows users to create a GenomeSet object.
This app completed without errors in 18s.
Objects
Created Object Name Type Description
Unkn_117_RAST_genomeset GenomeSet KButil_Build_GenomeSet
Summary
genomes in output set Unkn_117_RAST_genomeset: 1
Compute gene ontology (GO) term enrichment for genomic features.
This app is new, and hasn't been started.
No output found.
Allows users to create a GenomeSet object.
This app completed without errors in 25s.
Objects
Created Object Name Type Description
Unknown_117_genomeset GenomeSet KButil_Build_GenomeSet
Summary
genomes in output set Unknown_117_genomeset: 1
Examine the general functional distribution or specific functional gene families for a given FeatureSet.
This app is new, and hasn't been started.
No output found.
Annotate your genome(s) with DRAM. Annotations will then be distilled to create an interactive functional summary per genome.
This app completed without errors in 38m 4s.
Summary
Here are the results from your DRAM run.
Links
Files
These are only available in the live Narrative: https://narrative.kbase.us/narrative/109725
  • annotations.tsv - DRAM annotations in a tab separate table format
  • genes.faa - Genes as amino acids predicted by DRAM with brief annotations
  • product.tsv - DRAM product in tabular format
  • metabolism_summary.xlsx - DRAM metabolism summary tables
  • genome_stats.tsv - DRAM genome statistics table
Runs the CheckM lineage workflow to assess the genome quality of isolates, single cells, or genome bins from metagenome assemblies through comparison to an existing database of genomes.
This app completed without errors in 8m 10s.
Links
Files
These are only available in the live Narrative: https://narrative.kbase.us/narrative/109725
  • CheckM_summary_table.tsv.zip - TSV Summary Table from CheckM
  • full_output.zip - Full output of CheckM
  • plots.zip - Output plots from CheckM
Annotate your genome(s) with DRAM. Annotations will then be distilled to create an interactive functional summary per genome.
This app completed without errors in 38m 34s.
Summary
Here are the results from your DRAM run.
Links
Files
These are only available in the live Narrative: https://narrative.kbase.us/narrative/109725
  • annotations.tsv - DRAM annotations in a tab separate table format
  • genes.faa - Genes as amino acids predicted by DRAM with brief annotations
  • product.tsv - DRAM product in tabular format
  • metabolism_summary.xlsx - DRAM metabolism summary tables
  • genome_stats.tsv - DRAM genome statistics table
Examine the general functional distribution or specific functional gene families for a GenomeSet.
This app produced errors in 10s.
No output found.
Output from Annotate Assembly and Re-annotate Genomes with Prokka - v1.14.5
The viewer for the output created by this App is available at the original Narrative here: https://narrative.kbase.us/narrative/109725
Conduct a side-by-side comparison of various metabolic annotations mapped into a genome
This app completed without errors in 1m 37s.
Links
Annotate or re-annotate genome/assembly using RASTtk (Rapid Annotations using Subsystems Technology toolkit).
This app completed without errors in 11m 29s.
Objects
Created Object Name Type Description
Unknown_117_SPAdes_RAST_annotation Genome RAST re-annotated genome
Summary
The RAST algorithm was applied to annotating a genome sequence comprised of 30 contigs containing 4809961 nucleotides. No initial gene calls were provided. Standard gene features were called using: prodigal; glimmer3. A scan was conducted for the following additional feature types: rRNA; tRNA; selenoproteins; pyrrolysoproteins; repeat regions; crispr. The genome features were functionally annotated using the following algorithm(s): Kmers V2; Kmers V1; protein similarity. In addition to the remaining original 0 coding features and 0 non-coding features, 4889 new features were called, of which 80 are non-coding. Output genome has the following feature types: Coding gene 4809 Non-coding prophage 3 Non-coding repeat 37 Non-coding rna 40 Overall, the genes have 2813 distinct functions The genes include 2201 genes with a SEED annotation ontology across 1324 distinct SEED functions. The number of distinct functions can exceed the number of genes because some genes have multiple functions.
Links
v1 - KBaseGenomes.Genome-11.0
The viewer for the data in this Cell is available at the original Narrative here: https://narrative.kbase.us/narrative/109725
Annotate or re-annotate genome/assembly using RASTtk (Rapid Annotations using Subsystems Technology toolkit).
This app completed without errors in 11m 46s.
Objects
Created Object Name Type Description
Unknown_117_RASTandProkka Genome RAST re-annotated genome
Summary
The RAST algorithm was applied to annotating an existing genome: Unknown. The sequence for this genome is comprised of 30 contigs containing 4809961 nucleotides. The input genome has 4627 existing coding features. and 43 existing non-coding features. Input genome has the following feature types: Non-coding gene 43 gene 4627 The existing gene features were cleared due to selection of gene calling with Prodigal. The existing gene features were cleared due to selection of gene calling with Glimmer3. Standard gene features were called using: prodigal; glimmer3. A scan was conducted for the following additional feature types: rRNA; tRNA; selenoproteins; pyrrolysoproteins; repeat regions; crispr. The genome features were functionally annotated using the following algorithm(s): Kmers V2; Kmers V1; protein similarity. In addition to the remaining original 0 coding features and 43 non-coding features, 4891 new features were called, of which 82 are non-coding. Output genome has the following feature types: Coding gene 4809 Non-coding gene 43 Non-coding prophage 3 Non-coding repeat 37 Non-coding rna 42 Overall, the genes have 2812 distinct functions The genes include 2201 genes with a SEED annotation ontology across 1323 distinct SEED functions. The number of distinct functions can exceed the number of genes because some genes have multiple functions.
Links
Obtain objective taxonomic assignments for bacterial and archaeal genomes based on the Genome Taxonomy Database (GTDB) ver R06-RS202
This app completed without errors in 1h 19m 25s.
Links
Align sequencing reads to long reference prokaryotic genome sequences using Bowtie2.
This app produced errors.
No output found.

Apps

  1. Align Reads using Bowtie2 - v2.3.2
    • Langmead B, Salzberg SL. Fast gapped-read alignment with Bowtie 2. Nat Methods. 2012;9: 357 359. doi:10.1038/nmeth.1923
    • Langmead B, Trapnell C, Pop M, Salzberg SL. Ultrafast and memory-efficient alignment of short DNA sequences to the human genome. Genome Biol. 2009;10: R25. doi:10.1186/gb-2009-10-3-r25
  2. Annotate and Distill Genomes with DRAM
    • DRAM source code
    • DRAM documentation
    • DRAM publication
  3. Annotate Assembly and Re-annotate Genomes with Prokka - v1.14.5
    • Seemann T. Prokka: rapid prokaryotic genome annotation. Bioinformatics. 2014;30: 2068 2069. doi:10.1093/bioinformatics/btu153
  4. Annotate Genome/Assembly with RASTtk - v1.073
    • [1] Aziz RK, Bartels D, Best AA, DeJongh M, Disz T, Edwards RA, et al. The RAST Server: Rapid Annotations using Subsystems Technology. BMC Genomics. 2008;9: 75. doi:10.1186/1471-2164-9-75
    • [2] Overbeek R, Olson R, Pusch GD, Olsen GJ, Davis JJ, Disz T, et al. The SEED and the Rapid Annotation of microbial genomes using Subsystems Technology (RAST). Nucleic Acids Res. 2014;42: D206 D214. doi:10.1093/nar/gkt1226
    • [3] Brettin T, Davis JJ, Disz T, Edwards RA, Gerdes S, Olsen GJ, et al. RASTtk: A modular and extensible implementation of the RAST algorithm for building custom annotation pipelines and annotating batches of genomes. Sci Rep. 2015;5. doi:10.1038/srep08365
    • [4] Kent WJ. BLAT The BLAST-Like Alignment Tool. Genome Res. 2002;12: 656 664. doi:10.1101/gr.229202
    • [5] Altschul SF, Madden TL, Schaffer AA, Zhang J, Zhang Z, Miller W, Lipman DJ. Gapped BLAST and PSI-BLAST: a new generation of protein database search programs. Nucleic Acids Res. 1997;25: 3389-3402. doi:10.1093/nar/25.17.3389
    • [6] Lowe TM, Eddy SR. tRNAscan-SE: a program for improved detection of transfer RNA genes in genomic sequence. Nucleic Acids Res. 1997;25: 955 964.
    • [7] Cobucci-Ponzano B, Rossi M, Moracci M. Translational recoding in archaea. Extremophiles. 2012;16: 793 803. doi:10.1007/s00792-012-0482-8
    • [8] Meyer F, Overbeek R, Rodriguez A. FIGfams: yet another set of protein families. Nucleic Acids Res. 2009;37 6643-54. doi:10.1093/nar/gkp698.
    • [9] van Belkum A, Sluijuter M, de Groot R, Verbrugh H, Hermans PW. Novel BOX repeat PCR assay for high-resolution typing of Streptococcus pneumoniae strains. J Clin Microbiol. 1996;34: 1176 1179.
    • [10] Croucher NJ, Vernikos GS, Parkhill J, Bentley SD. Identification, variation and transcription of pneumococcal repeat sequences. BMC Genomics. 2011;12: 120. doi:10.1186/1471-2164-12-120
    • [11] Hyatt D, Chen G-L, Locascio PF, Land ML, Larimer FW, Hauser LJ. Prodigal: prokaryotic gene recognition and translation initiation site identification. BMC Bioinformatics. 2010;11: 119. doi:10.1186/1471-2105-11-119
    • [12] Delcher AL, Bratke KA, Powers EC, Salzberg SL. Identifying bacterial genes and endosymbiont DNA with Glimmer. Bioinformatics. 2007;23: 673 679. doi:10.1093/bioinformatics/btm009
    • [13] Akhter S, Aziz RK, Edwards RA. PhiSpy: a novel algorithm for finding prophages in bacterial genomes that combines similarity- and composition-based strategies. Nucleic Acids Res. 2012;40: e126. doi:10.1093/nar/gks406
  5. Assemble Reads with SPAdes - v3.15.3
    • Bankevich A, Nurk S, Antipov D, Gurevich AA, Dvorkin M, Kulikov AS, et al. SPAdes: A New Genome Assembly Algorithm and Its Applications to Single-Cell Sequencing. Journal of Computational Biology. 2012;19: 455-477. doi: 10.1089/cmb.2012.0021
    • Prjibelski A, Antipov D, Meleshko D, Lapidus A, Korobeynikov A. Using SPAdes De Novo Assembler. Curr Protoc Bioinformatics. 2020 Jun;70(1):e102. doi: 10.1002/cpbi.102.
  6. Assess Genome Quality with CheckM - v1.0.18
    • Parks DH, Imelfort M, Skennerton CT, Hugenholtz P, Tyson GW. CheckM: assessing the quality of microbial genomes recovered from isolates, single cells, and metagenomes. Genome Res. 2015;25: 1043 1055. doi:10.1101/gr.186072.114
    • CheckM source:
    • Additional info:
  7. Assess Read Quality with FastQC - v0.11.9
    • FastQC source: Bioinformatics Group at the Babraham Institute, UK.
  8. Build GenomeSet - v1.7.6
    • Arkin AP, Cottingham RW, Henry CS, Harris NL, Stevens RL, Maslov S, et al. KBase: The United States Department of Energy Systems Biology Knowledgebase. Nature Biotechnology. 2018;36: 566. doi: 10.1038/nbt.4163
  9. Classify Microbes with GTDB-Tk - v1.7.0
    • Pierre-Alain Chaumeil, Aaron J Mussig, Philip Hugenholtz, Donovan H Parks, GTDB-Tk: a toolkit to classify genomes with the Genome Taxonomy Database, Bioinformatics, Volume 36, Issue 6, 15 March 2020, Pages 1925 1927. DOI: https://doi.org/10.1093/bioinformatics/btz848
    • Parks, D., Chuvochina, M., Waite, D. et al. A standardized bacterial taxonomy based on genome phylogeny substantially revises the tree of life. Nat Biotechnol 36, 996 1004 (2018). DOI: https://doi.org/10.1038/nbt.4229
    • Parks DH, Chuvochina M, Chaumeil PA, Rinke C, Mussig AJ, Hugenholtz P. A complete domain-to-species taxonomy for Bacteria and Archaea. Nat Biotechnol. 2020;10.1038/s41587-020-0501-8. DOI:10.1038/s41587-020-0501-8
    • Rinke C, Chuvochina M, Mussig AJ, Chaumeil PA, Dav n AA, Waite DW, Whitman WB, Parks DH, and Hugenholtz P. A standardized archaeal taxonomy for the Genome Taxonomy Database. Nat Microbiol. 2021 Jul;6(7):946-959. DOI:10.1038/s41564-021-00918-8
    • Matsen FA, Kodner RB, Armbrust EV. pplacer: linear time maximum-likelihood and Bayesian phylogenetic placement of sequences onto a fixed reference tree. BMC Bioinformatics. 2010;11:538. Published 2010 Oct 30. doi:10.1186/1471-2105-11-538
    • Jain C, Rodriguez-R LM, Phillippy AM, Konstantinidis KT, Aluru S. High throughput ANI analysis of 90K prokaryotic genomes reveals clear species boundaries. Nat Commun. 2018;9(1):5114. Published 2018 Nov 30. DOI:10.1038/s41467-018-07641-9
    • Hyatt D, Chen GL, Locascio PF, Land ML, Larimer FW, Hauser LJ. Prodigal: prokaryotic gene recognition and translation initiation site identification. BMC Bioinformatics. 2010;11:119. Published 2010 Mar 8. DOI:10.1186/1471-2105-11-119
    • Price MN, Dehal PS, Arkin AP. FastTree 2--approximately maximum-likelihood trees for large alignments. PLoS One. 2010;5(3):e9490. Published 2010 Mar 10. DOI:10.1371/journal.pone.0009490 link: https://www.ncbi.nlm.nih.gov/pmc/articles/PMC2835736/
    • Eddy SR. Accelerated Profile HMM Searches. PLoS Comput Biol. 2011;7(10):e1002195. DOI:10.1371/journal.pcbi.1002195
  10. Compare Metabolic Annotations
    no citations
  11. Functional Enrichment for GO Terms - v1.0.8
    • Ashburner M, Ball CA, Blake JA, Botstein D, Butler H, Cherry JM, et al. Gene ontology: tool for the unification of biology. The Gene Ontology Consortium. Nat Genet. 2000;25: 25 29. doi:10.1038/75556
    • The Gene Ontology Consortium. Expansion of the Gene Ontology knowledgebase and resources. Nucleic Acids Res. 2017;45: D331 D338. doi:10.1093/nar/gkw1108
  12. Trim Reads with Trimmomatic - v0.36
    • Bolger AM, Lohse M, Usadel B. Trimmomatic: a flexible trimmer for Illumina sequence data. Bioinformatics. 2014;30: 2114 2120. doi:10.1093/bioinformatics/btu170
  13. View Function Profile for FeatureSet - v1.4.0
    • Arkin AP, Cottingham RW, Henry CS, Harris NL, Stevens RL, Maslov S, et al. KBase: The United States Department of Energy Systems Biology Knowledgebase. Nature Biotechnology. 2018;36: 566. doi: 10.1038/nbt.4163
  14. View Function Profile for Genomes - v1.4.0
    • Arkin AP, Cottingham RW, Henry CS, Harris NL, Stevens RL, Maslov S, et al. KBase: The United States Department of Energy Systems Biology Knowledgebase. Nature Biotechnology. 2018;36: 566. doi: 10.1038/nbt.4163