Generated December 1, 2023
# Welcome to the Narrative
from IPython.display import IFrame
IFrame("https://www.kbase.us/narrative-welcome-cell/", width="100%", height="300px")
Out[1]:

Chromosomal DNA of bacterium isolated from fermented cider (originally named m74) was sequenced on an Illumina NovaSeq platform 2x150bp. Raw paired end reads were trimmed and processed with BBDUK v39.01. Trimmed FASTQ files were uploaded to the KBase narrative. Within KBase, the paired-end reads were assembled using SPAdes v3.15.3 (output data file m74_SPAdes.Assembly). Completeness was evaluated with CheckM v1.0.18. The assembled genome was annotated using RAStk v1.073 (output data file m74_RAStkgenomeassembly). Comparison of the isolate genome to other available genomes was performed two ways. The InsertGenomeIntoSpeciesTree function placed the isolate in the same clade as Bacillus indicus (later renamed Metabacillus indicus). FastANI was then used to compare the average nucleotide identity of isolate (m74) to 3 available strains of Metabacillus indicus (ASM70993v2, ASM70875v2, 4-1317).

from biokbase.narrative.jobs.appmanager import AppManager
AppManager().run_app_batch(
    [{
        "app_id": "kb_uploadmethods/import_fastq_noninterleaved_as_reads_from_staging",
        "tag": "release",
        "version": "5b9346463df88a422ff5d4f4cba421679f63c73f",
        "params": [{
            "fastq_fwd_staging_file_name": "m74_ab_CP05131_R1_001.fastq",
            "fastq_rev_staging_file_name": "m74_ab_CP05131_R2_001.fastq",
            "name": "m74_rawdata"
        }],
        "shared_params": {
            "sequencing_tech": "Illumina",
            "single_genome": 1,
            "read_orientation_outward": 0,
            "insert_size_std_dev": None,
            "insert_size_mean": None
        }
    }],
    cell_id="e4aede4c-9f87-4a51-a805-5808d0f60ccf",
    run_id="649549bb-1123-46e9-87fa-578c52991086"
)
Assemble reads using the SPAdes assembler.
This app completed without errors in 10m 50s.
Objects
Created Object Name Type Description
m74_SPAdes.Assembly Assembly Assembled contigs
Summary
Assembly saved to: calopez:narrative_1700510470994/m74_SPAdes.Assembly Assembled into 34 contigs. Avg Length: 120465.14705882352 bp. Contig Length Distribution (# of contigs -- min to max basepairs): 12 -- 557.0 to 52006.3 bp 8 -- 52006.3 to 103455.6 bp 4 -- 103455.6 to 154904.90000000002 bp 2 -- 154904.90000000002 to 206354.2 bp 2 -- 206354.2 to 257803.5 bp 3 -- 257803.5 to 309252.80000000005 bp 1 -- 309252.80000000005 to 360702.10000000003 bp 0 -- 360702.10000000003 to 412151.4 bp 1 -- 412151.4 to 463600.7 bp 1 -- 463600.7 to 515050.0 bp
Links
Runs the CheckM lineage workflow to assess the genome quality of isolates, single cells, or genome bins from metagenome assemblies through comparison to an existing database of genomes.
This app completed without errors in 7m 16s.
Links
Files
These are only available in the live Narrative: https://narrative.kbase.us/narrative/163445
  • CheckM_summary_table.tsv.zip - TSV Summary Table from CheckM
  • full_output.zip - Full output of CheckM
  • plots.zip - Output plots from CheckM
v1 - KBaseGenomes.Genome-11.1
The viewer for the data in this Cell is available at the original Narrative here: https://narrative.kbase.us/narrative/163445
Annotate or re-annotate genome/assembly using RASTtk (Rapid Annotations using Subsystems Technology toolkit).
This app completed without errors in 4m 46s.
Objects
Created Object Name Type Description
m74_RAStkgenomeassembly Genome RAST re-annotated genome
Summary
The RAST algorithm was applied to annotating a genome sequence comprised of 34 contigs containing 4095815 nucleotides. No initial gene calls were provided. Standard features were called using: glimmer3; prodigal. A scan was conducted for the following additional feature types: rRNA; tRNA; selenoproteins; pyrrolysoproteins; repeat regions; crispr. The genome features were functionally annotated using the following algorithm(s): Kmers V2; Kmers V1; protein similarity. In addition to the remaining original 0 coding features and 0 non-coding features, 4536 new features were called, of which 147 are non-coding. Output genome has the following feature types: Coding gene 4389 Non-coding repeat 110 Non-coding rna 37 The number of distinct functions can exceed the number of genes because some genes have multiple functions.
Links
Add one or more Genomes to a KBase SpeciesTree.
This app completed without errors in 4m 0s.
Files
These are only available in the live Narrative: https://narrative.kbase.us/narrative/163445
  • m74_RAStk_tree.newick
  • m74_RAStk_tree-labels.newick
  • m74_RAStk_tree.png
  • m74_RAStk_tree.pdf
Allows user to add a Genome to a GenomeSet
This app completed without errors in 21s.
Objects
Created Object Name Type Description
Mindicus_m74_genomeset GenomeSet KButil_Add_Genomes_to_GenomeSet
Summary
genomes in output set Mindicus_m74_genomeset: 54
Add a user-provided GenomeSet to a KBase SpeciesTree.
This app completed without errors in 13m 11s.
Files
These are only available in the live Narrative: https://narrative.kbase.us/narrative/163445
  • Mindicus_m74_genomeset_Tree.newick
  • Mindicus_m74_genomeset_Tree-labels.newick
  • Mindicus_m74_genomeset_Tree.png
  • Mindicus_m74_genomeset_Tree.pdf
Allows users to compute fast whole-genome Average Nucleotide Identity (ANI) estimation.
This app completed without errors in 1m 22s.
Links
Add one or more Genomes to a KBase SpeciesTree.
This app completed without errors in 4m 21s.
Files
These are only available in the live Narrative: https://narrative.kbase.us/narrative/163445
  • Mindicus_genomeTree.newick
  • Mindicus_genomeTree-labels.newick
  • Mindicus_genomeTree.png
  • Mindicus_genomeTree.pdf

Apps

  1. Add Genomes to GenomeSet - v1.7.6
    • Chivian D, Jungbluth SP, Dehal PS, Wood-Charlson EM, Canon RS, Allen BH, Clark MM, Gu T, Land ML, Price GA, Riehl WJ, Sneddon MW, Sutormin R, Zhang Q, Cottingham RW, Henry CS, Arkin AP. Metagenome-assembled genome extraction and analysis from microbiomes using KBase. Nat Protoc. 2023 Jan;18(1):208-238. doi: 10.1038/s41596-022-00747-x
    • Arkin AP, Cottingham RW, Henry CS, Harris NL, Stevens RL, Maslov S, et al. KBase: The United States Department of Energy Systems Biology Knowledgebase. Nature Biotechnology. 2018;36: 566. doi: 10.1038/nbt.4163
  2. Annotate Genome/Assembly with RASTtk - v1.073
    • [1] Aziz RK, Bartels D, Best AA, DeJongh M, Disz T, Edwards RA, et al. The RAST Server: Rapid Annotations using Subsystems Technology. BMC Genomics. 2008;9: 75. doi:10.1186/1471-2164-9-75
    • [2] Overbeek R, Olson R, Pusch GD, Olsen GJ, Davis JJ, Disz T, et al. The SEED and the Rapid Annotation of microbial genomes using Subsystems Technology (RAST). Nucleic Acids Res. 2014;42: D206 D214. doi:10.1093/nar/gkt1226
    • [3] Brettin T, Davis JJ, Disz T, Edwards RA, Gerdes S, Olsen GJ, et al. RASTtk: A modular and extensible implementation of the RAST algorithm for building custom annotation pipelines and annotating batches of genomes. Sci Rep. 2015;5. doi:10.1038/srep08365
    • [4] Kent WJ. BLAT The BLAST-Like Alignment Tool. Genome Res. 2002;12: 656 664. doi:10.1101/gr.229202
    • [5] Altschul SF, Madden TL, Schaffer AA, Zhang J, Zhang Z, Miller W, Lipman DJ. Gapped BLAST and PSI-BLAST: a new generation of protein database search programs. Nucleic Acids Res. 1997;25: 3389-3402. doi:10.1093/nar/25.17.3389
    • [6] Lowe TM, Eddy SR. tRNAscan-SE: a program for improved detection of transfer RNA genes in genomic sequence. Nucleic Acids Res. 1997;25: 955 964.
    • [7] Cobucci-Ponzano B, Rossi M, Moracci M. Translational recoding in archaea. Extremophiles. 2012;16: 793 803. doi:10.1007/s00792-012-0482-8
    • [8] Meyer F, Overbeek R, Rodriguez A. FIGfams: yet another set of protein families. Nucleic Acids Res. 2009;37 6643-54. doi:10.1093/nar/gkp698.
    • [9] van Belkum A, Sluijuter M, de Groot R, Verbrugh H, Hermans PW. Novel BOX repeat PCR assay for high-resolution typing of Streptococcus pneumoniae strains. J Clin Microbiol. 1996;34: 1176 1179.
    • [10] Croucher NJ, Vernikos GS, Parkhill J, Bentley SD. Identification, variation and transcription of pneumococcal repeat sequences. BMC Genomics. 2011;12: 120. doi:10.1186/1471-2164-12-120
    • [11] Hyatt D, Chen G-L, Locascio PF, Land ML, Larimer FW, Hauser LJ. Prodigal: prokaryotic gene recognition and translation initiation site identification. BMC Bioinformatics. 2010;11: 119. doi:10.1186/1471-2105-11-119
    • [12] Delcher AL, Bratke KA, Powers EC, Salzberg SL. Identifying bacterial genes and endosymbiont DNA with Glimmer. Bioinformatics. 2007;23: 673 679. doi:10.1093/bioinformatics/btm009
    • [13] Akhter S, Aziz RK, Edwards RA. PhiSpy: a novel algorithm for finding prophages in bacterial genomes that combines similarity- and composition-based strategies. Nucleic Acids Res. 2012;40: e126. doi:10.1093/nar/gks406
  3. Assemble Reads with SPAdes - v3.15.3
    • Bankevich A, Nurk S, Antipov D, Gurevich AA, Dvorkin M, Kulikov AS, et al. SPAdes: A New Genome Assembly Algorithm and Its Applications to Single-Cell Sequencing. Journal of Computational Biology. 2012;19: 455-477. doi: 10.1089/cmb.2012.0021
    • Prjibelski A, Antipov D, Meleshko D, Lapidus A, Korobeynikov A. Using SPAdes De Novo Assembler. Curr Protoc Bioinformatics. 2020 Jun;70(1):e102. doi: 10.1002/cpbi.102.
  4. Assess Genome Quality with CheckM - v1.0.18
    • Parks DH, Imelfort M, Skennerton CT, Hugenholtz P, Tyson GW. CheckM: assessing the quality of microbial genomes recovered from isolates, single cells, and metagenomes. Genome Res. 2015;25: 1043 1055. doi:10.1101/gr.186072.114
    • CheckM source:
    • Additional info:
  5. Compute ANI with FastANI
    • [1] Jain C, Rodriguez-R LM, Phillippy AM, Konstantinidis KT, Aluru S. High-throughput ANI Analysis of 90K Prokaryotic Genomes Reveals Clear Species Boundaries. 2017; doi:10.1101/225342
    • [2] Goris J, Konstantinidis KT, Klappenbach JA, Coenye T, Vandamme P, Tiedje JM. DNA-DNA hybridization values and their relationship to whole-genome sequence similarities. Int J Syst Evol Microbiol. 2007;57: 81 91. doi:10.1099/ijs.0.64483-0
    • FastANI module and source code:
  6. Insert Genome Into SpeciesTree - v2.2.0
    • Price MN, Dehal PS, Arkin AP. FastTree 2 Approximately Maximum-Likelihood Trees for Large Alignments. PLoS One. 2010;5. doi:10.1371/journal.pone.0009490
  7. Insert Set of Genomes Into SpeciesTree - v2.2.0
    • Price MN, Dehal PS, Arkin AP. FastTree 2 Approximately Maximum-Likelihood Trees for Large Alignments. PLoS One. 2010;5. doi:10.1371/journal.pone.0009490