Ensifer's cool genome

Introduction

For our BIT 295 project we are using Ensifer Bacteria collected by Dr. Jason Whitham and Dr. Amy Grunden at NC State University pokeweed and acid mine drainage. These bacteria were analysed using Nanopore sequencing and this data is used throughout the work. The information gathered by Dr Goller and Dr. Sjogren can be found here: https://docs.google.com/document/d/1xqKYLeBJnFfC5c0DeZtS8vdNo6Oc7u-NgI4FvGJkog4/edit#heading=h.uq3zl9n74m8v

Background and Experimental Methods
Import and annotation
QC, Assembly, and Annotation
Taxonomic Classification
Metabolic Modeling and Flux Balance Analysis
References

Narrative created by: [Lauren Turrentine and Ankitha Lavu]

Note to Authors

The publication may not be available at the time of the static Narrative creation. This can be added after the fact; please contact [email protected] to update the DOI landing page when this is done.

Background and Experimental Methods

Sample Collection

The sample was collected from an area with acid mine drainage and pokeweed. It was found with 14 other isolates.

Isolation

Each sample was isolated on Trypic Soy Broth dishes.

Genome Sequencing

The Genes were then sequenced using NanoPore sequencing and CLC Cenomics Workbench.

Import

Note to authors

Reads or assemblies should be imported through the staging area. See the upload and download guide on the KBase documentation site for details instructions.

from biokbase.narrative.jobs.appmanager import AppManager
AppManager().run_app_bulk(
    [{
        "app_id": "kb_uploadmethods/import_fasta_as_assembly_from_staging",
        "tag": "release",
        "version": "1dbd08a56befada8f204b4d1db5a872796cd45a5",
        "params": [{
            "staging_file_subdir_path": "Barcode05.fasta",
            "assembly_name": "Barcode05.fasta_assembly",
            "type": "draft isolate",
            "min_contig_length": 10000
        }]
    }],
    cell_id="61e87ea3-aab8-4209-86fa-5356051cf780",
    run_id="d9da3558-3ae8-402c-970b-dca0dcfcebc2"
)

Annotate and Distill Assemblies with DRAM

Annotate your assembly with DRAM. Annotations will then be distilled to create an interactive functional summary per assembly.

This app completed without errors in 51m 38s.

Input Objects

Assembly/assembly set to be annotated

Barcode05.fasta_assembly

Parameters

Is metagenome?

Description

Ensifer genome?

Minimum contig length

2500

Translation table

Bit score threshold

Reverse search bit score threshold

350

Output Objects

Output GenomeSet Name

Ensifer

Objects

Created Object Name	Type	Description
Barcode05.fasta_assembly_DRAM	Genome	Annotated Genome
Ensifer	GenomeSet	Ensifer genome?

Report

View report in separate window

Summary

Here are the results from your DRAM run.

Links

product.html - DRAM product.

Files

These are only available in the live Narrative: https://narrative.kbase.us/narrative/113838

annotations.tsv - DRAM annotations in a tab separate table format
genes.fna - Genes as nucleotides predicted by DRAM with brief annotations
genes.faa - Genes as amino acids predicted by DRAM with brief annotations
genes.gff - GFF file of all DRAM annotations
rrnas.tsv - Tab separated table of rRNAs as detected by barrnap
trnas.tsv - Tab separated table of tRNAs as detected by tRNAscan-SE
genbank.tar.gz - Compressed folder of output genbank files
product.tsv - DRAM product in tabular format
metabolism_summary.xlsx - DRAM metabolism summary tables
genome_stats.tsv - DRAM genome statistics table

Assess Read Quality with FastQC - v0.11.9

A quality control application for high throughput sequence data.

This app is new, and hasn't been started.

No output found.

Barcode05.fasta_assembly

v1 - KBaseGenomeAnnotations.Assembly-5.0

The viewer for the data in this Cell is available at the original Narrative here: https://narrative.kbase.us/narrative/113838

Barcode05.fasta_assembly

v1 - KBaseGenomeAnnotations.Assembly-5.0

The viewer for the data in this Cell is available at the original Narrative here: https://narrative.kbase.us/narrative/113838

QC, Assembly, and Annotation

Author Checklist

Annotate Genome/Assembly with RASTtk - v1.073

Annotate or re-annotate genome/assembly using RASTtk (Rapid Annotations using Subsystems Technology toolkit).

This app completed without errors in 26m 39s.

Objects

Created Object Name	Type	Description
RAST-annotation	Genome	RAST re-annotated genome

Report

View report in separate window

Summary

The RAST algorithm was applied to annotating a genome sequence comprised of 4 contigs containing 7215895 nucleotides. No initial gene calls were provided. Standard gene features were called using: prodigal; glimmer3. A scan was conducted for the following additional feature types: rRNA; tRNA; selenoproteins; pyrrolysoproteins; repeat regions; crispr. The genome features were functionally annotated using the following algorithm(s): Kmers V2; Kmers V1; protein similarity. In addition to the remaining original 0 coding features and 0 non-coding features, 8537 new features were called, of which 129 are non-coding. Output genome has the following feature types: Coding gene 8408 Non-coding prophage 5 Non-coding repeat 50 Non-coding rna 74 Overall, the genes have 3519 distinct functions The genes include 3046 genes with a SEED annotation ontology across 1528 distinct SEED functions. The number of distinct functions can exceed the number of genes because some genes have multiple functions.

Links

genome_report.html - Feature function report for genome RAST-annotation

Annotate Assembly and Re-annotate Genomes with Prokka - v1.14.5

Annotate Assembly and Re-annotate Genomes with Prokka annotation pipeline.

This app completed without errors in 7m 16s.

Input Objects

Assembly or Genome

Barcode05.fasta_assembly

Parameters

Scientific name

Kingdom

Bacteria

Genus

Genetic code

Raw product

Fast

Min.contig size

E-value

Rfam

No rRNA

No tRNA

Output Objects

Output genome

Prokka-ann

Objects

Created Object Name	Type	Description
Prokka-ann	Genome	Annotated Genome

Summary

Annotated Genome saved to: cgoller:narrative_1649170660833/Prokka-ann Number of genes predicted: 7978 Number of protein coding genes: 7898 Number of genes with non-hypothetical function: 3898 Number of genes with EC-number: 1730 Number of genes with Seed Subsystem Ontology: 0 Average protein length: 255 aa.

Output from Annotate Assembly and Re-annotate Genomes with Prokka - v1.14.5

The viewer for the output created by this App is available at the original Narrative here: https://narrative.kbase.us/narrative/113838

Taxonomic Identification

Note to Authors

One common way is to use the GTDB-Tk classify app. We recommend adding a phylogenetic tree using the Insert Genome into SpeciesTree app.

Classify Microbes with GTDB-Tk - v1.7.0

Obtain objective taxonomic assignments for bacterial and archaeal genomes based on the Genome Taxonomy Database (GTDB) ver R06-RS202

This app completed without errors in 32m 43s.

Objects

Created Object Name	Type	Description
Barcode05.fasta_assembly_DRAM	Genome	Taxonomy and taxon_assignment updated with GTDB
Ensifer	GenomeSet	Taxonomy and taxon_assignment updated with GTDB

Report

View report in separate window

Links

index.html - HTML report for GTDBTk Classify

Insert Genome Into SpeciesTree - v2.2.0

Add one or more Genomes to a KBase SpeciesTree.

This app completed without errors in 6m 52s.

Report

View report in separate window

Links

EnsiferTree.html

Files

These are only available in the live Narrative: https://narrative.kbase.us/narrative/113838

EnsiferTree.newick
EnsiferTree-labels.newick
EnsiferTree.png
EnsiferTree.pdf

Upload File to Staging from Web - v1.0.12

Upload a data file (which may be compressed) from a web URL to your staging area.

This app completed without errors in 2m 27s.

Summary

Uploaded Files: 1 /barcode05.fastq

References

Annotate and Distill Assemblies with DRAM

Annotate Assembly and Re-annotate Genomes with Prokka - v1.14.5

Annotate Genome/Assembly with RASTtk - v1.073

Assess Read Quality with FastQC - v0.11.9

Classify Microbes with GTDB-Tk - v1.7.0

Insert Genome Into SpeciesTree - v2.2.0

Upload File to Staging from Web - v1.0.12

v1 - KBaseGenomeAnnotations.Assembly-5.0

Note to authors

The KBase static Narrative service automatically generates a list of citations for apps used. If you provide any citations for literature or outside tools within the markdown cells, those should be included here.

Apps

Annotate and Distill Assemblies with DRAM
- DRAM source code
- DRAM documentation
- DRAM publication
Annotate Assembly and Re-annotate Genomes with Prokka - v1.14.5
- Seemann T. Prokka: rapid prokaryotic genome annotation. Bioinformatics. 2014;30: 2068 2069. doi:10.1093/bioinformatics/btu153
Annotate Genome/Assembly with RASTtk - v1.073
- [1] Aziz RK, Bartels D, Best AA, DeJongh M, Disz T, Edwards RA, et al. The RAST Server: Rapid Annotations using Subsystems Technology. BMC Genomics. 2008;9: 75. doi:10.1186/1471-2164-9-75
- [2] Overbeek R, Olson R, Pusch GD, Olsen GJ, Davis JJ, Disz T, et al. The SEED and the Rapid Annotation of microbial genomes using Subsystems Technology (RAST). Nucleic Acids Res. 2014;42: D206 D214. doi:10.1093/nar/gkt1226
- [3] Brettin T, Davis JJ, Disz T, Edwards RA, Gerdes S, Olsen GJ, et al. RASTtk: A modular and extensible implementation of the RAST algorithm for building custom annotation pipelines and annotating batches of genomes. Sci Rep. 2015;5. doi:10.1038/srep08365
- [4] Kent WJ. BLAT The BLAST-Like Alignment Tool. Genome Res. 2002;12: 656 664. doi:10.1101/gr.229202
- [5] Altschul SF, Madden TL, Schaffer AA, Zhang J, Zhang Z, Miller W, Lipman DJ. Gapped BLAST and PSI-BLAST: a new generation of protein database search programs. Nucleic Acids Res. 1997;25: 3389-3402. doi:10.1093/nar/25.17.3389
- [6] Lowe TM, Eddy SR. tRNAscan-SE: a program for improved detection of transfer RNA genes in genomic sequence. Nucleic Acids Res. 1997;25: 955 964.
- [7] Cobucci-Ponzano B, Rossi M, Moracci M. Translational recoding in archaea. Extremophiles. 2012;16: 793 803. doi:10.1007/s00792-012-0482-8
- [8] Meyer F, Overbeek R, Rodriguez A. FIGfams: yet another set of protein families. Nucleic Acids Res. 2009;37 6643-54. doi:10.1093/nar/gkp698.
- [9] van Belkum A, Sluijuter M, de Groot R, Verbrugh H, Hermans PW. Novel BOX repeat PCR assay for high-resolution typing of Streptococcus pneumoniae strains. J Clin Microbiol. 1996;34: 1176 1179.
- [10] Croucher NJ, Vernikos GS, Parkhill J, Bentley SD. Identification, variation and transcription of pneumococcal repeat sequences. BMC Genomics. 2011;12: 120. doi:10.1186/1471-2164-12-120
- [11] Hyatt D, Chen G-L, Locascio PF, Land ML, Larimer FW, Hauser LJ. Prodigal: prokaryotic gene recognition and translation initiation site identification. BMC Bioinformatics. 2010;11: 119. doi:10.1186/1471-2105-11-119
- [12] Delcher AL, Bratke KA, Powers EC, Salzberg SL. Identifying bacterial genes and endosymbiont DNA with Glimmer. Bioinformatics. 2007;23: 673 679. doi:10.1093/bioinformatics/btm009
- [13] Akhter S, Aziz RK, Edwards RA. PhiSpy: a novel algorithm for finding prophages in bacterial genomes that combines similarity- and composition-based strategies. Nucleic Acids Res. 2012;40: e126. doi:10.1093/nar/gks406
Assess Read Quality with FastQC - v0.11.9
- FastQC source: Bioinformatics Group at the Babraham Institute, UK.
Classify Microbes with GTDB-Tk - v1.7.0
- Pierre-Alain Chaumeil, Aaron J Mussig, Philip Hugenholtz, Donovan H Parks, GTDB-Tk: a toolkit to classify genomes with the Genome Taxonomy Database, Bioinformatics, Volume 36, Issue 6, 15 March 2020, Pages 1925 1927. DOI: https://doi.org/10.1093/bioinformatics/btz848
- Parks, D., Chuvochina, M., Waite, D. et al. A standardized bacterial taxonomy based on genome phylogeny substantially revises the tree of life. Nat Biotechnol 36, 996 1004 (2018). DOI: https://doi.org/10.1038/nbt.4229
- Parks DH, Chuvochina M, Chaumeil PA, Rinke C, Mussig AJ, Hugenholtz P. A complete domain-to-species taxonomy for Bacteria and Archaea. Nat Biotechnol. 2020;10.1038/s41587-020-0501-8. DOI:10.1038/s41587-020-0501-8
- Rinke C, Chuvochina M, Mussig AJ, Chaumeil PA, Dav n AA, Waite DW, Whitman WB, Parks DH, and Hugenholtz P. A standardized archaeal taxonomy for the Genome Taxonomy Database. Nat Microbiol. 2021 Jul;6(7):946-959. DOI:10.1038/s41564-021-00918-8
- Matsen FA, Kodner RB, Armbrust EV. pplacer: linear time maximum-likelihood and Bayesian phylogenetic placement of sequences onto a fixed reference tree. BMC Bioinformatics. 2010;11:538. Published 2010 Oct 30. doi:10.1186/1471-2105-11-538
- Jain C, Rodriguez-R LM, Phillippy AM, Konstantinidis KT, Aluru S. High throughput ANI analysis of 90K prokaryotic genomes reveals clear species boundaries. Nat Commun. 2018;9(1):5114. Published 2018 Nov 30. DOI:10.1038/s41467-018-07641-9
- Hyatt D, Chen GL, Locascio PF, Land ML, Larimer FW, Hauser LJ. Prodigal: prokaryotic gene recognition and translation initiation site identification. BMC Bioinformatics. 2010;11:119. Published 2010 Mar 8. DOI:10.1186/1471-2105-11-119
- Price MN, Dehal PS, Arkin AP. FastTree 2--approximately maximum-likelihood trees for large alignments. PLoS One. 2010;5(3):e9490. Published 2010 Mar 10. DOI:10.1371/journal.pone.0009490 link: https://www.ncbi.nlm.nih.gov/pmc/articles/PMC2835736/
- Eddy SR. Accelerated Profile HMM Searches. PLoS Comput Biol. 2011;7(10):e1002195. DOI:10.1371/journal.pcbi.1002195
Insert Genome Into SpeciesTree - v2.2.0
- Price MN, Dehal PS, Arkin AP. FastTree 2 Approximately Maximum-Likelihood Trees for Large Alignments. PLoS One. 2010;5. doi:10.1371/journal.pone.0009490
Upload File to Staging from Web - v1.0.12
- Arkin AP, Cottingham RW, Henry CS, Harris NL, Stevens RL, Maslov S, et al. KBase: The United States Department of Energy Systems Biology Knowledgebase. Nature Biotechnology. 2018;36: 566. doi: 10.1038/nbt.4163

Ensifer's cool genome

Introduction

Table of Contents

Note to Authors

Background and Experimental Methods

Sample Collection

Isolation

Genome Sequencing

Import

Note to authors

QC, Assembly, and Annotation

Author Checklist

Taxonomic Identification

Note to Authors

References

Note to authors

Apps