Generated March 12, 2024
# Welcome to the Narrative
from IPython.display import IFrame
IFrame("https://www.kbase.us/narrative-welcome-cell/", width="100%", height="300px")
Out[1]:

Narratives for The phenotype and genotype of fermentative prokaryotes

This is the Narrative for Porphyromonadaceae sp. W3.11. A complementary Narrative for Lachnospiraceae sp. C1.1 is available here.

This is the Narrative for Lachnospiraceae sp. C1.1. A complementary Narrative for Porphyromonadaceae sp. W3.11 is available here.

Background and Isolation

This Narrative and its complementary Narrative contain assembly and annotation of two bacterial isolates that were isolated by our laboratory from the rumen of a Holstein heifer. All procedures with animals have been approved by University of California Davis’s Institutional Animal Care and Use Committee. Rumen contents were collected through a rumen fistula and strained through two layers of cheesecloth into a bottle. The bottle was sealed to exclude air and maintained at 39°C. Contents were brought to the laboratory and bubbled under O2-free CO2 within 15 min. At the laboratory, serial dilutions were made with anaerobic dilution solution for Lachnospiraceae sp. C1.1 and propionibacterium diluent for Porphyromonadaceae sp. W3.11 (table S2). Aliquots (0.1 ml) of each dilution were injected into anaerobic bottle plates (1) containing 9 ml of LH medium (table S2). After incubation at 37°C for 7 days, isolated colonies were picked. Lachnospiraceae sp. C1.1 was picked from a bottle inoculated with a 104 dilution of rumen contents, and Porphyromonadaceae sp. W3.11 was picked from a bottle inoculated with a 103 dilution. After initial isolation, these organisms were purified by growing on anaerobic roll tubes (2) and picking isolated colonies.

We performed de novo sequencing of Lachnospiraceae sp. C1.1 and Porphyromonadaceae sp. W3.11. Aliquots of liquid culture (9 and 1.5 ml, respectively) were collected by syringe and centrifuged (21,000g for 10 min at 4°C). Cell pellets were submitted to Molecular Research LP for DNA extraction, library preparation, and sequencing. After resuspending pellets in 180 µl of ATL buffer (Qiagen), DNA was extracted using the MagAttract HMW DNA Kit (Qiagen). DNA was eluted in 100 µl of AE buffer (Qiagen) and then cleaned using the DNEasy PowerClean Pro Cleanup Kit (Qiagen). DNA was then sheared using the Covaris g-TUBE (Covaris). Sequencing libraries were prepared using the SMRTbell Express Template Prep Kit 2.0 (Pacific Biosciences) and 1500 ng of the sheared and purified DNA. The SMRTbell libraries were size-selected (>6 Kb) using a BluePippin instrument (Sage Science) and 0.75% agarose gel. Libraries were then sequenced using the PacBio Sequel II (Pacific Biosciences) platform and a 30-hour movie time.

Narrative Summary

In these Narratives, we filtered low-quality reads using Trimmomatic (v0.36), assembled filtered reads with SPAdes (v3.15.3), and then checked completeness and contamination of the assembled genomes with CheckM (v1.0.18). Statistics for sequencing and assembly are in table S3.

Using the assembled contigs (genomes), we called genes and annotated them. Protein-coding genes were called using Prodigal (v2.6.3) (3) locally or using KBase via RASTtk (v1.073), with identical results. Genes were annotated with KO IDs using KAAS (4). They were further annotated with pfam and TIGRFAM IDs using KBase and the Annotate Domains in a Genome app. We classified putative genes for hydrogenases using HydDB. Genes for 16S ribosomal RNA (rRNA) were called using RASTtk (v1.073) in KBase.

The contigs (genomes) were analyzed to determine whether they belonged to new species. Taxonomy was assigned using GTDB-Tk (v1.7.0) in KBase. The identity of 16S rRNA genes to other organisms was found using EzBioCloud (5). Values of digital DNA-DNA hybridization (dDDH) were found with Type (Strain) Genome Server (6). These analyses suggest that Lachnospiraceae sp. C1.1 and Porphyromonadaceae sp. W3.11 represent novel species or genera. GTDB-Tk assigned Lachnospiracae sp. C1.1 to family Lachnospiraceae and genus NK4A144, which contains no type strains. It assigned Porphyromonadaceae sp. W3.11 to Porphyromonadaceae and genus Porphyromonas_A. Values of 16S rRNA identity and dDDH with respect to type strains were low (table S4). Although more phenotypic data are needed, available evidence supports assignment of genomes to new species or genera.

Related publication

Hackmann TJ, Zhang B. The phenotype and genotype of fermentative prokaryotes. Sci Adv. 2023 Sep 29;9(39):eadg8687. doi: 10.1126/sciadv.adg8687. Epub 2023 Sep 27. PMID: 37756392; PMCID: PMC10530074.

Supplementary tables referenced above can be downloaded here.

from biokbase.narrative.jobs.appmanager import AppManager
AppManager().run_app_batch(
    [{
        "app_id": "kb_uploadmethods/import_fastq_noninterleaved_as_reads_from_staging",
        "tag": "release",
        "version": "5b9346463df88a422ff5d4f4cba421679f63c73f",
        "params": [{
            "fastq_fwd_staging_file_name": "C1.1.fastq",
            "fastq_rev_staging_file_name": None,
            "name": "C1.1"
        }],
        "shared_params": {
            "sequencing_tech": "PacBio CCS",
            "single_genome": 1,
            "read_orientation_outward": 0,
            "insert_size_std_dev": None,
            "insert_size_mean": None
        }
    }],
    cell_id="725c72bf-6bf6-4bbe-8ef3-2253181d27bd",
    run_id="d22d8b9f-1537-4536-b7bc-8572a289dc0d"
)
A quality control application for high throughput sequence data.
This app completed without errors in 5m 27s.
Links
Files
These are only available in the live Narrative: https://narrative.kbase.us/narrative/171033
  • C1.1_151339_3_1.single_fastqc.zip - Zip file generated by fastqc that contains original images seen in the report
Trim paired- or single-end Illumina reads with Trimmomatic.
This app completed without errors in 7m 52s.
Objects
Created Object Name Type Description
C1.1_trimmed SingleEndLibrary Trimmed Reads
A quality control application for high throughput sequence data.
This app completed without errors in 4m 60s.
Links
Files
These are only available in the live Narrative: https://narrative.kbase.us/narrative/171033
  • C1.1_trimmed_151339_6_1.single_fastqc.zip - Zip file generated by fastqc that contains original images seen in the report
Assemble reads using the SPAdes assembler.
This app completed without errors in 3h 8m 40s.
Objects
Created Object Name Type Description
SPAdes.Assembly_trimmed Assembly Assembled contigs
Summary
Assembly saved to: thackmann:narrative_1688137047868/SPAdes.Assembly_trimmed Assembled into 3 contigs. Avg Length: 1409791.0 bp. Contig Length Distribution (# of contigs -- min to max basepairs): 2 -- 43853.0 to 448720.5 bp 0 -- 448720.5 to 853588.0 bp 0 -- 853588.0 to 1258455.5 bp 0 -- 1258455.5 to 1663323.0 bp 0 -- 1663323.0 to 2068190.5 bp 0 -- 2068190.5 to 2473058.0 bp 0 -- 2473058.0 to 2877925.5 bp 0 -- 2877925.5 to 3282793.0 bp 0 -- 3282793.0 to 3687660.5 bp 1 -- 3687660.5 to 4092528.0 bp
Links
Runs the CheckM lineage workflow to assess the genome quality of isolates, single cells, or genome bins from metagenome assemblies through comparison to an existing database of genomes.
This app completed without errors in 5m 24s.
Links
Files
These are only available in the live Narrative: https://narrative.kbase.us/narrative/171033
  • CheckM_summary_table.tsv.zip - TSV Summary Table from CheckM
  • full_output.zip - Full output of CheckM
  • plots.zip - Output plots from CheckM
Annotate a bacterial or archaeal assembly using RASTtk (Rapid Annotations using Subsystems Technology toolkit).
This app completed without errors in 2m 6s.
Objects
Created Object Name Type Description
C1.11_genes Genome RAST annotation
Summary
The RAST algorithm was applied to annotating a genome sequence comprised of 3 contigs containing 4229373 nucleotides. 
No initial gene calls were provided.
Standard gene features were called using: prodigal.
In addition to the remaining original 0 coding features and 0 non-coding features, 3749 new features were called, of which 0 are non-coding.
Output genome has the following feature types:
	Coding gene                     3749 
Overall, the genes have 0 distinct functions. 
The genes include 0 genes with a SEED annotation ontology across 0 distinct SEED functions.
The number of distinct functions can exceed the number of genes because some genes have multiple functions.
Output from Annotate Microbial Assembly with RASTtk - v1.073
The viewer for the output created by this App is available at the original Narrative here: https://narrative.kbase.us/narrative/171033
Annotate a Genome object with protein domains from widely used domain libraries.
This app completed without errors in 1h 34m 5s.
Objects
Created Object Name Type Description
C1.1_annotations DomainAnnotation Domain Annotations
Summary
Search Domains output: Getting DomainModelSet from storage. Getting Genome from storage. Running domain search against library 2959/19/1 Running domain search against library 2959/18/1 Running domain search against library 2959/24/1 Running domain search against library 2959/25/1 Running domain search against library 2959/23/1 Running domain search against library 2959/7/7 Running domain search against library 2959/20/1 Running domain search against library 2959/17/1 Running domain search against library 2959/21/1 Running domain search against library 2959/22/1
Output from Annotate Domains in a Genome
The viewer for the output created by this App is available at the original Narrative here: https://narrative.kbase.us/narrative/171033
Annotate a bacterial or archaeal assembly using RASTtk (Rapid Annotations using Subsystems Technology toolkit).
This app completed without errors in 1m 8s.
Objects
Created Object Name Type Description
C1.1_rRNA Genome RAST annotation
Summary
The RAST algorithm was applied to annotating a genome sequence comprised of 3 contigs containing 4229373 nucleotides. 
No initial gene calls were provided.
A scan was conducted for the following additional feature types: rRNA.
In addition to the remaining original 0 coding features and 0 non-coding features, 12 new features were called, of which 12 are non-coding.
Output genome has the following feature types:
	Non-coding rna                    12 
Overall, the genes have 0 distinct functions. 
The genes include 0 genes with a SEED annotation ontology across 0 distinct SEED functions.
The number of distinct functions can exceed the number of genes because some genes have multiple functions.
Output from Annotate Microbial Assembly with RASTtk - v1.073
The viewer for the output created by this App is available at the original Narrative here: https://narrative.kbase.us/narrative/171033
Obtain objective taxonomic assignments for bacterial and archaeal genomes based on the Genome Taxonomy Database (GTDB) ver R06-RS202
This app completed without errors in 1h 39m 53s.
Links

External References

  1. K. D. Olson, Modified bottle plate for the cultivation of strict anaerobes. J. Microbiol. Methods 14, 267–269 (1992). doi: 10.1016/0167-7012(92)90059-D

  2. R. Hungate, Chapter IV A roll tube method for cultivation of strict anaerobes. Methods Microbiol. 3, 117–132 (1969). doi: 10.1016/S0580-9517(08)70503-8

  3. D. Hyatt, G. L. Chen, P. F. LoCascio, M. L. Land, F. W. Larimer, L. J. Hauser, Prodigal: Prokaryotic gene recognition and translation initiation site identification. BMC Bioinformatics 11, 119 (2010). doi:10.1186/1471-2105-11-119

  4. Y. Moriya, M. Itoh, S. Okuda, A. C. Yoshizawa, M. Kanehisa, KAAS: An automatic genome annotation and pathway reconstruction server. Nucleic Acids Res. 35, W182–W185 (2007). doi:10.1093/nar/gkm321

  5. S. H. Yoon, S. M. Ha, S. Kwon, J. Lim, Y. Kim, H. Seo, J. Chun, Introducing EzBioCloud: A taxonomically united database of 16S rRNA gene sequences and whole-genome assemblies. Int. J. Syst. Evol. Microbiol. 67, 1613–1617 (2017).doi:10.1099/ijsem.0.001755

  6. J. P. Meier-Kolthoff, J. S. Carbasse, R. L. Peinado-Olarte, M. Goker, TYGS and LPSN: A database tandem for fast and reliable genome-based classification and nomenclature of prokaryotes. Nucleic Acids Res. 50, D801–D807 (2022). doi: 10.1093/nar/gkab902

Apps

  1. Annotate Domains in a Genome - v1.0.10
    • Altschul SF, Madden TL, Sch ffer AA, Zhang J, Zhang Z, Miller W, et al. Gapped BLAST and PSI-BLAST: a new generation of protein database search programs. Nucleic Acids Res. 1997;25: 3389 3402. doi:10.1093/nar/25.17.3389
    • Camacho C, Coulouris G, Avagyan V, Ma N, Papadopoulos J, Bealer K, et al. BLAST+: architecture and applications. BMC Bioinformatics. 2009;10: 421. doi:10.1186/1471-2105-10-421
    • Eddy SR. Accelerated Profile HMM Searches. PLOS Computational Biology. 2011;7: e1002195. doi:10.1371/journal.pcbi.1002195
    • El-Gebali S, Mistry J, Bateman A, Eddy SR, Luciani A, Potter SC, Qureshi M, Richardson LJ, Salazar GA, Smart A, Sonnhammer ELL, Hirsh L, Paladin L, Piovesan D, Tosatto SCE, Finn RD. The Pfam protein families database in 2019. Nucleic Acids Research. 2019;47: D427 D432. doi:10.1093/nar/gky995
    • Haft DH, Selengut JD, Richter RA, Harkins D, Basu MK, Beck E. TIGRFAMs and Genome Properties in 2013. Nucleic Acids Res. 2013;41: D387 D395. doi:10.1093/nar/gks1234
    • Letunic I, Bork P. 20 years of the SMART protein domain annotation resource. Nucleic Acids Res. 2018;46: D493 D496. doi:10.1093/nar/gkx922
    • Letunic I, Doerks T, Bork P. SMART: recent updates, new developments and status in 2015. Nucleic Acids Res. 2015;43: D257-260. doi:10.1093/nar/gku949
    • Marchler-Bauer A, Bo Y, Han L, He J, Lanczycki CJ, Lu S, et al. CDD/SPARCLE: functional classification of proteins via subfamily domain architectures. Nucleic Acids Res. 2017;45: D200 D203. doi:10.1093/nar/gkw1129
    • Selengut JD, Haft DH, Davidsen T, Ganapathy A, Gwinn-Giglio M, Nelson WC, et al. TIGRFAMs and Genome Properties: tools for the assignment of molecular function and biological process in prokaryotic genomes. Nucleic Acids Res. 2007;35: D260-264. doi:10.1093/nar/gkl1043
    • Tatusov RL, Koonin EV, Lipman DJ. A Genomic Perspective on Protein Families. Science. 1997;278: 631 637. doi:10.1126/science.278.5338.631
  2. Annotate Microbial Assembly with RASTtk - v1.073
    • [1] Aziz RK, Bartels D, Best AA, DeJongh M, Disz T, Edwards RA, et al. The RAST Server: Rapid Annotations using Subsystems Technology. BMC Genomics. 2008;9: 75. doi:10.1186/1471-2164-9-75
    • [2] Overbeek R, Olson R, Pusch GD, Olsen GJ, Davis JJ, Disz T, et al.vThe SEED and the Rapid Annotation of microbial genomes using Subsystems Technology (RAST). Nucleic Acids Res. 2014;42: D206 D214. doi:10.1093/nar/gkt1226
    • [3] Brettin T, Davis JJ, Disz T, Edwards RA, Gerdes S, Olsen GJ, et al. RASTtk: A modular and extensible implementation of the RAST algorithm for building custom annotation pipelines and annotating batches of genomes. Sci Rep. 2015;5. doi:10.1038/srep08365
    • [4] Kent WJ. BLAT The BLAST-Like Alignment Tool. Genome Res. 2002;12: 656 664. doi:10.1101/gr.229202
    • [5] Altschul SF, Madden TL, Schaffer AA, Zhang J, Zhang Z, Miller W, Lipman DJ. Gapped BLAST and PSI-BLAST: a new generation of protein database search programs. Nucleic Acids Res. 1997;25: 3389-3402. doi:10.1093/nar/25.17.3389
    • [6] Lowe TM, Eddy SR. tRNAscan-SE: a program for improved detection of transfer RNA genes in genomic sequence. Nucleic Acids Res. 1997;25: 955 964.
    • [7] Cobucci-Ponzano B, Rossi M, Moracci M. Translational recoding in archaea. Extremophiles. 2012;16: 793 803. doi:10.1007/s00792-012-0482-8
    • [8] Meyer F, Overbeek R, Rodriguez A. FIGfams: yet another set of protein families. Nucleic Acids Res. 2009;37 6643-54. doi:10.1093/nar/gkp698.
    • [9] van Belkum A, Sluijuter M, de Groot R, Verbrugh H, Hermans PW. Novel BOX repeat PCR assay for high-resolution typing of Streptococcus pneumoniae strains. J Clin Microbiol. 1996;34: 1176 1179.
    • [10] Croucher NJ, Vernikos GS, Parkhill J, Bentley SD. Identification, variation and transcription of pneumococcal repeat sequences. BMC Genomics. 2011;12: 120. doi:10.1186/1471-2164-12-120
    • [11] Hyatt D, Chen G-L, Locascio PF, Land ML, Larimer FW, Hauser LJ. Prodigal: prokaryotic gene recognition and translation initiation site identification. BMC Bioinformatics. 2010;11: 119. doi:10.1186/1471-2105-11-119
    • [12] Delcher AL, Bratke KA, Powers EC, Salzberg SL. Identifying bacterial genes and endosymbiont DNA with Glimmer. Bioinformatics. 2007;23: 673 679. doi:10.1093/bioinformatics/btm009
    • [13] Akhter S, Aziz RK, Edwards RA. PhiSpy: a novel algorithm for finding prophages in bacterial genomes that combines similarity- and composition-based strategies. Nucleic Acids Res. 2012;40: e126. doi:10.1093/nar/gks406
  3. Assemble Reads with SPAdes - v3.15.3
    • Bankevich A, Nurk S, Antipov D, Gurevich AA, Dvorkin M, Kulikov AS, et al. SPAdes: A New Genome Assembly Algorithm and Its Applications to Single-Cell Sequencing. Journal of Computational Biology. 2012;19: 455-477. doi: 10.1089/cmb.2012.0021
    • Prjibelski A, Antipov D, Meleshko D, Lapidus A, Korobeynikov A. Using SPAdes De Novo Assembler. Curr Protoc Bioinformatics. 2020 Jun;70(1):e102. doi: 10.1002/cpbi.102.
  4. Assess Genome Quality with CheckM - v1.0.18
    • Parks DH, Imelfort M, Skennerton CT, Hugenholtz P, Tyson GW. CheckM: assessing the quality of microbial genomes recovered from isolates, single cells, and metagenomes. Genome Res. 2015;25: 1043 1055. doi:10.1101/gr.186072.114
    • CheckM source:
    • Additional info:
  5. Assess Read Quality with FastQC - v0.12.1
    • FastQC source: Bioinformatics Group at the Babraham Institute, UK.
  6. Classify Microbes with GTDB-Tk - v2.3.2
    • Pierre-Alain Chaumeil, Aaron J Mussig, Philip Hugenholtz, Donovan H Parks. GTDB-Tk v2: memory friendly classification with the genome taxonomy database. Bioinformatics, Volume 38, Issue 23, 1 December 2022, Pages 5315 5316. DOI: https://doi.org/10.1093/bioinformatics/btac672
    • Pierre-Alain Chaumeil, Aaron J Mussig, Philip Hugenholtz, Donovan H Parks, GTDB-Tk: a toolkit to classify genomes with the Genome Taxonomy Database, Bioinformatics, Volume 36, Issue 6, 15 March 2020, Pages 1925 1927. DOI: https://doi.org/10.1093/bioinformatics/btz848
    • Donovan H Parks, Maria Chuvochina, Christian Rinke, Aaron J Mussig, Pierre-Alain Chaumeil, Philip Hugenholtz. GTDB: an ongoing census of bacterial and archaeal diversity through a phylogenetically consistent, rank normalized and complete genome-based taxonomy. Nucleic Acids Research, Volume 50, Issue D1, 7 January 2022, Pages D785 D794. DOI: https://doi.org/10.1093/nar/gkab776
    • Parks, D., Chuvochina, M., Waite, D. et al. A standardized bacterial taxonomy based on genome phylogeny substantially revises the tree of life. Nat Biotechnol 36, 996 1004 (2018). DOI: https://doi.org/10.1038/nbt.4229
    • Parks DH, Chuvochina M, Chaumeil PA, Rinke C, Mussig AJ, Hugenholtz P. A complete domain-to-species taxonomy for Bacteria and Archaea. Nat Biotechnol. 2020;10.1038/s41587-020-0501-8. DOI:10.1038/s41587-020-0501-8
    • Rinke C, Chuvochina M, Mussig AJ, Chaumeil PA, Dav n AA, Waite DW, Whitman WB, Parks DH, and Hugenholtz P. A standardized archaeal taxonomy for the Genome Taxonomy Database. Nat Microbiol. 2021 Jul;6(7):946-959. DOI:10.1038/s41564-021-00918-8
    • Chivian D, Jungbluth SP, Dehal PS, Wood-Charlson EM, Canon RS, Allen BH, Clark MM, Gu T, Land ML, Price GA, Riehl WJ, Sneddon MW, Sutormin R, Zhang Q, Cottingham RW, Henry CS, Arkin AP. Metagenome-assembled genome extraction and analysis from microbiomes using KBase. Nat Protoc. 2023 Jan;18(1):208-238. doi: 10.1038/s41596-022-00747-x
    • Matsen FA, Kodner RB, Armbrust EV. pplacer: linear time maximum-likelihood and Bayesian phylogenetic placement of sequences onto a fixed reference tree. BMC Bioinformatics. 2010;11:538. Published 2010 Oct 30. doi:10.1186/1471-2105-11-538
    • Jain C, Rodriguez-R LM, Phillippy AM, Konstantinidis KT, Aluru S. High throughput ANI analysis of 90K prokaryotic genomes reveals clear species boundaries. Nat Commun. 2018;9(1):5114. Published 2018 Nov 30. DOI:10.1038/s41467-018-07641-9
    • Hyatt D, Chen GL, Locascio PF, Land ML, Larimer FW, Hauser LJ. Prodigal: prokaryotic gene recognition and translation initiation site identification. BMC Bioinformatics. 2010;11:119. Published 2010 Mar 8. DOI:10.1186/1471-2105-11-119
    • Price MN, Dehal PS, Arkin AP. FastTree 2--approximately maximum-likelihood trees for large alignments. PLoS One. 2010;5(3):e9490. Published 2010 Mar 10. DOI:10.1371/journal.pone.0009490 link: https://www.ncbi.nlm.nih.gov/pmc/articles/PMC2835736/
    • Eddy SR. Accelerated Profile HMM Searches. PLoS Comput Biol. 2011;7(10):e1002195. DOI:10.1371/journal.pcbi.1002195
    • Ondov BD, Treangen TJ, Melsted P, Mallonee AB, Bergman NH, Koren S, Phillippy AM. Mash: fast genome and metagenome distance estimation using MinHash. Genome Biol. 2016 Jun 20;17(1):132. DOI: 10.1186/s13059-016-0997-x
  7. Trim Reads with Trimmomatic - v0.36
    • Bolger AM, Lohse M, Usadel B. Trimmomatic: a flexible trimmer for Illumina sequence data. Bioinformatics. 2014;30: 2114 2120. doi:10.1093/bioinformatics/btu170