Generated June 17, 2025

Fermented Foods Microbial Genomes Database

This database contains ~4,300 microbial genomes assembled from diverse fermented foods. These genomes were obtained from a larger set of 13,850 microbial genomes by clustering them at 99% average nucleotide identity (ANI) to create a "species"-representative database. The full database and instructions to subset the database in different ways can be found on Zenodo.

This database largely pulls from existing genome resources, and we curated this database specifically for fermented foods. If you use this database, please cite the following genome databases/resources:

  1. MiFoDB, a workflow for microbial food metagenomic characterization, enables high-resolution analysis of fermented food microbial dynamics. Elisa B. Caffrey, Matthew R. Olm, Caroline Isabel Kothe, Joshua Evans, Justin L. Sonnenburg . bioRxiv 2024.03.29.587370; doi: https://doi.org/10.1101/2024.03.29.587370
  2. Unexplored microbial diversity from 2,500 food metagenomes and links with the human microbiome. Carlino, Niccolo Alvarez-Ordonez, Avelino et al. Cell, Volume 187, Issue 20, 5775-5795.e15 AND the associated Zenodo release: Master Consortium. (2024). Unexplored microbial diversity from 2,500 food metagenomes and links with the human microbiome. Zenodo. https://doi.org/10.5281/zenodo.13285428

We are incredibly grateful for these groups and countless others taking the time to make their data publicly available. Included in the metadata is the original DOI and study link from which the genome was generated, in addition to if they were collated into one of the above two larger databases. If you specifically use/analyze a subset of genomes, please cite those studies to credit those that generate data and make it publicly available.

Analyzing the Genomes in this Narrative

The main working unit on KBase is the narrative. A narrative is where you can create shareable, reproducible workflows containing certaint tools chained together, analyses and results, and/or datasets. The quickest way to start running apps on genomes in this database is by copying this narrative into your workspace. After running apps on this database in a separate narrative, you may wish to download the results. This guide walks you through how to download the results to your local computer.

Incorporating Additional Genomes for Comparative Analyses with this Database

You may want to analyze additional genomes that aren't part of this database that are publicly available or you have generated yourself that aren't yet publicly available in comparison to the genomes in this database. This guide walks through how to upload FASTA assembly files. Additionally we documented how we uploaded the ~4300 set of assemblies in batches in case you also want to upload a large number of genomes for comparison. Follow our documentation guide starting from Creating a Bulk Import Template through to Uploading Genomes to KBase.

Downloading Specific Metagenomic Samples

This narrative only contains the raw FASTA files for the genome assemblies and not any raw FASTQ files for the metagenomic samples. Our complete, curated metadata is available on Zenodo. We documented SRA/ENA accessions for samples that we could confidently find that information. Some samples have a range of SRA/ENA accessions listed, as these were genomes that were co-assembled from multiple metagenomic samples and we couldn't confidently say which singular SRA/ENA accession that genome originated from or is represenatative of.

Once you have identified what SRA/ENA accessions from the metadata you are interested in incorporating into the narrative for analysis with the genome database, you can use KBase apps to download those FASTQ files following this guide.