Run Fama Genome Profiling - v1.1

Generate a functional profile of genomes with Fama

SUMMARY

This app is based on the Fama computational tool for functional profiling of microbiomes and taxonomic profiling of functional genes. Initially created for functional profiling of microbial communities, Fama can be used for functional genes of interest in annotated genome assemblies as well.

Fama Protein Profiling runs a similarity search for all predicted proteins in a genome using fast aligner DIAMOND and customized databases of reference proteins. After the similarity search, all hits found by DIAMOND are filtered by AAI (amino acid identity %) with family-specific thresholds. Top hits that pass the filter are counted for functional and taxonomic assignment.

INPUT

Protein Profiling requires genome or metagenome assembly with annotated coding genes as an input. Only pedicted protein sequences are analyzed by the app. Multiple genomes can be processed in a single run.

REFERENCE DATA

Datasets of reference proteins were prepared by search for functional roles of interest in the SEED database, with additional consistency checks. Those checks include identification and removal of incomplete proteins and redundant sequences. So, reference datasets include proteins from SEED genomes, with exception of RP-L6 dataset, which contains proteins from metagenome-assembled genomes. A complete list of functional families can be found here.

Reference data v.1.4 includes three reference datasets:

nitrogen cycle enzymes dataset for functional and taxonomic profiling of nitrate/nitrite/ammonia metabolic genes
30 families of universal single-copy marker proteins from complete bacterial and archaeal genomes for taxonomic profiling and check for contamination
Ribosomal protein L6 sequences from genomes of cultivated bacteria and metagenome-assembled genomes for fast taxonomic profiling.

OUTPUT

Output of the Fama Genome Profiling app includes report in HTML format, interactive profile plot for each genome, a DomainAnnotation object for each genome, a FeatureSet object and link to zip archive with Excel spreadsheets and interactive plots.

The HTML report contains the "Run Info" tab with a summary of results, "Protein list" tab with list of genes(proteins) and three tabs for each genome: "Functional profile", "Functional groups" and "Taxonomy profile". The "Functional profile" tab displays protein count and average amino acid identity % for each function. The "Functional groups" tab displays protein counts and average amino acid identity % for functions combined into more general functional groups. The "Taxonomy profile" tab displays protein counts for each function and each taxon.

Interactive Krona plots are generated for each genome. A Krona file contains taxonomic profiles displayed as hierarchical circular plots, one plot for each function. Numer of genes for each taxon is represented by an angle of the sector, and amino acid identity % is represented by color. Normally, all mapped genes would be mapped to the same taxon, but contaminated and chimeric genome assemblies can display conflicting taxonomic assignments of genes.

The DomainAnnotation objects are generated for each genome and contain list of genes with predicted functions. The Feature Set object contains list of predicted functional genes from all genomes. Those objects can be used for comparative genomics analyses (for example, GO term enrichment).

The output zip archive contains Excel spreadsheets with combined functional profile for all genomes, combined function/taxonomy profiles for all genomes, and detailed function/taxonomy profile for each genome (reporting protein count and average AAI% for each taxon). In addition, the archive contains a text file with a list of genes with predicted functions and interactive Krona plots for all samples.

Additional resources

Team members who implemented App in KBase: Alexey Kazakov.For questions, please contact us.

Related Publications

Kazakov A, Novichkov P. Fama: a computational tool for comparative analysis of shotgun metagenomic data. Great Lakes Bioinformatics conference (poster presentation). 2019. , https://iseq.lbl.gov/mydocs/fama_glbio2019_poster.pdf
Buchfink B, Xie C, Huson DH. Fast and sensitive protein alignment using DIAMOND. Nature Methods. 2015;12: 59-60. doi: 10.1038/nmeth.3176. Publication about third-party program used by Fama. , https://pubmed.ncbi.nlm.nih.gov/25402007/
Nayfach S, Pollard KS. Average genome size estimation improves comparative metagenomics and sheds light on the functional ecology of the human microbiome. Genome Biology. 2015;16: 51. doi: 10.1186/s13059-015-0611-7. Publication about third-party program used by Fama. , https://www.ncbi.nlm.nih.gov/pmc/articles/pmid/25853934/
Ondov B, Bergman NH et al. Interactive metagenomic visualization in a Web browser. BMC Bioinformatics. 2011;12: 385. doi: 10.1186/1471-2105-12-385. Publication about third-party program used by Fama. , https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3190407/

App Specification:

https://github.com/aekazakov/FamaProfiling/tree/d9db15ea217e3be2aab65c356564a6d345b4f410/ui/narrative/methods/run_FamaGenomeProfiling

Module Commit: d9db15ea217e3be2aab65c356564a6d345b4f410