Search for matches to Bacterial and Archaeal Phylogenetic Marker families using HMMER 3
This method scans protein sequences found in Genomes and Annotated Metagenome Assemblies (AMAs) using a set of Hidden Markov Model (HMM) from PFAM and TIGRFAMs that correspond to universal single copy essential genes, which resist horizontal transfer and therefore are useful as phylogenetic markers. It uses HMMER software.
Search with HMMs of Phylogenetic Marker families profiles collections of genes, genomes, and/or annotated metagenome assemblies for Phylogenetic marker genes and optionally outputs FeatureSet collections for each of the requested gene families. It uses gene-family-derived HMMs from the PFAM and TIGERFAMs which were identified as single copy universal by the GTDB resource. The user can run with the entire collection, just those from a given category, or specify individual gene families with which to search. In this last mode, FeatureSet objects are produced that can be used in additional KBase phylogenomic Apps, such as Build Gene Tree. Hits by each gene family to genes in the target set are also shown in the report.
Tool and Data Sources:
HMMER v3.3.2 is installed from http://hmmer.org
Phylogenetic Marker HMMs are defined from GTDB R05RS95, with models from TIGERFAMs v15.0 and PFAM 33.1. The v1 set of markers used is available at https://github.com/kbaseapps/kb_hmmer/blob/master/data/PhyloMarkers/PhyloMarkers-v1/tables/all_marker_info_r95-adjusted.tsv.
Configuration:
Targets Objects: The Targets Objects may be a FeatureSet of genes, a Genome, a GenomeSet, a SpeciesTree, or an Annotated Metagenome Assembly (AMA). A HMMER search database will be automatically generated from the Targets Object.
Output FeatureSet basename: This is the basename for the objects that will contain the set of genes that are both hit and pass confidence thresholds for each model.
Other Parameters: See "Parameters" section below.
Output:
Output Object: Gene hits are captured in a FeatureSet output object. If there are additional user-defined thresholds, those are filtered out and do not appear in the object, even if they are shown in the output table. The Output object name is used as a basename to which the HMM name is prepended.
Output HTML Profile: A raw count or heatmap of the number of genes hit from each gene family (column) with each genome or annotated metagenome assembly (row). Each cell in the profile offers a roll-over of the number of hits and the gene IDs of those hits. If the input targets is a SpeciesTree, the rows are ordered by their order in a ladderized view of that tree (available from the "View Tree" App).
Output HTML Hit Table: The tab-delimited hit table is HTML formatted and additionally shows the region of the hit sequence (as there is no query sequence) covered by the HMMER alignment. Hits that are above the e-value threshold but below other thresholds and are not included in the FeatureSet output object are shown in gray, with the attributes that were below the threshold in red. A separate table is made for each HMM.
Downloadable files: HMMER output hit table is available for download. These are not altered from the direct output from the HMMER run. The text output is generated for each HMM.
Team members who implemented App in KBase: Dylan Chivian. For questions, please contact us.
Please cite:
- Eddy SR. Accelerated Profile HMM Searches. PLOS Computational Biology. 2011;7: e1002195. doi:10.1371/journal.pcbi.1002195
- Chivian D, Jungbluth SP, Dehal PS, Wood-Charlson EM, Canon RS, Allen BH, Clark MM, Gu T, Land ML, Price GA, Riehl WJ, Sneddon MW, Sutormin R, Zhang Q, Cottingham RW, Henry CS, Arkin AP. Metagenome-assembled genome extraction and analysis from microbiomes using KBase. Nat Protoc. 2023 Jan;18(1):208-238. doi: 10.1038/s41596-022-00747-x
Related Publications
- Eddy SR. Accelerated Profile HMM Searches. PLOS Computational Biology. 2011;7: e1002195. doi:10.1371/journal.pcbi.1002195 , https://journals.plos.org/ploscompbiol/article?id=10.1371/journal.pcbi.1002195
- HMMER v3.3.2 source: , http://HMMER.ORG
- Chivian D, Jungbluth SP, Dehal PS, Wood-Charlson EM, Canon RS, Allen BH, Clark MM, Gu T, Land ML, Price GA, Riehl WJ, Sneddon MW, Sutormin R, Zhang Q, Cottingham RW, Henry CS, Arkin AP. Metagenome-assembled genome extraction and analysis from microbiomes using KBase. Nat Protoc. 2023 Jan;18(1):208-238. doi: 10.1038/s41596-022-00747-x , https://www.nature.com/articles/s41596-022-00747-x
App Specification:
https://github.com/kbaseapps/kb_hmmer/tree/6c338791492e2980534a08a1606d2d7137884759/ui/narrative/methods/HMMER_PhyloMarkers_SearchModule Commit: 6c338791492e2980534a08a1606d2d7137884759