Search for matches to a Multiple Sequence Alignment (MSA) using Hidden Markov Model Search (hmmsearch).
This App performs a Hidden Markov Model (HMM) search (hmmsearch) on protein sequences using HMMER models build from an input Multiple Sequence Alignment (MSA).
This App turns a protein multiple sequence alignment (MSA) into a Hidden Markov Model (HMM) which is used to search a protein sequence database. The MSA must first be generated using a tool such as MUSCLE to build an MSA from a FeatureSet object. The KBase implementation permits searching through the genes in a Genome object, the genes in the Genome members of a GenomeSet, or the genes in a FeatureSet. The output object of these searches is a FeatureSet containing those genes that pass the thresholds given by the user. The App also provides a table of hits (with those hits that are below the thresholds in gray) and links to download the table of hits and a Stockholm format MSA.
Tool and Data Sources:
HMMER v3.3.2 is installed from http://hmmer.org
Configuration:
Query MSA: The MSA to use as the query. It must be generated using an alignment App such as MUSCLE. An HMM will be automatically generated from the MSA by hmmbuild.
Targets Object: The Targets object may be a FeatureSet of genes, a Genome, or a GenomeSet. An HMMER search database will be automatically generated from the Targets object.
Output Object: This is the set of genes that hits within the user-defined thresholds.
E-value: This bounds the e-value for the weakest hit to consider viable. Values below this do not get reported in the table or the HMMER output text downloads.
Bitscore: This bounds the bit score for the weakest hit to include in the FeatureSet output object. Hits below this threshold are still reported in the table and HMMER text downloads.
Max Accepts: Limit on the number of hits to report (default is 1000).
Output:
Output Object: Gene hits are captured in a FeatureSet output object. If there are additional user-defined thresholds, those are filtered out and do not appear in the object, even if they are shown in the output table.
Output HTML Table: The tab-delimited hit table is HTML-formatted and shows the region of the hit sequence (as there is no query sequence) covered by the HMMER alignment. Hits that are above the e-value threshold but below other thresholds and are not included in the FeatureSet output object are shown in gray, with the attributes that were below the threshold in red.
Downloadable files: HMMER table and Stockholm format MSA are available for download. These are not altered from the direct output from the HMMER run.
Team members who implemented App in KBase: Dylan Chivian. For questions, please contact us.
Please cite:
- Eddy SR. Accelerated Profile HMM Searches. PLOS Computational Biology. 2011;7: e1002195. doi:10.1371/journal.pcbi.1002195
- Chivian D, Jungbluth SP, Dehal PS, Wood-Charlson EM, Canon RS, Allen BH, Clark MM, Gu T, Land ML, Price GA, Riehl WJ, Sneddon MW, Sutormin R, Zhang Q, Cottingham RW, Henry CS, Arkin AP. Metagenome-assembled genome extraction and analysis from microbiomes using KBase. Nat Protoc. 2023 Jan;18(1):208-238. doi: 10.1038/s41596-022-00747-x
Related Publications
- Eddy SR. Accelerated Profile HMM Searches. PLOS Computational Biology. 2011;7: e1002195. doi:10.1371/journal.pcbi.1002195 , https://journals.plos.org/ploscompbiol/article?id=10.1371/journal.pcbi.1002195
- HMMER v3.3.2 source: , http://HMMER.ORG
- Chivian D, Jungbluth SP, Dehal PS, Wood-Charlson EM, Canon RS, Allen BH, Clark MM, Gu T, Land ML, Price GA, Riehl WJ, Sneddon MW, Sutormin R, Zhang Q, Cottingham RW, Henry CS, Arkin AP. Metagenome-assembled genome extraction and analysis from microbiomes using KBase. Nat Protoc. 2023 Jan;18(1):208-238. doi: 10.1038/s41596-022-00747-x , https://www.nature.com/articles/s41596-022-00747-x
App Specification:
https://github.com/kbaseapps/kb_hmmer/tree/6c338791492e2980534a08a1606d2d7137884759/ui/narrative/methods/HMMER_MSA_SearchModule Commit: 6c338791492e2980534a08a1606d2d7137884759