Search for matches to dbCAN HMMs of CAZy carbohydrate active enzyme families using HMMER 3
This method performs a set of Hidden Markov Model (HMM) searches on protein sequences using HMMER models built for a Group of MSAs.
HMMER Search & Functional Profile with Custom Models performs multiple HMMER searches. It reads the objects in the Narrative to find the Multiple Sequence Alignments (MSAs) and uses those MSAs to create a set of Hidden Markov Models (HMMs) which are used in turn to search a protein sequence database. The MSAs must first be generated using a tool such as MUSCLE that builds an MSA from a FeatureSet object. The KBase implementation permits searching through the genes in a Genome object, the genes in the Genome members of a GenomeSet, or the genes in a FeatureSet. The output object of these searches is a FeatureSet containing those genes that pass the thresholds given by the user. The App also provides a table of the hits (with those hits that are below the thresholds in gray) and links to download the table of hits and a Stockholm format MSA. A separate table for each MSA/HMM is provided, whereas the user may select whether to combine the hits into a single FeatureSet or produce a separate FeatureSet object for each MSA/HMM.
Targets Object: The Targets Object may be a FeatureSet of genes, a Genome, or a GenomeSet. A HMMER search database will be automatically generated from the Targets Object.
Output Object: This is the set of genes that are both hit and pass user-defined thresholds.
Coalesce Output: Select whether to combine all hits into a single FeatureSet (which may be used in subsequent Functional Profiling) or make a separate FeatureSet of hits for each MSA/HMM.
E-value: This bounds the e-value for the weakest hit to consider viable. Values below this do not get reported in the table or the HMMER output text downloads.
Bitscore: This bounds the bitscore for the weakest hit to include in the FeatureSet output object. Hits below this threshold are still reported in the table and HMMER text downloads.
Max Accepts: Hard cap on how many hits to report (Default: 1000)
Output Object: Gene hits are captured in a FeatureSet output object. If there are additional user-defined thresholds, those are filtered out and do not appear in the object, even if they are shown in the output table. The Output object name is used as a basename to which the MSA/HMM name is prepended if the user has selected to separate FeatureSets.
Output HTML Table: The tab-delimited hit table is HTML formatted and additionally shows the region of the hit sequence (as there is no query sequence) covered by the HMMER alignment. Hits that are above the e-value threshold but below other thresholds and are not included in the FeatureSet output object are shown in gray, with the attributes that were below the threshold in red. A separate table is made for each MSA/HMM.
Downloadable files: HMMER table and Stockholm format MSA are available for download. These are not altered from the direct output from the HMMER run. The text output is generated for each MSA/HMM.
- Eddy SR. Accelerated Profile HMM Searches. PLOS Computational Biology. 2011;7: e1002195. doi:10.1371/journal.pcbi.1002195 , https://journals.plos.org/ploscompbiol/article?id=10.1371/journal.pcbi.1002195
- Huang L, Zhang H, Wu P, Entwistle S, Li X, Yohe T, Yi H, Yang Z, Yin Y. dbCAN-seq: a database of carbohydrate-active enzyme (CAZyme) sequence and annotation. Nucleic Acids Research. 2018;46: D516-D521. doi:10.1093/nar/gkx894 , https://academic.oup.com/nar/article/46/D1/D516/4372485
- HMMER v3.3 source: , http://HMMER.ORG
Module Commit: 3527ec7cd6839ff217d8c96d2cc0b9bff4d62b08