Search for matches to a Multiple Sequence Alignment (MSA)
This method performs a prot-prot (protein sequence alignment) psiBLAST Search that uses the input MSA as the profile with which to search using NCBI's BLAST+ (version 2.6.0)
Altschul SF, Madden TL, Sch ffer AA, Zhang J, Zhang Z, Miller W, & Lipman DJ. (1997) "Gapped BLAST and PSI-BLAST: a new generation of protein database search programs." Nucleic Acids Res. 25:3389-3402. doi: 10.1093/nar/25.17.3389
psiBLAST MSA Start is a protein multiple sequence alignment (MSA) search against a protein sequence database. Iteration is not currently implemented, and therefore a multiple sequence alignment must first be generated using an App such as MUSCLE to build an MSA from a FeatureSet object. The KBase implementation permits searching through the genes in a Genome object, the genes in the Genome members of a GenomeSet, or the genes in a FeatureSet. The output object of these searches is a FeatureSet containing those genes that pass the thresholds given by the user. The App also provides a table of the hits (with those hits that are below the thresholds in gray) and links to download other formats of BLAST output files.
Query MSA Object: The MSA to use as the Query. It must be generated, using an App such as MUSCLE.
Targets Object: The Targets Object may be a FeatureSet of genes, a Genome, or a GenomeSet. A BLAST search database will be automatically generated from the Targets Object.
Output Object: This is the set of genes that are both hit and pass user-defined thresholds.
E-value: This bounds the e-value for the weakest hit to consider viable. Values below this do not get reported in the table or the BLAST output text downloads.
Bitscore: This bounds the bitscore for the weakest hit to include in the FeatureSet output object. Hits below this threshold are still reported in the table and BLAST text downloads.
Alignment Overlap Threshold (%): This bounds the overlap percentage (portion of the LONGEST sequence in the MSA covered by the hit alignment) for inclusion in the FeatureSet output object. Hits below this threshold are still reported in the table and BLAST text downloads.
Max Accepts: Hard cap on how many hits to report (Default: 1000)
Extra Text Output format: The BLAST m=7 (tab-delimited table) text output format is available automatically for download. A user may request up to one extra format to be generated and downloadable. These include
- 0 (pairwise)
- 1 (query-anchored showing identities)
- 2 (query-anchored no identities)
- 3 (flat query-anchored, show identities)
- 4 (flat query-anchored, no identities)
- 5 (XML Blast output)
- 8 (Text ASN.1)
- 9 (Binary ASN.1)
- 10 (Comma-separated values)
- 11 (BLAST archive format ASN.1)
Output Object: Gene hits are captured in a FeatureSet output object. If there are additional user-defined thresholds, those are filtered out and do not appear in the object, even if they are shown in the output table.
Output HTML Table: The tab-delimited hit table is HTML formatted and additionally shows the region of the query covered by the BLAST alignment. Hits that are above the e-value threshold but below other thresholds and are not included in the FeatureSet output object are shown in gray, with the attributes that were below the threshold in red.
Downloadable files: BLAST text outputs that are requested (as indicated in Configuration above) are available for download. These are not altered from the direct output from the BLAST run. The m=7 (tab-delimited) format is always provided.
Module Commit: d65ca570cd7e336f5d99329d4d9c032f63056f31