Search for untranslated feature matches to a nucleotide query sequence.
This App performs a nucleotide-nucleotide (nucleotide sequence alignment) BLASTn Search using NCBI's BLAST+ (version 2.11.0). At this time, the App does not search the non-coding regions of target Genomes and cannot be used to find RNAs, repeats, or other non-coding features in a Genome.
BLASTn is a nucleotide sequence search against a nucleotide sequence database. The KBase implementation is restricted to searches of the DNA sequences of features in a Genome object, the Genome members of a GenomeSet or SpeciesTree, the features in an Annotated Metagenome Assembly, or the features in a FeatureSet. The results of the search are displayed as a table, saved to a downloadable text file, and saved as a KBase FeatureSet object for later use.
All output formats respect the e-value cutoff threshold. The on-screen table and downloadable files give the user the opportunity to examine the consequences of the other three thresholds (percent identity, bit score, and alignment coverage). On-screen, the proteins that fail to pass one or more of these three thresholds will appear in a gray line with the specific threshold that was not met highlighted in red. The downloadable files give users the most flexibility for exploring alternative thresholds. All hits that are below the e-value threshold are included in the downloadable text files and users can examine all of the cutoffs without having to rerun the App. Several NCBI BLAST formats are supported in the downloadable files (discussed below under extra text output).
At this time, KBase does not have a database equivalent of NR. Large GenomeSets for searching can be created through the insertion of genomes into a species tree, annotation of an AssemblySet, or adding to and/or merging GenomeSets. Several Apps are available to support these set operations.
Your input must provide either a query DNA sequence or an input query object, and it must contain a single DNA nucleic acid sequence. At this time, KBase does not support multiple query DNA sequences. The query can be in the form of a SequenceSet object or a single nucleic acid sequence. An amino acid sequence will produce an error message.
Input:
Input Query Object: You must provide either a query DNA sequence or an input query object and it must contain a single nucleic acid sequence. A valid query object is a SequenceSet with a single nucleotide sequence.
Input Query DNA Sequence: If you don't provide an input query object, you must copy-and-paste in a query DNA sequence. The format can be with or without a Fasta header line. If this query DNA sequence is used, you must also supply an output name for the single-element SequenceSet object that will be saved. The resulting SequenceSet can then be used in subsequent BLASTn runs.
Search Targets: The search database must be an object in your Narrative containing untranslated gene sequences. It may be a FeatureSet of genes, a Genome or a GenomeSet, the Genomes in a SpeciesTree, or an Annotated Metagenome Assembly. More than one object may be added to the Search Targets. The App will automatically generate a database from the Narrative object for BLASTn.
Parameters:
E-value: This sets the maximal e-value threshold for the reported search hits. Hits with e-values above this threshold do not get reported in any of the output formats, i.e., the on-screen table, the text downloads, or the save FeatureSet.
The following three thresholds only affect the saved FeatureSet object:
- Bit Score: This bounds the bit score for the weakest hit to include in the FeatureSet output object. Hits below this threshold are highlighted in red in the on-screen table. Typically, hits with bit scores below 50 are not to be trusted (as are hits with bit scores above 50!).
- Sequence Identity Threshold (%): This bounds the percent sequence identity between the query and each hit for inclusion in the FeatureSet output object. Identity is calculated from the amino acid alignment. The value should be between 1-100. Hits below this threshold are highlighted in red in the on-screen table.
- Alignment Coverage Threshold (%)(advanced): This bounds the percent alignment coverage (portion of the query nucleotide sequence length covered by the hit nucleotide sequence in the alignment) for inclusion in the FeatureSet output object. The value should be between 1-100. Hits below this threshold are highlighted in red in the on-screen table.
Max Accepts(advanced): A hard limit on how many hits to report. The default is 1000.
Extra Text Output format(advanced) NCBI BLAST has several defined output formats (in the section called outfmt). Among them, the BLAST m=7 (tab-delimited table) text output format is automatically generated and is available for download, so it should not be redundantly included here. A user may request up to one extra format to be generated and made downloadable. These include:
- 0 Pairwise
- 1 Query-anchored showing identities
- 2 Query-anchored no identities
- 3 Flat query-anchored, show identities
- 4 Flat query-anchored, no identities
- 5 XML Blast output
- 8 Text ASN.1
- 9 Binary ASN.1
- 10 Comma-separated values
- 11 BLAST archive format ASN.1
Output:
BLAST Hits Object: BLAST hits (genes) that pass all the user-defined filters are saved in an output FeatureSet. This field is for the name of the new FeatureSet.
Output HTML Table: The on-screen table includes all the BLAST hits that meet the e-value cutoff threshold. It includes several columns commonly found in BLAST outputs and includes a graphic with the region of the query covered by the BLAST alignment. The table gives users the opportunity to explore the consequences of the other three thresholds (percent identity, bit score, and alignment coverage). On-screen, the hits that exceed these thresholds are included but appear in a gray line with the threshold that was not met highlighted in red. This gives users the opportunity to refine their thresholds, rerun the App, and recreate the output FeatureSet.
Downloadable files: The downloadable files include all the BLAST hits that meet the e-value cutoff threshold. This gives the user the opportunity to explore the consequences of the other three thresholds (percent identity, bit score, and alignment coverage). After download, the thresholds can be explored without rerunning the App. By default, the BLAST output is automatically available for download in a tab-delimited (m=7, formerly m=8) format. Up to one additional format can be selected. The additional formats are found in the advanced parameters as Extra Text Output format . These formats are not altered from the direct output from the BLAST run.
Output Query Object: If the query DNA sequence was used above, it will be saved as SequenceSet object with a single nucleotide sequence. You must supply a name for this new object.
NOTE:The error message No sequence found in fasta_str or local variable 'appropriate_sequence_found_in_one_input' referenced before assignment is a sign that the query DNA sequence may not be nucleotides. It might be an amino acid sequence which doesn t work with this App.
Team members who implemented App in KBase: Dylan Chivian. For questions, please contact us.
Related Publications
- Altschul SF, Madden TL, Sch ffer AA, Zhang J, Zhang Z, Miller W, et al. Gapped BLAST and PSI-BLAST: a new generation of protein database search programs. Nucleic Acids Res. 1997;25: 3389 3402. doi:10.1093/nar/25.17.3389 , https://academic.oup.com/nar/article/25/17/3389/1061651
- Camacho C, Coulouris G, Avagyan V, Ma N, Papadopoulos J, Bealer K, et al. BLAST+: architecture and applications. BMC Bioinformatics. 2009;10: 421. doi:10.1186/1471-2105-10-421 , https://bmcbioinformatics.biomedcentral.com/articles/10.1186/1471-2105-10-421
App Specification:
https://github.com/kbaseapps/kb_blast/tree/791f72df62105af2c74f436e8f3452c932e8db68/ui/narrative/methods/BLASTn_SearchModule Commit: 791f72df62105af2c74f436e8f3452c932e8db68