App Catalog
Sign Up Sign In
BLASTx nuc-prot Search - v2.13.0+
kb_blast

v.1.7.0

By: dylan

Launch

Search for protein matches to an input nucleotide sequence.

This App performs a nucleotide-protein (translated protein sequence alignment) BLASTx Search using NCBI's BLAST+ (version 2.11.0).

BLASTx is a translated nucleotide sequence (the query) search against a protein sequence database (the subject, a.k.a. target, sequences). The KBase implementation permits single nucleotide queries against subjects that are protein-coding genes in a Genome object, in the Genome members of a GenomeSet or SpeciesTree, the genes in an Annotated Metagenome Assembly, or the features in a FeatureSet. The results of the search are displayed as a table, saved to a downloadable text file, and saved as a KBase FeatureSet object for later use.

All output formats respect the e-value cutoff threshold. The on-screen table and downloadable files give the user the opportunity to examine the consequences of the other three thresholds (percent identity, bit score, and alignment coverage). On-screen, the proteins that fail to pass one or more of these three thresholds will appear in a gray line with the specific failure highlighted in red. The downloadable files give users the most flexibility for exploring alternative thresholds. All hits that are below the e-value threshold are included in the downloadable text files and users can examine all of the cutoffs without having to rerun the App. Several NCBI BLAST formats are supported in the downloadable files (discussed below under extra text output).

At this time, KBase does not have a database equivalent of NR. Large GenomeSets for searching can be created through the insertion of genomes into a species tree, annotation of an AssemblySet, or adding to and/or merging GenomeSets. Several Apps are available to support these set operations.

Your input must provide either a query DNA sequence or an input query object, and it must contain a single DNA nucleic acid sequence. At this time, KBase does not support multiple query DNA sequences. The query can be in the form of a SequenceSet object or a single nucleic acid sequence. An amino acid sequence will produce an error message.

Input:

Input Query Object: You must provide either a query DNA sequence or an input query object, and it must contain a single nucleic acid sequence. A valid query object is a SequenceSet object with a single nucleotide sequence.

Input Query DNA Sequence: If you don't provide an Input Query Object, you must copy-and-paste in a query DNA sequence. The format can be with or without a Fasta header line. If this query DNA sequence is used, you must also supply an output name for the single-element SequenceSet object that will be saved. The resulting SequenceSet can then be used in subsequent blasts runs.

Search Targets: The search database must be an object in your Narrative containing protein sequences. It may be a FeatureSet of genes, a Genome or a GenomeSet, the Genomes in a SpeciesTree, or an Annotated Metagenome Assembly. More than one object may be added to the Search Targets. The App will automatically generate a database from the narrative object for BLASTx.

Parameters:

E-value: This sets the maximal e-value threshold for the reported search hits. Hits with e-values above this threshold do not get reported in any of the output formats, i.e., the on-screen table, the text downloads, or the save FeatureSet.

The following three thresholds only affect the saved FeatureSet object:

Max Accepts(advanced): Hard limit on how many hits to report. The default is 1000.

Allow Mistranslation(advanced): It sometimes happens that a eukaryotic contig is mixed in with bacterial or archaeal contigs, such as in a metagenome assembly. Some methods will annotate these genes correctly but the correct genetic code (e.g. 1 or 4) is not passed through to generation of the BLAST database of the target genes and the Bacterial and Archael code 11 will be used instead. For example, if it should be genetic code 4, this can lead to an internal STOP (aka TER) codon instead of the correct tryptophan (W). This flag will suppress translation and inclusion of such genes from the BLAST search database if an internal STOP is found. The default is to write mistranslations. Note: if an input gene length is not a multiple of 3, such genes are never translated. Such frameshifts or intron splicing must be handled upstream of this App.

Extra Text Output format(advanced) NCBI BLAST has several defined output formats (in the section called outfmt). Among them, the BLAST m=7 (tab-delimited table) text output format is automatically generated and is available for download, so should not be redundantly included here. A user may request up to one extra format to be generated and made downloadable. These include:

Output:

BLAST Hits Object: BLAST hits (proteins) that pass all the user-defined filters are saved in an output FeatureSet. This field is for the name of the new FeatureSet.

Output HTML Table: The on-screen table includes all the BLAST hits that meet the e-value cutoff threshold. It includes several columns commonly found in BLAST output and includes a graphic with the region of the query covered by the BLAST alignment. The table gives users the opportunity to explore the consequences of the other three thresholds (percent identity, bit score, and alignment coverage). On-screen, the hits that exceed these thresholds are included but appear in a gray line with the threshold that was not met highlighted in red. This gives users the opportunity to refine their thresholds, rerun the App, and recreate the output FeatureSet.

Downloadable files: The downloadable files include all the BLAST hits that meet the e-value cutoff threshold. This gives the user the opportunity to explore the consequences of the other three thresholds (percent identity, bit score, and alignment coverage). After download, the thresholds can be explored without rerunning the App. By default, the BLAST output is automatically available for download in a tab-delimited (m=7, formerly m=8) format. Up to one additional format can be selected. The additional formats are found in the advanced parameters as Extra Text Output format . These formats are not altered from the direct output from the BLAST run.

Output Query Object: If the Query DNA Sequence was used above, it will be saved as SequenceSet object with a single nucleotide sequence. You must supply a name for this new object.

NOTE:

The error message No sequence found in fasta_str or local variable 'appropriate_sequence_found_in_one_input' referenced before assignment is a sign that the query DNA sequence may not be nucleotides. It might be an amino acid sequence which doesn t work with this app.

Team members who implemented algorithm in KBase: Dylan Chivian. For questions, please contact us.

Related Publications


App Specification:

https://github.com/kbaseapps/kb_blast/tree/791f72df62105af2c74f436e8f3452c932e8db68/ui/narrative/methods/BLASTx_Search

Module Commit: 791f72df62105af2c74f436e8f3452c932e8db68