Build a Multiple Sequence Alignment (MSA) for nucleotide sequences using MUSCLE.
This App builds a Multiple Sequence Alignment (MSA) of nucleotide sequences with MUSCLE. MUSCLE is one of the most widely-used methods in biology. It performs an MSA and does so, according to their website, with accuracy and speed that are consistently better than CLUSTALW . The KBase implementation takes a FeatureSet object with a list of DNA references, extracts the sequences, and performs the alignment. The MSA can then be downloaded in FASTA and Clustal format. The MSA is also stored as an MSA data object in KBase for downstream analysis Apps such as FASTTREE-2 Phylogenetic Tree Builder.
The MUSCLE nucleotide app can also accept a SingleEndLibrary (e.g. 16S amplicons).
Inputs and Parameters:
- Input Sequences: The KBase object with the set of sequences to align. A KBase FeatureSet object is composed of protein-coding genes, and their corresponding DNA gene sequences (untranslated sequences) will be retrieved from their source Genomes or Annotated Metagenome Assembly objects as input. The FeatureSet can be built with Build Feature Set from Genome, Merge FeatureSets, and/or Logical Slice Two FeatureSets. FeatureSets are also created as output from BLAST and HMMER Apps. Typically protein sequence is used to align protein coding genes, but this App might be used if the genes are very closely related and primarily differ at the nucleotide level. Alternatively, RNA encoding genes like tRNA and rRNA, or promoter and TF-binding sites need to be aligned using the nucleotide sequences.
- MSA Description: A description for the output object is mandatory. It is part of the object as a label to distinguish the output, but serves no other purpose.
- Max Iterations: The upper bound for the number of iterations if MUSCLE fails to converge.
- Max Hours: The upper bound for the number of hours to run if MUSCLE fails to converge. Hours are in decimal format (e.g. enter "0.5" for 30 minutes).
- Output MSA: The name of the generated output MSA object to save in your Narrative.
Output:
- Output MSA: An output MSA object is created for use in subsequent analysis. This object contains the alignment sequences, the labels for the rows, the row order, and the description.
- Output Visualization: The MSA is shown in Clustal format.
- Links to Downloadable files: Two MSA file formats are automatically created for download (Clustal and FASTA formats).
Team members who implemented algorithm in KBase: Dylan Chivian. For questions, please contact us.
Related Publications
- Edgar RC. MUSCLE: multiple sequence alignment with high accuracy and high throughput. Nucleic Acids Res. 2004;32: 1792 1797. doi:10.1093/nar/gkh340 , https://academic.oup.com/nar/article/32/5/1792/2380623
- MUSCLE 3.8.425 Source: , http://www.drive5.com/muscle/
App Specification:
https://github.com/kbaseapps/kb_muscle/tree/f6cbe3df490aa7edc4788000e6964f601cb9c83a/ui/narrative/methods/MUSCLE_nucModule Commit: f6cbe3df490aa7edc4788000e6964f601cb9c83a