Build a Multiple Sequence Alignment (MSA) for nucleotide sequences using MUSCLE.
This App builds a Multiple Sequence Alignment (MSA) of nucleotide sequences with MUSCLE. MUSCLE is one of the most widely-used methods in biology. It performs an MSA and does so, according to their website, with accuracy and speed that are consistently better than CLUSTALW . The KBase implementation takes a FeatureSet object with a list of DNA references, extracts the sequences, and performs the alignment. The MSA can then be downloaded in FASTA and Clustal format. The MSA is also stored as an MSA data object in KBase for downstream analysis Apps such as FASTTREE-2 Phylogenetic Tree Builder.
Inputs and Parameters:
- Input Sequences: The KBase object with the set of sequences to align. A KBase FeatureSet object is composed of protein-coding genes, and their corresponding mRNA gene sequences (untranslated sequences) will be retrieved from their source genomes as input. The FeatureSet can be built with Build Feature Set from Genome, Merge FeatureSets, and/or Logical Slice Two FeatureSets. FeatureSets are also created as output from BLAST and HMMER Apps.
- MSA Description: A description for the output object is mandatory. It is part of the object download but serves no other purpose.
- Max Iterations: The upper bound for the number of iterations if MUSCLE fails to converge.
- Max Hours: The upper bound for the number of hours to run if MUSCLE fails to converge. Hours are in decimal format (e.g. enter "0.5" for 30 minutes).
- Output MSA: The name of the generated output MSA object to save in your Narrative.
- Output MSA: An output MSA object is created for use in subsequent analysis. This object contains the alignment sequences, the labels for the rows, the row order, and the description.
- Output Visualization: The MSA is shown in Clustal format.
- Links to Downloadable files: Two MSA file formats are automatically created for download (Clustal and FASTA formats).
Team members who implemented algorithm in KBase: Dylan Chivian. For questions, please contact us.
- Edgar RC. MUSCLE: multiple sequence alignment with high accuracy and high throughput. Nucleic Acids Res. 2004;32: 1792 1797. doi:10.1093/nar/gkh340 , https://academic.oup.com/nar/article/32/5/1792/2380623
- MUSCLE 3.8.425 Source: , http://www.drive5.com/muscle/
Module Commit: 9102fa8ccccd09a46156adf20ff6e89a94e0d26d