Assemble reads using the SPAdes assembler.
The Assemble Reads with SPAdes App allows the user to assemble a genome from reads using the SPAdes 3.15.3 assembler, which is designed for small genomes and single cell sequencing.
SPAdes is a de Bruijn graph-based assembler, which is noteworthy for its approach in applying multiple de Bruijn graphs (each built with different k-mer sizes) to better handle the large variations in coverage across the genome that are a characteristic of single cell sequencing, as well as a novel method for handling paired end information [1]. It begins its assembly process by using multisized de Bruijn graphs for constructing the assembly graph while detecting and removing chimeric reads. Next, distances between the k-mers are estimated for mapping the edges of the assembly graph. Afterwards, a paired assembly graph is constructed and SPAdes outputs a set of contiguous DNA sequences (contigs).
The App only takes paired-end reads library as input; using a single-end library will result in an error. The user sets one basic parameter for the input DNA type: standard for isolate DNA, single cell for flow-sorted bacterial cells from multiple displacement amplification (MDA), or plasmid for plasmid DNA. After setting the input and DNA type parameter, the user then provides a name for the output assembly.
Additionally, there are three advanced input parameters: (i) minimum contig length to report (default 500), (ii) a list of k-mer sizes for the de Bruijn graphs, (iii) assembly only, which prevents any error correction. By default, SPAdes has its own methods for choosing which k-mer sizes to use, depending on the input sequence data type. The user can override automatic k-mer value selection by using the second advanced input parameter. Read more information about manually selecting k-mer values in the SPAdes user manual.
Upon successful completion, the App creates a KBase Assembly object, which will appear in the data pane. A QUAST quality assessment report and summary are also generated.
For metagenome assembly, please use the Assemble Reads with metaSPAdes App.
Operational notes:
- Currently the App only supports Illumina, IonTorrent, PacBIO CLR and PacBIO CCS reads
- The --careful flag is always used, except for metagenomic assemblies where it is not allowed.
- Illumina and IonTorrent reads can not be mixed in the same assembly.
- PacBIO CLR needs to be run with at least one accompanying Illumina or IonTorrent library.
- The k-mer parameter is autodetected by SPAdes if the values are not specified in input.
- The PHRED parameter is autodetected by EAUtils.
SPAdes version: 3.15.3 was released under GPLv2 on July 22, 2021
Team members who developed & deployed algorithm in KBase: Gavin Price. For questions, please contact us.
Related Publications
- Bankevich A, Nurk S, Antipov D, Gurevich AA, Dvorkin M, Kulikov AS, et al. SPAdes: A New Genome Assembly Algorithm and Its Applications to Single-Cell Sequencing. Journal of Computational Biology. 2012;19: 455-477. doi: 10.1089/cmb.2012.0021 , https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3342519/
- Prjibelski A, Antipov D, Meleshko D, Lapidus A, Korobeynikov A. Using SPAdes De Novo Assembler. Curr Protoc Bioinformatics. 2020 Jun;70(1):e102. doi: 10.1002/cpbi.102. , https://currentprotocols.onlinelibrary.wiley.com/doi/10.1002/cpbi.102
App Specification:
https://github.com/kbaseapps/kb_SPAdes/tree/ee3f83f9d451a7d0ad86709e55e8653bee497e30/ui/narrative/methods/run_SPAdesModule Commit: ee3f83f9d451a7d0ad86709e55e8653bee497e30