Assemble DNA reads into a set of contigs (an Assembly object) using the ARAST Assembly Service.
This is App is now inactive according to the KBase Policy for App Deprecation as it is no longer supported by the developer.
This app can be used to perform an automatic genome assembly using the latest computational tools. Single or multiple assemblers can be invoked to compare results. Resulting assemblies are automatically processed via a collection of analysis tools developed by both KBase and the research community. The app attempts to select the best assembly (the smallest number of contigs, the longest average contig length) to suggest to the user.
Several assembly workflows or "recipes" are available. These workflows have been tuned and tested to fit certain dataset types or desired analysis criteria such as throughput or rigor. The compute engine's flexible nature also enables the rapid design and emulation of other popular protocols.
Additionally, custom workflows can be designed and executed in "pipeline" mode without having to compose complicated scripts. Workflows can be composed with combinations of quality filtering or trimming, error correction, adapter removal, assembly, scaffolding, or post-processing.
Assembly Recipe Descriptions:
Automatic Assembly:
- Provides a nice balance between "fast pipeline" and "smart pipeline"
- Runs BayesHammer on reads
- Assembles with Velvet[25], IDBA[20] and SPAdes[2]
- Sorts assemblies by ALE score[7]
Fast Pipeline:
- Assembles with A6[1], Velvet[25] and SPAdes[2] (with BayesHammer for error correction)
- Results are sorted by ARAST quality score
Smart Pipeline:
- Runs BayesHammer[19] on reads, KmerGenie[5] to choose hash-length for Velvet[25]
- Assembles with Velvet[25], IDBA[20] and SPAdes[2]
- Sorts assemblies by ALE score[7]
- Merges the two best assemblies with GAM-NGS[24]
Kiki assembler[15]:
- Runs Kiki assembler
Assembly Recipe Descriptions:
- sspace: SSPACE pre-assembled contig scaffolder[3] default values: extend: False minimum_overlap: 15 a: 0.4 m: -1 n: -1 k: -1 x: 0
- trim_sort: DynamicTrim and LengthSort from SolexaQA[8] default values: probcutoff: 0.05 length: 25
- filter_by_length: Length-based sequencing reads filter and trimmer based on seqtk[11] default values: min: 250 end: 200 sync: True
- KmerGenie: Informed and automated k-mer size selection for genome assembly[5]
Team members who developed & deployed algorithm in KBase: Chris Bun, Fangfang Xia. For questions, contact us.
Related Publications
- [1] A6 Github source: , https://github.com/levinas/a5
- [2] Bankevich A, Nurk S, Antipov D, Gurevich AA, Dvorkin M, Kulikov AS, et al. SPAdes: a new genome assembly algorithm and its applications to single-cell sequencing. J Comput Biol. 2012;19: 455 477. doi:10.1089/cmb.2012.0021 , https://www.liebertpub.com/doi/10.1089/cmb.2012.0021
- [3] Boetzer M, Henkel CV, Jansen HJ, Butler D, Pirovano W. Scaffolding pre-assembled contigs using SSPACE. Bioinformatics. 2011;27: 578 579. doi:10.1093/bioinformatics/btq683 , https://academic.oup.com/bioinformatics/article/27/4/578/197626
- [4] Boisvert S, Raymond F, Godzaridis , Laviolette F, Corbeil J. Ray Meta: scalable de novo metagenome assembly and profiling. Genome Biology. 2012;13: R122. doi:10.1186/gb-2012-13-12-r122 , https://genomebiology.biomedcentral.com/articles/10.1186/gb-2012-13-12-r122
- [5] Chikhi R, Medvedev P. Informed and automated k-mer size selection for genome assembly. Bioinformatics. 2014;30: 31 37. doi:10.1093/bioinformatics/btt310 , https://academic.oup.com/bioinformatics/article/30/1/31/235479
- [6] Chin C-S, Alexander DH, Marks P, Klammer AA, Drake J, Heiner C, et al. Nonhybrid, finished microbial genome assemblies from long-read SMRT sequencing data. Nature Methods. 2013;10: 563 569. doi:10.1038/nmeth.2474 , https://www.nature.com/articles/nmeth.2474
- [7] Clark SC, Egan R, Frazier PI, Wang Z. ALE: a generic assembly likelihood evaluation framework for assessing the accuracy of genome and metagenome assemblies. Bioinformatics. 2013;29: 435 443. doi:10.1093/bioinformatics/bts723 , https://academic.oup.com/bioinformatics/article/29/4/435/199222
- [8] Cox MP, Peterson DA, Biggs PJ. SolexaQA: At-a-glance quality assessment of Illumina second-generation sequencing data. BMC Bioinformatics. 2010;11: 485. doi:10.1186/1471-2105-11-485 , https://bmcbioinformatics.biomedcentral.com/articles/10.1186/1471-2105-11-485
- [9] Discovar source: , https://software.broadinstitute.org/software/discovar/blog/
- [10] FastQC source: , http://www.bioinformatics.babraham.ac.uk/projects/fastqc/
- [11] Filter by Length GitHub source: , https://github.com/levinas/seqtk
- [12] Gurevich A, Saveliev V, Vyahhi N, Tesler G. QUAST: quality assessment tool for genome assemblies. Bioinformatics. 2013;29: 1072 1075. doi:10.1093/bioinformatics/btt086 , https://academic.oup.com/bioinformatics/article/29/8/1072/228832
- [13] Hunt M, Kikuchi T, Sanders M, Newbold C, Berriman M, Otto TD. REAPR: a universal tool for genome assembly evaluation. Genome Biology. 2013;14: R47. doi:10.1186/gb-2013-14-5-r47 , https://genomebiology.biomedcentral.com/articles/10.1186/gb-2013-14-5-r47
- [14] Hyatt D, Chen G-L, LoCascio PF, Land ML, Larimer FW, Hauser LJ. Prodigal: prokaryotic gene recognition and translation initiation site identification. BMC Bioinformatics. 2010;11: 119. doi:10.1186/1471-2105-11-119 , https://bmcbioinformatics.biomedcentral.com/articles/10.1186/1471-2105-11-119
- [15] Kiki GitHub source: , https://github.com/GeneAssembly/kiki
- [16] Langmead B, Salzberg SL. Fast gapped-read alignment with Bowtie 2. Nat Methods. 2012;9: 357 359. doi:10.1038/nmeth.1923 , https://www.nature.com/articles/nmeth.1923
- [17] Lassmann T, Hayashizaki Y, Daub CO. TagDust a program to eliminate artifacts from next generation sequencing data. Bioinformatics. 2009;25: 2839 2840. doi:10.1093/bioinformatics/btp527 , https://academic.oup.com/bioinformatics/article/25/21/2839/227883
- [18] Li H, Durbin R. Fast and accurate short read alignment with Burrows-Wheeler transform. Bioinformatics. 2009;25: 1754 1760. doi:10.1093/bioinformatics/btp324 , https://academic.oup.com/bioinformatics/article/25/14/1754/225615
- [19] Nikolenko SI, Korobeynikov AI, Alekseyev MA. BayesHammer: Bayesian clustering for error correction in single-cell sequencing. BMC Genomics. 2013;14: S7. doi:10.1186/1471-2164-14-S1-S7 , https://bmcgenomics.biomedcentral.com/articles/10.1186/1471-2164-14-S1-S7
- [20] Peng Y, Leung HCM, Yiu SM, Chin FYL. IDBA-UD: a de novo assembler for single-cell and metagenomic sequencing data with highly uneven depth. Bioinformatics. 2012;28: 1420 1428. doi:10.1093/bioinformatics/bts174 , https://academic.oup.com/bioinformatics/article/28/11/1420/266973
- [21] Simpson JT, Durbin R. Efficient de novo assembly of large genomes using compressed data structures. Genome Res. 2012;22: 549 556. doi:10.1101/gr.126953.111 , https://genome.cshlp.org/content/22/3/549.abstract
- [22] SWAP-Assembler source: , https://sourceforge.net/projects/swapassembler/
- [23] Tritt A, Eisen JA, Facciotti MT, Darling AE. An Integrated Pipeline for de Novo Assembly of Microbial Genomes. PLOS ONE. 2012;7: e42304. doi:10.1371/journal.pone.0042304 , https://journals.plos.org/plosone/article?id=10.1371/journal.pone.0042304
- [24] Vicedomini R, Vezzi F, Scalabrin S, Arvestad L, Policriti A. GAM-NGS: genomic assemblies merger for next generation sequencing. BMC Bioinformatics. 2013;14: S6. doi:10.1186/1471-2105-14-S7-S6 , https://bmcbioinformatics.biomedcentral.com/articles/10.1186/1471-2105-14-S7-S6
- [25] Zerbino DR, Birney E. Velvet: algorithms for de novo short read assembly using de Bruijn graphs. Genome Res. 2008;18: 821 829. doi:10.1101/gr.074492.107 , https://genome.cshlp.org/content/18/5/821
- [26] Zimin AV, Mar ais G, Puiu D, Roberts M, Salzberg SL, Yorke JA. The MaSuRCA genome assembler. Bioinformatics. 2013;29: 2669 2677. doi:10.1093/bioinformatics/btt476 , https://academic.oup.com/bioinformatics/article/29/21/2669/195975
App Specification:
https://github.com/kbaseapps/ARAST_SDK/tree/056582c691c4df190110b059600d2dc2a3a8b80a/ui/narrative/methods/run_arastModule Commit: 056582c691c4df190110b059600d2dc2a3a8b80a