Assemble reads using the MaSuRCA assembler.
This is a KBase wrapper for MaSuRCA (Maryland Super Read Cabog Assembler). The MaSuRCA assembler combines the benefits of deBruijn graph and Overlap-Layout-Consensus assembly approaches. Since version 3.2.1, it supports hybrid assembly with short Illumina reads and long high error PacBio/MinION data.
MaSuRCA uses long reads with higher error rates combined with relatively shorter reads with lower error rates to create *mega-reads*, which are them assembled. MaSuRCA extends Illumina reads into *super-reads*, and then aligns the super-reads with PacBio reads to create the mega-reads. The assembly calls for at least 10x coverage of PacBio reads and 100x coverage of Illumina reads.
DATA is specified as type {PE,JUMP,OTHER,PACBIO} and 5 fields:
- two_letter_prefix
- mean
- stdev
- fastq(.gz)_fwd_reads
- fastq(.gz)_rev_reads
The PE reads are always assumed to be innies, i.e. --->.<---, and JUMP are assumed to be outties <---.--->. If there are any jump libraries that are innies, such as longjump, specify them as JUMP and specify NEGATIVE mean. Reverse reads are optional for PE libraries and mandatory for JUMP libraries. Any OTHER sequence data (454, Sanger, Ion torrent, etc) must be first converted into Celera Assembler compatible .frg files (see http://wgs-assembler.sourceforge.com)
More information can be found at the MaSuRCA homepage.
Related Publications
- Zimin AV, Mar ais G, Puiu D, Roberts M, Salzberg SL, Yorke JA. The MaSuRCA genome assembler. Bioinformatics. 2013;29: 2669 2677. doi:10.1093/bioinformatics/btt476 , https://academic.oup.com/bioinformatics/article/29/21/2669/195975
App Specification:
https://github.com/kbaseapps/kb_MaSuRCA/tree/5d815d6f7019b3dc9685306677e475b051de107b/ui/narrative/methods/run_masurca_assemblerModule Commit: 5d815d6f7019b3dc9685306677e475b051de107b