Assembly and Annotation in KBase

Assemble & AnnotateKBase provides multiple Apps for de novo assembly of prokaryotic Next-Generation Sequencing (NGS) reads from various sequencing platforms. These assemblies can then be annotated with RAST or Prokka, enabling you to explore structural and functional features of a Genome or use it in other analyses. The interactive Assembly & Annotation Narrative tutorial is a good way to learn about this powerful functionality.

KBase Assembly and Annotation Capabilities

Reads Quality Control and Assessment

  • Read trimming and adaptor removal with Trimmomatic
  • Filter low complexity reads with PRINSEQ
  • Quality assessment and reporting using FastQC
  • Custom adapter removal with cutadapt

De novo Assembly

  • De novo assembly of Illumina and Ion Torrent next-generation sequencing reads
  • Multiple popular assemblers: A5, A6, IDBA-UD, Kiki, MaSuRCA, MEGAHIT, MiniASM, Ray, SPAdes and Velvet
  • Support for single-end and paired-end read libraries
  • Compare assemblies with QUAST

Genome Annotation

  • Annotate structural and functional features of prokaryotic genomes using RAST (Rapid Annotations using Subsystems Technology)
  • Annotate coding DNA and protein sequences in plants with RAST
  • Annotate features of prokaryotic genomes, plasmids and metagenomes using Prokka
  • Annotate a genome with protein domains from widely used domain libraries: CDD, TIGRFAM, Pfam

The output of the annotation apps is a Genome, which is displayed in a tabular genome viewer (see below) that shows information about the Genome as well as a list of contigs and the genes that were called on each contig.



Assembly Apps

  • Assemble Contigs from Reads – runs several different assembly programs and lets users compare the quality of outputs (see above for more information).
  • Assemble with A5A5-miseq is good for high-quality microbial genome assembly and does so without the need for parameter tuning on the part of the user. It is an integrated meta-assembly pipeline that cleans reads, performs error correction, assembles contigs, performs scaffolding and then performs misassembly correction before constructing the final scaffold.
  • Assemble with A6 – A6 is an Argonne-modified version of the original A5 microbial assembly. A6’s modifications over A5 include a bug fix in detecting Phred64 quality coding and replacing IDBA with IDBA-UD for improved assembly accuracy and stability.
  • Assemble with IDBA-UDIDBA-UD is an iterative graph-based assembler for single-cell and standard short read data and is good for data of highly uneven sequencing depth. This assembler uses an iterative approach for selecting k-mer size that compensates for the information loss associated with single k-mer based de Bruijn graphs, making IDBA-UD one of the more accurate microbial assemblers.
  • Assemble with KikiKiki is a fast, parallel microbial and metagenomic assembler that uses a hybrid of the overlap-layout-consensus strategy and greedy contig extension. Compared to de Bruijn graph-based methods, this approach allows for less information loss without the need for chopping reads into shorter k-mers.
  • Assemble with MaSuRCAMaSuRCA is a short read assembler that combines the benefits of de Bruijn graph and overlap layout consensus assembly approaches. The main concept is the creation of super-reads that contain sequence information present in the original reads, which super-reads are then extended in both directions using an efficient k-mer lookup table. MaSuRCA is one of a smaller set of assemblers biologists use for eukaryotic assembly.
  • Assemble with MEGAHITMEGAHIT is a single node assembler for large and complex metagenomics NGS reads. It makes use of succinct de Bruijn graph (SdBG) to achieve low memory assembly, making it fast and especially suitable for assembly of small metagenomes, metatranscriptomes or low-coverage data in general.
  • Assemble with MiniASMMiniASM is an ultra-fast overlap-layout-consensus based de novo assembler for noisy long reads developed. It has been shown to assemble ~50X microbial PacBio reads into a draft assembly of a small number of contigs in a matter of minutes. MiniASM derives this performance from a locality-sensitive hashing based overlapper implemented in minimap.
  • Assemble with RayRay is a parallel, graph-based microbial and metagenomic assembler. Ray improves on the standard de Bruijn graph based algorithm by continuing contig-building at the unitigs by employing greedy heuristics to extend paths, keeping track of the reads from which the k-mers came from and the read pairs from paired-end reads, and by using a repeat removal algorithm inspired by SPAdes.
  • Assemble with SPAdesSPAdes is a single-cell and standard assembler based on paired de Bruijn graphs, considered to be one of the most accurate microbial assemblers. SPAdes employs a multisized de Bruijn graph which detects and removes bubble and chimeric reads, estimates insert distance from paired kmers, and computes contigs based on paired assembly graph.
  • Assemble with VelvetVelvet is a classic de Bruijn graph based assembler that works by efficiently manipulating de Bruijn graphs through simplification and compression. It eliminates errors and resolves repeats by first using an error correction algorithm that merges sequences together. Repeats are then removed from the sequence via the repeat solver that separates paths which share local overlaps.

Annotation Apps

Assembly and Annotation Tutorials