View distributions of contig characteristics for different assemblies.
Compare Assembled Contig Distributions allows the user to do a side-by-side comparison of assemblies in terms of their lengths and size distribution of the component contigs. Length and distribution are important because longer contigs are typically more desirable. The output contains several plots which were chosen because they emphasize the contribution of longer contigs. The plots and the colored table are essentially identical to the source of their inspiration: QUAST. Although QUAST is not actually run, instead the values are computed by this App. This App also has a vertical table layout of the assemblies, and additionally offers histograms of the contig lengths, broken up into length regimes to allow for more visible differences in the longer regimes with fewer counts.
Assembly Object(s): The Assembly object is a collection of assembled genome fragments, called "contigs". Most commonly, assembly objects can be created through the use of one of the many assembly Apps being run on reads, but they can also be imported as fasta, or as part of a GenBank genome. The contig length distributions usually differ depending on the input sequence data, the assembler, and the parameterization of the assembler. These comparisons are meant to aid the user in selecting an assembly that might be best for their downstream purposes, such as for extracting genomes from a metagenome. This App may be run on a single Assembly, but is really meant for comparing multiple assemblies that may be provided one-by-one or as a group in one or more AssemblySet objects. This tool can be an effective way to see what assembler works best for the user s data.
Output Object: This App does not create an output object.
- The top of the report has two plots comparing the assemblies by cumulative sum of contig lengths.
- The first plot reports the cumulative sum on the y-axis with the sorted contig lengths on the x-axis. The contigs are sorted from the longest to the shortest and the scale is the percent of contigs represented.
- The second plot again offers a cumulative sum. This time, the cumulative sum is on the x-axis while the y-axis reports the contig lengths (instead of percent of contigs). This plot is similar to the Nx plot available from QUAST but uses the summed length instead of percent of contigs as the x-coordinate.
- Below the two plots is a table of information on each Assembly. The categories in the table are colored from blue (BEST) to red (WORST) for each category across the participating Assembly objects. The columns in the table below are as follows:
- ASSEMBLY: There is one entry for this for each participating assembly object.
- Nx/Lx For further details, please see the following example.
- The next 3 columns are meant to be read as a group:
- LENGTH(bp): Contig length threshold
- NUM CONTIGS: Number of contigs that are >= to the threshold length
- SUM LENGTH (bp): Running sum of all bp length of all contigs that are >= to the threshold length
- Contig Length Histogram (1bp <= len < 10Kbp) : Displays contig length distributions of short contigs (under 10Kbp)
- Contig Length Histogram (10Kbp <= len < 100Kbp) : Displays contig length distributions of medium length contigs (10Kbp to 100Kbp)
- Contig Length Histogram (len >= 100Kbp) : Displays contig length distributions of long length contigs (greater than 100Kbp)
Downloadable files: All the plots from this App are available for download in both PNG image and PDF document formats. The HTML report may be saved using the browser.
Team members who developed & deployed App in KBase: Dylan Chivian. For questions, please contact us.
- Arkin AP, Cottingham RW, Henry CS, Harris NL, Stevens RL, Maslov S, et al. KBase: The United States Department of Energy Systems Biology Knowledgebase. Nature Biotechnology. 2018;36: 566. doi: 10.1038/nbt.4163 , https://www.nature.com/articles/nbt.4163
Module Commit: acf7ecefee20159051a34bbd3a1d2efbebc143c0