Generated February 23, 2020

Genome Extraction from Shotgun Metagenome Sequence Data

This Tutorial will guide the user through the process of obtaining high-quality genomes and phylogenetic placement from a metagenome assembly.

Genome Extraction from Shotgun Metagenome Sequence Data Thumbnails

Overview

KBase has powerful tools for metabolic modeling and comparative phylogenomics of microbial genomes that can be used for developing mechanistic understanding of functional interactions between species in microbial ecosystems. Essential to this process is obtaining high-quality genomes to annotate, either via cultivation or genome extraction from metagenome assembly. KBase has a suite of microbiome analysis Apps meant to be used in concert. After assembly and binning, high-quality bins are annotated and can then be used in Comparative Phylogenomics analyses (see Narrative here) and Metabolic Reconstruction and Community Interaction Modeling (see Narrative here).

Below we present the processing of two related Compost Enrichment Metagenomes (37A & 37B) [Ionic Liquids Impact the Bioenergy Feedstock-Degrading Microbiome and Transcription of Enzymes Relevant to Polysaccharide Hydrolysis] from the Joint BioEnergy Institute (JBEI).

Authors: Dylan Chivian (DCChivian@lbl.gov) & Mikayla Clark (clarkmm1@ornl.gov)

Main Lessons from this Narrative

  • Learn how to perform Quality Control of read libraries
  • Learn how to measure taxonomic population structure of environmental shotgun reads
  • Learn how to assemble metagenomes
  • Learn how to compare assembly quality
  • Learn how to bin metagenomic scaffolds into putative lineages (metagenome-assembled genomes, or MAGs)
  • Learn how to assess MAG quality and extract to individual genome assembly objects
  • Learn how to annotate genes to obtain KBase genomes which can be used by KBase analysis Apps
  • Learn how to place MAGs into reference species tree

A Word on Timing

While KBase boasts faster processing and run time on many apps over competitors, please keep in mind that large data sets do take time to analyze. Queue times for metagenomic data sets can appear lengthy during periods of high traffic on the servers. Once through the queue, we hope you enjoy our faster run times of hours and days for complex algorithms over the months it can take using other avenues.

As an example, the table below show displays the queue time, run time, and average run time for a selection of apps used in this tutorital.

Note: Queue and run times are from within this Narrative while the average run time is calculated from all jobs run across KBase with that particular app.

App Queue Time Run Time Average Run Time
Kaiju 1s 1h 37m 1h 3m
Assemble with metaSPAdes 11h 8m 7hr 13m 9h
Assemble Reads with MEGAHIT 1s 6h 2m 2h 58m
Assemble with IDBA-UD 3h 19m 7h 41m 4h 41m
MaxBin2 Contig Binning 2s 4h 19m 1h 13m

Description of Apps

  • FastQC allows users to check the quality of raw sequence data generated by high throughput sequencing pipelines. The results legends differentiate among normal (green tick), slightly abnormal (orange exclamation), and very unusual (red cross) reads.
  • Trimmomatic 1 performs a variety of useful trimming tasks for paired- or single-end Illumina reads improving the overall quality of the data.
  • Kaiju 3,5 generates fast and sensitive taxonomic classification for metagenomic reads by comparing sequences to databases of known microbial proteins. It also generates an interactive metagenomic visualization chart.
  • metaSPAdes 4 assembles metagenomic reads using the SPAdes assembler.
  • MEGAHIT 2 assembles metagenomic reads using the MEGAHIT assembler.
  • IDBA-UD 7 assembles paired-end reads from single-cell or metagenomic sequencing technologies using the IDBA-UD assembler.
  • Compare Assembled Contig Distributions allows the user to view distributions of contig characteristics for different assembly runs.
  • MaxBin2 Contig Binning 9,10 uses nucleotide composition information, source strain abundance, and phylogentic marker genes to perform binning through an Expectation-Maximization algorithm.
  • Assess Genome Quality with CheckM 6 provides a set of tools for assessing the quality of genomes or metagenomes. It also generates robust estimates of genome completeness and contamination.
  • Extract Bins as Assemblies from BinnedContigs extracts bins from a BinnedContig dataset as Assembly objects.
  • Annotate Microbial Assembly annotates a bacterial or archaeal assembly using the RAST (Rapid Annotations using Subsystems Technology) pipeline.
  • Build GenomeSet allows the user to group Genomes into a GenomeSet.
  • Insert Set of Genomes into Species Tree 8 constructs a phylogenetic tree combining the GenomeSet provided by the user with a set of closely related genomes from the KBase list of species.
  • View Tree displays a SpeciesTree or GenetTree as an image and allows users to download images and NEWICK representations. (Note: This app is in Beta, and therefore, the apps panel must be put into Beta rather than Released to add it to the narrative.)

1. Read Hygiene

We begin by importing the sets of paired-end reads in FASTQ format for metagenomes 37A and 37B. The import App creates a PairedEndLibrary object that we can then run through FastQC and Trimmomatic to determine and improve the quality of the reads, respectively.

Note: Running FastQC a second time after the reads had been run through Trimmomatic showed marked improvement in their quality. For this reason, the trimmed reads will be used for further analysis.
Import A FASTQ/SRA File As Reads Into Your Narrative
This app completed without errors in 2h 57m 3s.
Summary
Import Finished Imported Reads: 1 Reads Name: 37A_6437.3.44325.CTTGTA.adnq.fastq.gz_reads Imported Reads File: 6437.3.44325.CTTGTA.adnq.fastq.gz Number of Reads: 95,062,766
Output from Import FASTQ/SRA File As Reads From Staging Area
The viewer for the output created by this App is available at the original Narrative here: https://narrative.kbase.us/narrative/33233
Import A FASTQ/SRA File As Reads Into Your Narrative
This app completed without errors in 2h 46m 36s.
Summary
Import Finished Imported Reads: 1 Reads Name: 37B_6385.3.43508.GATCAG.adnq.fastq.gz_reads Imported Reads File: 6385.3.43508.GATCAG.adnq.fastq.gz Number of Reads: 76,779,376
Output from Import FASTQ/SRA File As Reads From Staging Area
The viewer for the output created by this App is available at the original Narrative here: https://narrative.kbase.us/narrative/33233
A quality control application for high throughput sequence data.
This app completed without errors in 38m 24s.
Links
Files
These are only available in the live Narrative: https://narrative.kbase.us/narrative/33233
  • 37A_6437.3.44325.CTTGTA.adnq.fastq.gz_reads_24019_8_1.fwd_fastqc.zip - Zip file generated by fastqc that contains original images seen in the report
  • 37A_6437.3.44325.CTTGTA.adnq.fastq.gz_reads_24019_8_1.rev_fastqc.zip - Zip file generated by fastqc that contains original images seen in the report
A quality control application for high throughput sequence data.
This app completed without errors in 30m 58s.
Links
Files
These are only available in the live Narrative: https://narrative.kbase.us/narrative/33233
  • 37B_6385.3.43508.GATCAG.adnq.fastq.gz_reads_24019_4_1.fwd_fastqc.zip - Zip file generated by fastqc that contains original images seen in the report
  • 37B_6385.3.43508.GATCAG.adnq.fastq.gz_reads_24019_4_1.rev_fastqc.zip - Zip file generated by fastqc that contains original images seen in the report
Trim paired- or single-end Illumina reads with Trimmomatic.
This app completed without errors in 50m 52s.
Objects
Created Object Name Type Description
37A_Trimm_headcrop5_crop140.PElib_paired PairedEndLibrary Trimmed Reads
37A_Trimm_headcrop5_crop140.PElib_unpaired_fwd SingleEndLibrary Trimmed Unpaired Forward Reads
37A_Trimm_headcrop5_crop140.PElib_unpaired_rev SingleEndLibrary Trimmed Unpaired Reverse Reads
Report
Trim paired- or single-end Illumina reads with Trimmomatic.
This app completed without errors in 33m 3s.
Objects
Created Object Name Type Description
37B_Trimm_headcrop5_crop140.PELib_paired PairedEndLibrary Trimmed Reads
37B_Trimm_headcrop5_crop140.PELib_unpaired_fwd SingleEndLibrary Trimmed Unpaired Forward Reads
37B_Trimm_headcrop5_crop140.PELib_unpaired_rev SingleEndLibrary Trimmed Unpaired Reverse Reads
Report
A quality control application for high throughput sequence data.
This app completed without errors in 29m 20s.
Links
Files
These are only available in the live Narrative: https://narrative.kbase.us/narrative/33233
  • 37A_Trimm_headcrop5_crop140.PElib_paired_24019_61_1.fwd_fastqc.zip - Zip file generated by fastqc that contains original images seen in the report
  • 37A_Trimm_headcrop5_crop140.PElib_paired_24019_61_1.rev_fastqc.zip - Zip file generated by fastqc that contains original images seen in the report
A quality control application for high throughput sequence data.
This app completed without errors in 24m 41s.
Links
Files
These are only available in the live Narrative: https://narrative.kbase.us/narrative/33233
  • 37B_Trimm_headcrop5_crop140.PELib_paired_24019_57_1.fwd_fastqc.zip - Zip file generated by fastqc that contains original images seen in the report
  • 37B_Trimm_headcrop5_crop140.PELib_paired_24019_57_1.rev_fastqc.zip - Zip file generated by fastqc that contains original images seen in the report

2. Classify Taxonomy

Our downstream analysis will generate Species Trees of the organisms present in the compost samples. Classifying the taxonomy with Kaiju is important because it predicts the microbial composition based on protein similarities rather than genome assembly and annotation. This prediction can be used to compare the Species Trees against.

Taxonomic Classification of Shotgun Metagenomic Read data
This app completed without errors in 1h 44m 29s.
Links
Files
These are only available in the live Narrative: https://narrative.kbase.us/narrative/33233
  • kaiju_classifications.zip
  • kaiju_summaries.zip
  • krona_data.zip
  • stacked_bar_abundance_plots_PNG+PDF.zip

3. Assemble

Now that we have cleaned the reads, we can move on to the next step: assembling the reads into contiguous fragments (contigs) thus creating the scaffolding of the whole genome. KBase offers several metagenomic assemblers and a tool for comparing their output (similar to QUAST). We will run three assembly Apps below.

Note: metaSPAdes only accepts a single library as input, so the App Merge Multiple ReadsLib to One ReadsLib was used to combine the reads from 37A and 37B. Because the combined 37AB reads produced the largest contig and the most contigs over 100,000 bp, it will be used as input for the MEGAHIT and IDBA-UD assemblers.
Note: We will run the MEGAHIT assembler twice using different parameters but the same input data. The first run will use "meta-large" as its preset. This is a setting catered towards large and complex assemblies. The second run will use the "meta-sensitive" preset. This parameter generates a more sensitive assembly but runs slower.
Assemble metagenomic reads using the SPAdes assembler.
This app completed without errors in 4h 5m 16s.
Objects
Created Object Name Type Description
37A_metaSPAdes.contigs Assembly Assembled contigs
Summary
Assembly saved to: dylan:narrative_1504634186710/37A_metaSPAdes.contigs Assembled into 16576 contigs. Avg Length: 8412.40968871 bp. Contig Length Distribution (# of contigs -- min to max basepairs): 16400 -- 2000.0 to 95802.9 bp 118 -- 95802.9 to 189605.8 bp 32 -- 189605.8 to 283408.7 bp 12 -- 283408.7 to 377211.6 bp 4 -- 377211.6 to 471014.5 bp 2 -- 471014.5 to 564817.4 bp 2 -- 564817.4 to 658620.3 bp 2 -- 658620.3 to 752423.2 bp 1 -- 752423.2 to 846226.1 bp 3 -- 846226.1 to 940029.0 bp
Links
Assemble metagenomic reads using the SPAdes assembler.
This app completed without errors in 3h 39m 27s.
Objects
Created Object Name Type Description
37B_metaSPAdes.contigs Assembly Assembled contigs
Summary
Assembly saved to: dylan:narrative_1504634186710/37B_metaSPAdes.contigs Assembled into 15751 contigs. Avg Length: 9773.91854485 bp. Contig Length Distribution (# of contigs -- min to max basepairs): 15615 -- 2000.0 to 125927.3 bp 90 -- 125927.3 to 249854.6 bp 28 -- 249854.6 to 373781.9 bp 10 -- 373781.9 to 497709.2 bp 3 -- 497709.2 to 621636.5 bp 2 -- 621636.5 to 745563.8 bp 0 -- 745563.8 to 869491.1 bp 1 -- 869491.1 to 993418.4 bp 0 -- 993418.4 to 1117345.7 bp 2 -- 1117345.7 to 1241273.0 bp
Links
Merge Multiple Reads Libraries into One Reads Library
This app completed without errors in 1h 12m 19s.
Objects
Created Object Name Type Description
37AB_trimm_headcrop5_crop140.PELib_paired PairedEndLibrary 37A and 37B trimmed and merged
Summary
NUM READS LIBRARIES COMBINED INTO ONE READS LIBRARY: 2
Output from Merge Multiple ReadsLibs to One ReadsLib - v1.0.1
The viewer for the output created by this App is available at the original Narrative here: https://narrative.kbase.us/narrative/33233
Assemble metagenomic reads using the SPAdes assembler.
This app completed without errors in 7h 13m 50s.
Objects
Created Object Name Type Description
37AB_metaSPAdes.contigs Assembly Assembled contigs
Summary
Assembly saved to: mm_clark:narrative_1528825054112/37AB_metaSPAdes.contigs Assembled into 27508 contigs. Avg Length: 9329.28079104 bp. Contig Length Distribution (# of contigs -- min to max basepairs): 27351 -- 2000.0 to 141371.1 bp 119 -- 141371.1 to 280742.2 bp 24 -- 280742.2 to 420113.3 bp 4 -- 420113.3 to 559484.4 bp 4 -- 559484.4 to 698855.5 bp 2 -- 698855.5 to 838226.6 bp 2 -- 838226.6 to 977597.7 bp 0 -- 977597.7 to 1116968.8 bp 0 -- 1116968.8 to 1256339.9 bp 2 -- 1256339.9 to 1395711.0 bp
Links
Assemble metagenomic reads using the MEGAHIT assembler.
This app completed without errors in 6h 2m 44s.
Objects
Created Object Name Type Description
37AB_MEGAHIT_metalarge.contigs Assembly Assembled contigs
Summary
ContigSet saved to: mm_clark:narrative_1528825054112/37AB_MEGAHIT_metalarge.contigs Assembled into 32090 contigs. Avg Length: 8049.4669679 bp. Contig Length Distribution (# of contigs -- min to max basepairs): 32015 -- 2000.0 to 151907.3 bp 53 -- 151907.3 to 301814.6 bp 15 -- 301814.6 to 451721.9 bp 3 -- 451721.9 to 601629.2 bp 2 -- 601629.2 to 751536.5 bp 1 -- 751536.5 to 901443.8 bp 0 -- 901443.8 to 1051351.1 bp 0 -- 1051351.1 to 1201258.4 bp 0 -- 1201258.4 to 1351165.7 bp 1 -- 1351165.7 to 1501073.0 bp
Links
Assemble metagenomic reads using the MEGAHIT assembler.
This app completed without errors in 6h 49m 57s.
Objects
Created Object Name Type Description
37AB_MEGAHIT_metasensitive.contigs Assembly Assembled contigs
Summary
ContigSet saved to: mm_clark:narrative_1528825054112/37AB_MEGAHIT_metasensitive.contigs Assembled into 31055 contigs. Avg Length: 8335.70017711 bp. Contig Length Distribution (# of contigs -- min to max basepairs): 30968 -- 2000.0 to 151910.1 bp 60 -- 151910.1 to 301820.2 bp 20 -- 301820.2 to 451730.3 bp 3 -- 451730.3 to 601640.4 bp 2 -- 601640.4 to 751550.5 bp 0 -- 751550.5 to 901460.6 bp 1 -- 901460.6 to 1051370.7 bp 0 -- 1051370.7 to 1201280.8 bp 0 -- 1201280.8 to 1351190.9 bp 1 -- 1351190.9 to 1501101.0 bp
Links
Assemble paired-end reads from single-cell or metagenomic sequencing technologies using the IDBA-UD assembler.
This app completed without errors in 7h 41m 36s.
Objects
Created Object Name Type Description
37AB_IDBA-UD.contigs Assembly Assembled contigs
Summary
Assembly saved to: mm_clark:narrative_1528825054112/37AB_IDBA-UD.contigs Assembled into 23276 contigs. Avg Length: 8888.39615913 bp. Contig Length Distribution (# of contigs -- min to max basepairs): 22716 -- 2000.0 to 43954.7 bp 371 -- 43954.7 to 85909.4 bp 99 -- 85909.4 to 127864.1 bp 38 -- 127864.1 to 169818.8 bp 26 -- 169818.8 to 211773.5 bp 11 -- 211773.5 to 253728.2 bp 4 -- 253728.2 to 295682.9 bp 4 -- 295682.9 to 337637.6 bp 3 -- 337637.6 to 379592.3 bp 4 -- 379592.3 to 421547.0 bp
Links

4. Compare Contigs

Now that we have six sets of contigs generated from our reads using various assemblers, we can examine them side-by-side to determine the one of highest quality. Using the best assembly will produce more accurate results in downstream analyses.

The table below summarizes the most important values (high N50, low L50, fewest contigs) generated by running Compare Assembled Contig Distribution. The best values in each category have been underlined and italicized.

N50: the shortest sequence length containing 50% of the entire assembly.

L50: the least number of contigs whose sum lenth is equal to N50.

Assembly Number of Contigs Longest Contig (bp) N50 L50 Contigs > 106 Sum Length (bp) Contigs
37AB_metaSPAdes 27508 1395711 *23793* *1871* *2* *2791420*
37AB_MEGAHIT_metalarge 32090 1501073 14663 3350 1 1501073
37AB_MEGAHIT_metasensitive 31055 *1501101* 16426 3008 1 1501101
37AB_IDBA-UD *23276* 421547 15901 2717 0 0
Note: The Assembly from running metaSPAdes on the combined library generated the highest quality contigs (37AB_metaSPAdes.contigs). Therefore, it will be used for further analysis.
View distributions of contig characteristics for different assemblies
This app completed without errors in 8m 38s.
Summary
ASSEMBLY STATS for 37A_metaSPAdes.contigs Len longest contig: 940029 bp N50 (L50): 23179 (967) N75 (L75): 4913 (4776) N90 (L90): 2758 (10599) Num contigs >= 1000000 bp: 0 Num contigs >= 100000 bp: 162 Num contigs >= 10000 bp: 2129 Num contigs >= 1000 bp: 16576 Num contigs >= 500 bp: 16576 Num contigs >= 1 bp: 16576 Len contigs >= 1000000 bp: 0 bp Len contigs >= 100000 bp: 34022723 bp Len contigs >= 10000 bp: 86871366 bp Len contigs >= 1000 bp: 139444103 bp Len contigs >= 500 bp: 139444103 bp Len contigs >= 1 bp: 139444103 bp ASSEMBLY STATS for 37B_metaSPAdes.contigs Len longest contig: 1241273 bp N50 (L50): 25790 (990) N75 (L75): 6686 (4063) N90 (L90): 3042 (9403) Num contigs >= 1000000 bp: 2 Num contigs >= 100000 bp: 188 Num contigs >= 10000 bp: 2733 Num contigs >= 1000 bp: 15751 Num contigs >= 500 bp: 15751 Num contigs >= 1 bp: 15751 Len contigs >= 1000000 bp: 2384295 bp Len contigs >= 100000 bp: 40194065 bp Len contigs >= 10000 bp: 104646915 bp Len contigs >= 1000 bp: 153948991 bp Len contigs >= 500 bp: 153948991 bp Len contigs >= 1 bp: 153948991 bp ASSEMBLY STATS for 37AB_metaSPAdes.contigs Len longest contig: 1395711 bp N50 (L50): 23793 (1871) N75 (L75): 6077 (7642) N90 (L90): 2998 (16930) Num contigs >= 1000000 bp: 2 Num contigs >= 100000 bp: 294 Num contigs >= 10000 bp: 4567 Num contigs >= 1000 bp: 27508 Num contigs >= 500 bp: 27508 Num contigs >= 1 bp: 27508 Len contigs >= 1000000 bp: 2791420 bp Len contigs >= 100000 bp: 57207896 bp Len contigs >= 10000 bp: 168877596 bp Len contigs >= 1000 bp: 256629856 bp Len contigs >= 500 bp: 256629856 bp Len contigs >= 1 bp: 256629856 bp ASSEMBLY STATS for 37AB_MEGAHIT_metalarge.contigs Len longest contig: 1501073 bp N50 (L50): 14663 (3350) N75 (L75): 5247 (11039) N90 (L90): 2868 (21229) Num contigs >= 1000000 bp: 1 Num contigs >= 100000 bp: 194 Num contigs >= 10000 bp: 5366 Num contigs >= 1000 bp: 32090 Num contigs >= 500 bp: 32090 Num contigs >= 1 bp: 32090 Len contigs >= 1000000 bp: 1501073 bp Len contigs >= 100000 bp: 35667574 bp Len contigs >= 10000 bp: 153312343 bp Len contigs >= 1000 bp: 258307395 bp Len contigs >= 500 bp: 258307395 bp Len contigs >= 1 bp: 258307395 bp ASSEMBLY STATS for 37AB_MEGAHIT_metasensitive.contigs Len longest contig: 1501101 bp N50 (L50): 16426 (3008) N75 (L75): 5431 (10198) N90 (L90): 2893 (20226) Num contigs >= 1000000 bp: 1 Num contigs >= 100000 bp: 183 Num contigs >= 10000 bp: 5301 Num contigs >= 1000 bp: 31055 Num contigs >= 500 bp: 31055 Num contigs >= 1 bp: 31055 Len contigs >= 1000000 bp: 1501101 bp Len contigs >= 100000 bp: 36201788 bp Len contigs >= 10000 bp: 158541746 bp Len contigs >= 1000 bp: 258865169 bp Len contigs >= 500 bp: 258865169 bp Len contigs >= 1 bp: 258865169 bp ASSEMBLY STATS for 37AB_IDBA-UD.contigs Len longest contig: 421547 bp N50 (L50): 15910 (2717) N75 (L75): 6126 (8080) N90 (L90): 3191 (15203) Num contigs >= 1000000 bp: 0 Num contigs >= 100000 bp: 150 Num contigs >= 10000 bp: 4835 Num contigs >= 1000 bp: 23276 Num contigs >= 500 bp: 23276 Num contigs >= 1 bp: 23276 Len contigs >= 1000000 bp: 0 bp Len contigs >= 100000 bp: 25040764 bp Len contigs >= 10000 bp: 130110156 bp Len contigs >= 1000 bp: 206886309 bp Len contigs >= 500 bp: 206886309 bp Len contigs >= 1 bp: 206886309 bp
Files
These are only available in the live Narrative: https://narrative.kbase.us/narrative/33233
  • key_plot.png
  • key_plot.pdf
  • cumulative_len_plot.png
  • cumulative_len_plot.pdf
  • sorted_contig_lengths.png
  • sorted_contig_lengths.pdf
  • histogram_figures.zip

5. Bin Contigs

Having assembled the contigs, the next step is to cluster them into bins, each of which corresponds to a putative population genome. To accomplish this, we will use MaxBin2 Contig Binning.

Bin assembled metagenomic contigs
This app completed without errors in 4h 19m 46s.
Objects
Created Object Name Type Description
37AB_metaSPAdes_107markers_0.8prob.BinnedContigs BinnedContigs BinnedContigs from MaxBin2
Links
Files
These are only available in the live Narrative: https://narrative.kbase.us/narrative/33233
  • maxbin_result.zip - File(s) generated by MaxBin2 App
  • Bin.marker.pdf - Visualization of the marker by MaxBin2 App
Output from MaxBin2 Contig Binning - v2.2.4
The viewer for the output created by this App is available at the original Narrative here: https://narrative.kbase.us/narrative/33233

6. Bin Quality Assessment

Quality control is a necessary step at every level of analysis to ensure the highest quality outcome and to avoid error propagation.

Note: From the graphic output below, we see that of the 65 total bins, 28 are both ≥90% complete and ≤2.5% contaminated. These bins (1, 2, 3, 4, 5, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 19, 22, 23, 26, 30, 32, 33, 35, 56, 59, 62, and 64) will be used for further analysis.
Runs the CheckM lineage workflow to assess the genome quality of isolates, single cells, or genome bins from metagenome assemblies
This app completed without errors in 1h 47m 29s.
Links
Files
These are only available in the live Narrative: https://narrative.kbase.us/narrative/33233
  • full_output.zip - Full output of CheckM
  • plots.zip - Output plots from CheckM

7. Extract Individual Assemblies

To use the desired 28 high quality bins in downstream Apps, we need them to be in the form of Assembly objects. This is achieved by running Extract Bins as Assemblies from BinnedContigs.

Note: The default is to extract ALL bins. We must specify the ones that we want.
Extract a bin as an Assembly from a BinnedContig dataset
This app completed without errors in 11m 10s.
Objects
Created Object Name Type Description
extracted_bins.AssemblySet AssemblySet Assembly set of extracted assemblies
Bin.001.fasta_assembly Assembly Assembly object of extracted contigs
Bin.002.fasta_assembly Assembly Assembly object of extracted contigs
Bin.003.fasta_assembly Assembly Assembly object of extracted contigs
Bin.004.fasta_assembly Assembly Assembly object of extracted contigs
Bin.005.fasta_assembly Assembly Assembly object of extracted contigs
Bin.007.fasta_assembly Assembly Assembly object of extracted contigs
Bin.008.fasta_assembly Assembly Assembly object of extracted contigs
Bin.009.fasta_assembly Assembly Assembly object of extracted contigs
Bin.010.fasta_assembly Assembly Assembly object of extracted contigs
Bin.011.fasta_assembly Assembly Assembly object of extracted contigs
Bin.012.fasta_assembly Assembly Assembly object of extracted contigs
Bin.013.fasta_assembly Assembly Assembly object of extracted contigs
Bin.014.fasta_assembly Assembly Assembly object of extracted contigs
Bin.015.fasta_assembly Assembly Assembly object of extracted contigs
Bin.016.fasta_assembly Assembly Assembly object of extracted contigs
Bin.017.fasta_assembly Assembly Assembly object of extracted contigs
Bin.019.fasta_assembly Assembly Assembly object of extracted contigs
Bin.022.fasta_assembly Assembly Assembly object of extracted contigs
Bin.023.fasta_assembly Assembly Assembly object of extracted contigs
Bin.026.fasta_assembly Assembly Assembly object of extracted contigs
Bin.030.fasta_assembly Assembly Assembly object of extracted contigs
Bin.032.fasta_assembly Assembly Assembly object of extracted contigs
Bin.033.fasta_assembly Assembly Assembly object of extracted contigs
Bin.035.fasta_assembly Assembly Assembly object of extracted contigs
Bin.056.fasta_assembly Assembly Assembly object of extracted contigs
Bin.059.fasta_assembly Assembly Assembly object of extracted contigs
Bin.062.fasta_assembly Assembly Assembly object of extracted contigs
Bin.064.fasta_assembly Assembly Assembly object of extracted contigs
Summary
Job Finished Generated Assembly Reference: 33233/405/1, 33233/406/1, 33233/407/1, 33233/408/1, 33233/409/1, 33233/410/1, 33233/411/1, 33233/412/1, 33233/413/1, 33233/414/1, 33233/415/1, 33233/416/1, 33233/417/1, 33233/418/1, 33233/419/1, 33233/420/1, 33233/421/1, 33233/422/1, 33233/423/1, 33233/424/1, 33233/425/1, 33233/426/1, 33233/427/1, 33233/428/1, 33233/429/1, 33233/430/1, 33233/431/1, 33233/432/1 Generated Assembly Set: 33233/433/1

8. Annotate Genomes

Since we now have the high quality bins in Assembly object form (and collected into an Assembly Set object), we will use Annotate Multiple Microbial Assemblies to turn them into annotated Genomes using the Rapid Annotation Subsystem Technology (RAST) pipeline.

Note: If you wish to just do a limited number of annotations, you can run them separately with the Annotate Microbial Assembly App.

Once the high qualitiy bins have all been annotated, we can combine them into a single GenomeSet. The resulting GenomeSet object will be used as input for the next step.

Note: Even though the GenomeSet is labelled Bins001-065, it only consists of the 28 high quality bins.
Annotate bacterial or archaeal assemblies and/or assembly sets using RASTtk.
This app completed without errors in 3h 4m 39s.
Objects
Created Object Name Type Description
Bin.017.fasta_assembly.RAST Genome Annotated genome
Bin.019.fasta_assembly.RAST Genome Annotated genome
Bin.022.fasta_assembly.RAST Genome Annotated genome
Bin.023.fasta_assembly.RAST Genome Annotated genome
Bin.026.fasta_assembly.RAST Genome Annotated genome
Bin.030.fasta_assembly.RAST Genome Annotated genome
Bin.032.fasta_assembly.RAST Genome Annotated genome
Bin.033.fasta_assembly.RAST Genome Annotated genome
Bin.035.fasta_assembly.RAST Genome Annotated genome
Bin.002.fasta_assembly.RAST Genome Annotated genome
Bin.015.fasta_assembly.RAST Genome Annotated genome
Bin.001.fasta_assembly.RAST Genome Annotated genome
Bin.016.fasta_assembly.RAST Genome Annotated genome
Bin.062.fasta_assembly.RAST Genome Annotated genome
Bin.059.fasta_assembly.RAST Genome Annotated genome
Bin.007.fasta_assembly.RAST Genome Annotated genome
Bin.008.fasta_assembly.RAST Genome Annotated genome
Bin.064.fasta_assembly.RAST Genome Annotated genome
Bin.009.fasta_assembly.RAST Genome Annotated genome
Bin.010.fasta_assembly.RAST Genome Annotated genome
Bin.011.fasta_assembly.RAST Genome Annotated genome
Bin.012.fasta_assembly.RAST Genome Annotated genome
Bin.005.fasta_assembly.RAST Genome Annotated genome
Bin.004.fasta_assembly.RAST Genome Annotated genome
Bin.013.fasta_assembly.RAST Genome Annotated genome
Bin.003.fasta_assembly.RAST Genome Annotated genome
Bin.014.fasta_assembly.RAST Genome Annotated genome
Bin.056.fasta_assembly.RAST Genome Annotated genome
Summary
The RAST algorithm was applied to annotating a genome sequence comprised of 155 contigs containing 4541543 nucleotides. 
No initial gene calls were provided.
Standard features were called using: glimmer3; prodigal.
A scan was conducted for the following additional feature types: rRNA; tRNA; selenoproteins; pyrrolysoproteins; repeat regions; crispr.
The genome features were functionally annotated using the following algorithm(s): Kmers V2; Kmers V1; protein similarity.
In addition to the remaining original 0 coding features and 0 non-coding features, 3884 new features were called, of which 50 are non-coding.
Output genome has the following feature types:
	Coding gene                     3834 
	Non-coding crispr_array            1 
	Non-coding crispr_repeat           7 
	Non-coding crispr_spacer           6 
	Non-coding repeat                  2 
	Non-coding rna                    34 
Overall, the genes have 1738 distinct functions. 
The genes include 1834 genes with a SEED annotation ontology across 865 distinct SEED functions.
The number of distinct functions can exceed the number of genes because some genes have multiple functions.
Bin.017.fasta_assembly succeeded!

The RAST algorithm was applied to annotating a genome sequence comprised of 237 contigs containing 4391776 nucleotides. 
No initial gene calls were provided.
Standard features were called using: glimmer3; prodigal.
A scan was conducted for the following additional feature types: rRNA; tRNA; selenoproteins; pyrrolysoproteins; repeat regions; crispr.
The genome features were functionally annotated using the following algorithm(s): Kmers V2; Kmers V1; protein similarity.
In addition to the remaining original 0 coding features and 0 non-coding features, 4914 new features were called, of which 354 are non-coding.
Output genome has the following feature types:
	Coding gene                     4560 
	Non-coding crispr_array            2 
	Non-coding crispr_repeat         147 
	Non-coding crispr_spacer         145 
	Non-coding prophage                2 
	Non-coding repeat                 18 
	Non-coding rna                    40 
Overall, the genes have 2598 distinct functions. 
The genes include 1890 genes with a SEED annotation ontology across 1171 distinct SEED functions.
The number of distinct functions can exceed the number of genes because some genes have multiple functions.
Bin.019.fasta_assembly succeeded!

The RAST algorithm was applied to annotating a genome sequence comprised of 413 contigs containing 4740206 nucleotides. 
No initial gene calls were provided.
Standard features were called using: glimmer3; prodigal.
A scan was conducted for the following additional feature types: rRNA; tRNA; selenoproteins; pyrrolysoproteins; repeat regions; crispr.
The genome features were functionally annotated using the following algorithm(s): Kmers V2; Kmers V1; protein similarity.
In addition to the remaining original 0 coding features and 0 non-coding features, 5069 new features were called, of which 64 are non-coding.
Output genome has the following feature types:
	Coding gene                     5005 
	Non-coding repeat                 24 
	Non-coding rna                    40 
Overall, the genes have 2743 distinct functions. 
The genes include 1953 genes with a SEED annotation ontology across 1090 distinct SEED functions.
The number of distinct functions can exceed the number of genes because some genes have multiple functions.
Bin.022.fasta_assembly succeeded!

The RAST algorithm was applied to annotating a genome sequence comprised of 118 contigs containing 3210281 nucleotides. 
No initial gene calls were provided.
Standard features were called using: glimmer3; prodigal.
A scan was conducted for the following additional feature types: rRNA; tRNA; selenoproteins; pyrrolysoproteins; repeat regions; crispr.
The genome features were functionally annotated using the following algorithm(s): Kmers V2; Kmers V1; protein similarity.
In addition to the remaining original 0 coding features and 0 non-coding features, 3225 new features were called, of which 125 are non-coding.
Output genome has the following feature types:
	Coding gene                     3100 
	Non-coding crispr_array            2 
	Non-coding crispr_repeat          43 
	Non-coding crispr_spacer          42 
	Non-coding prophage                2 
	Non-coding repeat                  4 
	Non-coding rna                    32 
Overall, the genes have 1320 distinct functions. 
The genes include 1404 genes with a SEED annotation ontology across 740 distinct SEED functions.
The number of distinct functions can exceed the number of genes because some genes have multiple functions.
Bin.023.fasta_assembly succeeded!

The RAST algorithm was applied to annotating a genome sequence comprised of 365 contigs containing 3132682 nucleotides. 
No initial gene calls were provided.
Standard features were called using: glimmer3; prodigal.
A scan was conducted for the following additional feature types: rRNA; tRNA; selenoproteins; pyrrolysoproteins; repeat regions; crispr.
The genome features were functionally annotated using the following algorithm(s): Kmers V2; Kmers V1; protein similarity.
In addition to the remaining original 0 coding features and 0 non-coding features, 3178 new features were called, of which 33 are non-coding.
Output genome has the following feature types:
	Coding gene                     3145 
	Non-coding repeat                  2 
	Non-coding rna                    31 
Overall, the genes have 1868 distinct functions. 
The genes include 1585 genes with a SEED annotation ontology across 1031 distinct SEED functions.
The number of distinct functions can exceed the number of genes because some genes have multiple functions.
Bin.026.fasta_assembly succeeded!

The RAST algorithm was applied to annotating a genome sequence comprised of 61 contigs containing 2209601 nucleotides. 
No initial gene calls were provided.
Standard features were called using: glimmer3; prodigal.
A scan was conducted for the following additional feature types: rRNA; tRNA; selenoproteins; pyrrolysoproteins; repeat regions; crispr.
The genome features were functionally annotated using the following algorithm(s): Kmers V2; Kmers V1; protein similarity.
In addition to the remaining original 0 coding features and 0 non-coding features, 2343 new features were called, of which 139 are non-coding.
Output genome has the following feature types:
	Coding gene                     2204 
	Non-coding crispr_array            2 
	Non-coding crispr_repeat          44 
	Non-coding crispr_spacer          42 
	Non-coding prophage                1 
	Non-coding repeat                  4 
	Non-coding rna                    46 
Overall, the genes have 1205 distinct functions. 
The genes include 1197 genes with a SEED annotation ontology across 769 distinct SEED functions.
The number of distinct functions can exceed the number of genes because some genes have multiple functions.
Bin.030.fasta_assembly succeeded!

The RAST algorithm was applied to annotating a genome sequence comprised of 153 contigs containing 3684022 nucleotides. 
No initial gene calls were provided.
Standard features were called using: glimmer3; prodigal.
A scan was conducted for the following additional feature types: rRNA; tRNA; selenoproteins; pyrrolysoproteins; repeat regions; crispr.
The genome features were functionally annotated using the following algorithm(s): Kmers V2; Kmers V1; protein similarity.
In addition to the remaining original 0 coding features and 0 non-coding features, 3535 new features were called, of which 219 are non-coding.
Output genome has the following feature types:
	Coding gene                     3316 
	Non-coding crispr_array            1 
	Non-coding crispr_repeat          84 
	Non-coding crispr_spacer          83 
	Non-coding prophage                1 
	Non-coding repeat                  6 
	Non-coding rna                    44 
Overall, the genes have 1305 distinct functions. 
The genes include 1693 genes with a SEED annotation ontology across 737 distinct SEED functions.
The number of distinct functions can exceed the number of genes because some genes have multiple functions.
Bin.032.fasta_assembly succeeded!

The RAST algorithm was applied to annotating a genome sequence comprised of 164 contigs containing 4614825 nucleotides. 
No initial gene calls were provided.
Standard features were called using: glimmer3; prodigal.
A scan was conducted for the following additional feature types: rRNA; tRNA; selenoproteins; pyrrolysoproteins; repeat regions; crispr.
The genome features were functionally annotated using the following algorithm(s): Kmers V2; Kmers V1; protein similarity.
In addition to the remaining original 0 coding features and 0 non-coding features, 3895 new features were called, of which 88 are non-coding.
Output genome has the following feature types:
	Coding gene                     3807 
	Non-coding crispr_array            2 
	Non-coding crispr_repeat           9 
	Non-coding crispr_spacer           7 
	Non-coding prophage                1 
	Non-coding repeat                 35 
	Non-coding rna                    34 
Overall, the genes have 1718 distinct functions. 
The genes include 1854 genes with a SEED annotation ontology across 850 distinct SEED functions.
The number of distinct functions can exceed the number of genes because some genes have multiple functions.
Bin.033.fasta_assembly succeeded!

The RAST algorithm was applied to annotating a genome sequence comprised of 82 contigs containing 2867199 nucleotides. 
No initial gene calls were provided.
Standard features were called using: glimmer3; prodigal.
A scan was conducted for the following additional feature types: rRNA; tRNA; selenoproteins; pyrrolysoproteins; repeat regions; crispr.
The genome features were functionally annotated using the following algorithm(s): Kmers V2; Kmers V1; protein similarity.
In addition to the remaining original 0 coding features and 0 non-coding features, 2784 new features were called, of which 47 are non-coding.
Output genome has the following feature types:
	Coding gene                     2737 
	Non-coding repeat                  6 
	Non-coding rna                    41 
Overall, the genes have 1909 distinct functions. 
The genes include 1427 genes with a SEED annotation ontology across 1112 distinct SEED functions.
The number of distinct functions can exceed the number of genes because some genes have multiple functions.
Bin.035.fasta_assembly succeeded!

The RAST algorithm was applied to annotating a genome sequence comprised of 35 contigs containing 4294648 nucleotides. 
No initial gene calls were provided.
Standard features were called using: glimmer3; prodigal.
A scan was conducted for the following additional feature types: rRNA; tRNA; selenoproteins; pyrrolysoproteins; repeat regions; crispr.
The genome features were functionally annotated using the following algorithm(s): Kmers V2; Kmers V1; protein similarity.
In addition to the remaining original 0 coding features and 0 non-coding features, 3821 new features were called, of which 46 are non-coding.
Output genome has the following feature types:
	Coding gene                     3775 
	Non-coding prophage                2 
	Non-coding rna                    44 
Overall, the genes have 1697 distinct functions. 
The genes include 1781 genes with a SEED annotation ontology across 898 distinct SEED functions.
The number of distinct functions can exceed the number of genes because some genes have multiple functions.
Bin.002.fasta_assembly succeeded!

The RAST algorithm was applied to annotating a genome sequence comprised of 183 contigs containing 2716435 nucleotides. 
No initial gene calls were provided.
Standard features were called using: glimmer3; prodigal.
A scan was conducted for the following additional feature types: rRNA; tRNA; selenoproteins; pyrrolysoproteins; repeat regions; crispr.
The genome features were functionally annotated using the following algorithm(s): Kmers V2; Kmers V1; protein similarity.
In addition to the remaining original 0 coding features and 0 non-coding features, 2659 new features were called, of which 42 are non-coding.
Output genome has the following feature types:
	Coding gene                     2617 
	Non-coding repeat                  4 
	Non-coding rna                    38 
Overall, the genes have 1466 distinct functions. 
The genes include 1446 genes with a SEED annotation ontology across 798 distinct SEED functions.
The number of distinct functions can exceed the number of genes because some genes have multiple functions.
Bin.015.fasta_assembly succeeded!

The RAST algorithm was applied to annotating a genome sequence comprised of 36 contigs containing 4935777 nucleotides. 
No initial gene calls were provided.
Standard features were called using: glimmer3; prodigal.
A scan was conducted for the following additional feature types: rRNA; tRNA; selenoproteins; pyrrolysoproteins; repeat regions; crispr.
The genome features were functionally annotated using the following algorithm(s): Kmers V2; Kmers V1; protein similarity.
In addition to the remaining original 0 coding features and 0 non-coding features, 4750 new features were called, of which 411 are non-coding.
Output genome has the following feature types:
	Coding gene                     4339 
	Non-coding crispr_array            2 
	Non-coding crispr_repeat         158 
	Non-coding crispr_spacer         156 
	Non-coding prophage                4 
	Non-coding repeat                 43 
	Non-coding rna                    48 
Overall, the genes have 1394 distinct functions. 
The genes include 2366 genes with a SEED annotation ontology across 876 distinct SEED functions.
The number of distinct functions can exceed the number of genes because some genes have multiple functions.
Bin.001.fasta_assembly succeeded!

The RAST algorithm was applied to annotating a genome sequence comprised of 135 contigs containing 2810734 nucleotides. 
No initial gene calls were provided.
Standard features were called using: glimmer3; prodigal.
A scan was conducted for the following additional feature types: rRNA; tRNA; selenoproteins; pyrrolysoproteins; repeat regions; crispr.
The genome features were functionally annotated using the following algorithm(s): Kmers V2; Kmers V1; protein similarity.
In addition to the remaining original 0 coding features and 0 non-coding features, 3084 new features were called, of which 263 are non-coding.
Output genome has the following feature types:
	Coding gene                     2821 
	Non-coding crispr_array            1 
	Non-coding crispr_repeat          97 
	Non-coding crispr_spacer          96 
	Non-coding repeat                 31 
	Non-coding rna                    38 
Overall, the genes have 1306 distinct functions. 
The genes include 1336 genes with a SEED annotation ontology across 731 distinct SEED functions.
The number of distinct functions can exceed the number of genes because some genes have multiple functions.
Bin.016.fasta_assembly succeeded!

The RAST algorithm was applied to annotating a genome sequence comprised of 135 contigs containing 4948386 nucleotides. 
No initial gene calls were provided.
Standard features were called using: glimmer3; prodigal.
A scan was conducted for the following additional feature types: rRNA; tRNA; selenoproteins; pyrrolysoproteins; repeat regions; crispr.
The genome features were functionally annotated using the following algorithm(s): Kmers V2; Kmers V1; protein similarity.
In addition to the remaining original 0 coding features and 0 non-coding features, 4728 new features were called, of which 300 are non-coding.
Output genome has the following feature types:
	Coding gene                     4428 
	Non-coding crispr_array            2 
	Non-coding crispr_repeat         127 
	Non-coding crispr_spacer         125 
	Non-coding prophage                2 
	Non-coding repeat                  4 
	Non-coding rna                    40 
Overall, the genes have 1632 distinct functions. 
The genes include 2008 genes with a SEED annotation ontology across 906 distinct SEED functions.
The number of distinct functions can exceed the number of genes because some genes have multiple functions.
Bin.062.fasta_assembly succeeded!

The RAST algorithm was applied to annotating a genome sequence comprised of 174 contigs containing 3102702 nucleotides. 
No initial gene calls were provided.
Standard features were called using: glimmer3; prodigal.
A scan was conducted for the following additional feature types: rRNA; tRNA; selenoproteins; pyrrolysoproteins; repeat regions; crispr.
The genome features were functionally annotated using the following algorithm(s): Kmers V2; Kmers V1; protein similarity.
In addition to the remaining original 0 coding features and 0 non-coding features, 3324 new features were called, of which 238 are non-coding.
Output genome has the following feature types:
	Coding gene                     3086 
	Non-coding crispr_array            1 
	Non-coding crispr_repeat          93 
	Non-coding crispr_spacer          92 
	Non-coding repeat                 17 
	Non-coding rna                    35 
Overall, the genes have 2098 distinct functions. 
The genes include 1434 genes with a SEED annotation ontology across 1054 distinct SEED functions.
The number of distinct functions can exceed the number of genes because some genes have multiple functions.
Bin.059.fasta_assembly succeeded!

The RAST algorithm was applied to annotating a genome sequence comprised of 28 contigs containing 5243107 nucleotides. 
No initial gene calls were provided.
Standard features were called using: glimmer3; prodigal.
A scan was conducted for the following additional feature types: rRNA; tRNA; selenoproteins; pyrrolysoproteins; repeat regions; crispr.
The genome features were functionally annotated using the following algorithm(s): Kmers V2; Kmers V1; protein similarity.
In addition to the remaining original 0 coding features and 0 non-coding features, 5408 new features were called, of which 390 are non-coding.
Output genome has the following feature types:
	Coding gene                     5018 
	Non-coding crispr_array            5 
	Non-coding crispr_repeat         162 
	Non-coding crispr_spacer         157 
	Non-coding prophage                1 
	Non-coding repeat                 20 
	Non-coding rna                    45 
Overall, the genes have 2230 distinct functions. 
The genes include 2488 genes with a SEED annotation ontology across 1157 distinct SEED functions.
The number of distinct functions can exceed the number of genes because some genes have multiple functions.
Bin.007.fasta_assembly succeeded!

The RAST algorithm was applied to annotating a genome sequence comprised of 131 contigs containing 2061093 nucleotides. 
No initial gene calls were provided.
Standard features were called using: glimmer3; prodigal.
A scan was conducted for the following additional feature types: rRNA; tRNA; selenoproteins; pyrrolysoproteins; repeat regions; crispr.
The genome features were functionally annotated using the following algorithm(s): Kmers V2; Kmers V1; protein similarity.
In addition to the remaining original 0 coding features and 0 non-coding features, 2139 new features were called, of which 37 are non-coding.
Output genome has the following feature types:
	Coding gene                     2102 
	Non-coding repeat                  2 
	Non-coding rna                    35 
Overall, the genes have 1181 distinct functions. 
The genes include 1175 genes with a SEED annotation ontology across 762 distinct SEED functions.
The number of distinct functions can exceed the number of genes because some genes have multiple functions.
Bin.008.fasta_assembly succeeded!

The RAST algorithm was applied to annotating a genome sequence comprised of 64 contigs containing 3525485 nucleotides. 
No initial gene calls were provided.
Standard features were called using: glimmer3; prodigal.
A scan was conducted for the following additional feature types: rRNA; tRNA; selenoproteins; pyrrolysoproteins; repeat regions; crispr.
The genome features were functionally annotated using the following algorithm(s): Kmers V2; Kmers V1; protein similarity.
In addition to the remaining original 0 coding features and 0 non-coding features, 3635 new features were called, of which 358 are non-coding.
Output genome has the following feature types:
	Coding gene                     3277 
	Non-coding crispr_array            1 
	Non-coding crispr_repeat         157 
	Non-coding crispr_spacer         156 
	Non-coding rna                    44 
Overall, the genes have 1794 distinct functions. 
The genes include 1726 genes with a SEED annotation ontology across 900 distinct SEED functions.
The number of distinct functions can exceed the number of genes because some genes have multiple functions.
Bin.064.fasta_assembly succeeded!

The RAST algorithm was applied to annotating a genome sequence comprised of 138 contigs containing 2516031 nucleotides. 
No initial gene calls were provided.
Standard features were called using: glimmer3; prodigal.
A scan was conducted for the following additional feature types: rRNA; tRNA; selenoproteins; pyrrolysoproteins; repeat regions; crispr.
The genome features were functionally annotated using the following algorithm(s): Kmers V2; Kmers V1; protein similarity.
In addition to the remaining original 0 coding features and 0 non-coding features, 2692 new features were called, of which 123 are non-coding.
Output genome has the following feature types:
	Coding gene                     2569 
	Non-coding crispr_array            2 
	Non-coding crispr_repeat          45 
	Non-coding crispr_spacer          43 
	Non-coding repeat                  2 
	Non-coding rna                    31 
Overall, the genes have 1289 distinct functions. 
The genes include 1392 genes with a SEED annotation ontology across 819 distinct SEED functions.
The number of distinct functions can exceed the number of genes because some genes have multiple functions.
Bin.009.fasta_assembly succeeded!

The RAST algorithm was applied to annotating a genome sequence comprised of 172 contigs containing 4133015 nucleotides. 
No initial gene calls were provided.
Standard features were called using: glimmer3; prodigal.
A scan was conducted for the following additional feature types: rRNA; tRNA; selenoproteins; pyrrolysoproteins; repeat regions; crispr.
The genome features were functionally annotated using the following algorithm(s): Kmers V2; Kmers V1; protein similarity.
In addition to the remaining original 0 coding features and 0 non-coding features, 4238 new features were called, of which 61 are non-coding.
Output genome has the following feature types:
	Coding gene                     4177 
	Non-coding repeat                 22 
	Non-coding rna                    39 
Overall, the genes have 2018 distinct functions. 
The genes include 1997 genes with a SEED annotation ontology across 1100 distinct SEED functions.
The number of distinct functions can exceed the number of genes because some genes have multiple functions.
Bin.010.fasta_assembly succeeded!

The RAST algorithm was applied to annotating a genome sequence comprised of 110 contigs containing 3369119 nucleotides. 
No initial gene calls were provided.
Standard features were called using: glimmer3; prodigal.
A scan was conducted for the following additional feature types: rRNA; tRNA; selenoproteins; pyrrolysoproteins; repeat regions; crispr.
The genome features were functionally annotated using the following algorithm(s): Kmers V2; Kmers V1; protein similarity.
In addition to the remaining original 0 coding features and 0 non-coding features, 3247 new features were called, of which 50 are non-coding.
Output genome has the following feature types:
	Coding gene                     3197 
	Non-coding prophage                2 
	Non-coding repeat                 13 
	Non-coding rna                    35 
Overall, the genes have 1457 distinct functions. 
The genes include 1486 genes with a SEED annotation ontology across 815 distinct SEED functions.
The number of distinct functions can exceed the number of genes because some genes have multiple functions.
Bin.011.fasta_assembly succeeded!

The RAST algorithm was applied to annotating a genome sequence comprised of 49 contigs containing 3935801 nucleotides. 
No initial gene calls were provided.
Standard features were called using: glimmer3; prodigal.
A scan was conducted for the following additional feature types: rRNA; tRNA; selenoproteins; pyrrolysoproteins; repeat regions; crispr.
The genome features were functionally annotated using the following algorithm(s): Kmers V2; Kmers V1; protein similarity.
In addition to the remaining original 0 coding features and 0 non-coding features, 3515 new features were called, of which 61 are non-coding.
Output genome has the following feature types:
	Coding gene                     3454 
	Non-coding repeat                 24 
	Non-coding rna                    37 
Overall, the genes have 1605 distinct functions. 
The genes include 1614 genes with a SEED annotation ontology across 875 distinct SEED functions.
The number of distinct functions can exceed the number of genes because some genes have multiple functions.
Bin.012.fasta_assembly succeeded!

The RAST algorithm was applied to annotating a genome sequence comprised of 110 contigs containing 3107186 nucleotides. 
No initial gene calls were provided.
Standard features were called using: glimmer3; prodigal.
A scan was conducted for the following additional feature types: rRNA; tRNA; selenoproteins; pyrrolysoproteins; repeat regions; crispr.
The genome features were functionally annotated using the following algorithm(s): Kmers V2; Kmers V1; protein similarity.
In addition to the remaining original 0 coding features and 0 non-coding features, 2956 new features were called, of which 47 are non-coding.
Output genome has the following feature types:
	Coding gene                     2909 
	Non-coding repeat                  2 
	Non-coding rna                    45 
Overall, the genes have 1669 distinct functions. 
The genes include 1570 genes with a SEED annotation ontology across 992 distinct SEED functions.
The number of distinct functions can exceed the number of genes because some genes have multiple functions.
Bin.005.fasta_assembly succeeded!

The RAST algorithm was applied to annotating a genome sequence comprised of 85 contigs containing 2716559 nucleotides. 
No initial gene calls were provided.
Standard features were called using: glimmer3; prodigal.
A scan was conducted for the following additional feature types: rRNA; tRNA; selenoproteins; pyrrolysoproteins; repeat regions; crispr.
The genome features were functionally annotated using the following algorithm(s): Kmers V2; Kmers V1; protein similarity.
In addition to the remaining original 0 coding features and 0 non-coding features, 2478 new features were called, of which 42 are non-coding.
Output genome has the following feature types:
	Coding gene                     2436 
	Non-coding repeat                  5 
	Non-coding rna                    37 
Overall, the genes have 1166 distinct functions. 
The genes include 1188 genes with a SEED annotation ontology across 679 distinct SEED functions.
The number of distinct functions can exceed the number of genes because some genes have multiple functions.
Bin.004.fasta_assembly succeeded!

The RAST algorithm was applied to annotating a genome sequence comprised of 155 contigs containing 3282083 nucleotides. 
No initial gene calls were provided.
Standard features were called using: glimmer3; prodigal.
A scan was conducted for the following additional feature types: rRNA; tRNA; selenoproteins; pyrrolysoproteins; repeat regions; crispr.
The genome features were functionally annotated using the following algorithm(s): Kmers V2; Kmers V1; protein similarity.
In addition to the remaining original 0 coding features and 0 non-coding features, 3354 new features were called, of which 39 are non-coding.
Output genome has the following feature types:
	Coding gene                     3315 
	Non-coding repeat                 10 
	Non-coding rna                    29 
Overall, the genes have 2086 distinct functions. 
The genes include 1615 genes with a SEED annotation ontology across 1098 distinct SEED functions.
The number of distinct functions can exceed the number of genes because some genes have multiple functions.
Bin.013.fasta_assembly succeeded!

The RAST algorithm was applied to annotating a genome sequence comprised of 127 contigs containing 5057090 nucleotides. 
No initial gene calls were provided.
Standard features were called using: glimmer3; prodigal.
A scan was conducted for the following additional feature types: rRNA; tRNA; selenoproteins; pyrrolysoproteins; repeat regions; crispr.
The genome features were functionally annotated using the following algorithm(s): Kmers V2; Kmers V1; protein similarity.
In addition to the remaining original 0 coding features and 0 non-coding features, 4715 new features were called, of which 326 are non-coding.
Output genome has the following feature types:
	Coding gene                     4389 
	Non-coding crispr_array            1 
	Non-coding crispr_repeat         140 
	Non-coding crispr_spacer         139 
	Non-coding prophage                1 
	Non-coding repeat                  4 
	Non-coding rna                    41 
Overall, the genes have 1860 distinct functions. 
The genes include 2180 genes with a SEED annotation ontology across 938 distinct SEED functions.
The number of distinct functions can exceed the number of genes because some genes have multiple functions.
Bin.003.fasta_assembly succeeded!

The RAST algorithm was applied to annotating a genome sequence comprised of 87 contigs containing 3097432 nucleotides. 
No initial gene calls were provided.
Standard features were called using: glimmer3; prodigal.
A scan was conducted for the following additional feature types: rRNA; tRNA; selenoproteins; pyrrolysoproteins; repeat regions; crispr.
The genome features were functionally annotated using the following algorithm(s): Kmers V2; Kmers V1; protein similarity.
In addition to the remaining original 0 coding features and 0 non-coding features, 3179 new features were called, of which 95 are non-coding.
Output genome has the following feature types:
	Coding gene                     3084 
	Non-coding crispr_array            1 
	Non-coding crispr_repeat          25 
	Non-coding crispr_spacer          24 
	Non-coding repeat                  2 
	Non-coding rna                    43 
Overall, the genes have 1978 distinct functions. 
The genes include 1572 genes with a SEED annotation ontology across 1051 distinct SEED functions.
The number of distinct functions can exceed the number of genes because some genes have multiple functions.
Bin.014.fasta_assembly succeeded!

The RAST algorithm was applied to annotating a genome sequence comprised of 42 contigs containing 4266888 nucleotides. 
No initial gene calls were provided.
Standard features were called using: glimmer3; prodigal.
A scan was conducted for the following additional feature types: rRNA; tRNA; selenoproteins; pyrrolysoproteins; repeat regions; crispr.
The genome features were functionally annotated using the following algorithm(s): Kmers V2; Kmers V1; protein similarity.
In addition to the remaining original 0 coding features and 0 non-coding features, 4344 new features were called, of which 359 are non-coding.
Output genome has the following feature types:
	Coding gene                     3985 
	Non-coding crispr_array            4 
	Non-coding crispr_repeat         151 
	Non-coding crispr_spacer         147 
	Non-coding prophage                1 
	Non-coding repeat                  9 
	Non-coding rna                    47 
Overall, the genes have 2037 distinct functions. 
The genes include 1868 genes with a SEED annotation ontology across 1127 distinct SEED functions.
The number of distinct functions can exceed the number of genes because some genes have multiple functions.
Bin.056.fasta_assembly succeeded!

Files
These are only available in the live Narrative: https://narrative.kbase.us/narrative/33233
  • annotation_report.Bins001-065.GenomeSet - Microbial Annotation Report

9. Find Relatives

Our new GenomeSet can be used as input for Insert Set of Genomes Into Species Tree, which will give us an initial phylogenetic placement of the bins.

Note: Uncheck 'Copy public genomes to your workspace' because we are not ready to determine which genomes from RefSeq we want to include in downstream comparisons yet.

The current implementation of Insert Genomes into Species Tree has a tendency to overemphasize proximal genomes at the expense of phylogenetic diversity. Future versions will remedy this shortcoming. In the meantime, we have to manually implement this approach to remove excessive genome attractors. We will split the bins into 5 additional clades based on the initial tree, which we will call A, B, C, D, and E. We will use Build GenomeSet to group the bins into clades.

Clade A: bins 26, 56, 5, and 35 Clade B: bins 22, 30, 8, 9, 15, and 64 Clade C: bins 7, 14, 59, 19, 10, and 13 Clade D: bins 16, 33, 17, 62, 23, 4, 12, 2, 11, and 32 Clade E: bins 1 and 3
To get a more accurate phylogentic trees, we will rerun Insert Set of Genomes Into Species Tree for each of the five clades.
Note: Again it will be necessary to uncheck 'Copy public genomes into workspace'. ViewTree (beta) was run to enable users to download the image of the SpeciesTree and the NEWICK representations.
Add a user-provided GenomeSet to a KBase species tree.
This app completed without errors in 9m 27s.
Files
These are only available in the live Narrative: https://narrative.kbase.us/narrative/33233
  • Bins001-065_plus_28neighbors_v2.newick
  • Bins001-065_plus_28neighbors_v2-labels.newick
  • Bins001-065_plus_28neighbors_v2.png
  • Bins001-065_plus_28neighbors_v2.pdf
Allows user to create a GenomeSet
This app completed without errors in 13s.
Objects
Created Object Name Type Description
Clade_A.GenomeSet GenomeSet KButil_Build_GenomeSet
Summary
genomes in output set Clade_A.GenomeSet: 4
Output from Build GenomeSet - v1.0.1
The viewer for the output created by this App is available at the original Narrative here: https://narrative.kbase.us/narrative/33233
Allows user to create a GenomeSet
This app completed without errors in 18s.
Objects
Created Object Name Type Description
Clade_B.GenomeSet GenomeSet KButil_Build_GenomeSet
Summary
genomes in output set Clade_B.GenomeSet: 6
Output from Build GenomeSet - v1.0.1
The viewer for the output created by this App is available at the original Narrative here: https://narrative.kbase.us/narrative/33233
Allows user to create a GenomeSet
This app completed without errors in 19s.
Objects
Created Object Name Type Description
Clade_C.GenomeSet GenomeSet KButil_Build_GenomeSet
Summary
genomes in output set Clade_C.GenomeSet: 6
Output from Build GenomeSet - v1.0.1
The viewer for the output created by this App is available at the original Narrative here: https://narrative.kbase.us/narrative/33233
Allows user to create a GenomeSet
This app completed without errors in 26s.
Objects
Created Object Name Type Description
Clade_D.GenomeSet GenomeSet KButil_Build_GenomeSet
Summary
genomes in output set Clade_D.GenomeSet: 10
Output from Build GenomeSet - v1.0.1
The viewer for the output created by this App is available at the original Narrative here: https://narrative.kbase.us/narrative/33233
Allows user to create a GenomeSet
This app completed without errors in 12s.
Objects
Created Object Name Type Description
Clade_E.GenomeSet GenomeSet KButil_Build_GenomeSet
Summary
genomes in output set Clade_E.GenomeSet: 2
Output from Build GenomeSet - v1.0.1
The viewer for the output created by this App is available at the original Narrative here: https://narrative.kbase.us/narrative/33233
Add a user-provided GenomeSet to a KBase species tree.
This app completed without errors in 3m 22s.
Files
These are only available in the live Narrative: https://narrative.kbase.us/narrative/33233
  • Clade_A_v2.SpeciesTree.newick
  • Clade_A_v2.SpeciesTree-labels.newick
  • Clade_A_v2.SpeciesTree.png
  • Clade_A_v2.SpeciesTree.pdf
Add a user-provided GenomeSet to a KBase species tree.
This app completed without errors in 3m 41s.
Files
These are only available in the live Narrative: https://narrative.kbase.us/narrative/33233
  • Clade_B_v2.SpeciesTree.newick
  • Clade_B_v2.SpeciesTree-labels.newick
  • Clade_B_v2.SpeciesTree.png
  • Clade_B_v2.SpeciesTree.pdf
Add a user-provided GenomeSet to a KBase species tree.
This app completed without errors in 3m 49s.
Files
These are only available in the live Narrative: https://narrative.kbase.us/narrative/33233
  • Clade_C_v2.SpeciesTree.newick
  • Clade_C_v2.SpeciesTree-labels.newick
  • Clade_C_v2.SpeciesTree.png
  • Clade_C_v2.SpeciesTree.pdf
Add a user-provided GenomeSet to a KBase species tree.
This app completed without errors in 4m 58s.
Files
These are only available in the live Narrative: https://narrative.kbase.us/narrative/33233
  • Clade_D_v2.SpeciesTree.newick
  • Clade_D_v2.SpeciesTree-labels.newick
  • Clade_D_v2.SpeciesTree.png
  • Clade_D_v2.SpeciesTree.pdf
Add a user-provided GenomeSet to a KBase species tree.
This app completed without errors in 2m 38s.
Files
These are only available in the live Narrative: https://narrative.kbase.us/narrative/33233
  • Clade_E_v2.SpeciesTree.newick
  • Clade_E_v2.SpeciesTree-labels.newick
  • Clade_E_v2.SpeciesTree.png
  • Clade_E_v2.SpeciesTree.pdf

10. Place Genomes into Phylogenetic Context with Phylum Exemplars

Run the Build Microbial SpeciesTree App to include Phylum Exemplars in the Species Tree.

Note: Build Microbial SpeciesTree is currently a beta App. To acccess beta Apps, click the "R" in the upper right corner of the App pane to switch it to "B".
Build Species Tree for your Microbial Genomes, including Reference Genomes and Tree Skeleton with Phylum Exemplars
This app is still in progress.
No output found.

Summary and Future Directions

This Narrative Tutorial covers how to generate annotated genomes and species predictions from raw metagenomic reads. Taxonomic abundance can be generated based on protein similarity from the raw reads using Kaiju or from annotated genome similarity to reference genomes through the creation of species trees.

Genome extraction and species prediction are just the beginning of how metagenomic samples can be analyzed within KBase. Annotated genomes can be used for metabolic modeling, comparative phylogenomics, functional profiling, and more.

Reference Literature

  1. Wu YW, Higgins B, Yu C, Reddy AP, Ceballos S, Joh LD, Simmons BA, Singer SW, VanderGheynst JS. Ionic Liquids Impact the Bioenergy Feedstock-Degrading Microbiome and Transcription of Enzymes Relevant to Polysaccharide Hydrolysis. mSystems. 2016 Dec 13;1(6). pii: e00120-16. eCollection 2016 Nov-Dec. doi:10.1128/mSystems.00120-16 https://www.ncbi.nlm.nih.gov/pubmed/27981239
  2. Bolger AM, Lohse M, Usadel B. Trimmomatic: a flexible trimmer for Illumina sequence data. Bioinformatics. 2014;30: 2114–2120. doi:10.1093/bioinformatics/btu170 http://www.ncbi.nlm.nih.gov/pubmed/24695404
  3. Menzel P, Ng KL, Krogh A. Fast and sensitive taxonomic classification for metagenomics with Kaiju. Nature Communications. 2016;7: 11257. doi:10.1038/ncomms11257 http://www.ncbi.nlm.nih.gov/pubmed/27071849
  4. Ondov BD, Bergman NH, Phillippy AM. Interactive metagenomic visualization in a Web browser. BMC Bioinformatics. 2011;12: 385. doi:10.1186/1471-2105-12-385http://www.ncbi.nlm.nih.gov/pubmed/21961884
  5. Nurk S, Meleshko D, Korobeynikov A, Pevzner PA. metaSPAdes: a new versatile metagenomic assembler. Genome Res. 2017 May;27(5):824-834. doi: 10.1101/gr.213959.116. https://www.ncbi.nlm.nih.gov/pubmed/28298430
  6. Li D, Liu C-M, Luo R, Sadakane K, Lam T-W. MEGAHIT: an ultra-fast single-node solution for large and complex metagenomics assembly via succinct de Bruijn graph. Bioinformatics. 2015;31: 1674–1676. doi:10.1093/bioinformatics/btv033 http://www.ncbi.nlm.nih.gov/pubmed/25609793
  7. Peng Y, Leung HCM, Yiu SM, Chin FYL. IDBA-UD: a de novo assembler for single-cell and metagenomic sequencing data with highly uneven depth. Bioinformatics. 2012;28: 1420–1428. doi:10.1093/bioinformatics/bts174 https://www.ncbi.nlm.nih.gov/pubmed/22495754
  8. Wu Y-W, Simmons BA, Singer SW. MaxBin 2.0: an automated binning algorithm to recover genomes from multiple metagenomic datasets. Bioinformatics. 2016;32: 605–607. doi:10.1093/bioinformatics/btv638 https://www.ncbi.nlm.nih.gov/pubmed/26515820
  9. Wu Y-W, Tang Y-H, Tringe SG, Simmons BA, Singer SW. MaxBin: an automated binning method to recover individual genomes from metagenomes using an expectation-maximization algorithm. Microbiome. 2014;2: 26. doi:10.1186/2049-2618-2-26 https://microbiomejournal.biomedcentral.com/articles/10.1186/2049-2618-2-26
  10. Parks DH, Imelfort M, Skennerton CT, Hugenholtz P, Tyson GW. CheckM: assessing the quality of microbial genomes recovered from isolates, single cells, and metagenomes. Genome Research. 2015;25: 1043–1055. doi:10.1101/gr.186072.114 http://genome.cshlp.org/content/25/7/1043.long
  11. Price MN, Dehal PS, Arkin AP. FastTree 2 – Approximately Maximum-Likelihood Trees for Large Alignments. Poon AFY, editor. PLoS ONE. 2010;5: e9490. doi:10.1371/journal.pone.0009490 http://www.ncbi.nlm.nih.gov/pmc/articles/PMC2835736/

Released Apps

  1. Annotate Multiple Microbial Assemblies
    • [1] Aziz RK, Bartels D, Best AA, DeJongh M, Disz T, Edwards RA, et al. The RAST Server: Rapid Annotations using Subsystems Technology. BMC Genomics. 2008;9: 75. doi:10.1186/1471-2164-9-75
    • [2] Overbeek R, Olson R, Pusch GD, Olsen GJ, Davis JJ, Disz T, et al. The SEED and the Rapid Annotation of microbial genomes using Subsystems Technology (RAST). Nucleic Acids Res. 2014;42: D206 D214. doi:10.1093/nar/gkt1226
    • [3] Brettin T, Davis JJ, Disz T, Edwards RA, Gerdes S, Olsen GJ, et al. RASTtk: A modular and extensible implementation of the RAST algorithm for building custom annotation pipelines and annotating batches of genomes. Sci Rep. 2015;5. doi:10.1038/srep08365
    • [4] Kent WJ. BLAT The BLAST-Like Alignment Tool. Genome Res. 2002;12: 656 664. doi:10.1101/gr.229202
    • [5] Camacho C, Coulouris G, Avagyan V, Ma N, Papadopoulos J, Bealer K, et al. BLAST+: architecture and applications. BMC Bioinformatics. 2009;10: 421. doi:10.1186/1471-2105-10-421
    • [6] Lowe TM, Eddy SR. tRNAscan-SE: a program for improved detection of transfer RNA genes in genomic sequence. Nucleic Acids Res. 1997;25: 955 964.
    • [7] Cobucci-Ponzano B, Rossi M, Moracci M. Translational recoding in archaea. Extremophiles. 2012;16: 793 803. doi:10.1007/s00792-012-0482-8
    • [8] Siguier P, Perochon J, Lestrade L, Mahillon J, Chandler M. ISfinder: the reference centre for bacterial insertion sequences. Nucleic Acids Res. 2006;34: D32 D36. doi:10.1093/nar/gkj014
    • [9] van Belkum A, Sluijuter M, de Groot R, Verbrugh H, Hermans PW. Novel BOX repeat PCR assay for high-resolution typing of Streptococcus pneumoniae strains. J Clin Microbiol. 1996;34: 1176 1179.
    • [10] Croucher NJ, Vernikos GS, Parkhill J, Bentley SD. Identification, variation and transcription of pneumococcal repeat sequences. BMC Genomics. 2011;12: 120. doi:10.1186/1471-2164-12-120
    • [11] Hyatt D, Chen G-L, Locascio PF, Land ML, Larimer FW, Hauser LJ. Prodigal: prokaryotic gene recognition and translation initiation site identification. BMC Bioinformatics. 2010;11: 119. doi:10.1186/1471-2105-11-119
    • [12] Delcher AL, Bratke KA, Powers EC, Salzberg SL. Identifying bacterial genes and endosymbiont DNA with Glimmer. Bioinformatics. 2007;23: 673 679. doi:10.1093/bioinformatics/btm009
    • [13] Akhter S, Aziz RK, Edwards RA. PhiSpy: a novel algorithm for finding prophages in bacterial genomes that combines similarity- and composition-based strategies. Nucleic Acids Res. 2012;40: e126. doi:10.1093/nar/gks406
  2. Assemble Reads with IDBA-UD - v1.1.3
    • Peng Y, Leung HCM, Yiu SM, Chin FYL. IDBA-UD: a de novo assembler for single-cell and metagenomic sequencing data with highly uneven depth. Bioinformatics. 2012;28: 1420 1428. doi:10.1093/bioinformatics/bts174
  3. Assemble Reads with MEGAHIT v1.2.9
    • Li D, Liu C-M, Luo R, Sadakane K, Lam T-W. MEGAHIT: an ultra-fast single-node solution for large and complex metagenomics assembly via succinct de Bruijn graph. Bioinformatics. 2015;31: 1674 1676. doi:10.1093/bioinformatics/btv033
  4. Assemble Reads with metaSPAdes - v3.13.0
    • Nurk S, Meleshko D, Korobeynikov A, Pevzner PA. metaSPAdes: a new versatile metagenomic assembler. Genome Res. 2017; 27:824 834. doi: 10.1101/gr.213959.116
  5. Assess Genome Quality with CheckM - v1.0.18
    • Parks DH, Imelfort M, Skennerton CT, Hugenholtz P, Tyson GW. CheckM: assessing the quality of microbial genomes recovered from isolates, single cells, and metagenomes. Genome Res. 2015;25: 1043 1055. doi:10.1101/gr.186072.114
    • CheckM source:
    • Additional info:
  6. Assess Read Quality with FastQC - v0.11.5
    • FastQC source: Bioinformatics Group at the Babraham Institute, UK.
  7. Bin Contigs using MaxBin2 - v2.2.4
    • Wu Y-W, Simmons BA, Singer SW. MaxBin 2.0: an automated binning algorithm to recover genomes from multiple metagenomic datasets. Bioinformatics. 2016;32: 605 607. doi:10.1093/bioinformatics/btv638 (2) 1. Wu Y-W, Tang Y-H, Tringe SG, Simmons BA, Singer SW. MaxBin: an automated binning method to recover individual genomes from metagenomes using an expectation-maximization algorithm. Microbiome. 2014;2: 26. doi:10.1186/2049-2618-2-26
    • Wu Y-W, Tang Y-H, Tringe SG, Simmons BA, Singer SW. MaxBin: an automated binning method to recover individual genomes from metagenomes using an expectation-maximization algorithm. Microbiome. 2014;2: 26. doi:10.1186/2049-2618-2-26
    • Maxbin2 source:
    • Maxbin source:
  8. Build GenomeSet - v1.0.1
    • Arkin AP, Cottingham RW, Henry CS, Harris NL, Stevens RL, Maslov S, et al. KBase: The United States Department of Energy Systems Biology Knowledgebase. Nature Biotechnology. 2018;36: 566. doi: 10.1038/nbt.4163
  9. Classify Taxonomy of Metagenomic Reads with Kaiju - v1.5.0
    • Menzel P, Ng KL, Krogh A. Fast and sensitive taxonomic classification for metagenomics with Kaiju. Nat Commun. 2016;7: 11257. doi:10.1038/ncomms11257
    • Ondov BD, Bergman NH, Phillippy AM. Interactive metagenomic visualization in a Web browser. BMC Bioinformatics. 2011;12: 385. doi:10.1186/1471-2105-12-385
    • Kaiju Homepage:
    • Kaiju DBs from:
    • Github for Kaiju:
    • Krona homepage:
    • Github for Krona:
  10. Compare Assembled Contig Distributions - v1.1.2
    • Arkin AP, Cottingham RW, Henry CS, Harris NL, Stevens RL, Maslov S, et al. KBase: The United States Department of Energy Systems Biology Knowledgebase. Nature Biotechnology. 2018;36: 566. doi: 10.1038/nbt.4163
  11. Extract Bins as Assemblies from BinnedContigs - v1.0.2
    • Arkin AP, Cottingham RW, Henry CS, Harris NL, Stevens RL, Maslov S, et al. KBase: The United States Department of Energy Systems Biology Knowledgebase. Nature Biotechnology. 2018;36: 566. doi: 10.1038/nbt.4163
  12. Import FASTQ/SRA File as Reads from Staging Area
    no citations
  13. Insert Set of Genomes Into Species Tree 2.1.10
    • Price MN, Dehal PS, Arkin AP (2010) FastTree 2--approximately maximum-likelihood trees for large alignments. PLoS One. 2010 Mar 10;5(3):e9490
  14. Merge Reads Libraries - v1.0.1
    • Arkin AP, Cottingham RW, Henry CS, Harris NL, Stevens RL, Maslov S, et al. KBase: The United States Department of Energy Systems Biology Knowledgebase. Nature Biotechnology. 2018;36: 566. doi: 10.1038/nbt.4163
  15. Trim Reads with Trimmomatic - v0.36
    • Bolger AM, Lohse M, Usadel B. Trimmomatic: a flexible trimmer for Illumina sequence data. Bioinformatics. 2014;30: 2114 2120. doi:10.1093/bioinformatics/btu170

Apps in Beta

  1. Build Microbial SpeciesTree - v1.5.1
    • Arkin AP, Cottingham RW, Henry CS, Harris NL, Stevens RL, Maslov S, et al. KBase: The United States Department of Energy Systems Biology Knowledgebase. Nature Biotechnology. 2018;36: 566. doi: 10.1038/nbt.4163