Generated February 15, 2021

Overview

KBase has powerful tools for extracting microbial genomes from metagenomes and performing phylogenomic analysis and metabolic modeling. These tools can be used to predict key media ingredients for isolating uncultured members of microbiomes. Essential to this process are high-quality genomes extracted from metagenomic assemblies, and Kbase has a tool for assessing the quality of genomes too.

Below we identify growth factors for a myxobacteria ("slime bacteria") yet to be isolated from the rhizosphere of Miscanthus xgiganteus (hybrid of "Silvergrass"), cultivated at the Kellogg Biological Station in Michigan. Data was transferred with Globus from [JGI-IMG].

image.png

Silver and Gold Narrative Set

This narrative and the "KBase Gold Case Study: Can you find Delftia?" make up the Silver and Gold Narrative Set for teaching metagenomics concepts to students in the BIT 477/577 course at North Carolina State University.

image.png

Determining Media Formulation Requirements for Isolation of Microbiome Constituents

This tutorial will guide the user through the process of extracting and annotating high-quality genomes from a metagenomic data, performing phylogenomic analysis, building a metabolic model, and using these to predict nutrient requirments for growth and isolation of corresponding microbes.

Author: Jason M. Whitham (jmwhitha@ncsu.edu)

image.png

Main Lessons from this Narrative

  • Learn how to perform quality control of read libraries
  • Learn how to predict taxonomic population structure of environmental shotgun reads
  • Learn how to assemble metagenomes
  • Learn how to compare quality of assemblies
  • Learn how to bin metagenomic contigs into putative lineages (metagenome-assembled genomes, or MAGs)
  • Learn how to assess MAG quality and extract them
  • Learn how to annotate genes of extracted MAGs
  • Learn how to place MAGs into a species tree
  • Learn how to build metabolic models with extracted annotated MAGs and investigate metabolic pathways

A Word on Timing

KBase has a finite number of servers that are shared by many customers, sometimes resulting in lengthy queue times. Furthermore, some computations take a long time, even when applications are allocated generous amounts of RAM and processors. These limitations prevent us from being able to execute and finish several steps in this narrative within a single class period. Lessons will therefore be much like a cooking show where the audience learns how to prepare the dish, they see the food go in the oven, and a fully cooked product is displayed a moment later. Like a cooking show, we won't make you "wait for the bake" during class, but inform you of expected wait times for when you "try the recipe".

A precise time for each step cannot be provided since queue and processing times will vary. The table below is meant to give you a sense of whether to check your narrative after sending a couple of emails, cooking a meal, going on a day hike, or after returning from a long weekend of visiting with friends or family. Overall, it will probably take a couple of weeks to complete the whole narrative from beginning to end.

Applications Magnitude
Read Import, Trimming, Quality Check, and Subsampling Hours
Taxonomy Classification Hours
Contig Assembly Days
Assembly Comparison Minutes
Binning Contigs Hours
Quality Assessment of Bins Minutes
Bin Extraction Minutes
Microbial Assembly Annotation Minutes
Genome Insertion into Species Tree Minutes
Metabolic Model Build Minutes

1. Read Hygiene

Read hygiene means checking the quality of your data and removing errors if possible. This is important because the colloquialism "junk in, junk out" is true. Before we can check data quality, we need to get data.

I have already imported paired-end reads in FASTQ format. An import application was automatically chosen when I select the file format in the Kbase data import staging area. To get to the staging area, click the arrow pointing to the right in the DATA panel and then click the IMPORT tab. Here, I selected the format "fastq reads" from the drop down and clicked the upload arrow directly beside it. These same steps can be used to import your own data into the narrative.

If you are interested in using your own data for this narrative, you will first need to load your data into the staging area . If your dataset is large, follow the guide to transferring large datasets with Globus. You can also obtain datasets from Kbase or datasets from the Joint Genomics Institute.

The remaining applications in this tutorial are preconfigured. Each application has its own View Configure tab. As we go through this narrative, practice inserting the same applications below the ones that are in the narrative by clicking the arrow pointing to the right in the APPS panel and searching the application by name. Practice configuring applications by coping the configurations from the prepopulated narrative applications. This experience will help you become familiar with the Kbase platform.

Once the reads are uploaded, you can check the quality of paired-end reads with the FastQC application and improve its quality with the Trimmomatic application. Run FastQC a second time after the reads are processed with Trimmomatic to verify the improvement. You may be surprised by what you find!

image.png

Import a FASTQ/SRA file into your Narrative as a Reads data object
This app completed without errors in 2h 41m 45s.
Objects
Created Object Name Type Description
SilvergrassMB_reads PairedEndLibrary Imported Reads
Links
A quality control application for high throughput sequence data.
This app completed without errors in 2h 18m 26s.
Links
Files
These are only available in the live Narrative: https://narrative.kbase.us/narrative/68579
  • SilvergrassMBReads_68579_2_1.fwd_fastqc.zip - Zip file generated by fastqc that contains original images seen in the report
  • SilvergrassMBReads_68579_2_1.rev_fastqc.zip - Zip file generated by fastqc that contains original images seen in the report
Trim paired- or single-end Illumina reads with Trimmomatic.
This app completed without errors in 1d 10h 0m 4s.
Objects
Created Object Name Type Description
trimmed_SilvergrassMBReads_paired PairedEndLibrary Trimmed Reads
trimmed_SilvergrassMBReads_unpaired_fwd SingleEndLibrary Trimmed Unpaired Forward Reads
trimmed_SilvergrassMBReads_unpaired_rev SingleEndLibrary Trimmed Unpaired Reverse Reads
A quality control application for high throughput sequence data.
This app completed without errors in 2h 52m 47s.
Links
Files
These are only available in the live Narrative: https://narrative.kbase.us/narrative/68579
  • trimmed_SilvergrassMBReads_paired_68579_5_1.fwd_fastqc.zip - Zip file generated by fastqc that contains original images seen in the report
  • trimmed_SilvergrassMBReads_paired_68579_5_1.rev_fastqc.zip - Zip file generated by fastqc that contains original images seen in the report

2. Classify Taxonomy

The Kaiju application predicts the microbial composition based on similarities in protein sequences of input reads and a database of proteins sequences. That is what you will do in this section of the narrative. Later in the narrative, you will generate a species tree, which predicts microbial phylogeny based on your assembled, extracted and annotated genome sequences. You can then compare the phylogenic and taxonomic predictions.

You will assemble the reads in the following step. Unfortunately, KBase assembly applications currently have an upper limit of between 180,263,840 and 240,351,788 paired reads depending on complexity. Kbase developers are working on this problem but haven’t yet implemented a solution. For now, the Randomly Subsample Reads application enables us to subsample our reads such that they are of a similar composition but not too deep for the assemblers. [Split Reads into Subsets] is another option. You will use Kaiju to verify that the composition of reads are similar before and after subsampling.

image.png

Split a reads library into a set of randomly subsampled reads libraries.
This app was canceled before completion.
No output found.
Allows users to perform taxonomic classification of shotgun metagenomic read data with Kaiju.
This app completed without errors in 6h 14m 58s.
Links
Files
These are only available in the live Narrative: https://narrative.kbase.us/narrative/68579
  • kaiju_classifications.zip
  • kaiju_summaries.zip
  • krona_data.zip
  • stacked_bar_abundance_plots_PNG+PDF.zip

3. Assemble Contigs

KBase offers several commonly used metagenomic assemblers. You will assemble reads with metaSPAdes, MEGAHIT, and IDBA_UD. It's good to try multiple assemblers since each uses a different algorithm, and one does not consistently perform better than the others. Application settings can also be tweaked to improve one output metric at the expense of another. For comparison of assemblies, you will configure all assemblers to have a minimum contig length of 1000 bp.

image.png

Assemble metagenomic reads using the SPAdes assembler.
This app completed without errors in 3d 4h 31m 11s.
Objects
Created Object Name Type Description
SilvergrassMB_SPAdes.contigs Assembly Assembled contigs
Summary
Assembly saved to: jmwhitham:narrative_1596024691020/SilvergrassMB_SPAdes.contigs Assembled into 106727 contigs. Avg Length: 1602.2291266502384 bp. Contig Length Distribution (# of contigs -- min to max basepairs): 105970 -- 1000.0 to 7095.8 bp 585 -- 7095.8 to 13191.6 bp 93 -- 13191.6 to 19287.4 bp 44 -- 19287.4 to 25383.2 bp 21 -- 25383.2 to 31479.0 bp 7 -- 31479.0 to 37574.8 bp 3 -- 37574.8 to 43670.6 bp 1 -- 43670.6 to 49766.4 bp 1 -- 49766.4 to 55862.200000000004 bp 2 -- 55862.200000000004 to 61958.0 bp
Links
Assemble metagenomic reads using the MEGAHIT assembler.
This app completed without errors in 22h 15m 46s.
Objects
Created Object Name Type Description
SilvergrassMB_MEGAHIT.contigs Assembly Assembled contigs
Summary
ContigSet saved to: jmwhitham:narrative_1596024691020/SilvergrassMB_MEGAHIT.contigs Assembled into 140704 contigs. Avg Length: 1563.9727939504207 bp. Contig Length Distribution (# of contigs -- min to max basepairs): 139936 -- 1000.0 to 7094.1 bp 583 -- 7094.1 to 13188.2 bp 113 -- 13188.2 to 19282.300000000003 bp 38 -- 19282.300000000003 to 25376.4 bp 16 -- 25376.4 to 31470.5 bp 9 -- 31470.5 to 37564.600000000006 bp 0 -- 37564.600000000006 to 43658.700000000004 bp 3 -- 43658.700000000004 to 49752.8 bp 3 -- 49752.8 to 55846.9 bp 3 -- 55846.9 to 61941.0 bp
Links
Assemble paired-end reads from single-cell or metagenomic sequencing technologies using the IDBA-UD assembler.
This app completed without errors in 3d 10h 4m 40s.
Objects
Created Object Name Type Description
SilvergrassMB_IDBA.contigs Assembly Assembled contigs
Summary
Assembly saved to: jmwhitham:narrative_1596024691020/SilvergrassMB_IDBA.contigs Assembled into 25312 contigs. Avg Length: 1884.28231669 bp. Contig Length Distribution (# of contigs -- min to max basepairs): 25080 -- 1000.0 to 7100.3 bp 197 -- 7100.3 to 13200.6 bp 29 -- 13200.6 to 19300.9 bp 1 -- 19300.9 to 25401.2 bp 0 -- 25401.2 to 31501.5 bp 1 -- 31501.5 to 37601.8 bp 1 -- 37601.8 to 43702.1 bp 1 -- 43702.1 to 49802.4 bp 1 -- 49802.4 to 55902.7 bp 1 -- 55902.7 to 62003.0 bp
Links

4. Compare Contigs

Kbase has a convenient application for comparing assemlies called Compare Assembled Contig Distribution. Use this application to see which one is best for downstream analysis. You are looking for assemblies with more assembled bases and longer contigs since these are the key factors that will affect the quality of the genome(s) you extract from the metagenome (more about this in the next step). The table below summarizes important mathmatical values that quantify these key factors.

Value Definition Further Explanation
N50 The sequence length of the shortest contig at 50% of the total assembly length. About half of all assembled bases will be contained in all contigs (ordered from longest to shortest) longer than the N50 contig and also shorter than the N50 contig.
L50 the smallest number of contigs whose length sum makes up half of genome size. A quantity of contigs, not the length of a contig or set of contigs.
Nx The sequence length of the shortest contig at x% of the total assembly length. Common Nx are N50, N75, and N90
Lx The smallest number of contigs whose length sum makes up x% of the genome size. Common Lx are L50, L75, and L90
NG50 The sequence length of the shortest contig at 50% of the known genome length. An estimated genome length is sometimes used.
View distributions of contig characteristics for different assemblies.
This app completed without errors in 33m 14s.
Summary
ASSEMBLY STATS for SilvergrassMB_SPAdes.contigs Len longest contig: 61958 bp N50 (L50): 1499 (34626) N75 (L75): 1179 (67144) N90 (L90): 1062 (90120) Num contigs >= 1000000 bp: 0 Num contigs >= 100000 bp: 0 Num contigs >= 10000 bp: 315 Num contigs >= 1000 bp: 106727 Num contigs >= 500 bp: 106727 Num contigs >= 1 bp: 106727 Len contigs >= 1000000 bp: 0 bp Len contigs >= 100000 bp: 0 bp Len contigs >= 10000 bp: 5213419 bp Len contigs >= 1000 bp: 171001108 bp Len contigs >= 500 bp: 171001108 bp Len contigs >= 1 bp: 171001108 bp ASSEMBLY STATS for SilvergrassMB_MEGAHIT.contigs Len longest contig: 61941 bp N50 (L50): 1463 (47186) N75 (L75): 1171 (89598) N90 (L90): 1059 (119296) Num contigs >= 1000000 bp: 0 Num contigs >= 100000 bp: 0 Num contigs >= 10000 bp: 341 Num contigs >= 1000 bp: 140704 Num contigs >= 500 bp: 140704 Num contigs >= 1 bp: 140704 Len contigs >= 1000000 bp: 0 bp Len contigs >= 100000 bp: 0 bp Len contigs >= 10000 bp: 5619317 bp Len contigs >= 1000 bp: 220057228 bp Len contigs >= 500 bp: 220057228 bp Len contigs >= 1 bp: 220057228 bp ASSEMBLY STATS for SilvergrassMB_IDBA.contigs Len longest contig: 62003 bp N50 (L50): 1830 (7801) N75 (L75): 1354 (15465) N90 (L90): 1186 (21120) Num contigs >= 1000000 bp: 0 Num contigs >= 100000 bp: 0 Num contigs >= 10000 bp: 84 Num contigs >= 1000 bp: 25312 Num contigs >= 500 bp: 25312 Num contigs >= 1 bp: 25312 Len contigs >= 1000000 bp: 0 bp Len contigs >= 100000 bp: 0 bp Len contigs >= 10000 bp: 1255578 bp Len contigs >= 1000 bp: 47694954 bp Len contigs >= 500 bp: 47694954 bp Len contigs >= 1 bp: 47694954 bp
Files
These are only available in the live Narrative: https://narrative.kbase.us/narrative/68579
  • key_plot.png
  • key_plot.pdf
  • cumulative_len_plot.png
  • cumulative_len_plot.pdf
  • sorted_contig_lengths.png
  • sorted_contig_lengths.pdf
  • histogram_figures.zip

5. Bin Contigs

Having assembled the contigs, the next step is to separate them into bins based on patterns including contig abundance and tetramer frequency. Contigs with similar abundances and tetramer frequencies will theoretically be from the same microbial genome. That is why these bins of contigs are also known as metagenome assembled genomes (MAGs)

MaxBin2 and MetaBAT2 are two commonly used binning softwares with different algorithms. Rather than just using one. Test both to see which produces better bins. One does not always outperform the other always or in all metrics. Use a minimum contig length of 1500 bp for a fair comparison, since that is the lowest MetaBat2 will allow. Optional: Try a minimum contig length of 1000 bp with Maxbin2 to see if bin statistics are improved.

image.png

Group assembled metagenomic contigs into lineages (Bins) using depth-of-coverage, nucleotide composition, and marker genes.
This app completed without errors in 2h 22m 58s.
Objects
Created Object Name Type Description
MaxBin2Binned_SilvergrassMB_MEGAHIT_contigs BinnedContigs BinnedContigs from MaxBin2
Links
Files
These are only available in the live Narrative: https://narrative.kbase.us/narrative/68579
  • maxbin_result.zip - File(s) generated by MaxBin2 App
Output from Bin Contigs using MaxBin2 - v2.2.4
The viewer for the output created by this App is available at the original Narrative here: https://narrative.kbase.us/narrative/68579
Bin metagenomic contigs
This app completed without errors in 2h 9m 20s.
Links
Files
These are only available in the live Narrative: https://narrative.kbase.us/narrative/68579
  • metabat_result.zip - Files generated by MetaBAT2 App
Output from MetaBAT2 Contig Binning - v1.7
The viewer for the output created by this App is available at the original Narrative here: https://narrative.kbase.us/narrative/68579

6. Bin Quality Assessment

Assessments like the number of bins and number of binned contigs, outputs of the MaxBin2 and MetaBAT2 applications, do not tell you the quality of generated bins. CheckM is a widely used application for this purpose, and will help you find a high-quality bin for downstream analyses.

image.png

Runs the CheckM lineage workflow to assess the genome quality of isolates, single cells, or genome bins from metagenome assemblies through comparison to an existing database of genomes.
This app completed without errors in 32m 42s.
Links
Files
These are only available in the live Narrative: https://narrative.kbase.us/narrative/68579
  • CheckM_summary_table.tsv.zip - TSV Summary Table from CheckM
  • full_output.zip - Full output of CheckM
  • plots.zip - Output plots from CheckM
Runs the CheckM lineage workflow to assess the genome quality of isolates, single cells, or genome bins from metagenome assemblies through comparison to an existing database of genomes.
This app completed without errors in 19m 44s.
Links
Files
These are only available in the live Narrative: https://narrative.kbase.us/narrative/68579
  • CheckM_summary_table.tsv.zip - TSV Summary Table from CheckM
  • full_output.zip - Full output of CheckM
  • plots.zip - Output plots from CheckM

7. Extract Individual Assemblies

Extract Bins as Assemblies from BinnedContigs performs the simple task of creating an Kbase-platform object from a specified bin or bins for input into downstream applications.

image.png

Extract a bin as an Assembly from a BinnedContig dataset
This app completed without errors in 3m 6s.
Objects
Created Object Name Type Description
bin.009.fasta_assembly Assembly Assembly object of extracted contigs
Summary
Job Finished Generated Assembly Reference: 68579/76/1

8. Annotate Genomes

Whether a genome is fragmented into many contigs or a contigious circular chromosome, the genes can and must be annotated by the Rapid Annotation Subsystem Technology (RAST) pipeline before a metabolic model can be built in Kbase. To do this, submit the extracted high-quality MAG to the Annotate Multiple Microbial Assemblies application.

image.png

Annotate a bacterial or archaeal assembly using components from the RAST (Rapid Annotations using Subsystems Technology) toolkit (RASTtk).
This app completed without errors in 9m 54s.
Objects
Created Object Name Type Description
SilvergrassMB_Annotated_MAG Genome Annotated genome
Summary
The RAST algorithm was applied to annotating a genome sequence comprised of 554 contigs containing 5233087 nucleotides. 
No initial gene calls were provided.
Standard features were called using: glimmer3; prodigal.
A scan was conducted for the following additional feature types: rRNA; tRNA; selenoproteins; pyrrolysoproteins; repeat regions; crispr.
The genome features were functionally annotated using the following algorithm(s): Kmers V2; Kmers V1; protein similarity.
In addition to the remaining original 0 coding features and 0 non-coding features, 5474 new features were called, of which 167 are non-coding.
Output genome has the following feature types:
	Coding gene                     5307 
	Non-coding repeat                123 
	Non-coding rna                    44 
Overall, the genes have 1841 distinct functions. 
The genes include 2799 genes with a SEED annotation ontology across 996 distinct SEED functions.
The number of distinct functions can exceed the number of genes because some genes have multiple functions.
Output from Annotate Microbial Assembly
The viewer for the output created by this App is available at the original Narrative here: https://narrative.kbase.us/narrative/68579

9. Find Relatives

image.png

Add one or more Genomes to a KBase SpeciesTree.
This app completed without errors in 3m 27s.
Files
These are only available in the live Narrative: https://narrative.kbase.us/narrative/68579
  • SilvergrassMB_Annotated_MAG_inMLTree.newick
  • SilvergrassMB_Annotated_MAG_inMLTree-labels.newick
  • SilvergrassMB_Annotated_MAG_inMLTree.png
  • SilvergrassMB_Annotated_MAG_inMLTree.pdf

Our mystery microbe is a myxobacteria

Phylogenomic analysis places the MAG between one cluster with Stigmatella aurantica, Hyalangium minutum, Cystobacter fuscus, Archangium gephyra, Corallococcus coralloides, Myxococcus stipitatus, Myxococcus xanthus, and Myxococcus fulvus and another cluster with Vulgatibacter incomptus, Anaeromyxobacter sp. Fw109-5, and Anaeromyxobacter dehalogenans. Placement between these evolutionary clusters helps us to anticipate that corresponding microbe will have shared, similar or intermediate phenotypes with the microbes in these clusters.

Looking back at our taxonomic classification of reads, we find that a large portion of unassembled reads are classified as the myxobacteria Sorangium cellulosum. This myxobacteria does not even appear as a close relative to our myxobacteria in our phylogenomic analysis. Furthermore, Sorangium cellulosum has the largest bacterial genome sequenced to date, 13,033,779 base pairs. Bin analysis with CheckM and annotation with RAST suggest the genome of our myxobacteria is near complete with approximately 5M base pairs. Taxonomic classification of shotgun reads was useful for verifying that read subsampling resulted in a similar distribution of the original set and could therefore be used for downstream analysis, but was not an accurate way of characterizing the population of species in the microbiome.

10. Build a Metabolic Model without Gapfilling

Below are a couple of formulations used by Leibniz Institute DSMZ - German Collection of Microorganisms and Cell Cultures GmbH (Deutsche Sammlung von Mikroorganismen und Zellkulturen GmbH) for growth of myxobacteria. Either Vitamin B12 (cobalamin) or the derivative cyanocobalamin must be included. This is because myxobacteria cannot biosynthesize vitamin B12. Verify this by building a genome-scale metabolic model with the Build Metabolic Model application and the gapfilling option deselected. Navigate to the KEGG porphyrin and chlorophyll metabolism pathway map to see if the pathway is missing.

image.png

Generate a draft metabolic model based on an annotated genome.
This app completed without errors in 49s.
Objects
Created Object Name Type Description
SilvergrassMB_Annotated_MAG_MetabolicModel FBAModel FBAModel-12 SilvergrassMB_Annotated_MAG_MetabolicModel
Report
Output from Build Metabolic Model
The viewer for the output created by this App is available at the original Narrative here: https://narrative.kbase.us/narrative/68579

11. Build a Metabolic Model with Gapfilling

Draft MAGs are often made up of tens or hundreds of contigs. While the contigs of high-quality MAGs will contain most of the core, universal genes, some genes will be missing. The absence of metabolic genes in contigs will show up as gaps in metabolic pathways in a metabolic model if gapfilling is not used.

Gapfilling adds the genes that were missing in a pathway back to the metabolic model. In general, an optimization algorithm identifies the minimal set of reactions that must be added to each model that would otherwise prevent the production of biomass components. Details can be found [here]. Use gapfilling to see what else is predicted to be necessary for growth of the myxobacteria.

image.png

Generate a draft metabolic model based on an annotated genome.
This app completed without errors in 2m 7s.
Objects
Created Object Name Type Description
SilvergrassMB_Annotated_MAG_Gapfilled_MetabolicModel FBAModel FBAModel-12 SilvergrassMB_Annotated_MAG_Gapfilled_MetabolicModel
SilvergrassMB_Annotated_MAG_Gapfilled_MetabolicModel.gf.0 FBA FBA-13 SilvergrassMB_Annotated_MAG_Gapfilled_MetabolicModel.gf.0
Report
Output from Build Metabolic Model
The viewer for the output created by this App is available at the original Narrative here: https://narrative.kbase.us/narrative/68579

Gapfilling Reactions Revealed Factors for Growth

Succinate dehydrogenase is missing in the TCA cycle. TCA cycle compounds - citrate, malate, succinate, and others - were found to stimulate growth of Myxococcus xanthus [19].

A biosynthesis step for the polyamine spermidine was gapfilled in this model. Spermidine at 125 ug/ml was found to be stimulatory for Myxococcus xanthus [19].

Gapfilling is not always helpful though. Valine, leucine and isoleucine are building blocks of proteins and therefore critical to biological processes. Biosynthesis pathways for these amino acids are missing in the model, and were not gapfilled. These amino acids must be added to media in a purified form or in a complex form like yeast extract.

There are several gapfilled reactions for biosynthesis of the coenzyme ubiquinone, involved in respiration of many organisms. No ingredient supplementation of media is required though since myxobacteria generally use other quinones including MK-8 [20].

Summary and Future Directions

This narrative tutorial covers how to utilize shotgun metagenomic data to predict necessary media ingredients for isolation and growth of microbiome members whose MAGs are high-quality.

Once a microbe is isolated, Kbase applications including flux balance analysis can be used in conjunction with growth experiments to refine media formulations for various purposes including optimization of growth and fermentation product yields.

References

  1. Andrews, S. FastQC: A Quality Control Tool for High Throughput Sequence Data. 2010. http://www.bioinformatics.babraham.ac.uk/projects/fastqc/
  2. Arkin AP, Cottingham RW, Henry CS, Harris NL, Stevens RL, Maslov S, et al. KBase: The United States Department of Energy Systems Biology Knowledgebase. Nature Biotechnology. 2018;36: 566. doi: 10.1038/nbt.4163 https://www.nature.com/articles/nbt.4163
  3. Bolger AM, Lohse M, Usadel B. Trimmomatic: a flexible trimmer for Illumina sequence data. Bioinformatics. 2014;30: 2114–2120. doi:10.1093/bioinformatics/btu170 http://www.ncbi.nlm.nih.gov/pubmed/24695404
  4. Menzel P, Ng KL, Krogh A. Fast and sensitive taxonomic classification for metagenomics with Kaiju. Nature Communications. 2016;7: 11257. doi:10.1038/ncomms11257 http://www.ncbi.nlm.nih.gov/pubmed/27071849
  5. Ondov BD, Bergman NH, Phillippy AM. Interactive metagenomic visualization in a Web browser. BMC Bioinformatics. 2011;12: 385. doi:10.1186/1471-2105-12-385http://www.ncbi.nlm.nih.gov/pubmed/21961884
  6. Nurk S, Meleshko D, Korobeynikov A, Pevzner PA. metaSPAdes: a new versatile metagenomic assembler. Genome Res. 2017 May;27(5):824-834. doi: 10.1101/gr.213959.116. https://www.ncbi.nlm.nih.gov/pubmed/28298430
  7. Li D, Liu C-M, Luo R, Sadakane K, Lam T-W. MEGAHIT: an ultra-fast single-node solution for large and complex metagenomics assembly via succinct de Bruijn graph. Bioinformatics. 2015;31: 1674–1676. doi:10.1093/bioinformatics/btv033 http://www.ncbi.nlm.nih.gov/pubmed/25609793
  8. Peng Y, Leung HCM, Yiu SM, Chin FYL. IDBA-UD: a de novo assembler for single-cell and metagenomic sequencing data with highly uneven depth. Bioinformatics. 2012;28: 1420–1428. doi:10.1093/bioinformatics/bts174 https://www.ncbi.nlm.nih.gov/pubmed/22495754
  9. Wu Y-W, Simmons BA, Singer SW. MaxBin 2.0: an automated binning algorithm to recover genomes from multiple metagenomic datasets. Bioinformatics. 2016;32: 605–607. doi:10.1093/bioinformatics/btv638 https://www.ncbi.nlm.nih.gov/pubmed/26515820
  10. Wu Y-W, Tang Y-H, Tringe SG, Simmons BA, Singer SW. MaxBin: an automated binning method to recover individual genomes from metagenomes using an expectation-maximization algorithm. Microbiome. 2014;2: 26. doi:10.1186/2049-2618-2-26 https://microbiomejournal.biomedcentral.com/articles/10.1186/2049-2618-2-26
  11. Kang DD, Froula J, Egan R, Wang Z. MetaBAT, an efficient tool for accurately reconstructing single genomes from complex microbial communities. PeerJ. 2015;3: e1165. doi:10.7717/peerj.1165 https://doi.org/10.7717/peerj.1165
  12. Parks DH, Imelfort M, Skennerton CT, Hugenholtz P, Tyson GW. CheckM: assessing the quality of microbial genomes recovered from isolates, single cells, and metagenomes. Genome Research. 2015;25: 1043–1055. doi:10.1101/gr.186072.114 http://genome.cshlp.org/content/25/7/1043.long
  13. Price MN, Dehal PS, Arkin AP. FastTree 2 – Approximately Maximum-Likelihood Trees for Large Alignments. Poon AFY, editor. PLoS ONE. 2010;5: e9490. doi:10.1371/journal.pone.0009490 http://www.ncbi.nlm.nih.gov/pmc/articles/PMC2835736/
  14. Henry CS, DeJongh M, Best AA, Frybarger PM, Linsay B, Stevens RL. High-throughput generation, optimization and analysis of genome-scale metabolic models. Nat Biotechnol. 2010;28: 977 982. doi:10.1038/nbt.1672 https://www.ncbi.nlm.nih.gov/pubmed/20802497
  15. Overbeek R, Olson R, Pusch GD, Olsen GJ, Davis JJ, Disz T, et al. The SEED and the Rapid Annotation of microbial genomes using Subsystems Technology (RAST). Nucleic Acids Res. 2014;42: D206 D214. doi:10.1093/nar/gkt1226 https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3965101/
  16. Latendresse M. Efficiently gap-filling reaction networks. BMC Bioinformatics. 2014;15: 225. doi:10.1186/1471-2105-15-225 https://bmcbioinformatics.biomedcentral.com/articles/10.1186/1471-2105-15-225
  17. Dreyfuss JM, Zucker JD, Hood HM, Ocasio LR, Sachs MS, Galagan JE. Reconstruction and Validation of a Genome-Scale Metabolic Model for the Filamentous Fungus Neurospora crassa Using FARM. PLOS Computational Biology. 2013;9: e1003126. doi:10.1371/journal.pcbi.1003126 https://journals.plos.org/ploscompbiol/article?id=10.1371/journal.pcbi.1003126
  18. Mahadevan R, Schilling CH. The effects of alternate optimal solutions in constraint-based genome-scale metabolic models. Metab Eng. 2003;5: 264 276. https://www.ncbi.nlm.nih.gov/pubmed/14642354
  19. Bretscher A P , Kaiser D. Nutrition of Myxococcus xanthus, a fruiting myxobacterium. J Bac. 1978; 133 (2) 763-768 https://jb.asm.org/content/133/2/763.short
  20. Yamamoto E, Muramatsu H, Nagai K. Vulgatibacter incomptus gen. nov., sp. nov. and Labilithrix luteola gen. nov., sp. nov., two myxobacteria isolated from soil in Yakushima Island, and the description of Vulgatibacteraceae fam. nov., Labilitrichaceae fam. nov. and Anaeromyxobacteraceae fam. nov. Int J Syst Evol Microbiol. 2014;64(Pt 10):3360-3368. doi:10.1099/ijs.0.063198-0 https://pubmed.ncbi.nlm.nih.gov/25048208/

Apps

  1. Annotate Microbial Assembly with RASTtk - v1.073
    • [1] Aziz RK, Bartels D, Best AA, DeJongh M, Disz T, Edwards RA, et al. The RAST Server: Rapid Annotations using Subsystems Technology. BMC Genomics. 2008;9: 75. doi:10.1186/1471-2164-9-75
    • [2] Overbeek R, Olson R, Pusch GD, Olsen GJ, Davis JJ, Disz T, et al.vThe SEED and the Rapid Annotation of microbial genomes using Subsystems Technology (RAST). Nucleic Acids Res. 2014;42: D206 D214. doi:10.1093/nar/gkt1226
    • [3] Brettin T, Davis JJ, Disz T, Edwards RA, Gerdes S, Olsen GJ, et al. RASTtk: A modular and extensible implementation of the RAST algorithm for building custom annotation pipelines and annotating batches of genomes. Sci Rep. 2015;5. doi:10.1038/srep08365
    • [4] Kent WJ. BLAT The BLAST-Like Alignment Tool. Genome Res. 2002;12: 656 664. doi:10.1101/gr.229202
    • [5] Altschul SF, Madden TL, Schaffer AA, Zhang J, Zhang Z, Miller W, Lipman DJ. Gapped BLAST and PSI-BLAST: a new generation of protein database search programs. Nucleic Acids Res. 1997;25: 3389-3402. doi:10.1093/nar/25.17.3389
    • [6] Lowe TM, Eddy SR. tRNAscan-SE: a program for improved detection of transfer RNA genes in genomic sequence. Nucleic Acids Res. 1997;25: 955 964.
    • [7] Cobucci-Ponzano B, Rossi M, Moracci M. Translational recoding in archaea. Extremophiles. 2012;16: 793 803. doi:10.1007/s00792-012-0482-8
    • [8] Meyer F, Overbeek R, Rodriguez A. FIGfams: yet another set of protein families. Nucleic Acids Res. 2009;37 6643-54. doi:10.1093/nar/gkp698.
    • [9] van Belkum A, Sluijuter M, de Groot R, Verbrugh H, Hermans PW. Novel BOX repeat PCR assay for high-resolution typing of Streptococcus pneumoniae strains. J Clin Microbiol. 1996;34: 1176 1179.
    • [10] Croucher NJ, Vernikos GS, Parkhill J, Bentley SD. Identification, variation and transcription of pneumococcal repeat sequences. BMC Genomics. 2011;12: 120. doi:10.1186/1471-2164-12-120
    • [11] Hyatt D, Chen G-L, Locascio PF, Land ML, Larimer FW, Hauser LJ. Prodigal: prokaryotic gene recognition and translation initiation site identification. BMC Bioinformatics. 2010;11: 119. doi:10.1186/1471-2105-11-119
    • [12] Delcher AL, Bratke KA, Powers EC, Salzberg SL. Identifying bacterial genes and endosymbiont DNA with Glimmer. Bioinformatics. 2007;23: 673 679. doi:10.1093/bioinformatics/btm009
    • [13] Akhter S, Aziz RK, Edwards RA. PhiSpy: a novel algorithm for finding prophages in bacterial genomes that combines similarity- and composition-based strategies. Nucleic Acids Res. 2012;40: e126. doi:10.1093/nar/gks406
  2. Assemble Reads with IDBA-UD - v1.1.3
    • Peng Y, Leung HCM, Yiu SM, Chin FYL. IDBA-UD: a de novo assembler for single-cell and metagenomic sequencing data with highly uneven depth. Bioinformatics. 2012;28: 1420 1428. doi:10.1093/bioinformatics/bts174
  3. Assemble Reads with MEGAHIT v1.2.9
    • Li D, Liu C-M, Luo R, Sadakane K, Lam T-W. MEGAHIT: an ultra-fast single-node solution for large and complex metagenomics assembly via succinct de Bruijn graph. Bioinformatics. 2015;31: 1674 1676. doi:10.1093/bioinformatics/btv033
  4. Assemble Reads with metaSPAdes - v3.13.0
    • Nurk S, Meleshko D, Korobeynikov A, Pevzner PA. metaSPAdes: a new versatile metagenomic assembler. Genome Res. 2017; 27:824 834. doi: 10.1101/gr.213959.116
  5. Assess Genome Quality with CheckM - v1.0.18
    • Parks DH, Imelfort M, Skennerton CT, Hugenholtz P, Tyson GW. CheckM: assessing the quality of microbial genomes recovered from isolates, single cells, and metagenomes. Genome Res. 2015;25: 1043 1055. doi:10.1101/gr.186072.114
    • CheckM source:
    • Additional info:
  6. Assess Read Quality with FastQC - v0.11.5
    • FastQC source: Bioinformatics Group at the Babraham Institute, UK.
  7. Bin Contigs using MaxBin2 - v2.2.4
    • Wu Y-W, Simmons BA, Singer SW. MaxBin 2.0: an automated binning algorithm to recover genomes from multiple metagenomic datasets. Bioinformatics. 2016;32: 605 607. doi:10.1093/bioinformatics/btv638 (2) 1. Wu Y-W, Tang Y-H, Tringe SG, Simmons BA, Singer SW. MaxBin: an automated binning method to recover individual genomes from metagenomes using an expectation-maximization algorithm. Microbiome. 2014;2: 26. doi:10.1186/2049-2618-2-26
    • Wu Y-W, Tang Y-H, Tringe SG, Simmons BA, Singer SW. MaxBin: an automated binning method to recover individual genomes from metagenomes using an expectation-maximization algorithm. Microbiome. 2014;2: 26. doi:10.1186/2049-2618-2-26
    • Maxbin2 source:
    • Maxbin source:
  8. Build Metabolic Model
    • [1] Henry CS, DeJongh M, Best AA, Frybarger PM, Linsay B, Stevens RL. High-throughput generation, optimization and analysis of genome-scale metabolic models. Nat Biotechnol. 2010;28: 977 982. doi:10.1038/nbt.1672
    • [2] Overbeek R, Olson R, Pusch GD, Olsen GJ, Davis JJ, Disz T, et al. The SEED and the Rapid Annotation of microbial genomes using Subsystems Technology (RAST). Nucleic Acids Res. 2014;42: D206 D214. doi:10.1093/nar/gkt1226
    • [3] Latendresse M. Efficiently gap-filling reaction networks. BMC Bioinformatics. 2014;15: 225. doi:10.1186/1471-2105-15-225
    • [4] Dreyfuss JM, Zucker JD, Hood HM, Ocasio LR, Sachs MS, Galagan JE. Reconstruction and Validation of a Genome-Scale Metabolic Model for the Filamentous Fungus Neurospora crassa Using FARM. PLOS Computational Biology. 2013;9: e1003126. doi:10.1371/journal.pcbi.1003126
    • [5] Mahadevan R, Schilling CH. The effects of alternate optimal solutions in constraint-based genome-scale metabolic models. Metab Eng. 2003;5: 264 276.
  9. Classify Taxonomy of Metagenomic Reads with Kaiju - v1.7.3
    • Menzel P, Ng KL, Krogh A. Fast and sensitive taxonomic classification for metagenomics with Kaiju. Nat Commun. 2016;7: 11257. doi:10.1038/ncomms11257
    • Ondov BD, Bergman NH, Phillippy AM. Interactive metagenomic visualization in a Web browser. BMC Bioinformatics. 2011;12: 385. doi:10.1186/1471-2105-12-385
    • Kaiju Homepage:
    • Kaiju DBs from:
    • Github for Kaiju:
    • Krona homepage:
    • Github for Krona:
  10. Compare Assembled Contig Distributions - v1.1.2
    • Arkin AP, Cottingham RW, Henry CS, Harris NL, Stevens RL, Maslov S, et al. KBase: The United States Department of Energy Systems Biology Knowledgebase. Nature Biotechnology. 2018;36: 566. doi: 10.1038/nbt.4163
  11. Extract Bins as Assemblies from BinnedContigs - v1.0.2
    • Arkin AP, Cottingham RW, Henry CS, Harris NL, Stevens RL, Maslov S, et al. KBase: The United States Department of Energy Systems Biology Knowledgebase. Nature Biotechnology. 2018;36: 566. doi: 10.1038/nbt.4163
  12. Import FASTQ/SRA File as Reads from Staging Area
    no citations
  13. Insert Genome Into SpeciesTree - v2.2.0
    • Price MN, Dehal PS, Arkin AP. FastTree 2 Approximately Maximum-Likelihood Trees for Large Alignments. PLoS One. 2010;5. doi:10.1371/journal.pone.0009490
  14. MetaBAT2 Contig Binning - v1.7
    • Kang DD, Froula J, Egan R, Wang Z. MetaBAT, an efficient tool for accurately reconstructing single genomes from complex microbial communities. PeerJ. 2015;3: e1165. doi:10.7717/peerj.1165
    • MetaBAT2 source:
  15. Randomly Subsample Reads - v1.0.2
    • Arkin AP, Cottingham RW, Henry CS, Harris NL, Stevens RL, Maslov S, et al. KBase: The United States Department of Energy Systems Biology Knowledgebase. Nature Biotechnology. 2018;36: 566. doi: 10.1038/nbt.4163
  16. Trim Reads with Trimmomatic - v0.36
    • Bolger AM, Lohse M, Usadel B. Trimmomatic: a flexible trimmer for Illumina sequence data. Bioinformatics. 2014;30: 2114 2120. doi:10.1093/bioinformatics/btu170