Generated July 29, 2022

Metagenome-assembled genomes from Amazonian soil microbial consortia

Narrative created by: Jéssica Adriele Mandro

We report 17 metagenome-assembled genomes (MAGs) recovered from microbial consortia of forest and pasture soils, in the Brazilian Eastern Amazon. Recovered MAGs constitute novel genomes with the potential to act in important ecological processes, including the degradation of carbohydrate and sulfur and nitrogen cycling.

Brief sample description

Soil sampling was conducted in the Brazilian Eastern Amazon, state of Pará, in two areas with different land uses: a primary forest located in the Tapajós National Forest (2°51’19.6” S 54°57’30.1” W) and a pasture in the adjacent region (3°07’44.9” S 54°57’15.5” W). In each area, the soil samples from 0 to 10 cm depth were collected in quintuplicate after removing the litter layer. Microbial consortia were obtained by cultivation with CH4 by the roll-tube technique (Hungate, 1969) was established in triplicate for forest and pasture soils, which were previously enriched for 15 days with CH4 (12% v/v) and serial-diluted. The total consortia DNA was extracted using the PowerLyzer PowerSoil DNA Isolation Kit. The metagenomic libraries were constructed using Nextera DNA Flex Library Prep Kit and sequenced on the Illumina HiSeq 2500 platform (2 x 100 bp).

Sample IDs:

  • F3_S20_L005_reads.fastq: Forest metagenome
  • P1_S19_L005_reads.fastq: Pasture metagenome

1. Quality evaluation, filtering, and trimming of the reads.

A quality control application for high throughput sequence data.
This app completed without errors in 31m 40s.
Links
Files
These are only available in the live Narrative: https://narrative.kbase.us/narrative/116951
  • F3_S20_L005_reads.fastq_116951_9_1.fwd_fastqc.zip - Zip file generated by fastqc that contains original images seen in the report
  • F3_S20_L005_reads.fastq_116951_9_1.rev_fastqc.zip - Zip file generated by fastqc that contains original images seen in the report
A quality control application for high throughput sequence data.
This app completed without errors in 43m 6s.
Links
Files
These are only available in the live Narrative: https://narrative.kbase.us/narrative/116951
  • P1_S19_L005_reads.fastq_116951_8_1.fwd_fastqc.zip - Zip file generated by fastqc that contains original images seen in the report
  • P1_S19_L005_reads.fastq_116951_8_1.rev_fastqc.zip - Zip file generated by fastqc that contains original images seen in the report
Trim paired- or single-end Illumina reads with Trimmomatic.
This app completed without errors in 1h 4m 47s.
Objects
Created Object Name Type Description
F3_S20_L005_trimmomatic_reads.fastq_paired PairedEndLibrary Trimmed Reads
F3_S20_L005_trimmomatic_reads.fastq_unpaired_fwd SingleEndLibrary Trimmed Unpaired Forward Reads
F3_S20_L005_trimmomatic_reads.fastq_unpaired_rev SingleEndLibrary Trimmed Unpaired Reverse Reads
Trim paired- or single-end Illumina reads with Trimmomatic.
This app completed without errors in 1h 42m 0s.
Objects
Created Object Name Type Description
P1_S19_L005_trimmomatic_reads.fastq_paired PairedEndLibrary Trimmed Reads
P1_S19_L005_trimmomatic_reads.fastq_unpaired_fwd SingleEndLibrary Trimmed Unpaired Forward Reads
P1_S19_L005_trimmomatic_reads.fastq_unpaired_rev SingleEndLibrary Trimmed Unpaired Reverse Reads
A quality control application for high throughput sequence data.
This app completed without errors in 23m 57s.
Links
Files
These are only available in the live Narrative: https://narrative.kbase.us/narrative/116951
  • F3_S20_L005_trimmomatic_reads.fastq_paired_116951_12_1.rev_fastqc.zip - Zip file generated by fastqc that contains original images seen in the report
  • F3_S20_L005_trimmomatic_reads.fastq_paired_116951_12_1.fwd_fastqc.zip - Zip file generated by fastqc that contains original images seen in the report
A quality control application for high throughput sequence data.
This app completed without errors in 30m 8s.
Links
Files
These are only available in the live Narrative: https://narrative.kbase.us/narrative/116951
  • P1_S19_L005_trimmomatic_reads.fastq_paired_116951_16_1.rev_fastqc.zip - Zip file generated by fastqc that contains original images seen in the report
  • P1_S19_L005_trimmomatic_reads.fastq_paired_116951_16_1.fwd_fastqc.zip - Zip file generated by fastqc that contains original images seen in the report

2. Assemble reads.

Assemble metagenomic reads using the SPAdes assembler.
This app completed without errors in 1h 22m 45s.
Objects
Created Object Name Type Description
F3_S20_L005_metaSPAdes.contigs Assembly Assembled contigs
Summary
Assembly saved to: jmand:narrative_1652271257887/F3_S20_L005_metaSPAdes.contigs Assembled into 2883 contigs. Avg Length: 15996.698577870275 bp. Contig Length Distribution (# of contigs -- min to max basepairs): 2813 -- 1503.0 to 94907.3 bp 36 -- 94907.3 to 188311.6 bp 12 -- 188311.6 to 281715.9 bp 11 -- 281715.9 to 375120.2 bp 3 -- 375120.2 to 468524.5 bp 2 -- 468524.5 to 561928.8 bp 2 -- 561928.8 to 655333.1 bp 1 -- 655333.1 to 748737.4 bp 0 -- 748737.4 to 842141.7000000001 bp 3 -- 842141.7000000001 to 935546.0 bp
Links
Assemble metagenomic reads using the SPAdes assembler.
This app completed without errors in 1h 25m 53s.
Objects
Created Object Name Type Description
P1_S19_L005_metaSPAdes.contigs Assembly Assembled contigs
Summary
Assembly saved to: jmand:narrative_1652271257887/P1_S19_L005_metaSPAdes.contigs Assembled into 7503 contigs. Avg Length: 11871.969745435159 bp. Contig Length Distribution (# of contigs -- min to max basepairs): 7449 -- 1500.0 to 168667.2 bp 22 -- 168667.2 to 335834.4 bp 12 -- 335834.4 to 503001.60000000003 bp 8 -- 503001.60000000003 to 670168.8 bp 2 -- 670168.8 to 837336.0 bp 4 -- 837336.0 to 1004503.2000000001 bp 1 -- 1004503.2000000001 to 1171670.4000000001 bp 1 -- 1171670.4000000001 to 1338837.6 bp 2 -- 1338837.6 to 1506004.8 bp 2 -- 1506004.8 to 1673172.0 bp
Links

3. Binning.

Group assembled metagenomic contigs into lineages (Bins) using depth-of-coverage, nucleotide composition, and marker genes.
This app completed without errors in 48m 7s.
Objects
Created Object Name Type Description
F3_S20_L005_BinnedContigs BinnedContigs BinnedContigs from MaxBin2
Links
Files
These are only available in the live Narrative: https://narrative.kbase.us/narrative/116951
  • maxbin_result.zip - File(s) generated by MaxBin2 App
Group assembled metagenomic contigs into lineages (Bins) using depth-of-coverage, nucleotide composition, and marker genes.
This app completed without errors in 1h 7m 16s.
Objects
Created Object Name Type Description
P1_S19_L005.BinnedContigs BinnedContigs BinnedContigs from MaxBin2
Links
Files
These are only available in the live Narrative: https://narrative.kbase.us/narrative/116951
  • maxbin_result.zip - File(s) generated by MaxBin2 App

4. Bin quality assessment.

Runs the CheckM lineage workflow to assess the genome quality of isolates, single cells, or genome bins from metagenome assemblies through comparison to an existing database of genomes.
This app completed without errors in 26m 39s.
Links
Files
These are only available in the live Narrative: https://narrative.kbase.us/narrative/116951
  • CheckM_summary_table.tsv.zip - TSV Summary Table from CheckM
  • full_output.zip - Full output of CheckM
  • plots.zip - Output plots from CheckM
Runs the CheckM lineage workflow to assess the genome quality of isolates, single cells, or genome bins from metagenome assemblies through comparison to an existing database of genomes.
This app completed without errors in 35m 40s.
Links
Files
These are only available in the live Narrative: https://narrative.kbase.us/narrative/116951
  • CheckM_summary_table.tsv.zip - TSV Summary Table from CheckM
  • full_output.zip - Full output of CheckM
  • plots.zip - Output plots from CheckM

5. Bin extraction.

Extract a bin as an Assembly from a BinnedContig dataset
This app completed without errors in 4m 34s.
Objects
Created Object Name Type Description
MAGs_Forest.AssemblySet AssemblySet Assembly set of extracted assemblies
Bin.001.fasta_Forest Assembly Assembly object of extracted contigs
Bin.002.fasta_Forest Assembly Assembly object of extracted contigs
Bin.003.fasta_Forest Assembly Assembly object of extracted contigs
Bin.005.fasta_Forest Assembly Assembly object of extracted contigs
Bin.006.fasta_Forest Assembly Assembly object of extracted contigs
Summary
Job Finished Generated Assembly Reference: 116951/202/1, 116951/204/1, 116951/205/1, 116951/206/1, 116951/208/1 Generated Assembly Set: 116951/210/1
Extract a bin as an Assembly from a BinnedContig dataset
This app completed without errors in 8m 47s.
Objects
Created Object Name Type Description
MAGs_Pasture.AssemblySet AssemblySet Assembly set of extracted assemblies
Bin.001.fasta_Pasture Assembly Assembly object of extracted contigs
Bin.002.fasta_Pasture Assembly Assembly object of extracted contigs
Bin.003.fasta_Pasture Assembly Assembly object of extracted contigs
Bin.004.fasta_Pasture Assembly Assembly object of extracted contigs
Bin.005.fasta_Pasture Assembly Assembly object of extracted contigs
Bin.006.fasta_Pasture Assembly Assembly object of extracted contigs
Bin.007.fasta_Pasture Assembly Assembly object of extracted contigs
Bin.008.fasta_Pasture Assembly Assembly object of extracted contigs
Bin.009.fasta_Pasture Assembly Assembly object of extracted contigs
Bin.011.fasta_Pasture Assembly Assembly object of extracted contigs
Bin.012.fasta_Pasture Assembly Assembly object of extracted contigs
Bin.015.fasta_Pasture Assembly Assembly object of extracted contigs
Summary
Job Finished Generated Assembly Reference: 116951/207/1, 116951/209/1, 116951/212/1, 116951/213/1, 116951/214/1, 116951/215/1, 116951/216/1, 116951/217/1, 116951/218/1, 116951/219/1, 116951/220/1, 116951/221/1 Generated Assembly Set: 116951/222/1

6. Bin taxonomic classification.

Obtain objective taxonomic assignments for bacterial and archaeal genomes based on the Genome Taxonomy Database (GTDB) ver R06-RS202
This app completed without errors in 53m 20s.
Links
Obtain objective taxonomic assignments for bacterial and archaeal genomes based on the Genome Taxonomy Database (GTDB) ver R06-RS202
This app completed without errors in 41m 13s.
Links

7. Bin relative abundance.

Align sequencing reads to long reference prokaryotic genome sequences using Bowtie2.
This app completed without errors in 2h 20m 22s.
No output found.
Align sequencing reads to long reference prokaryotic genome sequences using Bowtie2.
This app completed without errors in 1h 57m 58s.
No output found.
Align sequencing reads to long reference prokaryotic genome sequences using Bowtie2.
This app completed without errors in 2h 6m 39s.
No output found.
Align sequencing reads to long reference prokaryotic genome sequences using Bowtie2.
This app completed without errors in 1h 39m 16s.
No output found.
Align sequencing reads to long reference prokaryotic genome sequences using Bowtie2.
This app completed without errors in 1h 20m 56s.
No output found.
Align sequencing reads to long reference prokaryotic genome sequences using Bowtie2.
This app completed without errors in 2h 48m 3s.
No output found.
Align sequencing reads to long reference prokaryotic genome sequences using Bowtie2.
This app completed without errors in 1h 49m 56s.
No output found.
Align sequencing reads to long reference prokaryotic genome sequences using Bowtie2.
This app completed without errors in 1h 49m 42s.
No output found.
Align sequencing reads to long reference prokaryotic genome sequences using Bowtie2.
This app completed without errors in 1h 42m 40s.
No output found.
Align sequencing reads to long reference prokaryotic genome sequences using Bowtie2.
This app completed without errors in 1h 36m 40s.
No output found.
Align sequencing reads to long reference prokaryotic genome sequences using Bowtie2.
This app completed without errors in 1h 55m 14s.
No output found.
Align sequencing reads to long reference prokaryotic genome sequences using Bowtie2.
This app completed without errors in 1h 46m 7s.
No output found.
Align sequencing reads to long reference prokaryotic genome sequences using Bowtie2.
This app completed without errors in 1h 44m 51s.
No output found.
Align sequencing reads to long reference prokaryotic genome sequences using Bowtie2.
This app completed without errors in 1h 36m 39s.
No output found.
Align sequencing reads to long reference prokaryotic genome sequences using Bowtie2.
This app completed without errors in 1h 45m 9s.
No output found.
Align sequencing reads to long reference prokaryotic genome sequences using Bowtie2.
This app completed without errors in 1h 41m 51s.
No output found.
Align sequencing reads to long reference prokaryotic genome sequences using Bowtie2.
This app completed without errors in 1h 43m 16s.
No output found.

8. Genome functional annotation.

Annotate your assembly with DRAM. Annotations will then be distilled to create an interactive functional summary per assembly.
This app completed without errors in 3h 42m 23s.
Objects
Created Object Name Type Description
Bin.001.fasta_Forest_DRAM Genome Annotated Genome
Bin.002.fasta_Forest_DRAM Genome Annotated Genome
Bin.003.fasta_Forest_DRAM Genome Annotated Genome
Bin.005.fasta_Forest_DRAM Genome Annotated Genome
Bin.006.fasta_Forest_DRAM Genome Annotated Genome
F3_S20_L005.DRAM GenomeSet Functional annotation of high and medium quality forest bins
Summary
Here are the results from your DRAM run.
Links
Files
These are only available in the live Narrative: https://narrative.kbase.us/narrative/116951
  • annotations.tsv - DRAM annotations in a tab separate table format
  • genes.fna - Genes as nucleotides predicted by DRAM with brief annotations
  • genes.faa - Genes as amino acids predicted by DRAM with brief annotations
  • genes.gff - GFF file of all DRAM annotations
  • rrnas.tsv - Tab separated table of rRNAs as detected by barrnap
  • trnas.tsv - Tab separated table of tRNAs as detected by tRNAscan-SE
  • genbank.tar.gz - Compressed folder of output genbank files
  • product.tsv - DRAM product in tabular format
  • metabolism_summary.xlsx - DRAM metabolism summary tables
  • genome_stats.tsv - DRAM genome statistics table
Annotate your assembly with DRAM. Annotations will then be distilled to create an interactive functional summary per assembly.
This app completed without errors in 7h 31m 31s.
Objects
Created Object Name Type Description
Bin.001.fasta_Pasture_DRAM Genome Annotated Genome
Bin.002.fasta_Pasture_DRAM Genome Annotated Genome
Bin.003.fasta_Pasture_DRAM Genome Annotated Genome
Bin.004.fasta_Pasture_DRAM Genome Annotated Genome
Bin.005.fasta_Pasture_DRAM Genome Annotated Genome
Bin.006.fasta_Pasture_DRAM Genome Annotated Genome
Bin.007.fasta_Pasture_DRAM Genome Annotated Genome
Bin.008.fasta_Pasture_DRAM Genome Annotated Genome
Bin.009.fasta_Pasture_DRAM Genome Annotated Genome
Bin.011.fasta_Pasture_DRAM Genome Annotated Genome
Bin.012.fasta_Pasture_DRAM Genome Annotated Genome
Bin.015.fasta_Pasture_DRAM Genome Annotated Genome
P3_S19_L005.DRAM GenomeSet Functional annotation of high and medium quality pasture bins
Summary
Here are the results from your DRAM run.
Links
Files
These are only available in the live Narrative: https://narrative.kbase.us/narrative/116951
  • annotations.tsv - DRAM annotations in a tab separate table format
  • genes.fna - Genes as nucleotides predicted by DRAM with brief annotations
  • genes.faa - Genes as amino acids predicted by DRAM with brief annotations
  • genes.gff - GFF file of all DRAM annotations
  • rrnas.tsv - Tab separated table of rRNAs as detected by barrnap
  • trnas.tsv - Tab separated table of tRNAs as detected by tRNAscan-SE
  • genbank.tar.gz - Compressed folder of output genbank files
  • product.tsv - DRAM product in tabular format
  • metabolism_summary.xlsx - DRAM metabolism summary tables
  • genome_stats.tsv - DRAM genome statistics table

Apps

  1. Align Reads using Bowtie2 - v2.3.2
    • Langmead B, Salzberg SL. Fast gapped-read alignment with Bowtie 2. Nat Methods. 2012;9: 357 359. doi:10.1038/nmeth.1923
    • Langmead B, Trapnell C, Pop M, Salzberg SL. Ultrafast and memory-efficient alignment of short DNA sequences to the human genome. Genome Biol. 2009;10: R25. doi:10.1186/gb-2009-10-3-r25
  2. Annotate and Distill Assemblies with DRAM
    • DRAM source code
    • DRAM documentation
    • DRAM publication
  3. Assemble Reads with metaSPAdes - v3.15.3
    • Nurk S, Meleshko D, Korobeynikov A, Pevzner PA. metaSPAdes: a new versatile metagenomic assembler. Genome Res. 2017; 27:824 834. doi: 10.1101/gr.213959.116
    • Prjibelski A, Antipov D, Meleshko D, Lapidus A, Korobeynikov A. Using SPAdes De Novo Assembler. Curr Protoc Bioinformatics. 2020 Jun;70(1):e102. doi: 10.1002/cpbi.102.
  4. Assess Genome Quality with CheckM - v1.0.18
    • Parks DH, Imelfort M, Skennerton CT, Hugenholtz P, Tyson GW. CheckM: assessing the quality of microbial genomes recovered from isolates, single cells, and metagenomes. Genome Res. 2015;25: 1043 1055. doi:10.1101/gr.186072.114
    • CheckM source:
    • Additional info:
  5. Assess Read Quality with FastQC - v0.11.9
    • FastQC source: Bioinformatics Group at the Babraham Institute, UK.
  6. Bin Contigs using MaxBin2 - v2.2.4
    • Wu Y-W, Simmons BA, Singer SW. MaxBin 2.0: an automated binning algorithm to recover genomes from multiple metagenomic datasets. Bioinformatics. 2016;32: 605 607. doi:10.1093/bioinformatics/btv638 (2) 1. Wu Y-W, Tang Y-H, Tringe SG, Simmons BA, Singer SW. MaxBin: an automated binning method to recover individual genomes from metagenomes using an expectation-maximization algorithm. Microbiome. 2014;2: 26. doi:10.1186/2049-2618-2-26
    • Wu Y-W, Tang Y-H, Tringe SG, Simmons BA, Singer SW. MaxBin: an automated binning method to recover individual genomes from metagenomes using an expectation-maximization algorithm. Microbiome. 2014;2: 26. doi:10.1186/2049-2618-2-26
    • Maxbin2 source:
    • Maxbin source:
  7. Classify Microbes with GTDB-Tk - v1.7.0
    • Pierre-Alain Chaumeil, Aaron J Mussig, Philip Hugenholtz, Donovan H Parks, GTDB-Tk: a toolkit to classify genomes with the Genome Taxonomy Database, Bioinformatics, Volume 36, Issue 6, 15 March 2020, Pages 1925 1927. DOI: https://doi.org/10.1093/bioinformatics/btz848
    • Parks, D., Chuvochina, M., Waite, D. et al. A standardized bacterial taxonomy based on genome phylogeny substantially revises the tree of life. Nat Biotechnol 36, 996 1004 (2018). DOI: https://doi.org/10.1038/nbt.4229
    • Parks DH, Chuvochina M, Chaumeil PA, Rinke C, Mussig AJ, Hugenholtz P. A complete domain-to-species taxonomy for Bacteria and Archaea. Nat Biotechnol. 2020;10.1038/s41587-020-0501-8. DOI:10.1038/s41587-020-0501-8
    • Rinke C, Chuvochina M, Mussig AJ, Chaumeil PA, Dav n AA, Waite DW, Whitman WB, Parks DH, and Hugenholtz P. A standardized archaeal taxonomy for the Genome Taxonomy Database. Nat Microbiol. 2021 Jul;6(7):946-959. DOI:10.1038/s41564-021-00918-8
    • Matsen FA, Kodner RB, Armbrust EV. pplacer: linear time maximum-likelihood and Bayesian phylogenetic placement of sequences onto a fixed reference tree. BMC Bioinformatics. 2010;11:538. Published 2010 Oct 30. doi:10.1186/1471-2105-11-538
    • Jain C, Rodriguez-R LM, Phillippy AM, Konstantinidis KT, Aluru S. High throughput ANI analysis of 90K prokaryotic genomes reveals clear species boundaries. Nat Commun. 2018;9(1):5114. Published 2018 Nov 30. DOI:10.1038/s41467-018-07641-9
    • Hyatt D, Chen GL, Locascio PF, Land ML, Larimer FW, Hauser LJ. Prodigal: prokaryotic gene recognition and translation initiation site identification. BMC Bioinformatics. 2010;11:119. Published 2010 Mar 8. DOI:10.1186/1471-2105-11-119
    • Price MN, Dehal PS, Arkin AP. FastTree 2--approximately maximum-likelihood trees for large alignments. PLoS One. 2010;5(3):e9490. Published 2010 Mar 10. DOI:10.1371/journal.pone.0009490 link: https://www.ncbi.nlm.nih.gov/pmc/articles/PMC2835736/
    • Eddy SR. Accelerated Profile HMM Searches. PLoS Comput Biol. 2011;7(10):e1002195. DOI:10.1371/journal.pcbi.1002195
  8. Extract Bins as Assemblies from BinnedContigs - v1.0.2
    • Arkin AP, Cottingham RW, Henry CS, Harris NL, Stevens RL, Maslov S, et al. KBase: The United States Department of Energy Systems Biology Knowledgebase. Nature Biotechnology. 2018;36: 566. doi: 10.1038/nbt.4163
  9. Trim Reads with Trimmomatic - v0.36
    • Bolger AM, Lohse M, Usadel B. Trimmomatic: a flexible trimmer for Illumina sequence data. Bioinformatics. 2014;30: 2114 2120. doi:10.1093/bioinformatics/btu170