Generated July 2, 2025

MAGs from alcoholic fermentation in sugarcane biorefineries

Andressa M. Venturini, Carolina T. Martins, and Andreas K. Gombert

Genome-resolved metagenomics was applied to recover microbial metagenome-assembled genomes (MAGs) from alcoholic fermentation samples collected at two sugarcane biorefineries in São Paulo state, Brazil, during the 2024 harvest season.

Table of Contents

  • 1. Data importing
  • 2. Read trimming and filtering
  • 3. Taxonomic classification of reads
  • 4. Read assembly
  • 5. Contig binning
  • 6. MAG quality filtering
  • 7. MAG extraction
  • 8. Taxonomic classification of MAGs
  • 9. ANI comparisons
  • 10. Functional annotation
  • 11. Calculate MAG relative abundance
  • 1. Data importing

    from biokbase.narrative.jobs.appmanager import AppManager
    AppManager().run_app_batch(
        [{
            "app_id": "kb_uploadmethods/import_fastq_noninterleaved_as_reads_from_staging",
            "tag": "release",
            "version": "5b9346463df88a422ff5d4f4cba421679f63c73f",
            "params": [{
                "fastq_fwd_staging_file_name": "NGS910_UIRA_S222_L001_R1_001.fastq.gz",
                "fastq_rev_staging_file_name": "NGS910_UIRA_S222_L001_R2_001.fastq.gz",
                "name": "NGS910_UIRA_S222"
            }],
            "shared_params": {
                "sequencing_tech": "Illumina",
                "single_genome": 0,
                "read_orientation_outward": 0,
                "insert_size_std_dev": None,
                "insert_size_mean": None
            }
        }],
        cell_id="64c07f2f-d0ae-4e11-a160-83f68f98e79c",
        run_id="6d749821-edd1-4f64-a2c4-0e081af2943f"
    )
    
    from biokbase.narrative.jobs.appmanager import AppManager
    AppManager().run_app_batch(
        [{
            "app_id": "kb_uploadmethods/import_fastq_noninterleaved_as_reads_from_staging",
            "tag": "release",
            "version": "5b9346463df88a422ff5d4f4cba421679f63c73f",
            "params": [{
                "fastq_fwd_staging_file_name": "NGS910_USM_S223_L001_R1_001.fastq.gz",
                "fastq_rev_staging_file_name": "NGS910_USM_S223_L001_R2_001.fastq.gz",
                "name": "NGS910_UIRA_S223"
            }, {
                "fastq_fwd_staging_file_name": "NGS910_USC_S224_L001_R1_001.fastq.gz",
                "fastq_rev_staging_file_name": "NGS910_USC_S224_L001_R2_001.fastq.gz",
                "name": "NGS910_UIRA_S224"
            }],
            "shared_params": {
                "sequencing_tech": "Illumina",
                "single_genome": 0,
                "read_orientation_outward": 0,
                "insert_size_std_dev": None,
                "insert_size_mean": None
            }
        }],
        cell_id="89f15099-cd1f-4688-af7a-932eb7e5f785",
        run_id="6c3bf216-86fa-48f9-a299-72f58feac0e5"
    )
    

    2. Read trimming and filtering

    A quality control application for high throughput sequence data.
    This app completed without errors in 9m 32s.
    Links
    Files
    These are only available in the live Narrative: https://narrative.kbase.us/narrative/206696
    • NGS910_UIRA_S222_206696_2_1.rev_fastqc.zip - Zip file generated by fastqc that contains original images seen in the report
    • NGS910_UIRA_S222_206696_2_1.fwd_fastqc.zip - Zip file generated by fastqc that contains original images seen in the report
    A quality control application for high throughput sequence data.
    This app completed without errors in 9m 19s.
    Links
    Files
    These are only available in the live Narrative: https://narrative.kbase.us/narrative/206696
    • NGS910_UIRA_S223_206696_4_1.fwd_fastqc.zip - Zip file generated by fastqc that contains original images seen in the report
    • NGS910_UIRA_S223_206696_4_1.rev_fastqc.zip - Zip file generated by fastqc that contains original images seen in the report
    A quality control application for high throughput sequence data.
    This app completed without errors in 9m 47s.
    Links
    Files
    These are only available in the live Narrative: https://narrative.kbase.us/narrative/206696
    • NGS910_UIRA_S224_206696_5_1.rev_fastqc.zip - Zip file generated by fastqc that contains original images seen in the report
    • NGS910_UIRA_S224_206696_5_1.fwd_fastqc.zip - Zip file generated by fastqc that contains original images seen in the report
    Trim paired- or single-end Illumina reads with Trimmomatic.
    This app completed without errors in 34m 21s.
    Objects
    Created Object Name Type Description
    S222_clean_paired PairedEndLibrary Trimmed Reads
    S222_clean_unpaired_fwd SingleEndLibrary Trimmed Unpaired Forward Reads
    S222_clean_unpaired_rev SingleEndLibrary Trimmed Unpaired Reverse Reads
    Trim paired- or single-end Illumina reads with Trimmomatic.
    This app completed without errors in 34m 22s.
    Objects
    Created Object Name Type Description
    S223_clean_paired PairedEndLibrary Trimmed Reads
    S223_clean_unpaired_fwd SingleEndLibrary Trimmed Unpaired Forward Reads
    S223_clean_unpaired_rev SingleEndLibrary Trimmed Unpaired Reverse Reads
    Trim paired- or single-end Illumina reads with Trimmomatic.
    This app completed without errors in 34m 48s.
    Objects
    Created Object Name Type Description
    S224_clean_paired PairedEndLibrary Trimmed Reads
    S224_clean_unpaired_fwd SingleEndLibrary Trimmed Unpaired Forward Reads
    S224_clean_unpaired_rev SingleEndLibrary Trimmed Unpaired Reverse Reads

    3. Taxonomic classification of reads

    Allows users to perform taxonomic classification of shotgun metagenomic read data with Kaiju.
    This app completed without errors in 2h 24m 1s.
    Links
    Files
    These are only available in the live Narrative: https://narrative.kbase.us/narrative/206696
    • kaiju_classifications.zip
    • kaiju_summaries.zip
    • krona_data.zip
    • stacked_bar_abundance_plots_PNG+PDF.zip
    Calculate pangenome for microbial genomes, including MAGs of varying quality
    This app is new, and hasn't been started.
    No output found.

    4. Read assembly

    Assemble metagenomic reads using the MEGAHIT assembler.
    This app completed without errors in 36m 24s.
    Objects
    Created Object Name Type Description
    S222_MEGAHIT.assembly Assembly Assembled contigs
    Summary
    ContigSet saved to: andressamv:narrative_1739477632159/S222_MEGAHIT.assembly Assembled into 1207 contigs. Avg Length: 13789.438276719138 bp. Contig Length Distribution (# of contigs -- min to max basepairs): 1035 -- 2000.0 to 27311.1 bp 105 -- 27311.1 to 52622.2 bp 39 -- 52622.2 to 77933.29999999999 bp 18 -- 77933.29999999999 to 103244.4 bp 7 -- 103244.4 to 128555.5 bp 1 -- 128555.5 to 153866.59999999998 bp 0 -- 153866.59999999998 to 179177.69999999998 bp 1 -- 179177.69999999998 to 204488.8 bp 0 -- 204488.8 to 229799.9 bp 1 -- 229799.9 to 255111.0 bp
    Links
    Assemble metagenomic reads using the MEGAHIT assembler.
    This app completed without errors in 33m 20s.
    Objects
    Created Object Name Type Description
    S223_MEGAHIT.assembly Assembly Assembled contigs
    Summary
    ContigSet saved to: andressamv:narrative_1739477632159/S223_MEGAHIT.assembly Assembled into 703 contigs. Avg Length: 23341.479374110953 bp. Contig Length Distribution (# of contigs -- min to max basepairs): 461 -- 2005.0 to 20622.4 bp 107 -- 20622.4 to 39239.8 bp 49 -- 39239.8 to 57857.200000000004 bp 38 -- 57857.200000000004 to 76474.6 bp 22 -- 76474.6 to 95092.0 bp 11 -- 95092.0 to 113709.40000000001 bp 4 -- 113709.40000000001 to 132326.80000000002 bp 3 -- 132326.80000000002 to 150944.2 bp 5 -- 150944.2 to 169561.6 bp 3 -- 169561.6 to 188179.0 bp
    Links
    Assemble metagenomic reads using the MEGAHIT assembler.
    This app completed without errors in 35m 12s.
    Objects
    Created Object Name Type Description
    S224_MEGAHIT.assembly Assembly Assembled contigs
    Summary
    ContigSet saved to: andressamv:narrative_1739477632159/S224_MEGAHIT.assembly Assembled into 540 contigs. Avg Length: 22248.533333333333 bp. Contig Length Distribution (# of contigs -- min to max basepairs): 399 -- 2001.0 to 27700.4 bp 80 -- 27700.4 to 53399.8 bp 33 -- 53399.8 to 79099.20000000001 bp 17 -- 79099.20000000001 to 104798.6 bp 4 -- 104798.6 to 130498.0 bp 5 -- 130498.0 to 156197.40000000002 bp 1 -- 156197.40000000002 to 181896.80000000002 bp 0 -- 181896.80000000002 to 207596.2 bp 0 -- 207596.2 to 233295.6 bp 1 -- 233295.6 to 258995.0 bp
    Links

    5. Contig binning

    Group assembled metagenomic contigs into lineages (Bins) using depth-of-coverage and nucleotide composition
    This app completed without errors in 43m 28s.
    Links
    Files
    These are only available in the live Narrative: https://narrative.kbase.us/narrative/206696
    • concoct_result.zip - Files generated by CONCOCT App
    Group assembled metagenomic contigs into lineages (Bins) using depth-of-coverage and nucleotide composition
    This app completed without errors in 38m 27s.
    Links
    Files
    These are only available in the live Narrative: https://narrative.kbase.us/narrative/206696
    • concoct_result.zip - Files generated by CONCOCT App
    Group assembled metagenomic contigs into lineages (Bins) using depth-of-coverage and nucleotide composition
    This app completed without errors in 44m 59s.
    Links
    Files
    These are only available in the live Narrative: https://narrative.kbase.us/narrative/206696
    • concoct_result.zip - Files generated by CONCOCT App

    6. MAG quality filtering

    Runs the CheckM lineage workflow to assess the genome quality of isolates, single cells, or genome bins from metagenome assemblies through comparison to an existing database of genomes.
    This app completed without errors in 20m 39s.
    Links
    Files
    These are only available in the live Narrative: https://narrative.kbase.us/narrative/206696
    • CheckM_summary_table.tsv.zip - TSV Summary Table from CheckM
    • full_output.zip - Full output of CheckM
    • plots.zip - Output plots from CheckM
    Runs the CheckM lineage workflow to assess the genome quality of isolates, single cells, or genome bins from metagenome assemblies through comparison to an existing database of genomes.
    This app completed without errors in 28m 36s.
    Links
    Files
    These are only available in the live Narrative: https://narrative.kbase.us/narrative/206696
    • CheckM_summary_table.tsv.zip - TSV Summary Table from CheckM
    • full_output.zip - Full output of CheckM
    • plots.zip - Output plots from CheckM
    Runs the CheckM lineage workflow to assess the genome quality of isolates, single cells, or genome bins from metagenome assemblies through comparison to an existing database of genomes.
    This app completed without errors in 21m 29s.
    Links
    Files
    These are only available in the live Narrative: https://narrative.kbase.us/narrative/206696
    • CheckM_summary_table.tsv.zip - TSV Summary Table from CheckM
    • full_output.zip - Full output of CheckM
    • plots.zip - Output plots from CheckM
    Runs the CheckM lineage workflow to assess the genome quality of isolates, single cells, or genome bins from metagenome assemblies through comparison to an existing database of genomes. Creates a new BinnedContigs object with High Quality bins that pass user-defined thresholds for Completeness and Contamination.
    This app completed without errors in 21m 18s.
    Objects
    Created Object Name Type Description
    S222_CheckM_HQ_bins.BinnedContigs BinnedContigs HQ BinnedContigs S222_CheckM_HQ_bins.BinnedContigs
    Links
    Files
    These are only available in the live Narrative: https://narrative.kbase.us/narrative/206696
    • CheckM_summary_table.tsv.zip - TSV Summary Table from CheckM
    • full_output.zip - Full output of CheckM
    • plots.zip - Output plots from CheckM
    Runs the CheckM lineage workflow to assess the genome quality of isolates, single cells, or genome bins from metagenome assemblies through comparison to an existing database of genomes. Creates a new BinnedContigs object with High Quality bins that pass user-defined thresholds for Completeness and Contamination.
    This app completed without errors in 29m 46s.
    Objects
    Created Object Name Type Description
    S223_CheckM_HQ_bins.BinnedContigs BinnedContigs HQ BinnedContigs S223_CheckM_HQ_bins.BinnedContigs
    Links
    Files
    These are only available in the live Narrative: https://narrative.kbase.us/narrative/206696
    • CheckM_summary_table.tsv.zip - TSV Summary Table from CheckM
    • full_output.zip - Full output of CheckM
    • plots.zip - Output plots from CheckM
    Runs the CheckM lineage workflow to assess the genome quality of isolates, single cells, or genome bins from metagenome assemblies through comparison to an existing database of genomes. Creates a new BinnedContigs object with High Quality bins that pass user-defined thresholds for Completeness and Contamination.
    This app completed without errors in 22m 32s.
    Links
    Files
    These are only available in the live Narrative: https://narrative.kbase.us/narrative/206696
    • CheckM_summary_table.tsv.zip - TSV Summary Table from CheckM
    • full_output.zip - Full output of CheckM
    • plots.zip - Output plots from CheckM

    7. MAG extraction

    Extract a bin as an Assembly from a BinnedContig dataset
    This app completed without errors in 1m 10s.
    Objects
    Created Object Name Type Description
    S222_HQ_extracted_bins.AssemblySet AssemblySet Assembly set of extracted assemblies
    Bin.049.fastaS222_HQ_assembly Assembly Assembly object of extracted contigs
    Bin.013.fastaS222_HQ_assembly Assembly Assembly object of extracted contigs
    Summary
    Job Finished Generated Assembly Reference: 206696/209/1, 206696/211/1 Generated Assembly Set: 206696/212/1
    Extract a bin as an Assembly from a BinnedContig dataset
    This app completed without errors in 1m 7s.
    Objects
    Created Object Name Type Description
    S223_HQ_extracted_bins.AssemblySet AssemblySet Assembly set of extracted assemblies
    Bin.097.fastaS223_HQ_assembly Assembly Assembly object of extracted contigs
    Bin.074.fastaS223_HQ_assembly Assembly Assembly object of extracted contigs
    Summary
    Job Finished Generated Assembly Reference: 206696/210/1, 206696/213/1 Generated Assembly Set: 206696/215/1
    Allows users to create an AssemblySet object.
    This app completed without errors in 19s.
    Objects
    Created Object Name Type Description
    HQs_bins AssemblySet KButil_Build_AssemblySet
    Summary
    assembly objs in output set HQs_bins: 4

    8. Taxonomic classification of MAGs

    Obtain objective taxonomic assignments for bacterial and archaeal genomes based on the Genome Taxonomy Database (GTDB)
    This app completed without errors in 19m 2s.
    Objects
    Created Object Name Type Description
    Bin.013.fastaS222_HQ_assembly Assembly Added GTDB lineage
    Bin.049.fastaS222_HQ_assembly Assembly Added GTDB lineage
    Bin.074.fastaS223_HQ_assembly Assembly Added GTDB lineage
    Bin.097.fastaS223_HQ_assembly Assembly Added GTDB lineage
    HQs_bins AssemblySet Added GTDB lineage
    Links
    Files
    These are only available in the live Narrative: https://narrative.kbase.us/narrative/206696
    • gtdbtk.backbone.bac120.classify.tree - gtdbtk.backbone.bac120.classify.tree - whole tree GTDB formatted Newick
    • gtdbtk.backbone.bac120.classify-ITOL.tree - gtdbtk.backbone.bac120.classify-ITOL.tree - whole tree ITOL formatted Newick
    • gtdbtk.bac120.classify.tree.6.tree - gtdbtk.bac120.classify.tree.6.tree - whole tree GTDB formatted Newick
    • gtdbtk.bac120.classify.tree.6-ITOL.tree - gtdbtk.bac120.classify.tree.6-ITOL.tree - whole tree ITOL formatted Newick
    • gtdbtk.backbone.bac120.classify-proximals.tree - gtdbtk.backbone.bac120.classify-proximals.tree - Newick
    • gtdbtk.backbone.bac120.classify-trimmed.tree - gtdbtk.backbone.bac120.classify-trimmed.tree - Newick
    • gtdbtk.backbone.bac120.classify-lineages.map - gtdbtk.backbone.bac120.classify-lineages.map - GTDB lineage
    • gtdbtk.backbone.bac120.classify-trimmed.tree-rectangle.PNG - gtdbtk.backbone.bac120.classify-trimmed.tree - Image
    • gtdbtk.backbone.bac120.classify-trimmed.tree-rectangle.PDF - gtdbtk.backbone.bac120.classify-trimmed.tree - Image
    • gtdbtk.backbone.bac120.classify-trimmed.tree-circle.PNG - gtdbtk.backbone.bac120.classify-trimmed.tree - Image
    • gtdbtk.backbone.bac120.classify-trimmed.tree-circle.PDF - gtdbtk.backbone.bac120.classify-trimmed.tree - Image
    • gtdbtk.backbone.bac120.classify-trimmed.tree-circle-ultrametric.PNG - gtdbtk.backbone.bac120.classify-trimmed.tree - Image
    • gtdbtk.backbone.bac120.classify-trimmed.tree-circle-ultrametric.PDF - gtdbtk.backbone.bac120.classify-trimmed.tree - Image
    • gtdbtk.bac120.classify.tree.6-proximals.tree - gtdbtk.bac120.classify.tree.6-proximals.tree - Newick
    • gtdbtk.bac120.classify.tree.6-trimmed.tree - gtdbtk.bac120.classify.tree.6-trimmed.tree - Newick
    • gtdbtk.bac120.classify.tree.6-lineages.map - gtdbtk.bac120.classify.tree.6-lineages.map - GTDB lineage
    • gtdbtk.bac120.classify.tree.6-trimmed.tree-rectangle.PNG - gtdbtk.bac120.classify.tree.6-trimmed.tree - Image
    • gtdbtk.bac120.classify.tree.6-trimmed.tree-rectangle.PDF - gtdbtk.bac120.classify.tree.6-trimmed.tree - Image
    • gtdbtk.bac120.classify.tree.6-trimmed.tree-circle.PNG - gtdbtk.bac120.classify.tree.6-trimmed.tree - Image
    • gtdbtk.bac120.classify.tree.6-trimmed.tree-circle.PDF - gtdbtk.bac120.classify.tree.6-trimmed.tree - Image
    • gtdbtk.bac120.classify.tree.6-trimmed.tree-circle-ultrametric.PNG - gtdbtk.bac120.classify.tree.6-trimmed.tree - Image
    • gtdbtk.bac120.classify.tree.6-trimmed.tree-circle-ultrametric.PDF - gtdbtk.bac120.classify.tree.6-trimmed.tree - Image
    • GTDB-Tk_classify_wf.zip - GTDB-Tk Classify WF output

    9. ANI comparisons

    Allows users to compute fast whole-genome Average Nucleotide Identity (ANI) estimation.
    This app completed without errors in 42s.
    Links
    Allows users to compute fast whole-genome Average Nucleotide Identity (ANI) estimation.
    This app completed without errors in 43s.
    Links

    10. Functional annotation

    Annotate your assemblies, isolate genomes, or MAGs with DRAM and distill resulting annotations to create an interactive functional summary per genome or assembly. Use for KBase assembly objects.
    This app completed without errors in 3h 11m 49s.
    Objects
    Created Object Name Type Description
    Bin.013.fastaS222_HQ_assembly_DRAM Genome Annotated Genome
    Bin.049.fastaS222_HQ_assembly_DRAM Genome Annotated Genome
    Bin.074.fastaS223_HQ_assembly_DRAM Genome Annotated Genome
    Bin.097.fastaS223_HQ_assembly_DRAM Genome Annotated Genome
    HQ_bins GenomeSet HQ bins
    Summary
    Here are the results from your DRAM run.
    Links
    Files
    These are only available in the live Narrative: https://narrative.kbase.us/narrative/206696
    • annotations.tsv - DRAM annotations in a tab separate table format
    • genes.fna - Genes as nucleotides predicted by DRAM with brief annotations
    • genes.faa - Genes as amino acids predicted by DRAM with brief annotations
    • genes.gff - GFF file of all DRAM annotations
    • rrnas.tsv - Tab separated table of rRNAs as detected by barrnap
    • trnas.tsv - Tab separated table of tRNAs as detected by tRNAscan-SE
    • genbank.tar.gz - Compressed folder of output genbank files
    • product.tsv - DRAM product in tabular format
    • metabolism_summary.xlsx - DRAM metabolism summary tables
    • genome_stats.tsv - DRAM genome statistics table

    11. Calculate MAG relative abundance

    Align sequencing reads to long reference prokaryotic genome sequences using Bowtie2.
    This app completed without errors in 1h 19m 34s.
    No output found.
    Align sequencing reads to long reference prokaryotic genome sequences using Bowtie2.
    This app completed without errors in 1h 32m 34s.
    No output found.
    Align sequencing reads to long reference prokaryotic genome sequences using Bowtie2.
    This app completed without errors in 1h 55m 23s.
    No output found.
    Align sequencing reads to long reference prokaryotic genome sequences using Bowtie2.
    This app completed without errors in 1h 23m 24s.
    No output found.

    Apps

    1. Align Reads using Bowtie2 - v2.3.2
      • Langmead B, Salzberg SL. Fast gapped-read alignment with Bowtie 2. Nat Methods. 2012;9: 357 359. doi:10.1038/nmeth.1923
      • Langmead B, Trapnell C, Pop M, Salzberg SL. Ultrafast and memory-efficient alignment of short DNA sequences to the human genome. Genome Biol. 2009;10: R25. doi:10.1186/gb-2009-10-3-r25
    2. Annotate and Distill Assemblies with DRAM
      • DRAM source code
      • DRAM documentation
      • DRAM Tutorial
      • DRAM publication
    3. Assemble Reads with MEGAHIT v1.2.9
      • Li D, Liu C-M, Luo R, Sadakane K, Lam T-W. MEGAHIT: an ultra-fast single-node solution for large and complex metagenomics assembly via succinct de Bruijn graph. Bioinformatics. 2015;31: 1674 1676. doi:10.1093/bioinformatics/btv033
    4. Assess Genome Quality with CheckM - v1.0.18
      • Parks DH, Imelfort M, Skennerton CT, Hugenholtz P, Tyson GW. CheckM: assessing the quality of microbial genomes recovered from isolates, single cells, and metagenomes. Genome Res. 2015;25: 1043 1055. doi:10.1101/gr.186072.114
      • CheckM source:
      • Additional info:
    5. Assess Read Quality with FastQC - v0.12.1
      • FastQC source: Bioinformatics Group at the Babraham Institute, UK.
    6. Bin Contigs using CONCOCT - v1.1
      • Alneberg J, Bjarnason BS, de Bruijn I, Schirmer M, Quick J, Ijaz UZ, Lahti L, Loman NJ, Andersson AF, Quince C. Binning metagenomic contigs by coverage and composition. Nature Methods. 2014;11: 1144-1146. doi:10.1038/nmeth.3103
      • CONCOCT source:
    7. Build AssemblySet - v1.0.1
      • Chivian D, Jungbluth SP, Dehal PS, Wood-Charlson EM, Canon RS, Allen BH, Clark MM, Gu T, Land ML, Price GA, Riehl WJ, Sneddon MW, Sutormin R, Zhang Q, Cottingham RW, Henry CS, Arkin AP. Metagenome-assembled genome extraction and analysis from microbiomes using KBase. Nat Protoc. 2023 Jan;18(1):208-238. doi: 10.1038/s41596-022-00747-x
      • Arkin AP, Cottingham RW, Henry CS, Harris NL, Stevens RL, Maslov S, et al. KBase: The United States Department of Energy Systems Biology Knowledgebase. Nature Biotechnology. 2018;36: 566. doi: 10.1038/nbt.4163
    8. Calculate Pangenome with mOTUpan - v0.3.2
      • Moritz Buck, Maliheh Mehrshad, Stefan Bertilsson. mOTUpan: a robust Bayesian approach to leverage metagenome-assembled genomes for core-genome estimation. NAR Genom Bioinform. 2022 Aug 15;4(3):lqac060. doi: 10.1093/nargab/lqac060.
      • Martin Steinegger, Johannes S ding. MMseqs2 enables sensitive protein sequence searching for the analysis of massive data sets. Nat Biotechnol. 2017 Nov;35(11):1026-1028. doi: 10.1038/nbt.3988
    9. Classify Microbes with GTDB-Tk - v2.3.2
      • Pierre-Alain Chaumeil, Aaron J Mussig, Philip Hugenholtz, Donovan H Parks. GTDB-Tk v2: memory friendly classification with the genome taxonomy database. Bioinformatics, Volume 38, Issue 23, 1 December 2022, Pages 5315 5316. DOI: https://doi.org/10.1093/bioinformatics/btac672
      • Pierre-Alain Chaumeil, Aaron J Mussig, Philip Hugenholtz, Donovan H Parks, GTDB-Tk: a toolkit to classify genomes with the Genome Taxonomy Database, Bioinformatics, Volume 36, Issue 6, 15 March 2020, Pages 1925 1927. DOI: https://doi.org/10.1093/bioinformatics/btz848
      • Donovan H Parks, Maria Chuvochina, Christian Rinke, Aaron J Mussig, Pierre-Alain Chaumeil, Philip Hugenholtz. GTDB: an ongoing census of bacterial and archaeal diversity through a phylogenetically consistent, rank normalized and complete genome-based taxonomy. Nucleic Acids Research, Volume 50, Issue D1, 7 January 2022, Pages D785 D794. DOI: https://doi.org/10.1093/nar/gkab776
      • Parks, D., Chuvochina, M., Waite, D. et al. A standardized bacterial taxonomy based on genome phylogeny substantially revises the tree of life. Nat Biotechnol 36, 996 1004 (2018). DOI: https://doi.org/10.1038/nbt.4229
      • Parks DH, Chuvochina M, Chaumeil PA, Rinke C, Mussig AJ, Hugenholtz P. A complete domain-to-species taxonomy for Bacteria and Archaea. Nat Biotechnol. 2020;10.1038/s41587-020-0501-8. DOI:10.1038/s41587-020-0501-8
      • Rinke C, Chuvochina M, Mussig AJ, Chaumeil PA, Dav n AA, Waite DW, Whitman WB, Parks DH, and Hugenholtz P. A standardized archaeal taxonomy for the Genome Taxonomy Database. Nat Microbiol. 2021 Jul;6(7):946-959. DOI:10.1038/s41564-021-00918-8
      • Chivian D, Jungbluth SP, Dehal PS, Wood-Charlson EM, Canon RS, Allen BH, Clark MM, Gu T, Land ML, Price GA, Riehl WJ, Sneddon MW, Sutormin R, Zhang Q, Cottingham RW, Henry CS, Arkin AP. Metagenome-assembled genome extraction and analysis from microbiomes using KBase. Nat Protoc. 2023 Jan;18(1):208-238. doi: 10.1038/s41596-022-00747-x
      • Matsen FA, Kodner RB, Armbrust EV. pplacer: linear time maximum-likelihood and Bayesian phylogenetic placement of sequences onto a fixed reference tree. BMC Bioinformatics. 2010;11:538. Published 2010 Oct 30. doi:10.1186/1471-2105-11-538
      • Jain C, Rodriguez-R LM, Phillippy AM, Konstantinidis KT, Aluru S. High throughput ANI analysis of 90K prokaryotic genomes reveals clear species boundaries. Nat Commun. 2018;9(1):5114. Published 2018 Nov 30. DOI:10.1038/s41467-018-07641-9
      • Hyatt D, Chen GL, Locascio PF, Land ML, Larimer FW, Hauser LJ. Prodigal: prokaryotic gene recognition and translation initiation site identification. BMC Bioinformatics. 2010;11:119. Published 2010 Mar 8. DOI:10.1186/1471-2105-11-119
      • Price MN, Dehal PS, Arkin AP. FastTree 2--approximately maximum-likelihood trees for large alignments. PLoS One. 2010;5(3):e9490. Published 2010 Mar 10. DOI:10.1371/journal.pone.0009490 link: https://www.ncbi.nlm.nih.gov/pmc/articles/PMC2835736/
      • Eddy SR. Accelerated Profile HMM Searches. PLoS Comput Biol. 2011;7(10):e1002195. DOI:10.1371/journal.pcbi.1002195
      • Ondov BD, Treangen TJ, Melsted P, Mallonee AB, Bergman NH, Koren S, Phillippy AM. Mash: fast genome and metagenome distance estimation using MinHash. Genome Biol. 2016 Jun 20;17(1):132. DOI: 10.1186/s13059-016-0997-x
    10. Classify Taxonomy of Metagenomic Reads with Kaiju - v1.9.0
      • Chivian D, et al. Metagenome-assembled genome extraction and analysis from microbiomes using KBase. Nat Protoc. 2023 Jan;18(1):208-238. doi: 10.1038/s41596-022-00747-x
      • Menzel P, Ng KL, Krogh A. Fast and sensitive taxonomic classification for metagenomics with Kaiju. Nat Commun. 2016;7: 11257. doi:10.1038/ncomms11257
      • Ondov BD, Bergman NH, Phillippy AM. Interactive metagenomic visualization in a Web browser. BMC Bioinformatics. 2011;12: 385. doi:10.1186/1471-2105-12-385
      • Kaiju Homepage:
      • Kaiju DBs from:
      • Github for Kaiju:
      • Krona homepage:
      • Github for Krona:
    11. Compute ANI with FastANI
      • [1] Jain C, Rodriguez-R LM, Phillippy AM, Konstantinidis KT, Aluru S. High-throughput ANI Analysis of 90K Prokaryotic Genomes Reveals Clear Species Boundaries. 2017; doi:10.1101/225342
      • [2] Goris J, Konstantinidis KT, Klappenbach JA, Coenye T, Vandamme P, Tiedje JM. DNA-DNA hybridization values and their relationship to whole-genome sequence similarities. Int J Syst Evol Microbiol. 2007;57: 81 91. doi:10.1099/ijs.0.64483-0
      • FastANI module and source code:
    12. Extract Bins as Assemblies from BinnedContigs - v1.0.2
      • Arkin AP, Cottingham RW, Henry CS, Harris NL, Stevens RL, Maslov S, et al. KBase: The United States Department of Energy Systems Biology Knowledgebase. Nature Biotechnology. 2018;36: 566. doi: 10.1038/nbt.4163
    13. Filter Bins by Quality with CheckM - v1.0.18
      • Parks DH, Imelfort M, Skennerton CT, Hugenholtz P, Tyson GW. CheckM: assessing the quality of microbial genomes recovered from isolates, single cells, and metagenomes. Genome Res. 2015;25: 1043 1055. doi:10.1101/gr.186072.114
      • CheckM source:
      • Additional info:
    14. Trim Reads with Trimmomatic - v0.39
      • Bolger AM, Lohse M, Usadel B. Trimmomatic: a flexible trimmer for Illumina sequence data. Bioinformatics. 2014;30: 2114 2120. doi:10.1093/bioinformatics/btu170