Abstract / Description¶

Upon investigating the gut microbiome of mice for microbes related to Polycystic Ovary Syndrome (PCOS), our team found 2 metagenomes that could not be classified using alignment based methods. To further investigate microbial species from these samples, we used common MAGs workflow to create draft genomes which were then later taxonomically classified using GTDB-tk app.

from biokbase.narrative.jobs.appmanager import AppManager
AppManager().run_app_batch(
    [{
        "app_id": "kb_uploadmethods/import_fasta_as_assembly_from_staging",
        "tag": "release",
        "version": "d67ff71a675aed5566d257c267689ea0d2a4a8b0",
        "params": [{
            "staging_file_subdir_path": "scaffolds.fasta",
            "assembly_name": "scaffolds.fasta_assembly"
        }],
        "shared_params": {
            "type": "draft isolate",
            "min_contig_length": 500
        }
    }, {
        "app_id": "kb_uploadmethods/import_fastq_noninterleaved_as_reads_from_staging",
        "tag": "release",
        "version": "d67ff71a675aed5566d257c267689ea0d2a4a8b0",
        "params": [{
            "fastq_fwd_staging_file_name": "mouse_132_T4_S87_L001_R1_001.fastq.gz",
            "fastq_rev_staging_file_name": "mouse_132_T4_S87_L001_R2_001.fastq.gz",
            "name": "mouse_132"
        }, {
            "fastq_fwd_staging_file_name": "mouse_112_T4_S86_L001_R1_001.fastq.gz",
            "fastq_rev_staging_file_name": "mouse_112_T4_S86_L001_R2_001.fastq.gz",
            "name": "mouse_112"
        }],
        "shared_params": {
            "sequencing_tech": "Illumina",
            "single_genome": 1,
            "read_orientation_outward": 0,
            "insert_size_std_dev": None,
            "insert_size_mean": None
        }
    }],
    cell_id="e7f10ab6-212f-4b49-b542-fed7f2700d3f",
    run_id="7aef9407-2a90-4ba9-8a1e-200eff598ba2"
)

Binning Contigs¶

The reassembly of metagenomes were conducted outside of kbase using SPAdes software. To bin the contigs we used MaxBin2 and to refine the bins produced from Maxbin2 we used DASTool. The reason we decided to refine the bins, was to get higher quality draft genomes and remove any dubplicate contigs in each bin.

Annotating MAGs¶

After binning process, we annotated our bins using RASTtk. Annotations were then later used to determine if each bin contained the essential components for a living organism.

Phylogenetic Analysis¶

To determine which bins are related to each other we inserted the annotated bins into a phylogenetic tree. KBase also looks at the annotations and compares bins to their own database, if a bin matches a genome that is already classified they insert that organism into the tree to show how well the bins relate to that taxonomically classified organism

Creating Assemblies¶

To use GTDBtk app we needed to extract the bins as assemblies.

Taxonomic Classification¶

The taxonomic classification was done using GTDB app. As we can see there are many bins that were not able to be completely classified. There also bins that seem to not have any close relavent taxonomic classes (i.e GTDB does not know what these bins are) and only show the lowest taxon level based on protein markers of contigs found in the annotated bins.

Assessing Genome Quality¶

To determine the quality of our draft genomes we used CheckM. The results from CheckM and GTDBtk app where then used to do further analysis on a select few (n=5) genomes. The process of picking these genomes were discussed among researchers, and criteria that we used were CheckM completeness, FastANI, and frequency of contigs seen in metagenome (This analysis was done using Salmon).

Visualizing Genomes and Pathways¶

After selecting five binned assemblys, we checked each bin to determine if they contained the necessary metabolic components to sustain living organisms. To do this we used DRAM, which gives us a visual representation of which metabolic pathways and enzymes in those pathways are present in our bins. To visualize the annotated bins we used circular genome visualization tool, which shows which regions on the contigs are CDS, GC skew, etc.....

Acknowledgements¶

Special thanks to Alex Handzel for choosing the samples to run the analysis and Laura Sisk-Hackworth for DNA prep of metagenomes. Funding for the project was through NIH grant.

References¶

Yu-Wei Wu, Blake A. Simmons, Steven W. Singer, MaxBin 2.0: an automated binning algorithm to recover genomes from multiple metagenomic datasets, Bioinformatics, Volume 32, Issue 4, 15 February 2016, Pages 605–607, https://doi.org/10.1093/bioinformatics/btv638
Sieber, C.M.K., Probst, A.J., Sharrar, A. et al. Recovery of genomes from metagenomes via a dereplication, aggregation and scoring strategy. Nat Microbiol 3, 836–843 (2018). https://doi.org/10.1038/s41564-018-0171-1
Patro, R., Duggal, G., Love, M. I., Irizarry, R. A., & Kingsford, C. (2017). Salmon provides fast and bias-aware quantification of transcript expression. Nature Methods.
Pierre-Alain Chaumeil, Aaron J Mussig, Philip Hugenholtz, Donovan H Parks, GTDB-Tk: a toolkit to classify genomes with the Genome Taxonomy Database, Bioinformatics, Volume 36, Issue 6, 15 March 2020, Pages 1925–1927,
Parks DH, Imelfort M, Skennerton CT, Hugenholtz P, Tyson GW. 2014. Assessing the quality of microbial genomes recovered from isolates, single cells, and metagenomes. Genome Research, 25: 1043-1055.
Michael Shaffer, Mikayla A Borton, Bridget B McGivern, Ahmed A Zayed, Sabina Leanti La Rosa, Lindsey M Solden, Pengfei Liu, Adrienne B Narrowe, Josué Rodríguez-Ramos, Benjamin Bolduc, M Consuelo Gazitúa, Rebecca A Daly, Garrett J Smith, Dean R Vik, Phil B Pope, Matthew B Sullivan, Simon Roux, Kelly C Wrighton, DRAM for distilling microbial metabolism to automate the curation of microbiome function, Nucleic Acids Research, Volume 48, Issue 16, 18 September 2020, Pages 8883–8900, https://doi.org/10.1093/nar/gkaa621
Stothard P, Wishart DS. Circular genome visualization and exploration using CGView. Bioinformatics. 2005 Feb 15;21(4):537-9. doi: 10.1093/bioinformatics/bti054. Epub 2004 Oct 12. PMID: 15479716.
Price MN, Dehal PS, Arkin AP. FastTree 2--approximately maximum-likelihood trees for large alignments. PLoS One. 2010 Mar 10;5(3):e9490. doi: 10.1371/journal.pone.0009490. PMID: 20224823; PMCID: PMC2835736.
Aziz, R.K., Bartels, D., Best, A.A. et al. The RAST Server: Rapid Annotations using Subsystems Technology. BMC Genomics 9, 75 (2008). https://doi.org/10.1186/1471-2164-9-75
Arkin, A., Cottingham, R., Henry, C. et al. KBase: The United States Department of Energy Systems Biology Knowledgebase. Nat Biotechnol 36, 566–569 (2018). https://doi.org/10.1038/nbt.4163

Created Object Name	Type	Description
bin.016.fasta_assembly.RAST	Genome	RAST annotation
bin.051.fasta_assembly.RAST	Genome	RAST annotation
bin.007.fasta_assembly.RAST	Genome	RAST annotation
bin.046.fasta_assembly.RAST	Genome	RAST annotation
bin.037.fasta_assembly.RAST	Genome	RAST annotation
bin.015.fasta_assembly.RAST	Genome	RAST annotation
bin.019.fasta_assembly.RAST	Genome	RAST annotation
bin.022.fasta_assembly.RAST	Genome	RAST annotation
bin.047.fasta_assembly.RAST	Genome	RAST annotation
bin.042.fasta_assembly.RAST	Genome	RAST annotation
bin.024.fasta_assembly.RAST	Genome	RAST annotation
bin.005.fasta_assembly.RAST	Genome	RAST annotation
bin.025.fasta_assembly.RAST	Genome	RAST annotation
bin.006.fasta_assembly.RAST	Genome	RAST annotation
bin.028.fasta_assembly.RAST	Genome	RAST annotation
bin.017.fasta_assembly.RAST	Genome	RAST annotation
bin.032.fasta_assembly.RAST	Genome	RAST annotation
bin.044.fasta_assembly.RAST	Genome	RAST annotation
bin.030.fasta_assembly.RAST	Genome	RAST annotation
bin.010.fasta_assembly.RAST	Genome	RAST annotation
bin.012.fasta_assembly.RAST	Genome	RAST annotation
bin.021.fasta_assembly.RAST	Genome	RAST annotation
bin.023.fasta_assembly.RAST	Genome	RAST annotation
bin.008.fasta_assembly.RAST	Genome	RAST annotation
bin.001.fasta_assembly.RAST	Genome	RAST annotation
bin.050.fasta_assembly.RAST	Genome	RAST annotation
bin.035.fasta_assembly.RAST	Genome	RAST annotation
bin.038.fasta_assembly.RAST	Genome	RAST annotation
bin.011.fasta_assembly.RAST	Genome	RAST annotation
bin.002.fasta_assembly.RAST	Genome	RAST annotation
bin.039.fasta_assembly.RAST	Genome	RAST annotation
bin.034.fasta_assembly.RAST	Genome	RAST annotation
bin.018.fasta_assembly.RAST	Genome	RAST annotation
bin.004.fasta_assembly.RAST	Genome	RAST annotation
bin.031.fasta_assembly.RAST	Genome	RAST annotation
bin.040.fasta_assembly.RAST	Genome	RAST annotation
bin.048.fasta_assembly.RAST	Genome	RAST annotation
bin.009.fasta_assembly.RAST	Genome	RAST annotation
bin.029.fasta_assembly.RAST	Genome	RAST annotation
bin.033.fasta_assembly.RAST	Genome	RAST annotation
bin.043.fasta_assembly.RAST	Genome	RAST annotation
bin.003.fasta_assembly.RAST	Genome	RAST annotation
bin.026.fasta_assembly.RAST	Genome	RAST annotation
bin.014.fasta_assembly.RAST	Genome	RAST annotation
bin.013.fasta_assembly.RAST	Genome	RAST annotation
bin.020.fasta_assembly.RAST	Genome	RAST annotation
bin.049.fasta_assembly.RAST	Genome	RAST annotation
bin.041.fasta_assembly.RAST	Genome	RAST annotation
bin.045.fasta_assembly.RAST	Genome	RAST annotation
bin.027.fasta_assembly.RAST	Genome	RAST annotation
bin.036.fasta_assembly.RAST	Genome	RAST annotation
Anotated_assembly	GenomeSet	Genome Set

Created Object Name	Type	Description
extracted_bins.AssemblySet	AssemblySet	Assembly set of extracted assemblies
bin.038.fasta_assembly	Assembly	Assembly object of extracted contigs
bin.007.fasta_assembly	Assembly	Assembly object of extracted contigs
bin.035.fasta_assembly	Assembly	Assembly object of extracted contigs
bin.051.fasta_assembly	Assembly	Assembly object of extracted contigs
bin.050.fasta_assembly	Assembly	Assembly object of extracted contigs
bin.016.fasta_assembly	Assembly	Assembly object of extracted contigs
bin.029.fasta_assembly	Assembly	Assembly object of extracted contigs
bin.044.fasta_assembly	Assembly	Assembly object of extracted contigs
bin.014.fasta_assembly	Assembly	Assembly object of extracted contigs
bin.032.fasta_assembly	Assembly	Assembly object of extracted contigs
bin.026.fasta_assembly	Assembly	Assembly object of extracted contigs
bin.017.fasta_assembly	Assembly	Assembly object of extracted contigs
bin.003.fasta_assembly	Assembly	Assembly object of extracted contigs
bin.028.fasta_assembly	Assembly	Assembly object of extracted contigs
bin.043.fasta_assembly	Assembly	Assembly object of extracted contigs
bin.006.fasta_assembly	Assembly	Assembly object of extracted contigs
bin.033.fasta_assembly	Assembly	Assembly object of extracted contigs
bin.002.fasta_assembly	Assembly	Assembly object of extracted contigs
bin.019.fasta_assembly	Assembly	Assembly object of extracted contigs
bin.034.fasta_assembly	Assembly	Assembly object of extracted contigs
bin.047.fasta_assembly	Assembly	Assembly object of extracted contigs
bin.004.fasta_assembly	Assembly	Assembly object of extracted contigs
bin.024.fasta_assembly	Assembly	Assembly object of extracted contigs
bin.040.fasta_assembly	Assembly	Assembly object of extracted contigs
bin.005.fasta_assembly	Assembly	Assembly object of extracted contigs
bin.048.fasta_assembly	Assembly	Assembly object of extracted contigs
bin.025.fasta_assembly	Assembly	Assembly object of extracted contigs
bin.020.fasta_assembly	Assembly	Assembly object of extracted contigs
bin.012.fasta_assembly	Assembly	Assembly object of extracted contigs
bin.041.fasta_assembly	Assembly	Assembly object of extracted contigs
bin.023.fasta_assembly	Assembly	Assembly object of extracted contigs
bin.027.fasta_assembly	Assembly	Assembly object of extracted contigs
bin.008.fasta_assembly	Assembly	Assembly object of extracted contigs
bin.036.fasta_assembly	Assembly	Assembly object of extracted contigs
bin.001.fasta_assembly	Assembly	Assembly object of extracted contigs
bin.046.fasta_assembly	Assembly	Assembly object of extracted contigs
bin.011.fasta_assembly	Assembly	Assembly object of extracted contigs
bin.037.fasta_assembly	Assembly	Assembly object of extracted contigs
bin.039.fasta_assembly	Assembly	Assembly object of extracted contigs
bin.015.fasta_assembly	Assembly	Assembly object of extracted contigs
bin.018.fasta_assembly	Assembly	Assembly object of extracted contigs
bin.022.fasta_assembly	Assembly	Assembly object of extracted contigs
bin.031.fasta_assembly	Assembly	Assembly object of extracted contigs
bin.042.fasta_assembly	Assembly	Assembly object of extracted contigs
bin.009.fasta_assembly	Assembly	Assembly object of extracted contigs
bin.030.fasta_assembly	Assembly	Assembly object of extracted contigs
bin.013.fasta_assembly	Assembly	Assembly object of extracted contigs
bin.010.fasta_assembly	Assembly	Assembly object of extracted contigs
bin.049.fasta_assembly	Assembly	Assembly object of extracted contigs
bin.021.fasta_assembly	Assembly	Assembly object of extracted contigs
bin.045.fasta_assembly	Assembly	Assembly object of extracted contigs

Created Object Name	Type	Description
bin.049.fasta_assembly_DRAM	Genome	Annotated Genome
Alistipes_Draft_Genome	GenomeSet	bin_049_Alistipes