Generated May 14, 2021

Continued from the KBase narrative "JGI QC impact on assembly, binning, phylogenomics, and functional analysis" [34]:

Please also reference the journal article:

Trimming and decontamination of metagenomic data can significantly impact assembly and binning metrics, phylogenomic and functional analysis

Jason M. Whitham and Amy M. Grunden, 2021

[email protected]✉ and [email protected]

North Carolina State University, 4550A Thomas Hall, Box 7615, Raleigh NC, 27695, United States of America

Modules used in 10158.6*fastq processing

Import a FASTQ/SRA file into your Narrative as a Reads data object
This app completed without errors in 15h 53m 41s.
Objects
Created Object Name Type Description
10158.6.trim150.fq_reads PairedEndLibrary Imported Reads
Links
Assemble metagenomic reads using the MEGAHIT assembler.
This app completed without errors in 2d 7h 17m 60s.
Objects
Created Object Name Type Description
10158.6.trim150.MEGAHIT.assembly Assembly Assembled contigs
Summary
ContigSet saved to: jmwhitham:narrative_1605805694221/10158.6.trim150.MEGAHIT.assembly Assembled into 1006971 contigs. Avg Length: 1821.589106339706 bp. Contig Length Distribution (# of contigs -- min to max basepairs): 1006825 -- 1000.0 to 62452.4 bp 130 -- 62452.4 to 123904.8 bp 11 -- 123904.8 to 185357.2 bp 2 -- 185357.2 to 246809.6 bp 1 -- 246809.6 to 308262.0 bp 0 -- 308262.0 to 369714.4 bp 0 -- 369714.4 to 431166.8 bp 1 -- 431166.8 to 492619.2 bp 0 -- 492619.2 to 554071.6 bp 1 -- 554071.6 to 615524.0 bp
Links
Bin metagenomic contigs
This app completed without errors in 10h 37m 51s.
Links
Files
These are only available in the live Narrative: https://narrative.kbase.us/narrative/77705
  • metabat_result.zip - Files generated by MetaBAT2 App
Output from MetaBAT2 Contig Binning - v1.7
The viewer for the output created by this App is available at the original Narrative here: https://narrative.kbase.us/narrative/77705
Runs the CheckM lineage workflow to assess the genome quality of isolates, single cells, or genome bins from metagenome assemblies through comparison to an existing database of genomes.
This app completed without errors in 1h 33m 10s.
Links
Files
These are only available in the live Narrative: https://narrative.kbase.us/narrative/77705
  • CheckM_summary_table.tsv.zip - TSV Summary Table from CheckM
  • full_output.zip - Full output of CheckM
  • plots.zip - Output plots from CheckM

Import a FASTQ/SRA file into your Narrative as a Reads data object
This app completed without errors in 5h 27m 13s.
Objects
Created Object Name Type Description
10158.6.ftrimmed.fq_reads PairedEndLibrary Imported Reads
Links
Assemble metagenomic reads using the MEGAHIT assembler.
This app completed without errors in 1d 21h 37m 57s.
Objects
Created Object Name Type Description
10158.6.ftrimmed.MEGAHIT.assembly Assembly Assembled contigs
Summary
ContigSet saved to: jmwhitham:narrative_1605805694221/10158.6.ftrimmed.MEGAHIT.assembly Assembled into 855055 contigs. Avg Length: 1842.7823286221353 bp. Contig Length Distribution (# of contigs -- min to max basepairs): 854876 -- 1000.0 to 55726.7 bp 159 -- 55726.7 to 110453.4 bp 15 -- 110453.4 to 165180.09999999998 bp 2 -- 165180.09999999998 to 219906.8 bp 1 -- 219906.8 to 274633.5 bp 0 -- 274633.5 to 329360.19999999995 bp 0 -- 329360.19999999995 to 384086.89999999997 bp 1 -- 384086.89999999997 to 438813.6 bp 0 -- 438813.6 to 493540.3 bp 1 -- 493540.3 to 548267.0 bp
Links
Bin metagenomic contigs
This app completed without errors in 9h 37m 39s.
Links
Files
These are only available in the live Narrative: https://narrative.kbase.us/narrative/77705
  • metabat_result.zip - Files generated by MetaBAT2 App
Output from MetaBAT2 Contig Binning - v1.7
The viewer for the output created by this App is available at the original Narrative here: https://narrative.kbase.us/narrative/77705
Runs the CheckM lineage workflow to assess the genome quality of isolates, single cells, or genome bins from metagenome assemblies through comparison to an existing database of genomes.
This app completed without errors in 1h 26m 13s.
Links
Files
These are only available in the live Narrative: https://narrative.kbase.us/narrative/77705
  • CheckM_summary_table.tsv.zip - TSV Summary Table from CheckM
  • full_output.zip - Full output of CheckM
  • plots.zip - Output plots from CheckM
Import a FASTQ/SRA file into your Narrative as a Reads data object
This app completed without errors in 5h 22m 10s.
Objects
Created Object Name Type Description
10158.6.ktrimmed.fq_reads PairedEndLibrary Imported Reads
Links
Assemble metagenomic reads using the MEGAHIT assembler.
This app completed without errors in 1d 21h 40m 59s.
Objects
Created Object Name Type Description
10158.6.ktrimmed.MEGAHIT.assembly Assembly Assembled contigs
Summary
ContigSet saved to: jmwhitham:narrative_1605805694221/10158.6.ktrimmed.MEGAHIT.assembly Assembled into 844082 contigs. Avg Length: 1845.9901301058428 bp. Contig Length Distribution (# of contigs -- min to max basepairs): 843892 -- 1000.0 to 55726.7 bp 167 -- 55726.7 to 110453.4 bp 19 -- 110453.4 to 165180.09999999998 bp 1 -- 165180.09999999998 to 219906.8 bp 1 -- 219906.8 to 274633.5 bp 0 -- 274633.5 to 329360.19999999995 bp 0 -- 329360.19999999995 to 384086.89999999997 bp 1 -- 384086.89999999997 to 438813.6 bp 0 -- 438813.6 to 493540.3 bp 1 -- 493540.3 to 548267.0 bp
Links
Bin metagenomic contigs
This app completed without errors in 10h 19m 10s.
Links
Files
These are only available in the live Narrative: https://narrative.kbase.us/narrative/77705
  • metabat_result.zip - Files generated by MetaBAT2 App
Output from MetaBAT2 Contig Binning - v1.7
The viewer for the output created by this App is available at the original Narrative here: https://narrative.kbase.us/narrative/77705
Runs the CheckM lineage workflow to assess the genome quality of isolates, single cells, or genome bins from metagenome assemblies through comparison to an existing database of genomes.
This app completed without errors in 1h 33m 34s.
Links
Files
These are only available in the live Narrative: https://narrative.kbase.us/narrative/77705
  • CheckM_summary_table.tsv.zip - TSV Summary Table from CheckM
  • full_output.zip - Full output of CheckM
  • plots.zip - Output plots from CheckM
Import a FASTQ/SRA file into your Narrative as a Reads data object
This app completed without errors in 8h 57m 15s.
Objects
Created Object Name Type Description
10158.6.atrimmed.fq_reads PairedEndLibrary Imported Reads
Links
Assemble metagenomic reads using the MEGAHIT assembler.
This app completed without errors in 2d 7h 39m 34s.
Objects
Created Object Name Type Description
10158.6.atrimmed.MEGAHIT.assembly Assembly Assembled contigs
Summary
ContigSet saved to: jmwhitham:narrative_1605805694221/10158.6.atrimmed.MEGAHIT.assembly Assembled into 1021175 contigs. Avg Length: 1819.6556045731634 bp. Contig Length Distribution (# of contigs -- min to max basepairs): 1020963 -- 1000.0 to 54222.6 bp 181 -- 54222.6 to 107445.2 bp 28 -- 107445.2 to 160667.8 bp 1 -- 160667.8 to 213890.4 bp 0 -- 213890.4 to 267113.0 bp 0 -- 267113.0 to 320335.6 bp 0 -- 320335.6 to 373558.2 bp 0 -- 373558.2 to 426780.8 bp 1 -- 426780.8 to 480003.39999999997 bp 1 -- 480003.39999999997 to 533226.0 bp
Links
Bin metagenomic contigs
This app completed without errors in 9h 34m 55s.
Links
Files
These are only available in the live Narrative: https://narrative.kbase.us/narrative/77705
  • metabat_result.zip - Files generated by MetaBAT2 App
Output from MetaBAT2 Contig Binning - v1.7
The viewer for the output created by this App is available at the original Narrative here: https://narrative.kbase.us/narrative/77705
Runs the CheckM lineage workflow to assess the genome quality of isolates, single cells, or genome bins from metagenome assemblies through comparison to an existing database of genomes.
This app completed without errors in 1h 30m 60s.
Links
Files
These are only available in the live Narrative: https://narrative.kbase.us/narrative/77705
  • CheckM_summary_table.tsv.zip - TSV Summary Table from CheckM
  • full_output.zip - Full output of CheckM
  • plots.zip - Output plots from CheckM
Import a FASTQ/SRA file into your Narrative as a Reads data object
This app completed without errors in 2d 1h 11m 7s.
Objects
Created Object Name Type Description
10158.6.aqbtrimmed.fq_reads PairedEndLibrary Imported Reads
Links
Assemble metagenomic reads using the MEGAHIT assembler.
This app completed without errors in 2d 0h 14m 9s.
Objects
Created Object Name Type Description
10158.6.aqbtrimmed.MEGAHIT.assembly Assembly Assembled contigs
Summary
ContigSet saved to: jmwhitham:narrative_1605805694221/10158.6.aqbtrimmed.MEGAHIT.assembly Assembled into 984656 contigs. Avg Length: 1825.1797927397995 bp. Contig Length Distribution (# of contigs -- min to max basepairs): 984354 -- 1000.0 to 47344.8 bp 262 -- 47344.8 to 93689.6 bp 30 -- 93689.6 to 140034.40000000002 bp 5 -- 140034.40000000002 to 186379.2 bp 2 -- 186379.2 to 232724.0 bp 1 -- 232724.0 to 279068.80000000005 bp 1 -- 279068.80000000005 to 325413.60000000003 bp 0 -- 325413.60000000003 to 371758.4 bp 0 -- 371758.4 to 418103.2 bp 1 -- 418103.2 to 464448.0 bp
Links
Bin metagenomic contigs
This app completed without errors in 8h 57m 25s.
Links
Files
These are only available in the live Narrative: https://narrative.kbase.us/narrative/77705
  • metabat_result.zip - Files generated by MetaBAT2 App
Output from MetaBAT2 Contig Binning - v1.7
The viewer for the output created by this App is available at the original Narrative here: https://narrative.kbase.us/narrative/77705
Runs the CheckM lineage workflow to assess the genome quality of isolates, single cells, or genome bins from metagenome assemblies through comparison to an existing database of genomes.
This app completed without errors in 1h 35m 14s.
Links
Files
These are only available in the live Narrative: https://narrative.kbase.us/narrative/77705
  • CheckM_summary_table.tsv.zip - TSV Summary Table from CheckM
  • full_output.zip - Full output of CheckM
  • plots.zip - Output plots from CheckM
Import a FASTQ/SRA file into your Narrative as a Reads data object
This app completed without errors in 4h 37m 7s.
Objects
Created Object Name Type Description
10158.6.aqtrimmed.fq_reads PairedEndLibrary Imported Reads
Links
Assemble metagenomic reads using the MEGAHIT assembler.
This app completed without errors in 1d 19h 15m 18s.
Objects
Created Object Name Type Description
10158.6.aqtrimmed.MEGAHIT.assembly Assembly Assembled contigs
Summary
ContigSet saved to: jmwhitham:narrative_1605805694221/10158.6.aqtrimmed.MEGAHIT.assembly Assembled into 847260 contigs. Avg Length: 1826.5963564903336 bp. Contig Length Distribution (# of contigs -- min to max basepairs): 847078 -- 1000.0 to 54222.6 bp 160 -- 54222.6 to 107445.2 bp 19 -- 107445.2 to 160667.8 bp 1 -- 160667.8 to 213890.4 bp 0 -- 213890.4 to 267113.0 bp 0 -- 267113.0 to 320335.6 bp 0 -- 320335.6 to 373558.2 bp 0 -- 373558.2 to 426780.8 bp 1 -- 426780.8 to 480003.39999999997 bp 1 -- 480003.39999999997 to 533226.0 bp
Links
Bin metagenomic contigs
This app completed without errors in 8h 31m 13s.
Links
Files
These are only available in the live Narrative: https://narrative.kbase.us/narrative/77705
  • metabat_result.zip - Files generated by MetaBAT2 App
Output from MetaBAT2 Contig Binning - v1.7
The viewer for the output created by this App is available at the original Narrative here: https://narrative.kbase.us/narrative/77705
Runs the CheckM lineage workflow to assess the genome quality of isolates, single cells, or genome bins from metagenome assemblies through comparison to an existing database of genomes.
This app completed without errors in 1h 26m 51s.
Links
Files
These are only available in the live Narrative: https://narrative.kbase.us/narrative/77705
  • CheckM_summary_table.tsv.zip - TSV Summary Table from CheckM
  • full_output.zip - Full output of CheckM
  • plots.zip - Output plots from CheckM
Import a FASTQ/SRA file into your Narrative as a Reads data object
This app completed without errors in 15h 34m 31s.
Objects
Created Object Name Type Description
10158.6.qbtrimmed.fq_reads PairedEndLibrary Imported Reads
Links
Assemble metagenomic reads using the MEGAHIT assembler.
This app completed without errors in 1d 22h 34m 41s.
Objects
Created Object Name Type Description
10158.6.qbtrimmed.MEGAHIT.assembly Assembly Assembled contigs
Summary
ContigSet saved to: jmwhitham:narrative_1605805694221/10158.6.qbtrimmed.MEGAHIT.assembly Assembled into 813751 contigs. Avg Length: 1849.031118855768 bp. Contig Length Distribution (# of contigs -- min to max basepairs): 813605 -- 1000.0 to 61787.2 bp 131 -- 61787.2 to 122574.4 bp 10 -- 122574.4 to 183361.59999999998 bp 2 -- 183361.59999999998 to 244148.8 bp 1 -- 244148.8 to 304936.0 bp 0 -- 304936.0 to 365723.19999999995 bp 0 -- 365723.19999999995 to 426510.39999999997 bp 0 -- 426510.39999999997 to 487297.6 bp 1 -- 487297.6 to 548084.7999999999 bp 1 -- 548084.7999999999 to 608872.0 bp
Links
Bin metagenomic contigs
This app completed without errors in 8h 49m 32s.
Links
Files
These are only available in the live Narrative: https://narrative.kbase.us/narrative/77705
  • metabat_result.zip - Files generated by MetaBAT2 App
Output from MetaBAT2 Contig Binning - v1.7
The viewer for the output created by this App is available at the original Narrative here: https://narrative.kbase.us/narrative/77705
Runs the CheckM lineage workflow to assess the genome quality of isolates, single cells, or genome bins from metagenome assemblies through comparison to an existing database of genomes.
This app completed without errors in 1h 21m 28s.
Links
Files
These are only available in the live Narrative: https://narrative.kbase.us/narrative/77705
  • CheckM_summary_table.tsv.zip - TSV Summary Table from CheckM
  • full_output.zip - Full output of CheckM
  • plots.zip - Output plots from CheckM
Import a FASTQ/SRA file into your Narrative as a Reads data object
This app completed without errors in 18h 18m 13s.
Objects
Created Object Name Type Description
10158.6.qtrimmed.fq_reads PairedEndLibrary Imported Reads
Links
Assemble metagenomic reads using the MEGAHIT assembler.
This app completed without errors in 1d 13h 13m 45s.
Objects
Created Object Name Type Description
10158.6.qtrimmed.MEGAHIT.assembly Assembly Assembled contigs
Summary
ContigSet saved to: jmwhitham:narrative_1605805694221/10158.6.qtrimmed.MEGAHIT.assembly Assembled into 696646 contigs. Avg Length: 1847.8684525569658 bp. Contig Length Distribution (# of contigs -- min to max basepairs): 696518 -- 1000.0 to 61871.8 bp 113 -- 61871.8 to 122743.6 bp 11 -- 122743.6 to 183615.40000000002 bp 2 -- 183615.40000000002 to 244487.2 bp 0 -- 244487.2 to 305359.0 bp 1 -- 305359.0 to 366230.80000000005 bp 0 -- 366230.80000000005 to 427102.60000000003 bp 0 -- 427102.60000000003 to 487974.4 bp 0 -- 487974.4 to 548846.2000000001 bp 1 -- 548846.2000000001 to 609718.0 bp
Links
Bin metagenomic contigs
This app completed without errors in 7h 4m 36s.
Links
Files
These are only available in the live Narrative: https://narrative.kbase.us/narrative/77705
  • metabat_result.zip - Files generated by MetaBAT2 App
Output from MetaBAT2 Contig Binning - v1.7
The viewer for the output created by this App is available at the original Narrative here: https://narrative.kbase.us/narrative/77705
Runs the CheckM lineage workflow to assess the genome quality of isolates, single cells, or genome bins from metagenome assemblies through comparison to an existing database of genomes.
This app completed without errors in 1h 12m 25s.
Links
Files
These are only available in the live Narrative: https://narrative.kbase.us/narrative/77705
  • CheckM_summary_table.tsv.zip - TSV Summary Table from CheckM
  • full_output.zip - Full output of CheckM
  • plots.zip - Output plots from CheckM
Import a FASTQ/SRA file into your Narrative as a Reads data object
This app completed without errors in 17h 3m 45s.
Objects
Created Object Name Type Description
10158.6.bb1.fq_reads PairedEndLibrary Imported Reads
Links
Assemble metagenomic reads using the MEGAHIT assembler.
This app completed without errors in 2d 5h 47m 42s.
Objects
Created Object Name Type Description
10158.6.bb1.MEGAHIT.assembly Assembly Assembled contigs
Summary
ContigSet saved to: jmwhitham:narrative_1605805694221/10158.6.bb1.MEGAHIT.assembly Assembled into 1021122 contigs. Avg Length: 1819.693961152536 bp. Contig Length Distribution (# of contigs -- min to max basepairs): 1020910 -- 1000.0 to 54222.6 bp 180 -- 54222.6 to 107445.2 bp 29 -- 107445.2 to 160667.8 bp 1 -- 160667.8 to 213890.4 bp 0 -- 213890.4 to 267113.0 bp 0 -- 267113.0 to 320335.6 bp 0 -- 320335.6 to 373558.2 bp 0 -- 373558.2 to 426780.8 bp 1 -- 426780.8 to 480003.39999999997 bp 1 -- 480003.39999999997 to 533226.0 bp
Links
Bin metagenomic contigs
This app completed without errors in 10h 50m 53s.
Links
Files
These are only available in the live Narrative: https://narrative.kbase.us/narrative/77705
  • metabat_result.zip - Files generated by MetaBAT2 App
Output from MetaBAT2 Contig Binning - v1.7
The viewer for the output created by this App is available at the original Narrative here: https://narrative.kbase.us/narrative/77705
Runs the CheckM lineage workflow to assess the genome quality of isolates, single cells, or genome bins from metagenome assemblies through comparison to an existing database of genomes.
This app completed without errors in 1h 39m 28s.
Links
Files
These are only available in the live Narrative: https://narrative.kbase.us/narrative/77705
  • CheckM_summary_table.tsv.zip - TSV Summary Table from CheckM
  • full_output.zip - Full output of CheckM
  • plots.zip - Output plots from CheckM
Import a FASTQ/SRA file into your Narrative as a Reads data object
This app completed without errors in 18h 17m 30s.
Objects
Created Object Name Type Description
10158.6.bb2.fq_reads PairedEndLibrary Imported Reads
Links
Assemble metagenomic reads using the MEGAHIT assembler.
This app completed without errors in 1d 22h 17m 6s.
Objects
Created Object Name Type Description
10158.6.bb2.MEGAHIT.assembly Assembly Assembled contigs
Summary
ContigSet saved to: jmwhitham:narrative_1605805694221/10158.6.bb2.MEGAHIT.assembly Assembled into 847247 contigs. Avg Length: 1826.5755547083672 bp. Contig Length Distribution (# of contigs -- min to max basepairs): 847065 -- 1000.0 to 54222.6 bp 160 -- 54222.6 to 107445.2 bp 19 -- 107445.2 to 160667.8 bp 1 -- 160667.8 to 213890.4 bp 0 -- 213890.4 to 267113.0 bp 0 -- 267113.0 to 320335.6 bp 0 -- 320335.6 to 373558.2 bp 0 -- 373558.2 to 426780.8 bp 1 -- 426780.8 to 480003.39999999997 bp 1 -- 480003.39999999997 to 533226.0 bp
Links
Bin metagenomic contigs
This app completed without errors in 7h 36m 19s.
Links
Files
These are only available in the live Narrative: https://narrative.kbase.us/narrative/77705
  • metabat_result.zip - Files generated by MetaBAT2 App
Output from MetaBAT2 Contig Binning - v1.7
The viewer for the output created by this App is available at the original Narrative here: https://narrative.kbase.us/narrative/77705
Runs the CheckM lineage workflow to assess the genome quality of isolates, single cells, or genome bins from metagenome assemblies through comparison to an existing database of genomes.
This app completed without errors in 1h 19m 26s.
Links
Files
These are only available in the live Narrative: https://narrative.kbase.us/narrative/77705
  • CheckM_summary_table.tsv.zip - TSV Summary Table from CheckM
  • full_output.zip - Full output of CheckM
  • plots.zip - Output plots from CheckM
Import a FASTQ/SRA file into your Narrative as a Reads data object
This app completed without errors in 12h 42m 44s.
Objects
Created Object Name Type Description
10158.6.bb3.fq_reads PairedEndLibrary Imported Reads
Links
Assemble metagenomic reads using the MEGAHIT assembler.
This app completed without errors in 1d 15h 49m 60s.
Objects
Created Object Name Type Description
10158.6.bb3.MEGAHIT.assembly Assembly Assembled contigs
Summary
ContigSet saved to: jmwhitham:narrative_1605805694221/10158.6.bb3.MEGAHIT.assembly Assembled into 696638 contigs. Avg Length: 1847.8770408734522 bp. Contig Length Distribution (# of contigs -- min to max basepairs): 696510 -- 1000.0 to 61871.8 bp 113 -- 61871.8 to 122743.6 bp 11 -- 122743.6 to 183615.40000000002 bp 2 -- 183615.40000000002 to 244487.2 bp 0 -- 244487.2 to 305359.0 bp 1 -- 305359.0 to 366230.80000000005 bp 0 -- 366230.80000000005 to 427102.60000000003 bp 0 -- 427102.60000000003 to 487974.4 bp 0 -- 487974.4 to 548846.2000000001 bp 1 -- 548846.2000000001 to 609718.0 bp
Links
Bin metagenomic contigs
This app completed without errors in 6h 29m 42s.
Links
Files
These are only available in the live Narrative: https://narrative.kbase.us/narrative/77705
  • metabat_result.zip - Files generated by MetaBAT2 App
Output from MetaBAT2 Contig Binning - v1.7
The viewer for the output created by this App is available at the original Narrative here: https://narrative.kbase.us/narrative/77705
Runs the CheckM lineage workflow to assess the genome quality of isolates, single cells, or genome bins from metagenome assemblies through comparison to an existing database of genomes.
This app completed without errors in 1h 15m 7s.
Links
Files
These are only available in the live Narrative: https://narrative.kbase.us/narrative/77705
  • CheckM_summary_table.tsv.zip - TSV Summary Table from CheckM
  • full_output.zip - Full output of CheckM
  • plots.zip - Output plots from CheckM
Import a FASTQ/SRA file into your Narrative as a Reads data object
This app completed without errors in 15h 33m 25s.
Objects
Created Object Name Type Description
10158.6.bb4.fq_reads PairedEndLibrary Imported Reads
Links
Assemble metagenomic reads using the MEGAHIT assembler.
This app completed without errors in 2d 1h 34m 59s.
Objects
Created Object Name Type Description
10158.6.bb4.MEGAHIT.assembly Assembly Assembled contigs
Summary
ContigSet saved to: jmwhitham:narrative_1605805694221/10158.6.bb4.MEGAHIT.assembly Assembled into 1020095 contigs. Avg Length: 1819.4080404276071 bp. Contig Length Distribution (# of contigs -- min to max basepairs): 1019882 -- 1000.0 to 54122.7 bp 183 -- 54122.7 to 107245.4 bp 25 -- 107245.4 to 160368.09999999998 bp 3 -- 160368.09999999998 to 213490.8 bp 0 -- 213490.8 to 266613.5 bp 0 -- 266613.5 to 319736.19999999995 bp 0 -- 319736.19999999995 to 372858.89999999997 bp 0 -- 372858.89999999997 to 425981.6 bp 1 -- 425981.6 to 479104.3 bp 1 -- 479104.3 to 532227.0 bp
Links
Bin metagenomic contigs
This app completed without errors in 10h 51m 11s.
Links
Files
These are only available in the live Narrative: https://narrative.kbase.us/narrative/77705
  • metabat_result.zip - Files generated by MetaBAT2 App
Output from MetaBAT2 Contig Binning - v1.7
The viewer for the output created by this App is available at the original Narrative here: https://narrative.kbase.us/narrative/77705
Runs the CheckM lineage workflow to assess the genome quality of isolates, single cells, or genome bins from metagenome assemblies through comparison to an existing database of genomes.
This app completed without errors in 1h 26m 47s.
Links
Files
These are only available in the live Narrative: https://narrative.kbase.us/narrative/77705
  • CheckM_summary_table.tsv.zip - TSV Summary Table from CheckM
  • full_output.zip - Full output of CheckM
  • plots.zip - Output plots from CheckM
Import a FASTQ/SRA file into your Narrative as a Reads data object
This app completed without errors in 11h 13m 6s.
Objects
Created Object Name Type Description
10158.6.bb5.fq_reads PairedEndLibrary Imported Reads
Links
Assemble metagenomic reads using the MEGAHIT assembler.
This app completed without errors in 2d 4h 22m 49s.
Objects
Created Object Name Type Description
10158.6.bb5.MEGAHIT.assembly Assembly Assembled contigs
Summary
ContigSet saved to: jmwhitham:narrative_1605805694221/10158.6.bb5.MEGAHIT.assembly Assembled into 984678 contigs. Avg Length: 1825.1454465317597 bp. Contig Length Distribution (# of contigs -- min to max basepairs): 984377 -- 1000.0 to 47344.8 bp 261 -- 47344.8 to 93689.6 bp 30 -- 93689.6 to 140034.40000000002 bp 5 -- 140034.40000000002 to 186379.2 bp 2 -- 186379.2 to 232724.0 bp 1 -- 232724.0 to 279068.80000000005 bp 1 -- 279068.80000000005 to 325413.60000000003 bp 0 -- 325413.60000000003 to 371758.4 bp 0 -- 371758.4 to 418103.2 bp 1 -- 418103.2 to 464448.0 bp
Links
Bin metagenomic contigs
This app completed without errors in 9h 59m 36s.
Links
Files
These are only available in the live Narrative: https://narrative.kbase.us/narrative/77705
  • metabat_result.zip - Files generated by MetaBAT2 App
Output from MetaBAT2 Contig Binning - v1.7
The viewer for the output created by this App is available at the original Narrative here: https://narrative.kbase.us/narrative/77705
Runs the CheckM lineage workflow to assess the genome quality of isolates, single cells, or genome bins from metagenome assemblies through comparison to an existing database of genomes.
This app completed without errors in 1h 40m 42s.
Links
Files
These are only available in the live Narrative: https://narrative.kbase.us/narrative/77705
  • CheckM_summary_table.tsv.zip - TSV Summary Table from CheckM
  • full_output.zip - Full output of CheckM
  • plots.zip - Output plots from CheckM
Import a FASTQ/SRA file into your Narrative as a Reads data object
This app completed without errors in 2d 0h 43m 45s.
Objects
Created Object Name Type Description
10158.6.bb6.fq_reads PairedEndLibrary Imported Reads
Links
Assemble metagenomic reads using the MEGAHIT assembler.
This app completed without errors in 1d 21h 44m 55s.
Objects
Created Object Name Type Description
10158.6.bb6.MEGAHIT.assembly Assembly Assembled contigs
Summary
ContigSet saved to: jmwhitham:narrative_1605805694221/10158.6.bb6.MEGAHIT.assembly Assembled into 813739 contigs. Avg Length: 1849.0249711516838 bp. Contig Length Distribution (# of contigs -- min to max basepairs): 813593 -- 1000.0 to 61787.2 bp 131 -- 61787.2 to 122574.4 bp 10 -- 122574.4 to 183361.59999999998 bp 2 -- 183361.59999999998 to 244148.8 bp 1 -- 244148.8 to 304936.0 bp 0 -- 304936.0 to 365723.19999999995 bp 0 -- 365723.19999999995 to 426510.39999999997 bp 0 -- 426510.39999999997 to 487297.6 bp 1 -- 487297.6 to 548084.7999999999 bp 1 -- 548084.7999999999 to 608872.0 bp
Links
Bin metagenomic contigs
This app completed without errors in 8h 57m 18s.
Links
Files
These are only available in the live Narrative: https://narrative.kbase.us/narrative/77705
  • metabat_result.zip - Files generated by MetaBAT2 App
Output from MetaBAT2 Contig Binning - v1.7
The viewer for the output created by this App is available at the original Narrative here: https://narrative.kbase.us/narrative/77705
Runs the CheckM lineage workflow to assess the genome quality of isolates, single cells, or genome bins from metagenome assemblies through comparison to an existing database of genomes.
This app completed without errors in 1h 24m 44s.
Links
Files
These are only available in the live Narrative: https://narrative.kbase.us/narrative/77705
  • CheckM_summary_table.tsv.zip - TSV Summary Table from CheckM
  • full_output.zip - Full output of CheckM
  • plots.zip - Output plots from CheckM

Modules used in 9117.8*fastq processing

Import a FASTQ/SRA file into your Narrative as a Reads data object
This app completed without errors in 4h 7m 13s.
Objects
Created Object Name Type Description
9117.8.trim150.fq_reads PairedEndLibrary Imported Reads
Links
Assemble metagenomic reads using the MEGAHIT assembler.
This app completed without errors in 1d 18h 51m 44s.
Objects
Created Object Name Type Description
9117.8.trim150.MEGAHIT.assembly Assembly Assembled contigs
Summary
ContigSet saved to: jmwhitham:narrative_1605805694221/9117.8.trim150.MEGAHIT.assembly Assembled into 784734 contigs. Avg Length: 1767.098230738059 bp. Contig Length Distribution (# of contigs -- min to max basepairs): 784490 -- 1000.0 to 39338.3 bp 200 -- 39338.3 to 77676.6 bp 30 -- 77676.6 to 116014.90000000001 bp 5 -- 116014.90000000001 to 154353.2 bp 5 -- 154353.2 to 192691.5 bp 2 -- 192691.5 to 231029.80000000002 bp 0 -- 231029.80000000002 to 269368.10000000003 bp 0 -- 269368.10000000003 to 307706.4 bp 0 -- 307706.4 to 346044.7 bp 2 -- 346044.7 to 384383.0 bp
Links
Bin metagenomic contigs
This app completed without errors in 7h 49m 36s.
Links
Files
These are only available in the live Narrative: https://narrative.kbase.us/narrative/77705
  • metabat_result.zip - Files generated by MetaBAT2 App
Output from MetaBAT2 Contig Binning - v1.7
The viewer for the output created by this App is available at the original Narrative here: https://narrative.kbase.us/narrative/77705
Runs the CheckM lineage workflow to assess the genome quality of isolates, single cells, or genome bins from metagenome assemblies through comparison to an existing database of genomes.
This app completed without errors in 1h 17m 26s.
Links
Files
These are only available in the live Narrative: https://narrative.kbase.us/narrative/77705
  • CheckM_summary_table.tsv.zip - TSV Summary Table from CheckM
  • full_output.zip - Full output of CheckM
  • plots.zip - Output plots from CheckM
Import a FASTQ/SRA file into your Narrative as a Reads data object
This app completed without errors in 6h 2m 33s.
Objects
Created Object Name Type Description
9117.8.ftrimmed.fq_reads PairedEndLibrary Imported Reads
Links
Assemble metagenomic reads using the MEGAHIT assembler.
This app completed without errors in 1d 16h 21m 7s.
Objects
Created Object Name Type Description
9117.8.ftrimmed.MEGAHIT.assembly Assembly Assembled contigs
Summary
ContigSet saved to: jmwhitham:narrative_1605805694221/9117.8.ftrimmed.MEGAHIT.assembly Assembled into 668409 contigs. Avg Length: 1775.8356230990307 bp. Contig Length Distribution (# of contigs -- min to max basepairs): 668380 -- 1000.0 to 86412.3 bp 22 -- 86412.3 to 171824.6 bp 5 -- 171824.6 to 257236.90000000002 bp 1 -- 257236.90000000002 to 342649.2 bp 0 -- 342649.2 to 428061.5 bp 0 -- 428061.5 to 513473.80000000005 bp 0 -- 513473.80000000005 to 598886.1 bp 0 -- 598886.1 to 684298.4 bp 0 -- 684298.4 to 769710.7000000001 bp 1 -- 769710.7000000001 to 855123.0 bp
Links
Bin metagenomic contigs
This app completed without errors in 8h 45m 60s.
Links
Files
These are only available in the live Narrative: https://narrative.kbase.us/narrative/77705
  • metabat_result.zip - Files generated by MetaBAT2 App
Output from MetaBAT2 Contig Binning - v1.7
The viewer for the output created by this App is available at the original Narrative here: https://narrative.kbase.us/narrative/77705
Runs the CheckM lineage workflow to assess the genome quality of isolates, single cells, or genome bins from metagenome assemblies through comparison to an existing database of genomes.
This app completed without errors in 1h 15m 45s.
Links
Files
These are only available in the live Narrative: https://narrative.kbase.us/narrative/77705
  • CheckM_summary_table.tsv.zip - TSV Summary Table from CheckM
  • full_output.zip - Full output of CheckM
  • plots.zip - Output plots from CheckM
Import a FASTQ/SRA file into your Narrative as a Reads data object
This app completed without errors in 4h 2m 11s.
Objects
Created Object Name Type Description
9117.8.ktrimmed.fq_reads PairedEndLibrary Imported Reads
Links
Assemble metagenomic reads using the MEGAHIT assembler.
This app completed without errors in 1d 12h 40m 3s.
Objects
Created Object Name Type Description
9117.8.ktrimmed.MEGAHIT.assembly Assembly Assembled contigs
Summary
ContigSet saved to: jmwhitham:narrative_1605805694221/9117.8.ktrimmed.MEGAHIT.assembly Assembled into 668080 contigs. Avg Length: 1776.2506526164532 bp. Contig Length Distribution (# of contigs -- min to max basepairs): 668046 -- 1000.0 to 86412.3 bp 28 -- 86412.3 to 171824.6 bp 5 -- 171824.6 to 257236.90000000002 bp 0 -- 257236.90000000002 to 342649.2 bp 0 -- 342649.2 to 428061.5 bp 0 -- 428061.5 to 513473.80000000005 bp 0 -- 513473.80000000005 to 598886.1 bp 0 -- 598886.1 to 684298.4 bp 0 -- 684298.4 to 769710.7000000001 bp 1 -- 769710.7000000001 to 855123.0 bp
Links
Bin metagenomic contigs
This app completed without errors in 6h 12m 24s.
Links
Files
These are only available in the live Narrative: https://narrative.kbase.us/narrative/77705
  • metabat_result.zip - Files generated by MetaBAT2 App
Output from MetaBAT2 Contig Binning - v1.7
The viewer for the output created by this App is available at the original Narrative here: https://narrative.kbase.us/narrative/77705
Runs the CheckM lineage workflow to assess the genome quality of isolates, single cells, or genome bins from metagenome assemblies through comparison to an existing database of genomes.
This app completed without errors in 1h 9m 57s.
Links
Files
These are only available in the live Narrative: https://narrative.kbase.us/narrative/77705
  • CheckM_summary_table.tsv.zip - TSV Summary Table from CheckM
  • full_output.zip - Full output of CheckM
  • plots.zip - Output plots from CheckM
Import a FASTQ/SRA file into your Narrative as a Reads data object
This app completed without errors in 10h 59m 36s.
Objects
Created Object Name Type Description
9117.8.atrimmed.fq_reads PairedEndLibrary Imported Reads
Links
Assemble metagenomic reads using the MEGAHIT assembler.
This app completed without errors in 1d 18h 49m 41s.
Objects
Created Object Name Type Description
9117.8.atrimmed.MEGAHIT.assembly Assembly Assembled contigs
Summary
ContigSet saved to: jmwhitham:narrative_1605805694221/9117.8.atrimmed.MEGAHIT.assembly Assembled into 795189 contigs. Avg Length: 1764.1010225242048 bp. Contig Length Distribution (# of contigs -- min to max basepairs): 794948 -- 1000.0 to 39338.3 bp 200 -- 39338.3 to 77676.6 bp 26 -- 77676.6 to 116014.90000000001 bp 5 -- 116014.90000000001 to 154353.2 bp 6 -- 154353.2 to 192691.5 bp 2 -- 192691.5 to 231029.80000000002 bp 0 -- 231029.80000000002 to 269368.10000000003 bp 0 -- 269368.10000000003 to 307706.4 bp 0 -- 307706.4 to 346044.7 bp 2 -- 346044.7 to 384383.0 bp
Links
Bin metagenomic contigs
This app completed without errors in 7h 26m 25s.
Links
Files
These are only available in the live Narrative: https://narrative.kbase.us/narrative/77705
  • metabat_result.zip - Files generated by MetaBAT2 App
Output from MetaBAT2 Contig Binning - v1.7
The viewer for the output created by this App is available at the original Narrative here: https://narrative.kbase.us/narrative/77705
Runs the CheckM lineage workflow to assess the genome quality of isolates, single cells, or genome bins from metagenome assemblies through comparison to an existing database of genomes.
This app completed without errors in 1h 17m 44s.
Links
Files
These are only available in the live Narrative: https://narrative.kbase.us/narrative/77705
  • CheckM_summary_table.tsv.zip - TSV Summary Table from CheckM
  • full_output.zip - Full output of CheckM
  • plots.zip - Output plots from CheckM
Import a FASTQ/SRA file into your Narrative as a Reads data object
This app completed without errors in 17h 58m 44s.
Objects
Created Object Name Type Description
9117.8.aqbtrimmed.fq_reads PairedEndLibrary Imported Reads
Links
Assemble metagenomic reads using the MEGAHIT assembler.
This app completed without errors in 1d 17h 51m 51s.
Objects
Created Object Name Type Description
9117.8.aqbtrimmed.MEGAHIT.assembly Assembly Assembled contigs
Summary
ContigSet saved to: jmwhitham:narrative_1605805694221/9117.8.aqbtrimmed.MEGAHIT.assembly Assembled into 776326 contigs. Avg Length: 1768.3400607476756 bp. Contig Length Distribution (# of contigs -- min to max basepairs): 776082 -- 1000.0 to 39338.3 bp 196 -- 39338.3 to 77676.6 bp 34 -- 77676.6 to 116014.90000000001 bp 5 -- 116014.90000000001 to 154353.2 bp 5 -- 154353.2 to 192691.5 bp 2 -- 192691.5 to 231029.80000000002 bp 0 -- 231029.80000000002 to 269368.10000000003 bp 0 -- 269368.10000000003 to 307706.4 bp 0 -- 307706.4 to 346044.7 bp 2 -- 346044.7 to 384383.0 bp
Links
Bin metagenomic contigs
This app completed without errors in 7h 44m 53s.
Links
Files
These are only available in the live Narrative: https://narrative.kbase.us/narrative/77705
  • metabat_result.zip - Files generated by MetaBAT2 App
Output from MetaBAT2 Contig Binning - v1.7
The viewer for the output created by this App is available at the original Narrative here: https://narrative.kbase.us/narrative/77705
Runs the CheckM lineage workflow to assess the genome quality of isolates, single cells, or genome bins from metagenome assemblies through comparison to an existing database of genomes.
This app completed without errors in 1h 7m 56s.
Links
Files
These are only available in the live Narrative: https://narrative.kbase.us/narrative/77705
  • CheckM_summary_table.tsv.zip - TSV Summary Table from CheckM
  • full_output.zip - Full output of CheckM
  • plots.zip - Output plots from CheckM
Import a FASTQ/SRA file into your Narrative as a Reads data object
This app completed without errors in 22h 46m 7s.
Objects
Created Object Name Type Description
9117.8.aqtrimmed.fq_reads PairedEndLibrary Imported Reads
Links
Assemble metagenomic reads using the MEGAHIT assembler.
This app completed without errors in 1d 14h 34m 10s.
Objects
Created Object Name Type Description
9117.8.aqtrimmed.MEGAHIT.assembly Assembly Assembled contigs
Summary
ContigSet saved to: jmwhitham:narrative_1605805694221/9117.8.aqtrimmed.MEGAHIT.assembly Assembled into 693999 contigs. Avg Length: 1764.6772372870855 bp. Contig Length Distribution (# of contigs -- min to max basepairs): 693919 -- 1000.0 to 58806.6 bp 65 -- 58806.6 to 116613.2 bp 8 -- 116613.2 to 174419.8 bp 5 -- 174419.8 to 232226.4 bp 0 -- 232226.4 to 290033.0 bp 0 -- 290033.0 to 347839.6 bp 1 -- 347839.6 to 405646.2 bp 0 -- 405646.2 to 463452.8 bp 0 -- 463452.8 to 521259.39999999997 bp 1 -- 521259.39999999997 to 579066.0 bp
Links
Bin metagenomic contigs
This app completed without errors in 6h 46m 35s.
Links
Files
These are only available in the live Narrative: https://narrative.kbase.us/narrative/77705
  • metabat_result.zip - Files generated by MetaBAT2 App
Output from MetaBAT2 Contig Binning - v1.7
The viewer for the output created by this App is available at the original Narrative here: https://narrative.kbase.us/narrative/77705
Runs the CheckM lineage workflow to assess the genome quality of isolates, single cells, or genome bins from metagenome assemblies through comparison to an existing database of genomes.
This app completed without errors in 1h 3m 31s.
Links
Files
These are only available in the live Narrative: https://narrative.kbase.us/narrative/77705
  • CheckM_summary_table.tsv.zip - TSV Summary Table from CheckM
  • full_output.zip - Full output of CheckM
  • plots.zip - Output plots from CheckM
Import a FASTQ/SRA file into your Narrative as a Reads data object
This app completed without errors in 12h 5m 55s.
Objects
Created Object Name Type Description
9117.8.qbtrimmed.fq_reads PairedEndLibrary Imported Reads
Links
Assemble metagenomic reads using the MEGAHIT assembler.
This app completed without errors in 1d 15h 7m 6s.
Objects
Created Object Name Type Description
9117.8.qbtrimmed.MEGAHIT.assembly Assembly Assembled contigs
Summary
ContigSet saved to: jmwhitham:narrative_1605805694221/9117.8.qbtrimmed.MEGAHIT.assembly Assembled into 652614 contigs. Avg Length: 1778.4234876971686 bp. Contig Length Distribution (# of contigs -- min to max basepairs): 652584 -- 1000.0 to 86412.3 bp 23 -- 86412.3 to 171824.6 bp 6 -- 171824.6 to 257236.90000000002 bp 0 -- 257236.90000000002 to 342649.2 bp 0 -- 342649.2 to 428061.5 bp 0 -- 428061.5 to 513473.80000000005 bp 0 -- 513473.80000000005 to 598886.1 bp 0 -- 598886.1 to 684298.4 bp 0 -- 684298.4 to 769710.7000000001 bp 1 -- 769710.7000000001 to 855123.0 bp
Links
Bin metagenomic contigs
This app completed without errors in 6h 22m 45s.
Links
Files
These are only available in the live Narrative: https://narrative.kbase.us/narrative/77705
  • metabat_result.zip - Files generated by MetaBAT2 App
Output from MetaBAT2 Contig Binning - v1.7
The viewer for the output created by this App is available at the original Narrative here: https://narrative.kbase.us/narrative/77705
Runs the CheckM lineage workflow to assess the genome quality of isolates, single cells, or genome bins from metagenome assemblies through comparison to an existing database of genomes.
This app completed without errors in 1h 6m 17s.
Links
Files
These are only available in the live Narrative: https://narrative.kbase.us/narrative/77705
  • CheckM_summary_table.tsv.zip - TSV Summary Table from CheckM
  • full_output.zip - Full output of CheckM
  • plots.zip - Output plots from CheckM
Import a FASTQ/SRA file into your Narrative as a Reads data object
This app completed without errors in 8h 18m 29s.
Objects
Created Object Name Type Description
9117.8.qtrimmed.fq_reads PairedEndLibrary Imported Reads
Links
Assemble metagenomic reads using the MEGAHIT assembler.
This app completed without errors in 1d 9h 15m 18s.
Objects
Created Object Name Type Description
9117.8.qtrimmed.MEGAHIT.assembly Assembly Assembled contigs
Summary
ContigSet saved to: jmwhitham:narrative_1605805694221/9117.8.qtrimmed.MEGAHIT.assembly Assembled into 575297 contigs. Avg Length: 1775.4043754791003 bp. Contig Length Distribution (# of contigs -- min to max basepairs): 575226 -- 1000.0 to 58808.9 bp 60 -- 58808.9 to 116617.8 bp 4 -- 116617.8 to 174426.7 bp 4 -- 174426.7 to 232235.6 bp 0 -- 232235.6 to 290044.5 bp 1 -- 290044.5 to 347853.4 bp 1 -- 347853.4 to 405662.3 bp 0 -- 405662.3 to 463471.2 bp 0 -- 463471.2 to 521280.10000000003 bp 1 -- 521280.10000000003 to 579089.0 bp
Links
Bin metagenomic contigs
This app completed without errors in 6h 32m 59s.
Links
Files
These are only available in the live Narrative: https://narrative.kbase.us/narrative/77705
  • metabat_result.zip - Files generated by MetaBAT2 App
Output from MetaBAT2 Contig Binning - v1.7
The viewer for the output created by this App is available at the original Narrative here: https://narrative.kbase.us/narrative/77705
Runs the CheckM lineage workflow to assess the genome quality of isolates, single cells, or genome bins from metagenome assemblies through comparison to an existing database of genomes.
This app completed without errors in 1h 6m 47s.
Links
Files
These are only available in the live Narrative: https://narrative.kbase.us/narrative/77705
  • CheckM_summary_table.tsv.zip - TSV Summary Table from CheckM
  • full_output.zip - Full output of CheckM
  • plots.zip - Output plots from CheckM
Import a FASTQ/SRA file into your Narrative as a Reads data object
This app completed without errors in 17h 17m 32s.
Objects
Created Object Name Type Description
9117.8.bb1.fq_reads PairedEndLibrary Imported Reads
Links
Assemble metagenomic reads using the MEGAHIT assembler.
This app completed without errors in 1d 16h 28m 23s.
Objects
Created Object Name Type Description
9117.8.bb1.MEGAHIT.assembly Assembly Assembled contigs
Summary
ContigSet saved to: jmwhitham:narrative_1605805694221/9117.8.bb1.MEGAHIT.assembly Assembled into 794371 contigs. Avg Length: 1764.5469094415582 bp. Contig Length Distribution (# of contigs -- min to max basepairs): 794220 -- 1000.0 to 48015.3 bp 128 -- 48015.3 to 95030.6 bp 12 -- 95030.6 to 142045.90000000002 bp 6 -- 142045.90000000002 to 189061.2 bp 3 -- 189061.2 to 236076.5 bp 0 -- 236076.5 to 283091.80000000005 bp 0 -- 283091.80000000005 to 330107.10000000003 bp 0 -- 330107.10000000003 to 377122.4 bp 1 -- 377122.4 to 424137.7 bp 1 -- 424137.7 to 471153.0 bp
Links
Bin metagenomic contigs
This app completed without errors in 7h 44m 47s.
Links
Files
These are only available in the live Narrative: https://narrative.kbase.us/narrative/77705
  • metabat_result.zip - Files generated by MetaBAT2 App
Output from MetaBAT2 Contig Binning - v1.7
The viewer for the output created by this App is available at the original Narrative here: https://narrative.kbase.us/narrative/77705
Runs the CheckM lineage workflow to assess the genome quality of isolates, single cells, or genome bins from metagenome assemblies through comparison to an existing database of genomes.
This app completed without errors in 1h 14m 16s.
Links
Files
These are only available in the live Narrative: https://narrative.kbase.us/narrative/77705
  • CheckM_summary_table.tsv.zip - TSV Summary Table from CheckM
  • full_output.zip - Full output of CheckM
  • plots.zip - Output plots from CheckM
Import a FASTQ/SRA file into your Narrative as a Reads data object
This app completed without errors in 5h 32m 16s.
Objects
Created Object Name Type Description
9117.8.bb2.fq_reads PairedEndLibrary Imported Reads
Links
Assemble metagenomic reads using the MEGAHIT assembler.
This app completed without errors in 1d 9h 43m 23s.
Objects
Created Object Name Type Description
9117.8.bb2.MEGAHIT.assembly Assembly Assembled contigs
Summary
ContigSet saved to: jmwhitham:narrative_1605805694221/9117.8.bb2.MEGAHIT.assembly Assembled into 693233 contigs. Avg Length: 1764.4650644155718 bp. Contig Length Distribution (# of contigs -- min to max basepairs): 693150 -- 1000.0 to 58806.6 bp 70 -- 58806.6 to 116613.2 bp 8 -- 116613.2 to 174419.8 bp 3 -- 174419.8 to 232226.4 bp 0 -- 232226.4 to 290033.0 bp 0 -- 290033.0 to 347839.6 bp 1 -- 347839.6 to 405646.2 bp 0 -- 405646.2 to 463452.8 bp 0 -- 463452.8 to 521259.39999999997 bp 1 -- 521259.39999999997 to 579066.0 bp
Links
Bin metagenomic contigs
This app completed without errors in 6h 31m 17s.
Links
Files
These are only available in the live Narrative: https://narrative.kbase.us/narrative/77705
  • metabat_result.zip - Files generated by MetaBAT2 App
Output from MetaBAT2 Contig Binning - v1.7
The viewer for the output created by this App is available at the original Narrative here: https://narrative.kbase.us/narrative/77705
Runs the CheckM lineage workflow to assess the genome quality of isolates, single cells, or genome bins from metagenome assemblies through comparison to an existing database of genomes.
This app completed without errors in 1h 5m 3s.
Links
Files
These are only available in the live Narrative: https://narrative.kbase.us/narrative/77705
  • CheckM_summary_table.tsv.zip - TSV Summary Table from CheckM
  • full_output.zip - Full output of CheckM
  • plots.zip - Output plots from CheckM
Import a FASTQ/SRA file into your Narrative as a Reads data object
This app completed without errors in 3h 16m 56s.
Objects
Created Object Name Type Description
9117.8.bb3.fq_reads PairedEndLibrary Imported Reads
Links
Assemble metagenomic reads using the MEGAHIT assembler.
This app completed without errors in 1d 8h 17m 21s.
Objects
Created Object Name Type Description
9117.8.bb3.MEGAHIT.assembly Assembly Assembled contigs
Summary
ContigSet saved to: jmwhitham:narrative_1605805694221/9117.8.bb3.MEGAHIT.assembly Assembled into 574834 contigs. Avg Length: 1775.113498853582 bp. Contig Length Distribution (# of contigs -- min to max basepairs): 574764 -- 1000.0 to 58808.9 bp 58 -- 58808.9 to 116617.8 bp 5 -- 116617.8 to 174426.7 bp 4 -- 174426.7 to 232235.6 bp 0 -- 232235.6 to 290044.5 bp 1 -- 290044.5 to 347853.4 bp 1 -- 347853.4 to 405662.3 bp 0 -- 405662.3 to 463471.2 bp 0 -- 463471.2 to 521280.10000000003 bp 1 -- 521280.10000000003 to 579089.0 bp
Links
Bin metagenomic contigs
This app completed without errors in 5h 49m 40s.
Links
Files
These are only available in the live Narrative: https://narrative.kbase.us/narrative/77705
  • metabat_result.zip - Files generated by MetaBAT2 App
Output from MetaBAT2 Contig Binning - v1.7
The viewer for the output created by this App is available at the original Narrative here: https://narrative.kbase.us/narrative/77705
Runs the CheckM lineage workflow to assess the genome quality of isolates, single cells, or genome bins from metagenome assemblies through comparison to an existing database of genomes.
This app completed without errors in 59m 26s.
Links
Files
These are only available in the live Narrative: https://narrative.kbase.us/narrative/77705
  • CheckM_summary_table.tsv.zip - TSV Summary Table from CheckM
  • full_output.zip - Full output of CheckM
  • plots.zip - Output plots from CheckM
Import a FASTQ/SRA file into your Narrative as a Reads data object
This app completed without errors in 4h 37m 6s.
Objects
Created Object Name Type Description
9117.8.bb4.fq_reads PairedEndLibrary Imported Reads
Links
Assemble metagenomic reads using the MEGAHIT assembler.
This app completed without errors in 1d 19h 10m 54s.
Objects
Created Object Name Type Description
9117.8.bb4.MEGAHIT.assembly Assembly Assembled contigs
Summary
ContigSet saved to: jmwhitham:narrative_1605805694221/9117.8.bb4.MEGAHIT.assembly Assembled into 793939 contigs. Avg Length: 1764.1270299103583 bp. Contig Length Distribution (# of contigs -- min to max basepairs): 793698 -- 1000.0 to 39338.3 bp 195 -- 39338.3 to 77676.6 bp 32 -- 77676.6 to 116014.90000000001 bp 4 -- 116014.90000000001 to 154353.2 bp 7 -- 154353.2 to 192691.5 bp 1 -- 192691.5 to 231029.80000000002 bp 0 -- 231029.80000000002 to 269368.10000000003 bp 0 -- 269368.10000000003 to 307706.4 bp 0 -- 307706.4 to 346044.7 bp 2 -- 346044.7 to 384383.0 bp
Links
Bin metagenomic contigs
This app completed without errors in 7h 33m 46s.
Links
Files
These are only available in the live Narrative: https://narrative.kbase.us/narrative/77705
  • metabat_result.zip - Files generated by MetaBAT2 App
Output from MetaBAT2 Contig Binning - v1.7
The viewer for the output created by this App is available at the original Narrative here: https://narrative.kbase.us/narrative/77705
Runs the CheckM lineage workflow to assess the genome quality of isolates, single cells, or genome bins from metagenome assemblies through comparison to an existing database of genomes.
This app completed without errors in 1h 11m 30s.
Links
Files
These are only available in the live Narrative: https://narrative.kbase.us/narrative/77705
  • CheckM_summary_table.tsv.zip - TSV Summary Table from CheckM
  • full_output.zip - Full output of CheckM
  • plots.zip - Output plots from CheckM
Import a FASTQ/SRA file into your Narrative as a Reads data object
This app completed without errors in 1h 32m 8s.
Objects
Created Object Name Type Description
9117.8.bb5.fq_reads PairedEndLibrary Imported Reads
Links
Assemble metagenomic reads using the MEGAHIT assembler.
This app completed without errors in 1d 16h 54m 40s.
Objects
Created Object Name Type Description
9117.8.bb5.MEGAHIT.assembly Assembly Assembled contigs
Summary
ContigSet saved to: jmwhitham:narrative_1605805694221/9117.8.bb5.MEGAHIT.assembly Assembled into 775740 contigs. Avg Length: 1768.3041573207518 bp. Contig Length Distribution (# of contigs -- min to max basepairs): 775498 -- 1000.0 to 39338.3 bp 194 -- 39338.3 to 77676.6 bp 35 -- 77676.6 to 116014.90000000001 bp 5 -- 116014.90000000001 to 154353.2 bp 4 -- 154353.2 to 192691.5 bp 2 -- 192691.5 to 231029.80000000002 bp 0 -- 231029.80000000002 to 269368.10000000003 bp 0 -- 269368.10000000003 to 307706.4 bp 0 -- 307706.4 to 346044.7 bp 2 -- 346044.7 to 384383.0 bp
Links
Bin metagenomic contigs
This app completed without errors in 7h 25m 4s.
Links
Files
These are only available in the live Narrative: https://narrative.kbase.us/narrative/77705
  • metabat_result.zip - Files generated by MetaBAT2 App
Output from MetaBAT2 Contig Binning - v1.7
The viewer for the output created by this App is available at the original Narrative here: https://narrative.kbase.us/narrative/77705
Runs the CheckM lineage workflow to assess the genome quality of isolates, single cells, or genome bins from metagenome assemblies through comparison to an existing database of genomes.
This app completed without errors in 1h 18m 37s.
Links
Files
These are only available in the live Narrative: https://narrative.kbase.us/narrative/77705
  • CheckM_summary_table.tsv.zip - TSV Summary Table from CheckM
  • full_output.zip - Full output of CheckM
  • plots.zip - Output plots from CheckM
Import a FASTQ/SRA file into your Narrative as a Reads data object
This app completed without errors in 4h 2m 6s.
Objects
Created Object Name Type Description
9117.8.bb6.fq_reads PairedEndLibrary Imported Reads
Links
Assemble metagenomic reads using the MEGAHIT assembler.
This app completed without errors in 1d 12h 56m 48s.
Objects
Created Object Name Type Description
9117.8.bb6.MEGAHIT.assembly Assembly Assembled contigs
Summary
ContigSet saved to: jmwhitham:narrative_1605805694221/9117.8.bb6.MEGAHIT.assembly Assembled into 652079 contigs. Avg Length: 1778.0340863607016 bp. Contig Length Distribution (# of contigs -- min to max basepairs): 652048 -- 1000.0 to 86412.3 bp 24 -- 86412.3 to 171824.6 bp 6 -- 171824.6 to 257236.90000000002 bp 0 -- 257236.90000000002 to 342649.2 bp 0 -- 342649.2 to 428061.5 bp 0 -- 428061.5 to 513473.80000000005 bp 0 -- 513473.80000000005 to 598886.1 bp 0 -- 598886.1 to 684298.4 bp 0 -- 684298.4 to 769710.7000000001 bp 1 -- 769710.7000000001 to 855123.0 bp
Links
Bin metagenomic contigs
This app completed without errors in 6h 30m 37s.
Links
Files
These are only available in the live Narrative: https://narrative.kbase.us/narrative/77705
  • metabat_result.zip - Files generated by MetaBAT2 App
Output from MetaBAT2 Contig Binning - v1.7
The viewer for the output created by this App is available at the original Narrative here: https://narrative.kbase.us/narrative/77705
Runs the CheckM lineage workflow to assess the genome quality of isolates, single cells, or genome bins from metagenome assemblies through comparison to an existing database of genomes.
This app completed without errors in 1h 5m 47s.
Links
Files
These are only available in the live Narrative: https://narrative.kbase.us/narrative/77705
  • CheckM_summary_table.tsv.zip - TSV Summary Table from CheckM
  • full_output.zip - Full output of CheckM
  • plots.zip - Output plots from CheckM

Modules used in 9108.2*fastq processing

Import a FASTQ/SRA file into your Narrative as a Reads data object
This app completed without errors in 2h 36m 51s.
Objects
Created Object Name Type Description
9108.2.trim150.fq_reads PairedEndLibrary Imported Reads
Links
Assemble metagenomic reads using the MEGAHIT assembler.
This app completed without errors in 2d 13h 43m 54s.
Objects
Created Object Name Type Description
9108.2.trim150.MEGAHIT.assembly Assembly Assembled contigs
Summary
ContigSet saved to: jmwhitham:narrative_1605805694221/9108.2.trim150.MEGAHIT.assembly Assembled into 1195540 contigs. Avg Length: 1781.0664912926377 bp. Contig Length Distribution (# of contigs -- min to max basepairs): 1193975 -- 1000.0 to 17968.9 bp 1211 -- 17968.9 to 34937.8 bp 235 -- 34937.8 to 51906.700000000004 bp 70 -- 51906.700000000004 to 68875.6 bp 27 -- 68875.6 to 85844.5 bp 8 -- 85844.5 to 102813.40000000001 bp 8 -- 102813.40000000001 to 119782.30000000002 bp 3 -- 119782.30000000002 to 136751.2 bp 1 -- 136751.2 to 153720.1 bp 2 -- 153720.1 to 170689.0 bp
Links
Bin metagenomic contigs
This app completed without errors in 13h 59m 48s.
Links
Files
These are only available in the live Narrative: https://narrative.kbase.us/narrative/77705
  • metabat_result.zip - Files generated by MetaBAT2 App
Output from MetaBAT2 Contig Binning - v1.7
The viewer for the output created by this App is available at the original Narrative here: https://narrative.kbase.us/narrative/77705
Runs the CheckM lineage workflow to assess the genome quality of isolates, single cells, or genome bins from metagenome assemblies through comparison to an existing database of genomes.
This app completed without errors in 1h 33m 11s.
Links
Files
These are only available in the live Narrative: https://narrative.kbase.us/narrative/77705
  • CheckM_summary_table.tsv.zip - TSV Summary Table from CheckM
  • full_output.zip - Full output of CheckM
  • plots.zip - Output plots from CheckM
Import a FASTQ/SRA file into your Narrative as a Reads data object
This app completed without errors in 15h 23m 51s.
Objects
Created Object Name Type Description
9108.2.ftrimmed.fq_reads PairedEndLibrary Imported Reads
Links
Assemble metagenomic reads using the MEGAHIT assembler.
This app completed without errors in 2d 7h 10m 21s.
Objects
Created Object Name Type Description
9108.2.ftrimmed.MEGAHIT.assembly Assembly Assembled contigs
Summary
ContigSet saved to: jmwhitham:narrative_1605805694221/9108.2.ftrimmed.MEGAHIT.assembly Assembled into 1034211 contigs. Avg Length: 1783.6644649882858 bp. Contig Length Distribution (# of contigs -- min to max basepairs): 1032847 -- 1000.0 to 17972.6 bp 1016 -- 17972.6 to 34945.2 bp 220 -- 34945.2 to 51917.799999999996 bp 76 -- 51917.799999999996 to 68890.4 bp 29 -- 68890.4 to 85863.0 bp 9 -- 85863.0 to 102835.59999999999 bp 6 -- 102835.59999999999 to 119808.19999999998 bp 3 -- 119808.19999999998 to 136780.8 bp 3 -- 136780.8 to 153753.4 bp 2 -- 153753.4 to 170726.0 bp
Links
Bin metagenomic contigs
This app completed without errors in 11h 39m 53s.
Links
Files
These are only available in the live Narrative: https://narrative.kbase.us/narrative/77705
  • metabat_result.zip - Files generated by MetaBAT2 App
Output from MetaBAT2 Contig Binning - v1.7
The viewer for the output created by this App is available at the original Narrative here: https://narrative.kbase.us/narrative/77705
Runs the CheckM lineage workflow to assess the genome quality of isolates, single cells, or genome bins from metagenome assemblies through comparison to an existing database of genomes.
This app completed without errors in 1h 26m 56s.
Links
Files
These are only available in the live Narrative: https://narrative.kbase.us/narrative/77705
  • CheckM_summary_table.tsv.zip - TSV Summary Table from CheckM
  • full_output.zip - Full output of CheckM
  • plots.zip - Output plots from CheckM
Import a FASTQ/SRA file into your Narrative as a Reads data object
This app completed without errors in 15h 28m 43s.
Objects
Created Object Name Type Description
9108.2.ktrimmed.fq_reads PairedEndLibrary Imported Reads
Links
Assemble metagenomic reads using the MEGAHIT assembler.
This app completed without errors in 2d 6h 21m 60s.
Objects
Created Object Name Type Description
9108.2.ktrimmed.MEGAHIT.assembly Assembly Assembled contigs
Summary
ContigSet saved to: jmwhitham:narrative_1605805694221/9108.2.ktrimmed.MEGAHIT.assembly Assembled into 1033535 contigs. Avg Length: 1784.2603975675715 bp. Contig Length Distribution (# of contigs -- min to max basepairs): 1032154 -- 1000.0 to 17972.6 bp 1038 -- 17972.6 to 34945.2 bp 219 -- 34945.2 to 51917.799999999996 bp 74 -- 51917.799999999996 to 68890.4 bp 26 -- 68890.4 to 85863.0 bp 11 -- 85863.0 to 102835.59999999999 bp 8 -- 102835.59999999999 to 119808.19999999998 bp 0 -- 119808.19999999998 to 136780.8 bp 2 -- 136780.8 to 153753.4 bp 3 -- 153753.4 to 170726.0 bp
Links
Bin metagenomic contigs
This app completed without errors in 10h 36m 40s.
Links
Files
These are only available in the live Narrative: https://narrative.kbase.us/narrative/77705
  • metabat_result.zip - Files generated by MetaBAT2 App
Output from MetaBAT2 Contig Binning - v1.7
The viewer for the output created by this App is available at the original Narrative here: https://narrative.kbase.us/narrative/77705
Runs the CheckM lineage workflow to assess the genome quality of isolates, single cells, or genome bins from metagenome assemblies through comparison to an existing database of genomes.
This app completed without errors in 1h 27m 45s.
Links
Files
These are only available in the live Narrative: https://narrative.kbase.us/narrative/77705
  • CheckM_summary_table.tsv.zip - TSV Summary Table from CheckM
  • full_output.zip - Full output of CheckM
  • plots.zip - Output plots from CheckM
Import a FASTQ/SRA file into your Narrative as a Reads data object
This app completed without errors in 8h 12m 14s.
Objects
Created Object Name Type Description
9108.2.atrimmed.fq_reads PairedEndLibrary Imported Reads
Links
Assemble metagenomic reads using the MEGAHIT assembler.
This app completed without errors in 2d 14h 51m 28s.
Objects
Created Object Name Type Description
9108.2.atrimmed.MEGAHIT.assembly Assembly Assembled contigs
Summary
ContigSet saved to: jmwhitham:narrative_1605805694221/9108.2.atrimmed.MEGAHIT.assembly Assembled into 1209219 contigs. Avg Length: 1778.4371598527646 bp. Contig Length Distribution (# of contigs -- min to max basepairs): 1206870 -- 1000.0 to 14803.1 bp 1764 -- 14803.1 to 28606.2 bp 362 -- 28606.2 to 42409.3 bp 118 -- 42409.3 to 56212.4 bp 46 -- 56212.4 to 70015.5 bp 32 -- 70015.5 to 83818.6 bp 13 -- 83818.6 to 97621.7 bp 8 -- 97621.7 to 111424.8 bp 4 -- 111424.8 to 125227.90000000001 bp 2 -- 125227.90000000001 to 139031.0 bp
Links
Bin metagenomic contigs
This app completed without errors in 13h 19m 44s.
Links
Files
These are only available in the live Narrative: https://narrative.kbase.us/narrative/77705
  • metabat_result.zip - Files generated by MetaBAT2 App
Output from MetaBAT2 Contig Binning - v1.7
The viewer for the output created by this App is available at the original Narrative here: https://narrative.kbase.us/narrative/77705
Runs the CheckM lineage workflow to assess the genome quality of isolates, single cells, or genome bins from metagenome assemblies through comparison to an existing database of genomes.
This app completed without errors in 1h 26m 27s.
Links
Files
These are only available in the live Narrative: https://narrative.kbase.us/narrative/77705
  • CheckM_summary_table.tsv.zip - TSV Summary Table from CheckM
  • full_output.zip - Full output of CheckM
  • plots.zip - Output plots from CheckM
Import a FASTQ/SRA file into your Narrative as a Reads data object
This app completed without errors in 8h 17m 27s.
Objects
Created Object Name Type Description
9108.2.aqbtrimmed.fq_reads PairedEndLibrary Imported Reads
Links
Assemble metagenomic reads using the MEGAHIT assembler.
This app completed without errors in 2d 7h 28m 37s.
Objects
Created Object Name Type Description
9108.2.aqbtrimmed.MEGAHIT.assembly Assembly Assembled contigs
Summary
ContigSet saved to: jmwhitham:narrative_1605805694221/9108.2.aqbtrimmed.MEGAHIT.assembly Assembled into 1172362 contigs. Avg Length: 1781.2875929107222 bp. Contig Length Distribution (# of contigs -- min to max basepairs): 1170631 -- 1000.0 to 16860.1 bp 1341 -- 16860.1 to 32720.2 bp 240 -- 32720.2 to 48580.3 bp 77 -- 48580.3 to 64440.4 bp 41 -- 64440.4 to 80300.5 bp 16 -- 80300.5 to 96160.6 bp 4 -- 96160.6 to 112020.7 bp 9 -- 112020.7 to 127880.8 bp 2 -- 127880.8 to 143740.9 bp 1 -- 143740.9 to 159601.0 bp
Links
Bin metagenomic contigs
This app completed without errors in 13h 40m 42s.
Links
Files
These are only available in the live Narrative: https://narrative.kbase.us/narrative/77705
  • metabat_result.zip - Files generated by MetaBAT2 App
Output from MetaBAT2 Contig Binning - v1.7
The viewer for the output created by this App is available at the original Narrative here: https://narrative.kbase.us/narrative/77705
Runs the CheckM lineage workflow to assess the genome quality of isolates, single cells, or genome bins from metagenome assemblies through comparison to an existing database of genomes.
This app completed without errors in 1h 36m 51s.
Links
Files
These are only available in the live Narrative: https://narrative.kbase.us/narrative/77705
  • CheckM_summary_table.tsv.zip - TSV Summary Table from CheckM
  • full_output.zip - Full output of CheckM
  • plots.zip - Output plots from CheckM
Import a FASTQ/SRA file into your Narrative as a Reads data object
This app completed without errors in 11h 16m 45s.
Objects
Created Object Name Type Description
9108.2.aqtrimmed.fq_reads PairedEndLibrary Imported Reads
Links
Assemble metagenomic reads using the MEGAHIT assembler.
This app completed without errors in 2d 3h 46m 28s.
Objects
Created Object Name Type Description
9108.2.aqtrimmed.MEGAHIT.assembly Assembly Assembled contigs
Summary
ContigSet saved to: jmwhitham:narrative_1605805694221/9108.2.aqtrimmed.MEGAHIT.assembly Assembled into 1024332 contigs. Avg Length: 1769.7907143387106 bp. Contig Length Distribution (# of contigs -- min to max basepairs): 1022482 -- 1000.0 to 15200.1 bp 1368 -- 15200.1 to 29400.2 bp 299 -- 29400.2 to 43600.3 bp 90 -- 43600.3 to 57800.4 bp 52 -- 57800.4 to 72000.5 bp 25 -- 72000.5 to 86200.6 bp 5 -- 86200.6 to 100400.7 bp 8 -- 100400.7 to 114600.8 bp 2 -- 114600.8 to 128800.90000000001 bp 1 -- 128800.90000000001 to 143001.0 bp
Links
Bin metagenomic contigs
This app completed without errors in 8h 45m 5s.
Links
Files
These are only available in the live Narrative: https://narrative.kbase.us/narrative/77705
  • metabat_result.zip - Files generated by MetaBAT2 App
Output from MetaBAT2 Contig Binning - v1.7
The viewer for the output created by this App is available at the original Narrative here: https://narrative.kbase.us/narrative/77705
Runs the CheckM lineage workflow to assess the genome quality of isolates, single cells, or genome bins from metagenome assemblies through comparison to an existing database of genomes.
This app completed without errors in 1h 27m 49s.
Links
Files
These are only available in the live Narrative: https://narrative.kbase.us/narrative/77705
  • CheckM_summary_table.tsv.zip - TSV Summary Table from CheckM
  • full_output.zip - Full output of CheckM
  • plots.zip - Output plots from CheckM
Import a FASTQ/SRA file into your Narrative as a Reads data object
This app completed without errors in 12h 42m 18s.
Objects
Created Object Name Type Description
9108.2.qbtrimmed.fq_reads PairedEndLibrary Imported Reads
Links
Assemble metagenomic reads using the MEGAHIT assembler.
This app completed without errors in 2d 3h 2m 11s.
Objects
Created Object Name Type Description
9108.2.qbtrimmed.MEGAHIT.assembly Assembly Assembled contigs
Summary
ContigSet saved to: jmwhitham:narrative_1605805694221/9108.2.qbtrimmed.MEGAHIT.assembly Assembled into 999706 contigs. Avg Length: 1785.2481049428532 bp. Contig Length Distribution (# of contigs -- min to max basepairs): 998820 -- 1000.0 to 22256.2 bp 693 -- 22256.2 to 43512.4 bp 133 -- 43512.4 to 64768.600000000006 bp 34 -- 64768.600000000006 to 86024.8 bp 11 -- 86024.8 to 107281.0 bp 9 -- 107281.0 to 128537.20000000001 bp 3 -- 128537.20000000001 to 149793.4 bp 1 -- 149793.4 to 171049.6 bp 1 -- 171049.6 to 192305.80000000002 bp 1 -- 192305.80000000002 to 213562.0 bp
Links
Bin metagenomic contigs
This app completed without errors in 11h 57m 55s.
Links
Files
These are only available in the live Narrative: https://narrative.kbase.us/narrative/77705
  • metabat_result.zip - Files generated by MetaBAT2 App
Output from MetaBAT2 Contig Binning - v1.7
The viewer for the output created by this App is available at the original Narrative here: https://narrative.kbase.us/narrative/77705
Runs the CheckM lineage workflow to assess the genome quality of isolates, single cells, or genome bins from metagenome assemblies through comparison to an existing database of genomes.
This app completed without errors in 1h 16m 54s.
Links
Files
These are only available in the live Narrative: https://narrative.kbase.us/narrative/77705
  • CheckM_summary_table.tsv.zip - TSV Summary Table from CheckM
  • full_output.zip - Full output of CheckM
  • plots.zip - Output plots from CheckM
Import a FASTQ/SRA file into your Narrative as a Reads data object
This app completed without errors in 2h 2m 3s.
Objects
Created Object Name Type Description
9108.2.qtrimmed.fq_reads PairedEndLibrary Imported Reads
Links
Assemble metagenomic reads using the MEGAHIT assembler.
This app completed without errors in 1d 22h 15m 12s.
Objects
Created Object Name Type Description
9108.2.qtrimmed.MEGAHIT.assembly Assembly Assembled contigs
Summary
ContigSet saved to: jmwhitham:narrative_1605805694221/9108.2.qtrimmed.MEGAHIT.assembly Assembled into 860875 contigs. Avg Length: 1768.3140943807173 bp. Contig Length Distribution (# of contigs -- min to max basepairs): 859590 -- 1000.0 to 16961.7 bp 928 -- 16961.7 to 32923.4 bp 231 -- 32923.4 to 48885.100000000006 bp 78 -- 48885.100000000006 to 64846.8 bp 29 -- 64846.8 to 80808.5 bp 9 -- 80808.5 to 96770.20000000001 bp 4 -- 96770.20000000001 to 112731.90000000001 bp 3 -- 112731.90000000001 to 128693.6 bp 1 -- 128693.6 to 144655.30000000002 bp 2 -- 144655.30000000002 to 160617.0 bp
Links
Bin metagenomic contigs
This app completed without errors in 6h 41m 54s.
Links
Files
These are only available in the live Narrative: https://narrative.kbase.us/narrative/77705
  • metabat_result.zip - Files generated by MetaBAT2 App
Output from MetaBAT2 Contig Binning - v1.7
The viewer for the output created by this App is available at the original Narrative here: https://narrative.kbase.us/narrative/77705
Runs the CheckM lineage workflow to assess the genome quality of isolates, single cells, or genome bins from metagenome assemblies through comparison to an existing database of genomes.
This app completed without errors in 1h 17m 54s.
Links
Files
These are only available in the live Narrative: https://narrative.kbase.us/narrative/77705
  • CheckM_summary_table.tsv.zip - TSV Summary Table from CheckM
  • full_output.zip - Full output of CheckM
  • plots.zip - Output plots from CheckM
Import a FASTQ/SRA file into your Narrative as a Reads data object
This app completed without errors in 2h 21m 54s.
Objects
Created Object Name Type Description
9108.2.bb1.fq_reads PairedEndLibrary Imported Reads
Links
Assemble metagenomic reads using the MEGAHIT assembler.
This app completed without errors in 2d 6h 45m 43s.
Objects
Created Object Name Type Description
9108.2.bb1.MEGAHIT.assembly Assembly Assembled contigs
Summary
ContigSet saved to: jmwhitham:narrative_1605805694221/9108.2.bb1.MEGAHIT.assembly Assembled into 1208811 contigs. Avg Length: 1778.5609462521436 bp. Contig Length Distribution (# of contigs -- min to max basepairs): 1206463 -- 1000.0 to 14850.9 bp 1772 -- 14850.9 to 28701.8 bp 351 -- 28701.8 to 42552.7 bp 122 -- 42552.7 to 56403.6 bp 45 -- 56403.6 to 70254.5 bp 31 -- 70254.5 to 84105.4 bp 11 -- 84105.4 to 97956.3 bp 9 -- 97956.3 to 111807.2 bp 5 -- 111807.2 to 125658.09999999999 bp 2 -- 125658.09999999999 to 139509.0 bp
Links
Bin metagenomic contigs
This app completed without errors in 9h 32m 40s.
Links
Files
These are only available in the live Narrative: https://narrative.kbase.us/narrative/77705
  • metabat_result.zip - Files generated by MetaBAT2 App
Output from MetaBAT2 Contig Binning - v1.7
The viewer for the output created by this App is available at the original Narrative here: https://narrative.kbase.us/narrative/77705
Runs the CheckM lineage workflow to assess the genome quality of isolates, single cells, or genome bins from metagenome assemblies through comparison to an existing database of genomes.
This app completed without errors in 1h 48m 6s.
Links
Files
These are only available in the live Narrative: https://narrative.kbase.us/narrative/77705
  • CheckM_summary_table.tsv.zip - TSV Summary Table from CheckM
  • full_output.zip - Full output of CheckM
  • plots.zip - Output plots from CheckM
Import a FASTQ/SRA file into your Narrative as a Reads data object
This app completed without errors in 20h 5m 40s.
Objects
Created Object Name Type Description
9108.2.bb2.fq_reads PairedEndLibrary Imported Reads
Links
Assemble metagenomic reads using the MEGAHIT assembler.
This app completed without errors in 2d 3h 58m 1s.
Objects
Created Object Name Type Description
9108.2.bb2.MEGAHIT.assembly Assembly Assembled contigs
Summary
ContigSet saved to: jmwhitham:narrative_1605805694221/9108.2.bb2.MEGAHIT.assembly Assembled into 1023779 contigs. Avg Length: 1769.7766041303837 bp. Contig Length Distribution (# of contigs -- min to max basepairs): 1021949 -- 1000.0 to 15249.3 bp 1354 -- 15249.3 to 29498.6 bp 298 -- 29498.6 to 43747.899999999994 bp 86 -- 43747.899999999994 to 57997.2 bp 49 -- 57997.2 to 72246.5 bp 25 -- 72246.5 to 86495.79999999999 bp 5 -- 86495.79999999999 to 100745.09999999999 bp 8 -- 100745.09999999999 to 114994.4 bp 3 -- 114994.4 to 129243.7 bp 2 -- 129243.7 to 143493.0 bp
Links
Bin metagenomic contigs
This app completed without errors in 9h 30m 33s.
Links
Files
These are only available in the live Narrative: https://narrative.kbase.us/narrative/77705
  • metabat_result.zip - Files generated by MetaBAT2 App
Output from MetaBAT2 Contig Binning - v1.7
The viewer for the output created by this App is available at the original Narrative here: https://narrative.kbase.us/narrative/77705
Runs the CheckM lineage workflow to assess the genome quality of isolates, single cells, or genome bins from metagenome assemblies through comparison to an existing database of genomes.
This app completed without errors in 1h 21m 25s.
Links
Files
These are only available in the live Narrative: https://narrative.kbase.us/narrative/77705
  • CheckM_summary_table.tsv.zip - TSV Summary Table from CheckM
  • full_output.zip - Full output of CheckM
  • plots.zip - Output plots from CheckM
Import a FASTQ/SRA file into your Narrative as a Reads data object
This app completed without errors in 18h 57m 51s.
Objects
Created Object Name Type Description
9108.2.bb3.fq_reads PairedEndLibrary Imported Reads
Links
Assemble metagenomic reads using the MEGAHIT assembler.
This app completed without errors in 1d 21h 54m 41s.
Objects
Created Object Name Type Description
9108.2.bb3.MEGAHIT.assembly Assembly Assembled contigs
Summary
ContigSet saved to: jmwhitham:narrative_1605805694221/9108.2.bb3.MEGAHIT.assembly Assembled into 860302 contigs. Avg Length: 1768.471336809632 bp. Contig Length Distribution (# of contigs -- min to max basepairs): 858995 -- 1000.0 to 16961.7 bp 959 -- 16961.7 to 32923.4 bp 225 -- 32923.4 to 48885.100000000006 bp 79 -- 48885.100000000006 to 64846.8 bp 25 -- 64846.8 to 80808.5 bp 9 -- 80808.5 to 96770.20000000001 bp 3 -- 96770.20000000001 to 112731.90000000001 bp 3 -- 112731.90000000001 to 128693.6 bp 1 -- 128693.6 to 144655.30000000002 bp 3 -- 144655.30000000002 to 160617.0 bp
Links
Bin metagenomic contigs
This app completed without errors in 8h 18m 42s.
Links
Files
These are only available in the live Narrative: https://narrative.kbase.us/narrative/77705
  • metabat_result.zip - Files generated by MetaBAT2 App
Output from MetaBAT2 Contig Binning - v1.7
The viewer for the output created by this App is available at the original Narrative here: https://narrative.kbase.us/narrative/77705
Runs the CheckM lineage workflow to assess the genome quality of isolates, single cells, or genome bins from metagenome assemblies through comparison to an existing database of genomes.
This app completed without errors in 1h 15m 20s.
Links
Files
These are only available in the live Narrative: https://narrative.kbase.us/narrative/77705
  • CheckM_summary_table.tsv.zip - TSV Summary Table from CheckM
  • full_output.zip - Full output of CheckM
  • plots.zip - Output plots from CheckM
Import a FASTQ/SRA file into your Narrative as a Reads data object
This app completed without errors in 7h 12m 15s.
Objects
Created Object Name Type Description
9108.2.bb4.fq_reads PairedEndLibrary Imported Reads
Links
Assemble metagenomic reads using the MEGAHIT assembler.
This app completed without errors in 2d 11h 23m 21s.
Objects
Created Object Name Type Description
9108.2.bb4.MEGAHIT.assembly Assembly Assembled contigs
Summary
ContigSet saved to: jmwhitham:narrative_1605805694221/9108.2.bb4.MEGAHIT.assembly Assembled into 1208662 contigs. Avg Length: 1778.2948086396361 bp. Contig Length Distribution (# of contigs -- min to max basepairs): 1206954 -- 1000.0 to 17136.9 bp 1302 -- 17136.9 to 33273.8 bp 255 -- 33273.8 to 49410.7 bp 77 -- 49410.7 to 65547.6 bp 44 -- 65547.6 to 81684.5 bp 16 -- 81684.5 to 97821.4 bp 6 -- 97821.4 to 113958.3 bp 5 -- 113958.3 to 130095.2 bp 2 -- 130095.2 to 146232.1 bp 1 -- 146232.1 to 162369.0 bp
Links
Bin metagenomic contigs
This app is new, and hasn't been started.
No output found.
Runs the CheckM lineage workflow to assess the genome quality of isolates, single cells, or genome bins from metagenome assemblies through comparison to an existing database of genomes.
This app completed without errors in 1h 34m 58s.
Links
Files
These are only available in the live Narrative: https://narrative.kbase.us/narrative/77705
  • CheckM_summary_table.tsv.zip - TSV Summary Table from CheckM
  • full_output.zip - Full output of CheckM
  • plots.zip - Output plots from CheckM
Import a FASTQ/SRA file into your Narrative as a Reads data object
This app completed without errors in 2h 12m 6s.
Objects
Created Object Name Type Description
9108.2.bb5.fq_reads PairedEndLibrary Imported Reads
Links
Assemble metagenomic reads using the MEGAHIT assembler.
This app completed without errors in 2d 5h 58m 56s.
Objects
Created Object Name Type Description
9108.2.bb5.MEGAHIT.assembly Assembly Assembled contigs
Summary
ContigSet saved to: jmwhitham:narrative_1605805694221/9108.2.bb5.MEGAHIT.assembly Assembled into 1171776 contigs. Avg Length: 1781.3440171158993 bp. Contig Length Distribution (# of contigs -- min to max basepairs): 1170037 -- 1000.0 to 16860.1 bp 1343 -- 16860.1 to 32720.2 bp 248 -- 32720.2 to 48580.3 bp 79 -- 48580.3 to 64440.4 bp 38 -- 64440.4 to 80300.5 bp 14 -- 80300.5 to 96160.6 bp 5 -- 96160.6 to 112020.7 bp 9 -- 112020.7 to 127880.8 bp 2 -- 127880.8 to 143740.9 bp 1 -- 143740.9 to 159601.0 bp
Links
Bin metagenomic contigs
This app completed without errors in 12h 47m 54s.
Links
Files
These are only available in the live Narrative: https://narrative.kbase.us/narrative/77705
  • metabat_result.zip - Files generated by MetaBAT2 App
Output from MetaBAT2 Contig Binning - v1.7
The viewer for the output created by this App is available at the original Narrative here: https://narrative.kbase.us/narrative/77705
Runs the CheckM lineage workflow to assess the genome quality of isolates, single cells, or genome bins from metagenome assemblies through comparison to an existing database of genomes.
This app completed without errors in 1h 28m 21s.
Links
Files
These are only available in the live Narrative: https://narrative.kbase.us/narrative/77705
  • CheckM_summary_table.tsv.zip - TSV Summary Table from CheckM
  • full_output.zip - Full output of CheckM
  • plots.zip - Output plots from CheckM
Import a FASTQ/SRA file into your Narrative as a Reads data object
This app completed without errors in 5h 32m 15s.
Objects
Created Object Name Type Description
9108.2.bb6.fq_reads PairedEndLibrary Imported Reads
Links
Assemble metagenomic reads using the MEGAHIT assembler.
This app completed without errors in 1d 23h 11m 34s.
Objects
Created Object Name Type Description
9108.2.bb6.MEGAHIT.assembly Assembly Assembled contigs
Summary
ContigSet saved to: jmwhitham:narrative_1605805694221/9108.2.bb6.MEGAHIT.assembly Assembled into 999512 contigs. Avg Length: 1784.8426732245337 bp. Contig Length Distribution (# of contigs -- min to max basepairs): 998634 -- 1000.0 to 22256.2 bp 692 -- 22256.2 to 43512.4 bp 125 -- 43512.4 to 64768.600000000006 bp 35 -- 64768.600000000006 to 86024.8 bp 11 -- 86024.8 to 107281.0 bp 7 -- 107281.0 to 128537.20000000001 bp 3 -- 128537.20000000001 to 149793.4 bp 3 -- 149793.4 to 171049.6 bp 1 -- 171049.6 to 192305.80000000002 bp 1 -- 192305.80000000002 to 213562.0 bp
Links
Bin metagenomic contigs
This app completed without errors in 10h 45m 36s.
Links
Files
These are only available in the live Narrative: https://narrative.kbase.us/narrative/77705
  • metabat_result.zip - Files generated by MetaBAT2 App
Output from MetaBAT2 Contig Binning - v1.7
The viewer for the output created by this App is available at the original Narrative here: https://narrative.kbase.us/narrative/77705
Runs the CheckM lineage workflow to assess the genome quality of isolates, single cells, or genome bins from metagenome assemblies through comparison to an existing database of genomes.
This app completed without errors in 1h 28m 46s.
Links
Files
These are only available in the live Narrative: https://narrative.kbase.us/narrative/77705
  • CheckM_summary_table.tsv.zip - TSV Summary Table from CheckM
  • full_output.zip - Full output of CheckM
  • plots.zip - Output plots from CheckM
Run QUAST (QUality ASsessment Tool) on a set of Assemblies to assess their quality.
This app completed without errors in 1h 56m 56s.
Summary
All statistics are based on contigs of size >= 500 bp, unless otherwise noted (e.g., "# contigs (>= 0 bp)" and "Total length (>= 0 bp)" include all contigs). Assembly 10158.6.MEGAHIT.assembly 10158.6.QC.MEGAHIT.assembly 10158.6.trim150.MEGAHIT.assembly 10158.6.ftrimmed.MEGAHIT.assembly 10158.6.ktrimmed.MEGAHIT.assembly 10158.6.atrimmed.MEGAHIT.assembly 10158.6.aqbtrimmed.MEGAHIT.assembly 10158.6.aqtrimmed.MEGAHIT.assembly 10158.6.qbtrimmed.MEGAHIT.assembly 10158.6.qtrimmed.MEGAHIT.assembly 10158.6.bb1.MEGAHIT.assembly 10158.6.bb2.MEGAHIT.assembly 10158.6.bb3.MEGAHIT.assembly 10158.6.bb4.MEGAHIT.assembly 10158.6.bb5.MEGAHIT.assembly 10158.6.bb6.MEGAHIT.assembly # contigs (>= 0 bp) 1021022 1000991 1006971 855055 844082 1021175 984656 847260 813751 696646 1021122 847247 696638 1020095 984678 813739 # contigs (>= 1000 bp) 1021022 1000991 1006971 855055 844082 1021175 984656 847260 813751 696646 1021122 847247 696638 1020095 984678 813739 # contigs (>= 10000 bp) 6864 6851 6841 6411 6388 6897 6839 6104 6146 5236 6901 6103 5237 6891 6840 6149 # contigs (>= 100000 bp) 36 33 34 32 33 37 36 28 35 31 38 28 31 32 36 35 # contigs (>= 1000000 bp) 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 Total length (>= 0 bp) 1856329992 1824542955 1834287404 1575680244 1558167041 1858186812 1797174234 1547602029 1504650922 1287310166 1858129537 1547560659 1287301366 1855969045 1797180568 1504623731 Total length (>= 1000 bp) 1856329992 1824542955 1834287404 1575680244 1558167041 1858186812 1797174234 1547602029 1504650922 1287310166 1858129537 1547560659 1287301366 1855969045 1797180568 1504623731 Total length (>= 10000 bp) 132937132 133115463 132427114 122730690 122394179 133337225 132122926 116816919 119641924 100850310 133365650 116787787 100861454 133419467 132127122 119663200 Total length (>= 100000 bp) 5354160 5264724 5311801 4769344 4821231 5398502 5173203 4218920 5348260 4708075 5537117 4218920 4708075 4872164 5173203 5348260 Total length (>= 1000000 bp) 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 # contigs 1021022 1000991 1006971 855055 844082 1021175 984656 847260 813751 696646 1021122 847247 696638 1020095 984678 813739 Largest contig 533581 628222 615524 548267 548267 533226 464448 533226 608872 609718 533226 533226 609718 532227 464448 608872 Total length 1856329992 1824542955 1834287404 1575680244 1558167041 1858186812 1797174234 1547602029 1504650922 1287310166 1858129537 1547560659 1287301366 1855969045 1797180568 1504623731 GC (%) 61.74 61.76 61.75 61.75 61.75 61.76 61.79 62.01 61.78 62.04 61.76 62.01 62.04 61.75 61.79 61.78 N50 1748 1753 1752 1776 1781 1750 1756 1758 1783 1785 1750 1758 1785 1749 1756 1783 N75 1256 1258 1257 1263 1265 1256 1258 1257 1265 1264 1256 1257 1264 1256 1258 1265 L50 284898 278536 280361 233937 230518 284769 273362 234656 221482 189812 284744 234656 189808 284407 273368 221480 L75 603510 590947 594626 501552 494696 603367 580757 499401 476388 407899 603327 499395 407893 602736 580772 476381 # N's per 100 kbp 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
Links
Run QUAST (QUality ASsessment Tool) on a set of Assemblies to assess their quality.
This app completed without errors in 1h 31m 56s.
Summary
All statistics are based on contigs of size >= 500 bp, unless otherwise noted (e.g., "# contigs (>= 0 bp)" and "Total length (>= 0 bp)" include all contigs). Assembly 9117.8.MEGAHIT.assembly 9117.8.QC.MEGAHIT.assembly 9117.8.trim150.MEGAHIT.assembly 9117.8.ftrimmed.MEGAHIT.assembly 9117.8.ktrimmed.MEGAHIT.assembly 9117.8.atrimmed.MEGAHIT.assembly 9117.8.aqbtrimmed.MEGAHIT.assembly 9117.8.aqtrimmed.MEGAHIT.assembly 9117.8.qbtrimmed.MEGAHIT.assembly 9117.8.qtrimmed.MEGAHIT.assembly 9117.8.bb1.MEGAHIT.assembly 9117.8.bb2.MEGAHIT.assembly 9117.8.bb3.MEGAHIT.assembly 9117.8.bb4.MEGAHIT.assembly 9117.8.bb5.MEGAHIT.assembly 9117.8.bb6.MEGAHIT.assembly # contigs (>= 0 bp) 795124 788426 784734 668409 668080 795189 776326 693999 652614 575297 794371 693233 574834 793939 775740 652079 # contigs (>= 1000 bp) 795124 788426 784734 668409 668080 795189 776326 693999 652614 575297 794371 693233 574834 793939 775740 652079 # contigs (>= 10000 bp) 4479 4428 4405 4016 4031 4467 4414 3978 3929 3564 4505 3961 3541 4468 4398 3937 # contigs (>= 100000 bp) 21 23 19 20 23 21 20 18 24 20 20 17 20 23 19 25 # contigs (>= 1000000 bp) 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 Total length (>= 0 bp) 1402423204 1390291730 1386702063 1186984513 1186677536 1402793728 1372808366 1224684238 1160624066 1021384811 1401704893 1223185410 1020395593 1400609250 1371744267 1159418689 Total length (>= 1000 bp) 1402423204 1390291730 1386702063 1186984513 1186677536 1402793728 1372808366 1224684238 1160624066 1021384811 1401704893 1223185410 1020395593 1400609250 1371744267 1159418689 Total length (>= 10000 bp) 82686516 81611021 82088957 74532129 74385740 82595657 81700782 73305064 73233944 65053259 83006724 73028010 64771748 82617140 81347067 73345555 Total length (>= 100000 bp) 3389401 3833627 3257486 3757178 3996597 3495021 3242484 3399992 4225978 3623679 3474252 3100525 3646129 3663613 3086483 4334464 Total length (>= 1000000 bp) 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 # contigs 795124 788426 784734 668409 668080 795189 776326 693999 652614 575297 794371 693233 574834 793939 775740 652079 Largest contig 384383 384383 384383 855123 855123 384383 384383 579066 855123 579089 471153 579066 579089 384383 384383 855123 Total length 1402423204 1390291730 1386702063 1186984513 1186677536 1402793728 1372808366 1224684238 1160624066 1021384811 1401704893 1223185410 1020395593 1400609250 1371744267 1159418689 GC (%) 63.13 63.12 63.13 63.09 63.09 63.13 63.11 63.04 63.07 62.98 63.13 63.04 62.98 63.13 63.11 63.07 N50 1692 1691 1695 1702 1703 1691 1697 1690 1704 1699 1692 1689 1699 1692 1697 1703 N75 1242 1242 1244 1245 1245 1242 1244 1242 1246 1244 1243 1242 1244 1243 1244 1246 L50 232650 230849 229082 193472 193360 232608 226457 202713 188543 166283 232356 202525 166214 232328 226291 188383 L75 478091 474170 471418 400396 400163 478102 466195 417155 390667 344573 477566 416741 344347 477390 465846 390377 # N's per 100 kbp 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
Links
Run QUAST (QUality ASsessment Tool) on a set of Assemblies to assess their quality.
This app completed without errors in 2h 29m 2s.
Summary
All statistics are based on contigs of size >= 500 bp, unless otherwise noted (e.g., "# contigs (>= 0 bp)" and "Total length (>= 0 bp)" include all contigs). Assembly 9108.2.MEGAHIT.assembly 9108.2.QC.MEGAHIT.assembly 9108.2.trim150.MEGAHIT.assembly 9108.2.ftrimmed.MEGAHIT.assembly 9108.2.ktrimmed.MEGAHIT.assembly 9108.2.atrimmed.MEGAHIT.assembly 9108.2.aqbtrimmed.MEGAHIT.assembly 9108.2.aqtrimmed.MEGAHIT.assembly 9108.2.qbtrimmed.MEGAHIT.assembly 9108.2.qtrimmed.MEGAHIT.assembly 9108.2.bb1.MEGAHIT.assembly 9108.2.bb2.MEGAHIT.assembly 9108.2.bb3.MEGAHIT.assembly 9108.2.bb4.MEGAHIT.assembly 9108.2.bb5.MEGAHIT.assembly 9108.2.bb6.MEGAHIT.assembly # contigs (>= 0 bp) 1209055 1199339 1195540 1034211 1033535 1209219 1172362 1024332 999706 860875 1208811 1023779 860302 1208662 1171776 999512 # contigs (>= 1000 bp) 1209055 1199339 1195540 1034211 1033535 1209219 1172362 1024332 999706 860875 1208811 1023779 860302 1208662 1171776 999512 # contigs (>= 10000 bp) 5807 5691 5755 4989 5058 5833 5729 4644 4930 3934 5797 4635 3923 5738 5751 4926 # contigs (>= 100000 bp) 13 15 14 15 17 14 15 11 18 9 16 13 9 14 17 18 # contigs (>= 1000000 bp) 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 Total length (>= 0 bp) 2150531073 2132518680 2129336233 1844685410 1844095570 2150520004 2088313885 1812853262 1784723242 1522297396 2149944036 1811860122 1521419428 2149357360 2087336167 1783971670 Total length (>= 1000 bp) 2150531073 2132518680 2129336233 1844685410 1844095570 2150520004 2088313885 1812853262 1784723242 1522297396 2149944036 1811860122 1521419428 2149357360 2087336167 1783971670 Total length (>= 10000 bp) 101047179 99447875 100242313 88924777 89636361 101530395 99296661 82780195 87966049 71476974 101231294 82720028 71325100 100472896 99626118 87766607 Total length (>= 100000 bp) 1535786 1802980 1763621 1929049 2101327 1578267 1816415 1229882 2327663 1110789 1808691 1495653 1157115 1680449 2033363 2430139 Total length (>= 1000000 bp) 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 # contigs 1209055 1199339 1195540 1034211 1033535 1209219 1172362 1024332 999706 860875 1208811 1023779 860302 1208662 1171776 999512 Largest contig 162369 162369 170689 170726 170726 139031 159601 143001 213562 160617 139509 143493 160617 162369 159601 213562 Total length 2150531073 2132518680 2129336233 1844685410 1844095570 2150520004 2088313885 1812853262 1784723242 1522297396 2149944036 1811860122 1521419428 2149357360 2087336167 1783971670 GC (%) 64.27 64.27 64.27 64.30 64.30 64.27 64.26 64.25 64.29 64.26 64.27 64.25 64.26 64.27 64.26 64.29 N50 1723 1722 1727 1731 1732 1723 1727 1715 1733 1713 1723 1715 1713 1723 1726 1732 N75 1255 1255 1256 1257 1257 1255 1256 1253 1257 1253 1255 1253 1253 1255 1256 1257 L50 354458 351707 350014 302448 302250 354427 343199 302735 291889 254816 354299 302596 254542 354351 343024 291895 L75 725400 719687 716789 619655 619174 725473 702946 616322 598577 518237 725201 615998 517839 725199 702593 598530 # N's per 100 kbp 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
Links

Modules used in 9117.7*fastq processing

Import a FASTQ/SRA file into your Narrative as a Reads data object
This app completed without errors in 18h 43m 12s.
Objects
Created Object Name Type Description
9117.7.trim150.fq_reads PairedEndLibrary Imported Reads
Links
Assemble metagenomic reads using the MEGAHIT assembler.
This app completed without errors in 2d 2h 27m 29s.
Objects
Created Object Name Type Description
9117.7.trim150.MEGAHIT.assembly Assembly Assembled contigs
Summary
ContigSet saved to: jmwhitham:narrative_1605805694221/9117.7.trim150.MEGAHIT.assembly Assembled into 1122596 contigs. Avg Length: 1917.0348094951346 bp. Contig Length Distribution (# of contigs -- min to max basepairs): 1121969 -- 1000.0 to 37302.4 bp 479 -- 37302.4 to 73604.8 bp 88 -- 73604.8 to 109907.20000000001 bp 26 -- 109907.20000000001 to 146209.6 bp 17 -- 146209.6 to 182512.0 bp 6 -- 182512.0 to 218814.40000000002 bp 5 -- 218814.40000000002 to 255116.80000000002 bp 2 -- 255116.80000000002 to 291419.2 bp 3 -- 291419.2 to 327721.60000000003 bp 1 -- 327721.60000000003 to 364024.0 bp
Links
Bin metagenomic contigs
This app completed without errors in 10h 31m 15s.
Links
Files
These are only available in the live Narrative: https://narrative.kbase.us/narrative/77705
  • metabat_result.zip - Files generated by MetaBAT2 App
Output from MetaBAT2 Contig Binning - v1.7
The viewer for the output created by this App is available at the original Narrative here: https://narrative.kbase.us/narrative/77705
Runs the CheckM lineage workflow to assess the genome quality of isolates, single cells, or genome bins from metagenome assemblies through comparison to an existing database of genomes.
This app completed without errors in 2h 12m 53s.
Links
Files
These are only available in the live Narrative: https://narrative.kbase.us/narrative/77705
  • CheckM_summary_table.tsv.zip - TSV Summary Table from CheckM
  • full_output.zip - Full output of CheckM
  • plots.zip - Output plots from CheckM
Import a FASTQ/SRA file into your Narrative as a Reads data object
This app completed without errors in 1d 3h 59m 59s.
Objects
Created Object Name Type Description
9117.7.ftrimmed.fq_reads PairedEndLibrary Imported Reads
Links
Assemble metagenomic reads using the MEGAHIT assembler.
This app completed without errors in 1d 22h 41m 34s.
Objects
Created Object Name Type Description
9117.7.ftrimmed.MEGAHIT.assembly Assembly Assembled contigs
Summary
ContigSet saved to: jmwhitham:narrative_1605805694221/9117.7.ftrimmed.MEGAHIT.assembly Assembled into 971118 contigs. Avg Length: 1941.6426932669356 bp. Contig Length Distribution (# of contigs -- min to max basepairs): 970743 -- 1000.0 to 47668.2 bp 285 -- 47668.2 to 94336.4 bp 59 -- 94336.4 to 141004.59999999998 bp 19 -- 141004.59999999998 to 187672.8 bp 4 -- 187672.8 to 234341.0 bp 6 -- 234341.0 to 281009.19999999995 bp 0 -- 281009.19999999995 to 327677.39999999997 bp 0 -- 327677.39999999997 to 374345.6 bp 1 -- 374345.6 to 421013.8 bp 1 -- 421013.8 to 467682.0 bp
Links
Bin metagenomic contigs
This app completed without errors in 10h 38m 17s.
Links
Files
These are only available in the live Narrative: https://narrative.kbase.us/narrative/77705
  • metabat_result.zip - Files generated by MetaBAT2 App
Output from MetaBAT2 Contig Binning - v1.7
The viewer for the output created by this App is available at the original Narrative here: https://narrative.kbase.us/narrative/77705
Runs the CheckM lineage workflow to assess the genome quality of isolates, single cells, or genome bins from metagenome assemblies through comparison to an existing database of genomes.
This app completed without errors in 1h 59m 56s.
Links
Files
These are only available in the live Narrative: https://narrative.kbase.us/narrative/77705
  • CheckM_summary_table.tsv.zip - TSV Summary Table from CheckM
  • full_output.zip - Full output of CheckM
  • plots.zip - Output plots from CheckM
Import a FASTQ/SRA file into your Narrative as a Reads data object
This app completed without errors in 1d 2h 44m 28s.
Objects
Created Object Name Type Description
9117.7.ktrimmed.fq_reads PairedEndLibrary Imported Reads
Links
Assemble metagenomic reads using the MEGAHIT assembler.
This app completed without errors in 1d 20h 55m 57s.
Objects
Created Object Name Type Description
9117.7.ktrimmed.MEGAHIT.assembly Assembly Assembled contigs
Summary
ContigSet saved to: jmwhitham:narrative_1605805694221/9117.7.ktrimmed.MEGAHIT.assembly Assembled into 970655 contigs. Avg Length: 1941.9942822114963 bp. Contig Length Distribution (# of contigs -- min to max basepairs): 970412 -- 1000.0 to 58275.3 bp 185 -- 58275.3 to 115550.6 bp 40 -- 115550.6 to 172825.90000000002 bp 12 -- 172825.90000000002 to 230101.2 bp 5 -- 230101.2 to 287376.5 bp 0 -- 287376.5 to 344651.80000000005 bp 0 -- 344651.80000000005 to 401927.10000000003 bp 0 -- 401927.10000000003 to 459202.4 bp 0 -- 459202.4 to 516477.7 bp 1 -- 516477.7 to 573753.0 bp
Links
Bin metagenomic contigs
This app completed without errors in 9h 53m 13s.
Links
Files
These are only available in the live Narrative: https://narrative.kbase.us/narrative/77705
  • metabat_result.zip - Files generated by MetaBAT2 App
Output from MetaBAT2 Contig Binning - v1.7
The viewer for the output created by this App is available at the original Narrative here: https://narrative.kbase.us/narrative/77705
Runs the CheckM lineage workflow to assess the genome quality of isolates, single cells, or genome bins from metagenome assemblies through comparison to an existing database of genomes.
This app completed without errors in 1h 55m 9s.
Links
Files
These are only available in the live Narrative: https://narrative.kbase.us/narrative/77705
  • CheckM_summary_table.tsv.zip - TSV Summary Table from CheckM
  • full_output.zip - Full output of CheckM
  • plots.zip - Output plots from CheckM
Import a FASTQ/SRA file into your Narrative as a Reads data object
This app completed without errors in 3h 2m 29s.
Objects
Created Object Name Type Description
9117.7.atrimmed.fq_reads PairedEndLibrary Imported Reads
Links
Assemble metagenomic reads using the MEGAHIT assembler.
This app completed without errors in 2d 3h 2m 58s.
Objects
Created Object Name Type Description
9117.7.atrimmed.MEGAHIT.assembly Assembly Assembled contigs
Summary
ContigSet saved to: jmwhitham:narrative_1605805694221/9117.7.atrimmed.MEGAHIT.assembly Assembled into 1134976 contigs. Avg Length: 1913.7658699391 bp. Contig Length Distribution (# of contigs -- min to max basepairs): 1134609 -- 1000.0 to 48696.4 bp 293 -- 48696.4 to 96392.8 bp 47 -- 96392.8 to 144089.2 bp 15 -- 144089.2 to 191785.6 bp 2 -- 191785.6 to 239482.0 bp 5 -- 239482.0 to 287178.4 bp 4 -- 287178.4 to 334874.8 bp 0 -- 334874.8 to 382571.2 bp 0 -- 382571.2 to 430267.60000000003 bp 1 -- 430267.60000000003 to 477964.0 bp
Links
Bin metagenomic contigs
This app completed without errors in 11h 9m 14s.
Links
Files
These are only available in the live Narrative: https://narrative.kbase.us/narrative/77705
  • metabat_result.zip - Files generated by MetaBAT2 App
Output from MetaBAT2 Contig Binning - v1.7
The viewer for the output created by this App is available at the original Narrative here: https://narrative.kbase.us/narrative/77705
Runs the CheckM lineage workflow to assess the genome quality of isolates, single cells, or genome bins from metagenome assemblies through comparison to an existing database of genomes.
This app completed without errors in 1h 59m 20s.
Links
Files
These are only available in the live Narrative: https://narrative.kbase.us/narrative/77705
  • CheckM_summary_table.tsv.zip - TSV Summary Table from CheckM
  • full_output.zip - Full output of CheckM
  • plots.zip - Output plots from CheckM
Import a FASTQ/SRA file into your Narrative as a Reads data object
This app completed without errors in 1d 9h 12m 46s.
Objects
Created Object Name Type Description
9117.7.aqbtrimmed.fq_reads PairedEndLibrary Imported Reads
Links
Assemble metagenomic reads using the MEGAHIT assembler.
This app completed without errors in 2d 3h 3m 24s.
Objects
Created Object Name Type Description
9117.7.aqbtrimmed.MEGAHIT.assembly Assembly Assembled contigs
Summary
ContigSet saved to: jmwhitham:narrative_1605805694221/9117.7.aqbtrimmed.MEGAHIT.assembly Assembled into 1111545 contigs. Avg Length: 1917.1631171027714 bp. Contig Length Distribution (# of contigs -- min to max basepairs): 1111163 -- 1000.0 to 48696.4 bp 302 -- 48696.4 to 96392.8 bp 50 -- 96392.8 to 144089.2 bp 19 -- 144089.2 to 191785.6 bp 7 -- 191785.6 to 239482.0 bp 2 -- 239482.0 to 287178.4 bp 1 -- 287178.4 to 334874.8 bp 0 -- 334874.8 to 382571.2 bp 0 -- 382571.2 to 430267.60000000003 bp 1 -- 430267.60000000003 to 477964.0 bp
Links
Bin metagenomic contigs
This app completed without errors in 9h 50m 39s.
Links
Files
These are only available in the live Narrative: https://narrative.kbase.us/narrative/77705
  • metabat_result.zip - Files generated by MetaBAT2 App
Output from MetaBAT2 Contig Binning - v1.7
The viewer for the output created by this App is available at the original Narrative here: https://narrative.kbase.us/narrative/77705
Runs the CheckM lineage workflow to assess the genome quality of isolates, single cells, or genome bins from metagenome assemblies through comparison to an existing database of genomes.
This app completed without errors in 1h 58m 42s.
Links
Files
These are only available in the live Narrative: https://narrative.kbase.us/narrative/77705
  • CheckM_summary_table.tsv.zip - TSV Summary Table from CheckM
  • full_output.zip - Full output of CheckM
  • plots.zip - Output plots from CheckM
Import a FASTQ/SRA file into your Narrative as a Reads data object
This app completed without errors in 2h 2m 13s.
Objects
Created Object Name Type Description
9117.7.aqtrimmed.fq_reads PairedEndLibrary Imported Reads
Links
Assemble metagenomic reads using the MEGAHIT assembler.
This app completed without errors in 1d 20h 55m 27s.
Objects
Created Object Name Type Description
9117.7.aqtrimmed.MEGAHIT.assembly Assembly Assembled contigs
Summary
ContigSet saved to: jmwhitham:narrative_1605805694221/9117.7.aqtrimmed.MEGAHIT.assembly Assembled into 1010558 contigs. Avg Length: 1913.1056594475526 bp. Contig Length Distribution (# of contigs -- min to max basepairs): 1009768 -- 1000.0 to 32410.4 bp 600 -- 32410.4 to 63820.8 bp 105 -- 63820.8 to 95231.20000000001 bp 43 -- 95231.20000000001 to 126641.6 bp 21 -- 126641.6 to 158052.0 bp 4 -- 158052.0 to 189462.40000000002 bp 11 -- 189462.40000000002 to 220872.80000000002 bp 2 -- 220872.80000000002 to 252283.2 bp 3 -- 252283.2 to 283693.60000000003 bp 1 -- 283693.60000000003 to 315104.0 bp
Links
Bin metagenomic contigs
This app completed without errors in 9h 34m 55s.
Links
Files
These are only available in the live Narrative: https://narrative.kbase.us/narrative/77705
  • metabat_result.zip - Files generated by MetaBAT2 App
Output from MetaBAT2 Contig Binning - v1.7
The viewer for the output created by this App is available at the original Narrative here: https://narrative.kbase.us/narrative/77705
Runs the CheckM lineage workflow to assess the genome quality of isolates, single cells, or genome bins from metagenome assemblies through comparison to an existing database of genomes.
This app completed without errors in 1h 50m 8s.
Links
Files
These are only available in the live Narrative: https://narrative.kbase.us/narrative/77705
  • CheckM_summary_table.tsv.zip - TSV Summary Table from CheckM
  • full_output.zip - Full output of CheckM
  • plots.zip - Output plots from CheckM
Import a FASTQ/SRA file into your Narrative as a Reads data object
This app completed without errors in 17h 32m 30s.
Objects
Created Object Name Type Description
9117.7.qbtrimmed.fq_reads PairedEndLibrary Imported Reads
Links
Assemble metagenomic reads using the MEGAHIT assembler.
This app completed without errors in 1d 19h 51m 1s.
Objects
Created Object Name Type Description
9117.7.qbtrimmed.MEGAHIT.assembly Assembly Assembled contigs
Summary
ContigSet saved to: jmwhitham:narrative_1605805694221/9117.7.qbtrimmed.MEGAHIT.assembly Assembled into 950703 contigs. Avg Length: 1943.5816895497333 bp. Contig Length Distribution (# of contigs -- min to max basepairs): 950327 -- 1000.0 to 47435.6 bp 283 -- 47435.6 to 93871.2 bp 57 -- 93871.2 to 140306.8 bp 22 -- 140306.8 to 186742.4 bp 10 -- 186742.4 to 233178.0 bp 3 -- 233178.0 to 279613.6 bp 0 -- 279613.6 to 326049.2 bp 0 -- 326049.2 to 372484.8 bp 0 -- 372484.8 to 418920.39999999997 bp 1 -- 418920.39999999997 to 465356.0 bp
Links
Bin metagenomic contigs
This app completed without errors in 9h 45m 25s.
Links
Files
These are only available in the live Narrative: https://narrative.kbase.us/narrative/77705
  • metabat_result.zip - Files generated by MetaBAT2 App
Output from MetaBAT2 Contig Binning - v1.7
The viewer for the output created by this App is available at the original Narrative here: https://narrative.kbase.us/narrative/77705
Runs the CheckM lineage workflow to assess the genome quality of isolates, single cells, or genome bins from metagenome assemblies through comparison to an existing database of genomes.
This app completed without errors in 1h 47m 19s.
Links
Files
These are only available in the live Narrative: https://narrative.kbase.us/narrative/77705
  • CheckM_summary_table.tsv.zip - TSV Summary Table from CheckM
  • full_output.zip - Full output of CheckM
  • plots.zip - Output plots from CheckM
Import a FASTQ/SRA file into your Narrative as a Reads data object
This app completed without errors in 16h 59m 48s.
Objects
Created Object Name Type Description
9117.7.qtrimmed.fq_reads PairedEndLibrary Imported Reads
Links
Assemble metagenomic reads using the MEGAHIT assembler.
This app completed without errors in 1d 12h 2m 11s.
Objects
Created Object Name Type Description
9117.7.qtrimmed.MEGAHIT.assembly Assembly Assembled contigs
Summary
ContigSet saved to: jmwhitham:narrative_1605805694221/9117.7.qtrimmed.MEGAHIT.assembly Assembled into 858090 contigs. Avg Length: 1937.6153492057942 bp. Contig Length Distribution (# of contigs -- min to max basepairs): 857504 -- 1000.0 to 36146.8 bp 461 -- 36146.8 to 71293.6 bp 80 -- 71293.6 to 106440.40000000001 bp 20 -- 106440.40000000001 to 141587.2 bp 8 -- 141587.2 to 176734.0 bp 8 -- 176734.0 to 211880.80000000002 bp 3 -- 211880.80000000002 to 247027.60000000003 bp 3 -- 247027.60000000003 to 282174.4 bp 2 -- 282174.4 to 317321.2 bp 1 -- 317321.2 to 352468.0 bp
Links
Bin metagenomic contigs
This app completed without errors in 8h 40m 47s.
Links
Files
These are only available in the live Narrative: https://narrative.kbase.us/narrative/77705
  • metabat_result.zip - Files generated by MetaBAT2 App
Output from MetaBAT2 Contig Binning - v1.7
The viewer for the output created by this App is available at the original Narrative here: https://narrative.kbase.us/narrative/77705
Runs the CheckM lineage workflow to assess the genome quality of isolates, single cells, or genome bins from metagenome assemblies through comparison to an existing database of genomes.
This app completed without errors in 1h 36m 26s.
Links
Files
These are only available in the live Narrative: https://narrative.kbase.us/narrative/77705
  • CheckM_summary_table.tsv.zip - TSV Summary Table from CheckM
  • full_output.zip - Full output of CheckM
  • plots.zip - Output plots from CheckM
Import a FASTQ/SRA file into your Narrative as a Reads data object
This app completed without errors in 16h 23m 11s.
Objects
Created Object Name Type Description
9117.7.bb1.fq_reads PairedEndLibrary Imported Reads
Links
Assemble metagenomic reads using the MEGAHIT assembler.
This app completed without errors in 2d 2h 49m 31s.
Objects
Created Object Name Type Description
9117.7.bb1.MEGAHIT.assembly Assembly Assembled contigs
Summary
ContigSet saved to: jmwhitham:narrative_1605805694221/9117.7.bb1.MEGAHIT.assembly Assembled into 1134995 contigs. Avg Length: 1913.7212445869804 bp. Contig Length Distribution (# of contigs -- min to max basepairs): 1134629 -- 1000.0 to 48696.4 bp 292 -- 48696.4 to 96392.8 bp 47 -- 96392.8 to 144089.2 bp 15 -- 144089.2 to 191785.6 bp 2 -- 191785.6 to 239482.0 bp 5 -- 239482.0 to 287178.4 bp 4 -- 287178.4 to 334874.8 bp 0 -- 334874.8 to 382571.2 bp 0 -- 382571.2 to 430267.60000000003 bp 1 -- 430267.60000000003 to 477964.0 bp
Links
Bin metagenomic contigs
This app completed without errors in 10h 17m 39s.
Links
Files
These are only available in the live Narrative: https://narrative.kbase.us/narrative/77705
  • metabat_result.zip - Files generated by MetaBAT2 App
Output from MetaBAT2 Contig Binning - v1.7
The viewer for the output created by this App is available at the original Narrative here: https://narrative.kbase.us/narrative/77705
Runs the CheckM lineage workflow to assess the genome quality of isolates, single cells, or genome bins from metagenome assemblies through comparison to an existing database of genomes.
This app completed without errors in 2h 1m 44s.
Links
Files
These are only available in the live Narrative: https://narrative.kbase.us/narrative/77705
  • CheckM_summary_table.tsv.zip - TSV Summary Table from CheckM
  • full_output.zip - Full output of CheckM
  • plots.zip - Output plots from CheckM
Import a FASTQ/SRA file into your Narrative as a Reads data object
This app completed without errors in 18h 5m 26s.
Objects
Created Object Name Type Description
9117.7.bb2.fq_reads PairedEndLibrary Imported Reads
Links
Assemble metagenomic reads using the MEGAHIT assembler.
This app completed without errors in 1d 20h 49m 41s.
Objects
Created Object Name Type Description
9117.7.bb2.MEGAHIT.assembly Assembly Assembled contigs
Summary
ContigSet saved to: jmwhitham:narrative_1605805694221/9117.7.bb2.MEGAHIT.assembly Assembled into 1010490 contigs. Avg Length: 1913.0915852705123 bp. Contig Length Distribution (# of contigs -- min to max basepairs): 1009700 -- 1000.0 to 32410.4 bp 602 -- 32410.4 to 63820.8 bp 102 -- 63820.8 to 95231.20000000001 bp 44 -- 95231.20000000001 to 126641.6 bp 21 -- 126641.6 to 158052.0 bp 4 -- 158052.0 to 189462.40000000002 bp 11 -- 189462.40000000002 to 220872.80000000002 bp 2 -- 220872.80000000002 to 252283.2 bp 3 -- 252283.2 to 283693.60000000003 bp 1 -- 283693.60000000003 to 315104.0 bp
Links
Bin metagenomic contigs
This app completed without errors in 10h 39m 34s.
Links
Files
These are only available in the live Narrative: https://narrative.kbase.us/narrative/77705
  • metabat_result.zip - Files generated by MetaBAT2 App
Output from MetaBAT2 Contig Binning - v1.7
The viewer for the output created by this App is available at the original Narrative here: https://narrative.kbase.us/narrative/77705
Runs the CheckM lineage workflow to assess the genome quality of isolates, single cells, or genome bins from metagenome assemblies through comparison to an existing database of genomes.
This app completed without errors in 1h 49m 44s.
Links
Files
These are only available in the live Narrative: https://narrative.kbase.us/narrative/77705
  • CheckM_summary_table.tsv.zip - TSV Summary Table from CheckM
  • full_output.zip - Full output of CheckM
  • plots.zip - Output plots from CheckM
Import a FASTQ/SRA file into your Narrative as a Reads data object
This app completed without errors in 2h 7m 14s.
Objects
Created Object Name Type Description
9117.7.bb3.fq_reads PairedEndLibrary Imported Reads
Links
Assemble metagenomic reads using the MEGAHIT assembler.
This app completed without errors in 1d 14h 43m 43s.
Objects
Created Object Name Type Description
9117.7.bb3.MEGAHIT.assembly Assembly Assembled contigs
Summary
ContigSet saved to: jmwhitham:narrative_1605805694221/9117.7.bb3.MEGAHIT.assembly Assembled into 858086 contigs. Avg Length: 1937.6138673745988 bp. Contig Length Distribution (# of contigs -- min to max basepairs): 857500 -- 1000.0 to 36146.8 bp 461 -- 36146.8 to 71293.6 bp 80 -- 71293.6 to 106440.40000000001 bp 20 -- 106440.40000000001 to 141587.2 bp 8 -- 141587.2 to 176734.0 bp 8 -- 176734.0 to 211880.80000000002 bp 3 -- 211880.80000000002 to 247027.60000000003 bp 3 -- 247027.60000000003 to 282174.4 bp 2 -- 282174.4 to 317321.2 bp 1 -- 317321.2 to 352468.0 bp
Links
Bin metagenomic contigs
This app completed without errors in 8h 19m 55s.
Links
Files
These are only available in the live Narrative: https://narrative.kbase.us/narrative/77705
  • metabat_result.zip - Files generated by MetaBAT2 App
Output from MetaBAT2 Contig Binning - v1.7
The viewer for the output created by this App is available at the original Narrative here: https://narrative.kbase.us/narrative/77705
Runs the CheckM lineage workflow to assess the genome quality of isolates, single cells, or genome bins from metagenome assemblies through comparison to an existing database of genomes.
This app completed without errors in 1h 32m 35s.
Links
Files
These are only available in the live Narrative: https://narrative.kbase.us/narrative/77705
  • CheckM_summary_table.tsv.zip - TSV Summary Table from CheckM
  • full_output.zip - Full output of CheckM
  • plots.zip - Output plots from CheckM
Import a FASTQ/SRA file into your Narrative as a Reads data object
This app completed without errors in 4h 57m 5s.
Objects
Created Object Name Type Description
9117.7.bb4.fq_reads PairedEndLibrary Imported Reads
Links
Assemble metagenomic reads using the MEGAHIT assembler.
This app completed without errors in 2d 3h 5m 24s.
Objects
Created Object Name Type Description
9117.7.bb4.MEGAHIT.assembly Assembly Assembled contigs
Summary
ContigSet saved to: jmwhitham:narrative_1605805694221/9117.7.bb4.MEGAHIT.assembly Assembled into 1134808 contigs. Avg Length: 1913.7711701010214 bp. Contig Length Distribution (# of contigs -- min to max basepairs): 1134449 -- 1000.0 to 48696.4 bp 282 -- 48696.4 to 96392.8 bp 49 -- 96392.8 to 144089.2 bp 14 -- 144089.2 to 191785.6 bp 5 -- 191785.6 to 239482.0 bp 5 -- 239482.0 to 287178.4 bp 3 -- 287178.4 to 334874.8 bp 0 -- 334874.8 to 382571.2 bp 0 -- 382571.2 to 430267.60000000003 bp 1 -- 430267.60000000003 to 477964.0 bp
Links
Bin metagenomic contigs
This app completed without errors in 7h 57m 39s.
Links
Files
These are only available in the live Narrative: https://narrative.kbase.us/narrative/77705
  • metabat_result.zip - Files generated by MetaBAT2 App
Output from MetaBAT2 Contig Binning - v1.7
The viewer for the output created by this App is available at the original Narrative here: https://narrative.kbase.us/narrative/77705
Runs the CheckM lineage workflow to assess the genome quality of isolates, single cells, or genome bins from metagenome assemblies through comparison to an existing database of genomes.
This app completed without errors in 2h 3m 51s.
Links
Files
These are only available in the live Narrative: https://narrative.kbase.us/narrative/77705
  • CheckM_summary_table.tsv.zip - TSV Summary Table from CheckM
  • full_output.zip - Full output of CheckM
  • plots.zip - Output plots from CheckM
Import a FASTQ/SRA file into your Narrative as a Reads data object
This app completed without errors in 7h 52m 0s.
Objects
Created Object Name Type Description
9117.7.bb5.fq_reads PairedEndLibrary Imported Reads
Links
Assemble metagenomic reads using the MEGAHIT assembler.
This app completed without errors in 2d 2h 35m 16s.
Objects
Created Object Name Type Description
9117.7.bb5.MEGAHIT.assembly Assembly Assembled contigs
Summary
ContigSet saved to: jmwhitham:narrative_1605805694221/9117.7.bb5.MEGAHIT.assembly Assembled into 1111536 contigs. Avg Length: 1917.1589287256554 bp. Contig Length Distribution (# of contigs -- min to max basepairs): 1111154 -- 1000.0 to 48696.4 bp 303 -- 48696.4 to 96392.8 bp 49 -- 96392.8 to 144089.2 bp 19 -- 144089.2 to 191785.6 bp 7 -- 191785.6 to 239482.0 bp 2 -- 239482.0 to 287178.4 bp 1 -- 287178.4 to 334874.8 bp 0 -- 334874.8 to 382571.2 bp 0 -- 382571.2 to 430267.60000000003 bp 1 -- 430267.60000000003 to 477964.0 bp
Links
Bin metagenomic contigs
This app completed without errors in 10h 48m 2s.
Links
Files
These are only available in the live Narrative: https://narrative.kbase.us/narrative/77705
  • metabat_result.zip - Files generated by MetaBAT2 App
Output from MetaBAT2 Contig Binning - v1.7
The viewer for the output created by this App is available at the original Narrative here: https://narrative.kbase.us/narrative/77705
Runs the CheckM lineage workflow to assess the genome quality of isolates, single cells, or genome bins from metagenome assemblies through comparison to an existing database of genomes.
This app completed without errors in 1h 55m 23s.
Links
Files
These are only available in the live Narrative: https://narrative.kbase.us/narrative/77705
  • CheckM_summary_table.tsv.zip - TSV Summary Table from CheckM
  • full_output.zip - Full output of CheckM
  • plots.zip - Output plots from CheckM
Import a FASTQ/SRA file into your Narrative as a Reads data object
This app completed without errors in 2h 33m 24s.
Objects
Created Object Name Type Description
9117.7.bb6.fq_reads PairedEndLibrary Imported Reads
Links
Assemble metagenomic reads using the MEGAHIT assembler.
This app completed without errors in 1d 16h 48m 11s.
Objects
Created Object Name Type Description
9117.7.bb6.MEGAHIT.assembly Assembly Assembled contigs
Summary
ContigSet saved to: jmwhitham:narrative_1605805694221/9117.7.bb6.MEGAHIT.assembly Assembled into 950709 contigs. Avg Length: 1943.5650435622256 bp. Contig Length Distribution (# of contigs -- min to max basepairs): 950333 -- 1000.0 to 47435.6 bp 283 -- 47435.6 to 93871.2 bp 57 -- 93871.2 to 140306.8 bp 22 -- 140306.8 to 186742.4 bp 10 -- 186742.4 to 233178.0 bp 3 -- 233178.0 to 279613.6 bp 0 -- 279613.6 to 326049.2 bp 0 -- 326049.2 to 372484.8 bp 0 -- 372484.8 to 418920.39999999997 bp 1 -- 418920.39999999997 to 465356.0 bp
Links
Bin metagenomic contigs
This app completed without errors in 8h 47m 29s.
Links
Files
These are only available in the live Narrative: https://narrative.kbase.us/narrative/77705
  • metabat_result.zip - Files generated by MetaBAT2 App
Output from MetaBAT2 Contig Binning - v1.7
The viewer for the output created by this App is available at the original Narrative here: https://narrative.kbase.us/narrative/77705
Runs the CheckM lineage workflow to assess the genome quality of isolates, single cells, or genome bins from metagenome assemblies through comparison to an existing database of genomes.
This app completed without errors in 1h 43m 52s.
Links
Files
These are only available in the live Narrative: https://narrative.kbase.us/narrative/77705
  • CheckM_summary_table.tsv.zip - TSV Summary Table from CheckM
  • full_output.zip - Full output of CheckM
  • plots.zip - Output plots from CheckM

Modules used in 11306.3*fastq processing

Import a FASTQ/SRA file into your Narrative as a Reads data object
This app completed without errors in 2h 58m 36s.
Objects
Created Object Name Type Description
11306.3.trim150.fq_reads PairedEndLibrary Imported Reads
Links
Assemble metagenomic reads using the MEGAHIT assembler.
This app completed without errors in 2d 0h 25m 12s.
Objects
Created Object Name Type Description
11306.3.trim150.MEGAHIT.assembly Assembly Assembled contigs
Summary
ContigSet saved to: jmwhitham:narrative_1605805694221/11306.3.trim150.MEGAHIT.assembly Assembled into 1022460 contigs. Avg Length: 1868.601351642118 bp. Contig Length Distribution (# of contigs -- min to max basepairs): 1022408 -- 1000.0 to 122117.8 bp 35 -- 122117.8 to 243235.6 bp 9 -- 243235.6 to 364353.4 bp 5 -- 364353.4 to 485471.2 bp 1 -- 485471.2 to 606589.0 bp 0 -- 606589.0 to 727706.8 bp 0 -- 727706.8 to 848824.6 bp 1 -- 848824.6 to 969942.4 bp 0 -- 969942.4 to 1091060.2 bp 1 -- 1091060.2 to 1212178.0 bp
Links
Bin metagenomic contigs
This app completed without errors in 9h 53m 9s.
Links
Files
These are only available in the live Narrative: https://narrative.kbase.us/narrative/77705
  • metabat_result.zip - Files generated by MetaBAT2 App
Output from MetaBAT2 Contig Binning - v1.7
The viewer for the output created by this App is available at the original Narrative here: https://narrative.kbase.us/narrative/77705
Runs the CheckM lineage workflow to assess the genome quality of isolates, single cells, or genome bins from metagenome assemblies through comparison to an existing database of genomes.
This app completed without errors in 2h 3m 10s.
Links
Files
These are only available in the live Narrative: https://narrative.kbase.us/narrative/77705
  • CheckM_summary_table.tsv.zip - TSV Summary Table from CheckM
  • full_output.zip - Full output of CheckM
  • plots.zip - Output plots from CheckM
Import a FASTQ/SRA file into your Narrative as a Reads data object
This app completed without errors in 18h 8m 14s.
Objects
Created Object Name Type Description
11306.3.ftrimmed.fq_reads PairedEndLibrary Imported Reads
Links
Assemble metagenomic reads using the MEGAHIT assembler.
This app completed without errors in 1d 18h 37m 14s.
Objects
Created Object Name Type Description
11306.3.ftrimmed.MEGAHIT.assembly Assembly Assembled contigs
Summary
ContigSet saved to: jmwhitham:narrative_1605805694221/11306.3.ftrimmed.MEGAHIT.assembly Assembled into 889031 contigs. Avg Length: 1877.808786195307 bp. Contig Length Distribution (# of contigs -- min to max basepairs): 888978 -- 1000.0 to 122052.0 bp 32 -- 122052.0 to 243104.0 bp 15 -- 243104.0 to 364156.0 bp 2 -- 364156.0 to 485208.0 bp 1 -- 485208.0 to 606260.0 bp 1 -- 606260.0 to 727312.0 bp 1 -- 727312.0 to 848364.0 bp 0 -- 848364.0 to 969416.0 bp 0 -- 969416.0 to 1090468.0 bp 1 -- 1090468.0 to 1211520.0 bp
Links
Bin metagenomic contigs
This app completed without errors in 5h 59m 2s.
Links
Files
These are only available in the live Narrative: https://narrative.kbase.us/narrative/77705
  • metabat_result.zip - Files generated by MetaBAT2 App
Output from MetaBAT2 Contig Binning - v1.7
The viewer for the output created by this App is available at the original Narrative here: https://narrative.kbase.us/narrative/77705
Runs the CheckM lineage workflow to assess the genome quality of isolates, single cells, or genome bins from metagenome assemblies through comparison to an existing database of genomes.
This app completed without errors in 1h 41m 50s.
Links
Files
These are only available in the live Narrative: https://narrative.kbase.us/narrative/77705
  • CheckM_summary_table.tsv.zip - TSV Summary Table from CheckM
  • full_output.zip - Full output of CheckM
  • plots.zip - Output plots from CheckM
Import a FASTQ/SRA file into your Narrative as a Reads data object
This app completed without errors in 21h 34m 37s.
Objects
Created Object Name Type Description
11306.3.ktrimmed.fq_reads PairedEndLibrary Imported Reads
Links
Assemble metagenomic reads using the MEGAHIT assembler.
This app completed without errors in 1d 19h 11m 7s.
Objects
Created Object Name Type Description
11306.3.ktrimmed.MEGAHIT.assembly Assembly Assembled contigs
Summary
ContigSet saved to: jmwhitham:narrative_1605805694221/11306.3.ktrimmed.MEGAHIT.assembly Assembled into 882439 contigs. Avg Length: 1879.090657824507 bp. Contig Length Distribution (# of contigs -- min to max basepairs): 882390 -- 1000.0 to 122052.0 bp 29 -- 122052.0 to 243104.0 bp 15 -- 243104.0 to 364156.0 bp 3 -- 364156.0 to 485208.0 bp 0 -- 485208.0 to 606260.0 bp 1 -- 606260.0 to 727312.0 bp 0 -- 727312.0 to 848364.0 bp 0 -- 848364.0 to 969416.0 bp 0 -- 969416.0 to 1090468.0 bp 1 -- 1090468.0 to 1211520.0 bp
Links
Bin metagenomic contigs
This app completed without errors in 5h 45m 52s.
Links
Files
These are only available in the live Narrative: https://narrative.kbase.us/narrative/77705
  • metabat_result.zip - Files generated by MetaBAT2 App
Output from MetaBAT2 Contig Binning - v1.7
The viewer for the output created by this App is available at the original Narrative here: https://narrative.kbase.us/narrative/77705
Runs the CheckM lineage workflow to assess the genome quality of isolates, single cells, or genome bins from metagenome assemblies through comparison to an existing database of genomes.
This app completed without errors in 1h 43m 54s.
Links
Files
These are only available in the live Narrative: https://narrative.kbase.us/narrative/77705
  • CheckM_summary_table.tsv.zip - TSV Summary Table from CheckM
  • full_output.zip - Full output of CheckM
  • plots.zip - Output plots from CheckM
Import a FASTQ/SRA file into your Narrative as a Reads data object
This app completed without errors in 1h 57m 17s.
Objects
Created Object Name Type Description
11306.3.atrimmed.fq_reads PairedEndLibrary Imported Reads
Links
Assemble metagenomic reads using the MEGAHIT assembler.
This app completed without errors in 2d 0h 25m 53s.
Objects
Created Object Name Type Description
11306.3.atrimmed.MEGAHIT.assembly Assembly Assembled contigs
Summary
ContigSet saved to: jmwhitham:narrative_1605805694221/11306.3.atrimmed.MEGAHIT.assembly Assembled into 1042370 contigs. Avg Length: 1865.6722574517685 bp. Contig Length Distribution (# of contigs -- min to max basepairs): 1042318 -- 1000.0 to 121979.6 bp 35 -- 121979.6 to 242959.2 bp 10 -- 242959.2 to 363938.80000000005 bp 4 -- 363938.80000000005 to 484918.4 bp 1 -- 484918.4 to 605898.0 bp 0 -- 605898.0 to 726877.6000000001 bp 1 -- 726877.6000000001 to 847857.2000000001 bp 0 -- 847857.2000000001 to 968836.8 bp 0 -- 968836.8 to 1089816.4000000001 bp 1 -- 1089816.4000000001 to 1210796.0 bp
Links
Bin metagenomic contigs
This app completed without errors in 6h 41m 11s.
Links
Files
These are only available in the live Narrative: https://narrative.kbase.us/narrative/77705
  • metabat_result.zip - Files generated by MetaBAT2 App
Output from MetaBAT2 Contig Binning - v1.7
The viewer for the output created by this App is available at the original Narrative here: https://narrative.kbase.us/narrative/77705
Runs the CheckM lineage workflow to assess the genome quality of isolates, single cells, or genome bins from metagenome assemblies through comparison to an existing database of genomes.
This app completed without errors in 1h 59m 17s.
Links
Files
These are only available in the live Narrative: https://narrative.kbase.us/narrative/77705
  • CheckM_summary_table.tsv.zip - TSV Summary Table from CheckM
  • full_output.zip - Full output of CheckM
  • plots.zip - Output plots from CheckM
Import a FASTQ/SRA file into your Narrative as a Reads data object
This app completed without errors in 1d 14h 17m 17s.
Objects
Created Object Name Type Description
11306.3.aqbtrimmed.fq_reads PairedEndLibrary Imported Reads
Links
Assemble metagenomic reads using the MEGAHIT assembler.
This app completed without errors in 1d 21h 51m 54s.
Objects
Created Object Name Type Description
11306.3.aqbtrimmed.MEGAHIT.assembly Assembly Assembled contigs
Summary
ContigSet saved to: jmwhitham:narrative_1605805694221/11306.3.aqbtrimmed.MEGAHIT.assembly Assembled into 1022688 contigs. Avg Length: 1865.955419443662 bp. Contig Length Distribution (# of contigs -- min to max basepairs): 1022634 -- 1000.0 to 121979.6 bp 35 -- 121979.6 to 242959.2 bp 12 -- 242959.2 to 363938.80000000005 bp 5 -- 363938.80000000005 to 484918.4 bp 1 -- 484918.4 to 605898.0 bp 0 -- 605898.0 to 726877.6000000001 bp 0 -- 726877.6000000001 to 847857.2000000001 bp 0 -- 847857.2000000001 to 968836.8 bp 0 -- 968836.8 to 1089816.4000000001 bp 1 -- 1089816.4000000001 to 1210796.0 bp
Links
Bin metagenomic contigs
This app completed without errors in 9h 57m 9s.
Links
Files
These are only available in the live Narrative: https://narrative.kbase.us/narrative/77705
  • metabat_result.zip - Files generated by MetaBAT2 App
Output from MetaBAT2 Contig Binning - v1.7
The viewer for the output created by this App is available at the original Narrative here: https://narrative.kbase.us/narrative/77705
Runs the CheckM lineage workflow to assess the genome quality of isolates, single cells, or genome bins from metagenome assemblies through comparison to an existing database of genomes.
This app completed without errors in 1h 49m 55s.
Links
Files
These are only available in the live Narrative: https://narrative.kbase.us/narrative/77705
  • CheckM_summary_table.tsv.zip - TSV Summary Table from CheckM
  • full_output.zip - Full output of CheckM
  • plots.zip - Output plots from CheckM
Import a FASTQ/SRA file into your Narrative as a Reads data object
This app completed without errors in 2h 17m 41s.
Objects
Created Object Name Type Description
11306.3.aqtrimmed.fq_reads PairedEndLibrary Imported Reads
Links
Assemble metagenomic reads using the MEGAHIT assembler.
This app completed without errors in 1d 20h 28m 33s.
Objects
Created Object Name Type Description
11306.3.aqtrimmed.MEGAHIT.assembly Assembly Assembled contigs
Summary
ContigSet saved to: jmwhitham:narrative_1605805694221/11306.3.aqtrimmed.MEGAHIT.assembly Assembled into 940420 contigs. Avg Length: 1858.334849322643 bp. Contig Length Distribution (# of contigs -- min to max basepairs): 940339 -- 1000.0 to 91683.2 bp 52 -- 91683.2 to 182366.4 bp 13 -- 182366.4 to 273049.6 bp 10 -- 273049.6 to 363732.8 bp 3 -- 363732.8 to 454416.0 bp 1 -- 454416.0 to 545099.2 bp 0 -- 545099.2 to 635782.4 bp 0 -- 635782.4 to 726465.6 bp 0 -- 726465.6 to 817148.7999999999 bp 2 -- 817148.7999999999 to 907832.0 bp
Links
Bin metagenomic contigs
This app completed without errors in 5h 50m 7s.
Links
Files
These are only available in the live Narrative: https://narrative.kbase.us/narrative/77705
  • metabat_result.zip - Files generated by MetaBAT2 App
Output from MetaBAT2 Contig Binning - v1.7
The viewer for the output created by this App is available at the original Narrative here: https://narrative.kbase.us/narrative/77705
Runs the CheckM lineage workflow to assess the genome quality of isolates, single cells, or genome bins from metagenome assemblies through comparison to an existing database of genomes.
This app completed without errors in 1h 52m 15s.
Links
Files
These are only available in the live Narrative: https://narrative.kbase.us/narrative/77705
  • CheckM_summary_table.tsv.zip - TSV Summary Table from CheckM
  • full_output.zip - Full output of CheckM
  • plots.zip - Output plots from CheckM
Import a FASTQ/SRA file into your Narrative as a Reads data object
This app completed without errors in 7h 27m 20s.
Objects
Created Object Name Type Description
11306.3.qbtrimmed.fq_reads PairedEndLibrary Imported Reads
Links
Assemble metagenomic reads using the MEGAHIT assembler.
This app completed without errors in 1d 18h 18m 50s.
Objects
Created Object Name Type Description
11306.3.qbtrimmed.MEGAHIT.assembly Assembly Assembled contigs
Summary
ContigSet saved to: jmwhitham:narrative_1605805694221/11306.3.qbtrimmed.MEGAHIT.assembly Assembled into 872042 contigs. Avg Length: 1878.095779790423 bp. Contig Length Distribution (# of contigs -- min to max basepairs): 871993 -- 1000.0 to 122052.0 bp 30 -- 122052.0 to 243104.0 bp 13 -- 243104.0 to 364156.0 bp 2 -- 364156.0 to 485208.0 bp 1 -- 485208.0 to 606260.0 bp 2 -- 606260.0 to 727312.0 bp 0 -- 727312.0 to 848364.0 bp 0 -- 848364.0 to 969416.0 bp 0 -- 969416.0 to 1090468.0 bp 1 -- 1090468.0 to 1211520.0 bp
Links
Bin metagenomic contigs
This app completed without errors in 8h 11m 24s.
Links
Files
These are only available in the live Narrative: https://narrative.kbase.us/narrative/77705
  • metabat_result.zip - Files generated by MetaBAT2 App
Output from MetaBAT2 Contig Binning - v1.7
The viewer for the output created by this App is available at the original Narrative here: https://narrative.kbase.us/narrative/77705
Runs the CheckM lineage workflow to assess the genome quality of isolates, single cells, or genome bins from metagenome assemblies through comparison to an existing database of genomes.
This app completed without errors in 1h 45m 37s.
Links
Files
These are only available in the live Narrative: https://narrative.kbase.us/narrative/77705
  • CheckM_summary_table.tsv.zip - TSV Summary Table from CheckM
  • full_output.zip - Full output of CheckM
  • plots.zip - Output plots from CheckM
Import a FASTQ/SRA file into your Narrative as a Reads data object
This app completed without errors in 2h 17m 50s.
Objects
Created Object Name Type Description
11306.3.qtrimmed.fq_reads PairedEndLibrary Imported Reads
Links
Assemble metagenomic reads using the MEGAHIT assembler.
This app completed without errors in 1d 15h 32m 21s.
Objects
Created Object Name Type Description
11306.3.qtrimmed.MEGAHIT.assembly Assembly Assembled contigs
Summary
ContigSet saved to: jmwhitham:narrative_1605805694221/11306.3.qtrimmed.MEGAHIT.assembly Assembled into 794360 contigs. Avg Length: 1869.8241414472027 bp. Contig Length Distribution (# of contigs -- min to max basepairs): 794303 -- 1000.0 to 122052.0 bp 41 -- 122052.0 to 243104.0 bp 12 -- 243104.0 to 364156.0 bp 2 -- 364156.0 to 485208.0 bp 1 -- 485208.0 to 606260.0 bp 0 -- 606260.0 to 727312.0 bp 0 -- 727312.0 to 848364.0 bp 0 -- 848364.0 to 969416.0 bp 0 -- 969416.0 to 1090468.0 bp 1 -- 1090468.0 to 1211520.0 bp
Links
Bin metagenomic contigs
This app completed without errors in 7h 6m 5s.
Links
Files
These are only available in the live Narrative: https://narrative.kbase.us/narrative/77705
  • metabat_result.zip - Files generated by MetaBAT2 App
Output from MetaBAT2 Contig Binning - v1.7
The viewer for the output created by this App is available at the original Narrative here: https://narrative.kbase.us/narrative/77705
Runs the CheckM lineage workflow to assess the genome quality of isolates, single cells, or genome bins from metagenome assemblies through comparison to an existing database of genomes.
This app completed without errors in 1h 32m 3s.
Links
Files
These are only available in the live Narrative: https://narrative.kbase.us/narrative/77705
  • CheckM_summary_table.tsv.zip - TSV Summary Table from CheckM
  • full_output.zip - Full output of CheckM
  • plots.zip - Output plots from CheckM
Import a FASTQ/SRA file into your Narrative as a Reads data object
This app completed without errors in 22h 23m 51s.
Objects
Created Object Name Type Description
11306.3.bb1.fq_reads PairedEndLibrary Imported Reads
Links
Assemble metagenomic reads using the MEGAHIT assembler.
This app completed without errors in 2d 0h 55m 50s.
Objects
Created Object Name Type Description
11306.3.bb1.MEGAHIT.assembly Assembly Assembled contigs
Summary
ContigSet saved to: jmwhitham:narrative_1605805694221/11306.3.bb1.MEGAHIT.assembly Assembled into 1042400 contigs. Avg Length: 1865.658668457406 bp. Contig Length Distribution (# of contigs -- min to max basepairs): 1042348 -- 1000.0 to 121979.6 bp 35 -- 121979.6 to 242959.2 bp 10 -- 242959.2 to 363938.80000000005 bp 4 -- 363938.80000000005 to 484918.4 bp 1 -- 484918.4 to 605898.0 bp 0 -- 605898.0 to 726877.6000000001 bp 1 -- 726877.6000000001 to 847857.2000000001 bp 0 -- 847857.2000000001 to 968836.8 bp 0 -- 968836.8 to 1089816.4000000001 bp 1 -- 1089816.4000000001 to 1210796.0 bp
Links
Bin metagenomic contigs
This app completed without errors in 9h 34m 32s.
Links
Files
These are only available in the live Narrative: https://narrative.kbase.us/narrative/77705
  • metabat_result.zip - Files generated by MetaBAT2 App
Output from MetaBAT2 Contig Binning - v1.7
The viewer for the output created by this App is available at the original Narrative here: https://narrative.kbase.us/narrative/77705
Runs the CheckM lineage workflow to assess the genome quality of isolates, single cells, or genome bins from metagenome assemblies through comparison to an existing database of genomes.
This app completed without errors in 1h 58m 6s.
Links
Files
These are only available in the live Narrative: https://narrative.kbase.us/narrative/77705
  • CheckM_summary_table.tsv.zip - TSV Summary Table from CheckM
  • full_output.zip - Full output of CheckM
  • plots.zip - Output plots from CheckM
Import a FASTQ/SRA file into your Narrative as a Reads data object
This app completed without errors in 18h 17m 50s.
Objects
Created Object Name Type Description
11306.3.bb2.fq_reads PairedEndLibrary Imported Reads
Links
Assemble metagenomic reads using the MEGAHIT assembler.
This app completed without errors in 1d 19h 47m 38s.
Objects
Created Object Name Type Description
11306.3.bb2.MEGAHIT.assembly Assembly Assembled contigs
Summary
ContigSet saved to: jmwhitham:narrative_1605805694221/11306.3.bb2.MEGAHIT.assembly Assembled into 940415 contigs. Avg Length: 1858.337166038398 bp. Contig Length Distribution (# of contigs -- min to max basepairs): 940334 -- 1000.0 to 91683.2 bp 52 -- 91683.2 to 182366.4 bp 13 -- 182366.4 to 273049.6 bp 10 -- 273049.6 to 363732.8 bp 3 -- 363732.8 to 454416.0 bp 1 -- 454416.0 to 545099.2 bp 0 -- 545099.2 to 635782.4 bp 0 -- 635782.4 to 726465.6 bp 0 -- 726465.6 to 817148.7999999999 bp 2 -- 817148.7999999999 to 907832.0 bp
Links
Bin metagenomic contigs
This app completed without errors in 8h 27m 32s.
Links
Files
These are only available in the live Narrative: https://narrative.kbase.us/narrative/77705
  • metabat_result.zip - Files generated by MetaBAT2 App
Output from MetaBAT2 Contig Binning - v1.7
The viewer for the output created by this App is available at the original Narrative here: https://narrative.kbase.us/narrative/77705
Runs the CheckM lineage workflow to assess the genome quality of isolates, single cells, or genome bins from metagenome assemblies through comparison to an existing database of genomes.
This app completed without errors in 1h 46m 20s.
Links
Files
These are only available in the live Narrative: https://narrative.kbase.us/narrative/77705
  • CheckM_summary_table.tsv.zip - TSV Summary Table from CheckM
  • full_output.zip - Full output of CheckM
  • plots.zip - Output plots from CheckM
Import a FASTQ/SRA file into your Narrative as a Reads data object
This app completed without errors in 6h 42m 20s.
Objects
Created Object Name Type Description
11306.3.bb3.fq_reads PairedEndLibrary Imported Reads
Links
Assemble metagenomic reads using the MEGAHIT assembler.
This app completed without errors in 1d 15h 28m 36s.
Objects
Created Object Name Type Description
11306.3.bb3.MEGAHIT.assembly Assembly Assembled contigs
Summary
ContigSet saved to: jmwhitham:narrative_1605805694221/11306.3.bb3.MEGAHIT.assembly Assembled into 794335 contigs. Avg Length: 1869.8413553475548 bp. Contig Length Distribution (# of contigs -- min to max basepairs): 794278 -- 1000.0 to 122052.0 bp 41 -- 122052.0 to 243104.0 bp 12 -- 243104.0 to 364156.0 bp 2 -- 364156.0 to 485208.0 bp 1 -- 485208.0 to 606260.0 bp 0 -- 606260.0 to 727312.0 bp 0 -- 727312.0 to 848364.0 bp 0 -- 848364.0 to 969416.0 bp 0 -- 969416.0 to 1090468.0 bp 1 -- 1090468.0 to 1211520.0 bp
Links
Bin metagenomic contigs
This app completed without errors in 4h 55m 5s.
Links
Files
These are only available in the live Narrative: https://narrative.kbase.us/narrative/77705
  • metabat_result.zip - Files generated by MetaBAT2 App
Output from MetaBAT2 Contig Binning - v1.7
The viewer for the output created by this App is available at the original Narrative here: https://narrative.kbase.us/narrative/77705
Runs the CheckM lineage workflow to assess the genome quality of isolates, single cells, or genome bins from metagenome assemblies through comparison to an existing database of genomes.
This app completed without errors in 1h 26m 35s.
Links
Files
These are only available in the live Narrative: https://narrative.kbase.us/narrative/77705
  • CheckM_summary_table.tsv.zip - TSV Summary Table from CheckM
  • full_output.zip - Full output of CheckM
  • plots.zip - Output plots from CheckM
Import a FASTQ/SRA file into your Narrative as a Reads data object
This app completed without errors in 2h 2m 51s.
Objects
Created Object Name Type Description
11306.3.bb4.fq_reads PairedEndLibrary Imported Reads
Links
Assemble metagenomic reads using the MEGAHIT assembler.
This app completed without errors in 2d 0h 52m 14s.
Objects
Created Object Name Type Description
11306.3.bb4.MEGAHIT.assembly Assembly Assembled contigs
Summary
ContigSet saved to: jmwhitham:narrative_1605805694221/11306.3.bb4.MEGAHIT.assembly Assembled into 1042030 contigs. Avg Length: 1865.603870330029 bp. Contig Length Distribution (# of contigs -- min to max basepairs): 1041982 -- 1000.0 to 124432.9 bp 33 -- 124432.9 to 247865.8 bp 8 -- 247865.8 to 371298.69999999995 bp 4 -- 371298.69999999995 to 494731.6 bp 1 -- 494731.6 to 618164.5 bp 0 -- 618164.5 to 741597.3999999999 bp 0 -- 741597.3999999999 to 865030.2999999999 bp 0 -- 865030.2999999999 to 988463.2 bp 0 -- 988463.2 to 1111896.0999999999 bp 2 -- 1111896.0999999999 to 1235329.0 bp
Links
Bin metagenomic contigs
This app completed without errors in 6h 36m 13s.
Links
Files
These are only available in the live Narrative: https://narrative.kbase.us/narrative/77705
  • metabat_result.zip - Files generated by MetaBAT2 App
Output from MetaBAT2 Contig Binning - v1.7
The viewer for the output created by this App is available at the original Narrative here: https://narrative.kbase.us/narrative/77705
Runs the CheckM lineage workflow to assess the genome quality of isolates, single cells, or genome bins from metagenome assemblies through comparison to an existing database of genomes.
This app completed without errors in 1h 59m 30s.
Links
Files
These are only available in the live Narrative: https://narrative.kbase.us/narrative/77705
  • CheckM_summary_table.tsv.zip - TSV Summary Table from CheckM
  • full_output.zip - Full output of CheckM
  • plots.zip - Output plots from CheckM
Import a FASTQ/SRA file into your Narrative as a Reads data object
This app completed without errors in 1d 3h 19m 6s.
Objects
Created Object Name Type Description
11306.3.bb5.fq_reads PairedEndLibrary Imported Reads
Links
Assemble metagenomic reads using the MEGAHIT assembler.
This app completed without errors in 1d 23h 29m 11s.
Objects
Created Object Name Type Description
11306.3.bb5.MEGAHIT.assembly Assembly Assembled contigs
Summary
ContigSet saved to: jmwhitham:narrative_1605805694221/11306.3.bb5.MEGAHIT.assembly Assembled into 1022595 contigs. Avg Length: 1865.9571042299249 bp. Contig Length Distribution (# of contigs -- min to max basepairs): 1022541 -- 1000.0 to 121979.6 bp 35 -- 121979.6 to 242959.2 bp 12 -- 242959.2 to 363938.80000000005 bp 5 -- 363938.80000000005 to 484918.4 bp 1 -- 484918.4 to 605898.0 bp 0 -- 605898.0 to 726877.6000000001 bp 0 -- 726877.6000000001 to 847857.2000000001 bp 0 -- 847857.2000000001 to 968836.8 bp 0 -- 968836.8 to 1089816.4000000001 bp 1 -- 1089816.4000000001 to 1210796.0 bp
Links
Bin metagenomic contigs
This app completed without errors in 9h 54m 32s.
Links
Files
These are only available in the live Narrative: https://narrative.kbase.us/narrative/77705
  • metabat_result.zip - Files generated by MetaBAT2 App
Output from MetaBAT2 Contig Binning - v1.7
The viewer for the output created by this App is available at the original Narrative here: https://narrative.kbase.us/narrative/77705
Runs the CheckM lineage workflow to assess the genome quality of isolates, single cells, or genome bins from metagenome assemblies through comparison to an existing database of genomes.
This app completed without errors in 1h 50m 29s.
Links
Files
These are only available in the live Narrative: https://narrative.kbase.us/narrative/77705
  • CheckM_summary_table.tsv.zip - TSV Summary Table from CheckM
  • full_output.zip - Full output of CheckM
  • plots.zip - Output plots from CheckM
Import a FASTQ/SRA file into your Narrative as a Reads data object
This app completed without errors in 1h 47m 9s.
Objects
Created Object Name Type Description
11306.3.bb6.fq_reads PairedEndLibrary Imported Reads
Links
Assemble metagenomic reads using the MEGAHIT assembler.
This app completed without errors in 1d 18h 20m 48s.
Objects
Created Object Name Type Description
11306.3.bb6.MEGAHIT.assembly Assembly Assembled contigs
Summary
ContigSet saved to: jmwhitham:narrative_1605805694221/11306.3.bb6.MEGAHIT.assembly Assembled into 872045 contigs. Avg Length: 1878.0923472985912 bp. Contig Length Distribution (# of contigs -- min to max basepairs): 871996 -- 1000.0 to 122052.0 bp 30 -- 122052.0 to 243104.0 bp 13 -- 243104.0 to 364156.0 bp 2 -- 364156.0 to 485208.0 bp 1 -- 485208.0 to 606260.0 bp 2 -- 606260.0 to 727312.0 bp 0 -- 727312.0 to 848364.0 bp 0 -- 848364.0 to 969416.0 bp 0 -- 969416.0 to 1090468.0 bp 1 -- 1090468.0 to 1211520.0 bp
Links
Bin metagenomic contigs
This app completed without errors in 8h 31m 2s.
Links
Files
These are only available in the live Narrative: https://narrative.kbase.us/narrative/77705
  • metabat_result.zip - Files generated by MetaBAT2 App
Output from MetaBAT2 Contig Binning - v1.7
The viewer for the output created by this App is available at the original Narrative here: https://narrative.kbase.us/narrative/77705
Runs the CheckM lineage workflow to assess the genome quality of isolates, single cells, or genome bins from metagenome assemblies through comparison to an existing database of genomes.
This app completed without errors in 1h 38m 46s.
Links
Files
These are only available in the live Narrative: https://narrative.kbase.us/narrative/77705
  • CheckM_summary_table.tsv.zip - TSV Summary Table from CheckM
  • full_output.zip - Full output of CheckM
  • plots.zip - Output plots from CheckM

Modules used in 9117.4*fastq processing

Import a FASTQ/SRA file into your Narrative as a Reads data object
This app completed without errors in 3h 2m 58s.
Objects
Created Object Name Type Description
9117.4.trim150.fq_reads PairedEndLibrary Imported Reads
Links
Assemble metagenomic reads using the MEGAHIT assembler.
This app completed without errors in 2d 4h 26m 1s.
Objects
Created Object Name Type Description
9117.4.trim150.MEGAHIT.assembly Assembly Assembled contigs
Summary
ContigSet saved to: jmwhitham:narrative_1605805694221/9117.4.trim150.MEGAHIT.assembly Assembled into 1055541 contigs. Avg Length: 1872.29976002827 bp. Contig Length Distribution (# of contigs -- min to max basepairs): 1055042 -- 1000.0 to 34960.4 bp 392 -- 34960.4 to 68920.8 bp 69 -- 68920.8 to 102881.20000000001 bp 24 -- 102881.20000000001 to 136841.6 bp 7 -- 136841.6 to 170802.0 bp 3 -- 170802.0 to 204762.40000000002 bp 2 -- 204762.40000000002 to 238722.80000000002 bp 0 -- 238722.80000000002 to 272683.2 bp 0 -- 272683.2 to 306643.60000000003 bp 2 -- 306643.60000000003 to 340604.0 bp
Links
Bin metagenomic contigs
This app completed without errors in 12h 27m 53s.
Links
Files
These are only available in the live Narrative: https://narrative.kbase.us/narrative/77705
  • metabat_result.zip - Files generated by MetaBAT2 App
Output from MetaBAT2 Contig Binning - v1.7
The viewer for the output created by this App is available at the original Narrative here: https://narrative.kbase.us/narrative/77705
Runs the CheckM lineage workflow to assess the genome quality of isolates, single cells, or genome bins from metagenome assemblies through comparison to an existing database of genomes.
This app completed without errors in 1h 50m 25s.
Links
Files
These are only available in the live Narrative: https://narrative.kbase.us/narrative/77705
  • CheckM_summary_table.tsv.zip - TSV Summary Table from CheckM
  • full_output.zip - Full output of CheckM
  • plots.zip - Output plots from CheckM
Import a FASTQ/SRA file into your Narrative as a Reads data object
This app completed without errors in 1d 5h 48m 58s.
Objects
Created Object Name Type Description
9117.4.ftrimmed.fq_reads PairedEndLibrary Imported Reads
Links
Assemble metagenomic reads using the MEGAHIT assembler.
This app completed without errors in 1d 23h 28m 37s.
Objects
Created Object Name Type Description
9117.4.ftrimmed.MEGAHIT.assembly Assembly Assembled contigs
Summary
ContigSet saved to: jmwhitham:narrative_1605805694221/9117.4.ftrimmed.MEGAHIT.assembly Assembled into 914807 contigs. Avg Length: 1885.4208865913795 bp. Contig Length Distribution (# of contigs -- min to max basepairs): 914025 -- 1000.0 to 26898.6 bp 618 -- 26898.6 to 52797.2 bp 100 -- 52797.2 to 78695.79999999999 bp 38 -- 78695.79999999999 to 104594.4 bp 13 -- 104594.4 to 130493.0 bp 3 -- 130493.0 to 156391.59999999998 bp 6 -- 156391.59999999998 to 182290.19999999998 bp 2 -- 182290.19999999998 to 208188.8 bp 1 -- 208188.8 to 234087.4 bp 1 -- 234087.4 to 259986.0 bp
Links
Bin metagenomic contigs
This app completed without errors in 11h 2m 43s.
Links
Files
These are only available in the live Narrative: https://narrative.kbase.us/narrative/77705
  • metabat_result.zip - Files generated by MetaBAT2 App
Output from MetaBAT2 Contig Binning - v1.7
The viewer for the output created by this App is available at the original Narrative here: https://narrative.kbase.us/narrative/77705
Runs the CheckM lineage workflow to assess the genome quality of isolates, single cells, or genome bins from metagenome assemblies through comparison to an existing database of genomes.
This app completed without errors in 1h 41m 14s.
Links
Files
These are only available in the live Narrative: https://narrative.kbase.us/narrative/77705
  • CheckM_summary_table.tsv.zip - TSV Summary Table from CheckM
  • full_output.zip - Full output of CheckM
  • plots.zip - Output plots from CheckM
Import a FASTQ/SRA file into your Narrative as a Reads data object
This app completed without errors in 12h 53m 1s.
Objects
Created Object Name Type Description
9117.4.ktrimmed.fq_reads PairedEndLibrary Imported Reads
Links
Assemble metagenomic reads using the MEGAHIT assembler.
This app completed without errors in 1d 21h 33m 37s.
Objects
Created Object Name Type Description
9117.4.ktrimmed.MEGAHIT.assembly Assembly Assembled contigs
Summary
ContigSet saved to: jmwhitham:narrative_1605805694221/9117.4.ktrimmed.MEGAHIT.assembly Assembled into 913797 contigs. Avg Length: 1886.3445984173727 bp. Contig Length Distribution (# of contigs -- min to max basepairs): 912660 -- 1000.0 to 22662.1 bp 878 -- 22662.1 to 44324.2 bp 159 -- 44324.2 to 65986.29999999999 bp 57 -- 65986.29999999999 to 87648.4 bp 20 -- 87648.4 to 109310.5 bp 9 -- 109310.5 to 130972.59999999999 bp 4 -- 130972.59999999999 to 152634.69999999998 bp 3 -- 152634.69999999998 to 174296.8 bp 6 -- 174296.8 to 195958.9 bp 1 -- 195958.9 to 217621.0 bp
Links
Bin metagenomic contigs
This app completed without errors in 6h 39m 18s.
Links
Files
These are only available in the live Narrative: https://narrative.kbase.us/narrative/77705
  • metabat_result.zip - Files generated by MetaBAT2 App
Output from MetaBAT2 Contig Binning - v1.7
The viewer for the output created by this App is available at the original Narrative here: https://narrative.kbase.us/narrative/77705
Runs the CheckM lineage workflow to assess the genome quality of isolates, single cells, or genome bins from metagenome assemblies through comparison to an existing database of genomes.
This app completed without errors in 1h 22m 31s.
Links
Files
These are only available in the live Narrative: https://narrative.kbase.us/narrative/77705
  • CheckM_summary_table.tsv.zip - TSV Summary Table from CheckM
  • full_output.zip - Full output of CheckM
  • plots.zip - Output plots from CheckM
Import a FASTQ/SRA file into your Narrative as a Reads data object
This app completed without errors in 3h 2m 51s.
Objects
Created Object Name Type Description
9117.4.atrimmed.fq_reads PairedEndLibrary Imported Reads
Links
Assemble metagenomic reads using the MEGAHIT assembler.
This app completed without errors in 2d 3h 17m 26s.
Objects
Created Object Name Type Description
9117.4.atrimmed.MEGAHIT.assembly Assembly Assembled contigs
Summary
ContigSet saved to: jmwhitham:narrative_1605805694221/9117.4.atrimmed.MEGAHIT.assembly Assembled into 1066718 contigs. Avg Length: 1868.8495609898773 bp. Contig Length Distribution (# of contigs -- min to max basepairs): 1066034 -- 1000.0 to 29637.5 bp 530 -- 29637.5 to 58275.0 bp 93 -- 58275.0 to 86912.5 bp 31 -- 86912.5 to 115550.0 bp 16 -- 115550.0 to 144187.5 bp 6 -- 144187.5 to 172825.0 bp 4 -- 172825.0 to 201462.5 bp 3 -- 201462.5 to 230100.0 bp 0 -- 230100.0 to 258737.5 bp 1 -- 258737.5 to 287375.0 bp
Links
Bin metagenomic contigs
This app completed without errors in 10h 33m 9s.
Links
Files
These are only available in the live Narrative: https://narrative.kbase.us/narrative/77705
  • metabat_result.zip - Files generated by MetaBAT2 App
Output from MetaBAT2 Contig Binning - v1.7
The viewer for the output created by this App is available at the original Narrative here: https://narrative.kbase.us/narrative/77705
Runs the CheckM lineage workflow to assess the genome quality of isolates, single cells, or genome bins from metagenome assemblies through comparison to an existing database of genomes.
This app completed without errors in 1h 38m 28s.
Links
Files
These are only available in the live Narrative: https://narrative.kbase.us/narrative/77705
  • CheckM_summary_table.tsv.zip - TSV Summary Table from CheckM
  • full_output.zip - Full output of CheckM
  • plots.zip - Output plots from CheckM
Import a FASTQ/SRA file into your Narrative as a Reads data object
This app completed without errors in 20h 48m 41s.
Objects
Created Object Name Type Description
9117.4.aqbtrimmed.fq_reads PairedEndLibrary Imported Reads
Links
Assemble metagenomic reads using the MEGAHIT assembler.
This app completed without errors in 2d 2h 10m 50s.
Objects
Created Object Name Type Description
9117.4.aqbtrimmed.MEGAHIT.assembly Assembly Assembled contigs
Summary
ContigSet saved to: jmwhitham:narrative_1605805694221/9117.4.aqbtrimmed.MEGAHIT.assembly Assembled into 1044900 contigs. Avg Length: 1873.6855450282324 bp. Contig Length Distribution (# of contigs -- min to max basepairs): 1044366 -- 1000.0 to 33414.0 bp 402 -- 33414.0 to 65828.0 bp 90 -- 65828.0 to 98242.0 bp 24 -- 98242.0 to 130656.0 bp 10 -- 130656.0 to 163070.0 bp 4 -- 163070.0 to 195484.0 bp 3 -- 195484.0 to 227898.0 bp 0 -- 227898.0 to 260312.0 bp 0 -- 260312.0 to 292726.0 bp 1 -- 292726.0 to 325140.0 bp
Links
Bin metagenomic contigs
This app completed without errors in 10h 17m 6s.
Links
Files
These are only available in the live Narrative: https://narrative.kbase.us/narrative/77705
  • metabat_result.zip - Files generated by MetaBAT2 App
Output from MetaBAT2 Contig Binning - v1.7
The viewer for the output created by this App is available at the original Narrative here: https://narrative.kbase.us/narrative/77705
Runs the CheckM lineage workflow to assess the genome quality of isolates, single cells, or genome bins from metagenome assemblies through comparison to an existing database of genomes.
This app completed without errors in 1h 41m 26s.
Links
Files
These are only available in the live Narrative: https://narrative.kbase.us/narrative/77705
  • CheckM_summary_table.tsv.zip - TSV Summary Table from CheckM
  • full_output.zip - Full output of CheckM
  • plots.zip - Output plots from CheckM
Import a FASTQ/SRA file into your Narrative as a Reads data object
This app completed without errors in 1d 6h 18m 46s.
Objects
Created Object Name Type Description
9117.4.aqtrimmed.fq_reads PairedEndLibrary Imported Reads
Links
Assemble metagenomic reads using the MEGAHIT assembler.
This app completed without errors in 1d 21h 40m 4s.
Objects
Created Object Name Type Description
9117.4.aqtrimmed.MEGAHIT.assembly Assembly Assembled contigs
Summary
ContigSet saved to: jmwhitham:narrative_1605805694221/9117.4.aqtrimmed.MEGAHIT.assembly Assembled into 948039 contigs. Avg Length: 1869.1050188863537 bp. Contig Length Distribution (# of contigs -- min to max basepairs): 947070 -- 1000.0 to 24518.0 bp 758 -- 24518.0 to 48036.0 bp 128 -- 48036.0 to 71554.0 bp 47 -- 71554.0 to 95072.0 bp 15 -- 95072.0 to 118590.0 bp 12 -- 118590.0 to 142108.0 bp 3 -- 142108.0 to 165626.0 bp 2 -- 165626.0 to 189144.0 bp 3 -- 189144.0 to 212662.0 bp 1 -- 212662.0 to 236180.0 bp
Links
Bin metagenomic contigs
This app completed without errors in 9h 22m 9s.
Links
Files
These are only available in the live Narrative: https://narrative.kbase.us/narrative/77705
  • metabat_result.zip - Files generated by MetaBAT2 App
Output from MetaBAT2 Contig Binning - v1.7
The viewer for the output created by this App is available at the original Narrative here: https://narrative.kbase.us/narrative/77705
Runs the CheckM lineage workflow to assess the genome quality of isolates, single cells, or genome bins from metagenome assemblies through comparison to an existing database of genomes.
This app completed without errors in 1h 30m 18s.
Links
Files
These are only available in the live Narrative: https://narrative.kbase.us/narrative/77705
  • CheckM_summary_table.tsv.zip - TSV Summary Table from CheckM
  • full_output.zip - Full output of CheckM
  • plots.zip - Output plots from CheckM
Import a FASTQ/SRA file into your Narrative as a Reads data object
This app completed without errors in 20h 14m 13s.
Objects
Created Object Name Type Description
9117.4.qbtrimmed.fq_reads PairedEndLibrary Imported Reads
Links
Assemble metagenomic reads using the MEGAHIT assembler.
This app completed without errors in 1d 21h 6m 58s.
Objects
Created Object Name Type Description
9117.4.qbtrimmed.MEGAHIT.assembly Assembly Assembled contigs
Summary
ContigSet saved to: jmwhitham:narrative_1605805694221/9117.4.qbtrimmed.MEGAHIT.assembly Assembled into 895058 contigs. Avg Length: 1889.272976723296 bp. Contig Length Distribution (# of contigs -- min to max basepairs): 893798 -- 1000.0 to 21634.4 bp 987 -- 21634.4 to 42268.8 bp 169 -- 42268.8 to 62903.200000000004 bp 54 -- 62903.200000000004 to 83537.6 bp 22 -- 83537.6 to 104172.0 bp 10 -- 104172.0 to 124806.40000000001 bp 7 -- 124806.40000000001 to 145440.80000000002 bp 4 -- 145440.80000000002 to 166075.2 bp 5 -- 166075.2 to 186709.6 bp 2 -- 186709.6 to 207344.0 bp
Links
Bin metagenomic contigs
This app completed without errors in 9h 43m 59s.
Links
Files
These are only available in the live Narrative: https://narrative.kbase.us/narrative/77705
  • metabat_result.zip - Files generated by MetaBAT2 App
Output from MetaBAT2 Contig Binning - v1.7
The viewer for the output created by this App is available at the original Narrative here: https://narrative.kbase.us/narrative/77705
Runs the CheckM lineage workflow to assess the genome quality of isolates, single cells, or genome bins from metagenome assemblies through comparison to an existing database of genomes.
This app completed without errors in 1h 27m 59s.
Links
Files
These are only available in the live Narrative: https://narrative.kbase.us/narrative/77705
  • CheckM_summary_table.tsv.zip - TSV Summary Table from CheckM
  • full_output.zip - Full output of CheckM
  • plots.zip - Output plots from CheckM
Import a FASTQ/SRA file into your Narrative as a Reads data object
This app completed without errors in 1d 4h 50m 15s.
Objects
Created Object Name Type Description
9117.4.qtrimmed.fq_reads PairedEndLibrary Imported Reads
Links
Assemble metagenomic reads using the MEGAHIT assembler.
This app completed without errors in 1d 17h 27m 24s.
Objects
Created Object Name Type Description
9117.4.qtrimmed.MEGAHIT.assembly Assembly Assembled contigs
Summary
ContigSet saved to: jmwhitham:narrative_1605805694221/9117.4.qtrimmed.MEGAHIT.assembly Assembled into 803334 contigs. Avg Length: 1881.7324873589316 bp. Contig Length Distribution (# of contigs -- min to max basepairs): 801901 -- 1000.0 to 19023.6 bp 1092 -- 19023.6 to 37047.2 bp 200 -- 37047.2 to 55070.799999999996 bp 71 -- 55070.799999999996 to 73094.4 bp 36 -- 73094.4 to 91118.0 bp 14 -- 91118.0 to 109141.59999999999 bp 9 -- 109141.59999999999 to 127165.19999999998 bp 7 -- 127165.19999999998 to 145188.8 bp 1 -- 145188.8 to 163212.4 bp 3 -- 163212.4 to 181236.0 bp
Links
Bin metagenomic contigs
This app completed without errors in 9h 8m 32s.
Links
Files
These are only available in the live Narrative: https://narrative.kbase.us/narrative/77705
  • metabat_result.zip - Files generated by MetaBAT2 App
Output from MetaBAT2 Contig Binning - v1.7
The viewer for the output created by this App is available at the original Narrative here: https://narrative.kbase.us/narrative/77705
Runs the CheckM lineage workflow to assess the genome quality of isolates, single cells, or genome bins from metagenome assemblies through comparison to an existing database of genomes.
This app completed without errors in 1h 24m 51s.
Links
Files
These are only available in the live Narrative: https://narrative.kbase.us/narrative/77705
  • CheckM_summary_table.tsv.zip - TSV Summary Table from CheckM
  • full_output.zip - Full output of CheckM
  • plots.zip - Output plots from CheckM
Import a FASTQ/SRA file into your Narrative as a Reads data object
This app completed without errors in 1d 6h 26m 11s.
Objects
Created Object Name Type Description
9117.4.bb1.fq_reads PairedEndLibrary Imported Reads
Links
Assemble metagenomic reads using the MEGAHIT assembler.
This app completed without errors in 2d 3h 9m 20s.
Objects
Created Object Name Type Description
9117.4.bb1.MEGAHIT.assembly Assembly Assembled contigs
Summary
ContigSet saved to: jmwhitham:narrative_1605805694221/9117.4.bb1.MEGAHIT.assembly Assembled into 1066737 contigs. Avg Length: 1868.7236197863203 bp. Contig Length Distribution (# of contigs -- min to max basepairs): 1066049 -- 1000.0 to 29637.5 bp 534 -- 29637.5 to 58275.0 bp 95 -- 58275.0 to 86912.5 bp 29 -- 86912.5 to 115550.0 bp 17 -- 115550.0 to 144187.5 bp 6 -- 144187.5 to 172825.0 bp 4 -- 172825.0 to 201462.5 bp 2 -- 201462.5 to 230100.0 bp 0 -- 230100.0 to 258737.5 bp 1 -- 258737.5 to 287375.0 bp
Links
Bin metagenomic contigs
This app completed without errors in 7h 36m 12s.
Links
Files
These are only available in the live Narrative: https://narrative.kbase.us/narrative/77705
  • metabat_result.zip - Files generated by MetaBAT2 App
Output from MetaBAT2 Contig Binning - v1.7
The viewer for the output created by this App is available at the original Narrative here: https://narrative.kbase.us/narrative/77705
Runs the CheckM lineage workflow to assess the genome quality of isolates, single cells, or genome bins from metagenome assemblies through comparison to an existing database of genomes.
This app completed without errors in 1h 33m 33s.
Links
Files
These are only available in the live Narrative: https://narrative.kbase.us/narrative/77705
  • CheckM_summary_table.tsv.zip - TSV Summary Table from CheckM
  • full_output.zip - Full output of CheckM
  • plots.zip - Output plots from CheckM
Import a FASTQ/SRA file into your Narrative as a Reads data object
This app completed without errors in 9h 47m 25s.
Objects
Created Object Name Type Description
9117.4.bb2.fq_reads PairedEndLibrary Imported Reads
Links
Assemble metagenomic reads using the MEGAHIT assembler.
This app completed without errors in 1d 21h 59m 11s.
Objects
Created Object Name Type Description
9117.4.bb2.MEGAHIT.assembly Assembly Assembled contigs
Summary
ContigSet saved to: jmwhitham:narrative_1605805694221/9117.4.bb2.MEGAHIT.assembly Assembled into 947729 contigs. Avg Length: 1869.1429870775296 bp. Contig Length Distribution (# of contigs -- min to max basepairs): 946533 -- 1000.0 to 22031.4 bp 915 -- 22031.4 to 43062.8 bp 169 -- 43062.8 to 64094.200000000004 bp 63 -- 64094.200000000004 to 85125.6 bp 22 -- 85125.6 to 106157.0 bp 13 -- 106157.0 to 127188.40000000001 bp 6 -- 127188.40000000001 to 148219.80000000002 bp 5 -- 148219.80000000002 to 169251.2 bp 1 -- 169251.2 to 190282.6 bp 2 -- 190282.6 to 211314.0 bp
Links
Bin metagenomic contigs
This app completed without errors in 9h 7m 42s.
Links
Files
These are only available in the live Narrative: https://narrative.kbase.us/narrative/77705
  • metabat_result.zip - Files generated by MetaBAT2 App
Output from MetaBAT2 Contig Binning - v1.7
The viewer for the output created by this App is available at the original Narrative here: https://narrative.kbase.us/narrative/77705
Runs the CheckM lineage workflow to assess the genome quality of isolates, single cells, or genome bins from metagenome assemblies through comparison to an existing database of genomes.
This app completed without errors in 1h 24m 14s.
Links
Files
These are only available in the live Narrative: https://narrative.kbase.us/narrative/77705
  • CheckM_summary_table.tsv.zip - TSV Summary Table from CheckM
  • full_output.zip - Full output of CheckM
  • plots.zip - Output plots from CheckM
Import a FASTQ/SRA file into your Narrative as a Reads data object
This app completed without errors in 14h 32m 40s.
Objects
Created Object Name Type Description
9117.4.bb3.fq_reads PairedEndLibrary Imported Reads
Links
Assemble metagenomic reads using the MEGAHIT assembler.
This app completed without errors in 1d 16h 13m 16s.
Objects
Created Object Name Type Description
9117.4.bb3.MEGAHIT.assembly Assembly Assembled contigs
Summary
ContigSet saved to: jmwhitham:narrative_1605805694221/9117.4.bb3.MEGAHIT.assembly Assembled into 803225 contigs. Avg Length: 1881.6928370008404 bp. Contig Length Distribution (# of contigs -- min to max basepairs): 801793 -- 1000.0 to 19023.6 bp 1093 -- 19023.6 to 37047.2 bp 199 -- 37047.2 to 55070.799999999996 bp 70 -- 55070.799999999996 to 73094.4 bp 36 -- 73094.4 to 91118.0 bp 14 -- 91118.0 to 109141.59999999999 bp 9 -- 109141.59999999999 to 127165.19999999998 bp 7 -- 127165.19999999998 to 145188.8 bp 1 -- 145188.8 to 163212.4 bp 3 -- 163212.4 to 181236.0 bp
Links
Bin metagenomic contigs
This app completed without errors in 8h 0m 19s.
Links
Files
These are only available in the live Narrative: https://narrative.kbase.us/narrative/77705
  • metabat_result.zip - Files generated by MetaBAT2 App
Output from MetaBAT2 Contig Binning - v1.7
The viewer for the output created by this App is available at the original Narrative here: https://narrative.kbase.us/narrative/77705
Runs the CheckM lineage workflow to assess the genome quality of isolates, single cells, or genome bins from metagenome assemblies through comparison to an existing database of genomes.
This app completed without errors in 1h 15m 11s.
Links
Files
These are only available in the live Narrative: https://narrative.kbase.us/narrative/77705
  • CheckM_summary_table.tsv.zip - TSV Summary Table from CheckM
  • full_output.zip - Full output of CheckM
  • plots.zip - Output plots from CheckM
Import a FASTQ/SRA file into your Narrative as a Reads data object
This app completed without errors in 1d 3h 46m 32s.
Objects
Created Object Name Type Description
9117.4.bb4.fq_reads PairedEndLibrary Imported Reads
Links
Assemble metagenomic reads using the MEGAHIT assembler.
This app completed without errors in 2d 2h 53m 53s.
Objects
Created Object Name Type Description
9117.4.bb4.MEGAHIT.assembly Assembly Assembled contigs
Summary
ContigSet saved to: jmwhitham:narrative_1605805694221/9117.4.bb4.MEGAHIT.assembly Assembled into 1066633 contigs. Avg Length: 1868.4382810207446 bp. Contig Length Distribution (# of contigs -- min to max basepairs): 1066124 -- 1000.0 to 34322.5 bp 388 -- 34322.5 to 67645.0 bp 83 -- 67645.0 to 100967.5 bp 24 -- 100967.5 to 134290.0 bp 8 -- 134290.0 to 167612.5 bp 5 -- 167612.5 to 200935.0 bp 0 -- 200935.0 to 234257.5 bp 0 -- 234257.5 to 267580.0 bp 0 -- 267580.0 to 300902.5 bp 1 -- 300902.5 to 334225.0 bp
Links
Bin metagenomic contigs
This app completed without errors in 10h 2m 37s.
Links
Files
These are only available in the live Narrative: https://narrative.kbase.us/narrative/77705
  • metabat_result.zip - Files generated by MetaBAT2 App
Output from MetaBAT2 Contig Binning - v1.7
The viewer for the output created by this App is available at the original Narrative here: https://narrative.kbase.us/narrative/77705
Runs the CheckM lineage workflow to assess the genome quality of isolates, single cells, or genome bins from metagenome assemblies through comparison to an existing database of genomes.
This app completed without errors in 1h 41m 13s.
Links
Files
These are only available in the live Narrative: https://narrative.kbase.us/narrative/77705
  • CheckM_summary_table.tsv.zip - TSV Summary Table from CheckM
  • full_output.zip - Full output of CheckM
  • plots.zip - Output plots from CheckM
Import a FASTQ/SRA file into your Narrative as a Reads data object
This app completed without errors in 20h 48m 29s.
Objects
Created Object Name Type Description
9117.4.bb5.fq_reads PairedEndLibrary Imported Reads
Links
Assemble metagenomic reads using the MEGAHIT assembler.
This app completed without errors in 2d 2h 3m 35s.
Objects
Created Object Name Type Description
9117.4.bb5.MEGAHIT.assembly Assembly Assembled contigs
Summary
ContigSet saved to: jmwhitham:narrative_1605805694221/9117.4.bb5.MEGAHIT.assembly Assembled into 1044800 contigs. Avg Length: 1873.7243424578867 bp. Contig Length Distribution (# of contigs -- min to max basepairs): 1044269 -- 1000.0 to 33414.0 bp 397 -- 33414.0 to 65828.0 bp 91 -- 65828.0 to 98242.0 bp 24 -- 98242.0 to 130656.0 bp 11 -- 130656.0 to 163070.0 bp 5 -- 163070.0 to 195484.0 bp 2 -- 195484.0 to 227898.0 bp 0 -- 227898.0 to 260312.0 bp 0 -- 260312.0 to 292726.0 bp 1 -- 292726.0 to 325140.0 bp
Links
Bin metagenomic contigs
This app completed without errors in 10h 6m 15s.
Links
Files
These are only available in the live Narrative: https://narrative.kbase.us/narrative/77705
  • metabat_result.zip - Files generated by MetaBAT2 App
Output from MetaBAT2 Contig Binning - v1.7
The viewer for the output created by this App is available at the original Narrative here: https://narrative.kbase.us/narrative/77705
Runs the CheckM lineage workflow to assess the genome quality of isolates, single cells, or genome bins from metagenome assemblies through comparison to an existing database of genomes.
This app completed without errors in 1h 40m 26s.
Links
Files
These are only available in the live Narrative: https://narrative.kbase.us/narrative/77705
  • CheckM_summary_table.tsv.zip - TSV Summary Table from CheckM
  • full_output.zip - Full output of CheckM
  • plots.zip - Output plots from CheckM
Import a FASTQ/SRA file into your Narrative as a Reads data object
This app completed without errors in 20h 42m 52s.
Objects
Created Object Name Type Description
9117.4.bb6.fq_reads PairedEndLibrary Imported Reads
Links
Assemble metagenomic reads using the MEGAHIT assembler.
This app completed without errors in 1d 20h 25m 30s.
Objects
Created Object Name Type Description
9117.4.bb6.MEGAHIT.assembly Assembly Assembled contigs
Summary
ContigSet saved to: jmwhitham:narrative_1605805694221/9117.4.bb6.MEGAHIT.assembly Assembled into 894910 contigs. Avg Length: 1889.3738487669152 bp. Contig Length Distribution (# of contigs -- min to max basepairs): 893650 -- 1000.0 to 21634.4 bp 988 -- 21634.4 to 42268.8 bp 169 -- 42268.8 to 62903.200000000004 bp 53 -- 62903.200000000004 to 83537.6 bp 22 -- 83537.6 to 104172.0 bp 10 -- 104172.0 to 124806.40000000001 bp 7 -- 124806.40000000001 to 145440.80000000002 bp 4 -- 145440.80000000002 to 166075.2 bp 5 -- 166075.2 to 186709.6 bp 2 -- 186709.6 to 207344.0 bp
Links
Bin metagenomic contigs
This app completed without errors in 6h 23m 40s.
Links
Files
These are only available in the live Narrative: https://narrative.kbase.us/narrative/77705
  • metabat_result.zip - Files generated by MetaBAT2 App
Output from MetaBAT2 Contig Binning - v1.7
The viewer for the output created by this App is available at the original Narrative here: https://narrative.kbase.us/narrative/77705
Runs the CheckM lineage workflow to assess the genome quality of isolates, single cells, or genome bins from metagenome assemblies through comparison to an existing database of genomes.
This app completed without errors in 1h 29m 3s.
Links
Files
These are only available in the live Narrative: https://narrative.kbase.us/narrative/77705
  • CheckM_summary_table.tsv.zip - TSV Summary Table from CheckM
  • full_output.zip - Full output of CheckM
  • plots.zip - Output plots from CheckM

Trimming and decontamination removed as much as tens of millions of reads and tens of billions of bases from read files

Eighty-four trimmed and/or decontaminated fastq were generated from raw fastq files to evaluate the effects of these methods on assembly and binning metrics. The creation of these eighty-four read files involved sequentially force trimming, kmer trimming, quality trimming, decontaminating, or some combination of these steps based on recommendations posted in online bioinformatics forums. The number and percentage of reads and bases removed are provided in the Supplemental Files. When used, force trimming did not impact reads but removed about five percent of bases. Kmer trimming removed about four percent or fewer reads and as much as five percent of bases. Decontamination of raw fastq files removed between zero and seven percent of reads and bases; less than three percent for files that were force, quality, and/or kmer trimmed prior to decontamination. Quality trimming to Q10 removed about two to four percent of reads and about two to four percent of bases, while quality trimming to Q20 removed about eight to 15% of reads and about nine to 16% of bases. Generalizations could not be made about which steps had the greatest impact on reads or bases with the exception that quality trimming to Q20 consistently removed the most of each. Total reads removed by all combinations of steps tested ranged from about zero to 16%. Similarly, total bases removed were one to 22%. The greatest change in reads and bases was from about 399M to 334M and from about 60.3B to 46.7B, respectively.

In addition to the read files generated, the raw and JGI processed reads were included in the subsequent analyses, making a total of 96 read files. These ranged from 245M to 399M reads, a span of 154M reads, and from 34.4 to 60.3B bases, a span of 25.9B.

Total MAG counts correlated with bases and reads

MAGs are binned contigs assembled from metagenomic reads. Therefore, it is no surprise that we found MAG counts were correlated with input reads (p_raw = 0.040, 1.664 MAGs/tMreads, 95% CI [0.080, 3.249], adjusted Pearson's r = 0.382), and their base counts (p_raw = 0.041, 1.099 MAGs/Bbases, 95% CI [0.338, 1.392], adjusted Pearson's r = 0.382). Read and base counts were also correlated with medium (0.648 medium MAGs/tMreads, 95% CI [0.129, 1.166], adjusted Pearson's r = 0.455; 0.428 medium MAGs/Bbases, 95% CI [0.085, 0.772], adjusted Pearson's r = 0.455) and good MAGs from raw reads (p_raw = 0.004, 0.529 good MAGs/tMreads, 95% CI [0.187, 0.872], adjusted Pearson's r = 0.545; p_raw = 0.004, 0.350 good MAGs/Bbases, 95% CI [0.123, 0.577], adjusted Pearson's r = 0.544). We wanted to know though if reduction of reads and bases due to trimming and decontamination also reduced MAG counts.

Since trimmed and decontaminated reads were observations dependent upon the original raw files, we used mixed linear effects models to avoid violating the ordinary least squares model assumption that observations are independent [36]. We found that MAG counts were correlated with read and base counts of trimmed and decontaminated reads (p_trim_decon = 0.000, 2.095 MAGs/tMreads, 95% CI [1.435, 2.755]; p_trim_decon = 0.000, 1.320 MAGs/Bbases 95% CI [1.018, 1.622]). Read and base counts of trimmed and decontaminated reads were also correlated with medium MAGs (p_trim_decon = 0.000, 0.883 medium MAGs/tMreads, 95% CI [0.492, 1.275]; p_trim_decon = 0.000, 0.610 medium MAGs/Bbases, 95% CI [0.423, 0.797]) and good MAGs (p_trim_decon = 0.003, 0.399 good MAGs/tMreads, 95% CI [0.160, 0.638]; p_trim_decon = 0.000, 0.309 good MAGs/Bbases, 95% CI [0.196, 0.421]). No significant correlations were found with average MAG completeness or contamination and read or base counts for raw or trimmed and decontaminated reads.

#MAG counts were correlated with read counts of trimmed and decontaminated reads
#MAG counts were correlated with read counts of raw reads at alpha = 0.5

import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import statsmodels.api as sm
import statsmodels.formula.api as smf
from statsmodels.formula.api import ols

#data
Mix_Group = ['10158.6', '10158.6', '10158.6', '10158.6', '10158.6', '10158.6', '10158.6', '10158.6', '10158.6', '10158.6', '10158.6', '10158.6', '10158.6', '10158.6', '10158.6', '10158.6', '9117.8', '9117.8', '9117.8', '9117.8', '9117.8', '9117.8', '9117.8', '9117.8', '9117.8', '9117.8', '9117.8', '9117.8', '9117.8', '9117.8', '9117.8', '9117.8', '9108.2', '9108.2', '9108.2', '9108.2', '9108.2', '9108.2', '9108.2', '9108.2', '9108.2', '9108.2', '9108.2', '9108.2', '9108.2', '9108.2', '9108.2', '9108.2', '9117.7', '9117.7', '9117.7', '9117.7', '9117.7', '9117.7', '9117.7', '9117.7', '9117.7', '9117.7', '9117.7', '9117.7', '9117.7', '9117.7', '9117.7', '9117.7', '11306.3', '11306.3', '11306.3', '11306.3', '11306.3', '11306.3', '11306.3', '11306.3', '11306.3', '11306.3', '11306.3', '11306.3', '11306.3', '11306.3', '11306.3', '11306.3', '9117.4', '9117.4', '9117.4', '9117.4', '9117.4', '9117.4', '9117.4', '9117.4', '9117.4', '9117.4', '9117.4', '9117.4', '9117.4', '9117.4', '9117.4', '9117.4']
td_read_files = ['10158.6_raw', '10158.6_qc', '10158.6_trim150', '10158.6_ftrim', '10158.6_ktrim', '10158.6_atrim', '10158.6_aqbtrim', '10158.6_aqtrim', '10158.6_qbtrim', '10158.6_qtrim', '10158.6_bb1', '10158.6_bb2', '10158.6_bb3', '10158.6_bb4', '10158.6_bb5', '10158.6_bb6', '9117.8_raw', '9117.8_qc', '9117.8_trim150', '9117.8_ftrim', '9117.8_ktrim', '9117.8_atrim', '9117.8_aqbtrim', '9117.8_aqtrim', '9117.8_qbtrim', '9117.8_qtrim', '9117.8_bb1', '9117.8_bb2', '9117.8_bb3', '9117.8_bb4', '9117.8_bb5', '9117.8_bb6', '9108.2_raw', '9108.2_qc', '9108.2_trim150', '9108.2_ftrim', '9108.2_ktrim', '9108.2_atrim', '9108.2_aqbtrim', '9108.2_aqtrim', '9108.2_qbtrim', '9108.2_qtrim', '9108.2_bb1', '9108.2_bb2', '9108.2_bb3', '9108.2_bb4', '9108.2_bb5', '9108.2_bb6', '9117.7_raw', '9117.7_qc', '9117.7_trim150', '9117.7_ftrimmed', '9117.7_ktrimmed', '9117.7_atrimmed', '9117.7_aqbtrimmed', '9117.7_aqtrimmed', '9117.7_qbtrimmed', '9117.7_qtrimmed', '9117.7_bb1', '9117.7_bb2', '9117.7_bb3', '9117.7_bb4', '9117.7_bb5', '9117.7_bb6', '11306.3_raw', '11306.3_qc', '11306.3_trim150', '11306.3_ftrimmed', '11306.3_ktrimmed', '11306.3_atrimmed', '11306.3_aqbtrimmed', '11306.3_aqtrimmed', '11306.3_qbtrimmed', '11306.3_qtrimmed', '11306.3_bb1', '11306.3_bb2', '11306.3_bb3', '11306.3_bb4', '11306.3_bb5', '11306.3_bb6', '9117.4_raw', '9117.4_qc', '9117.4_trim150', '9117.4_ftrimmed', '9117.4_ktrimmed', '9117.4_atrimmed', '9117.4_aqbtrimmed', '9117.4_aqtrimmed', '9117.4_qbtrimmed', '9117.4_qtrimmed', '9117.4_bb1', '9117.4_bb2', '9117.4_bb3', '9117.4_bb4', '9117.4_bb5', '9117.4_bb6']
td_tMreads = [36.0129894, 35.2337896, 36.0129894, 36.0129894, 35.8983284, 35.933862, 34.552143, 31.2682706, 34.3282984, 30.6449696, 35.8442254, 31.2345964, 30.615651, 35.3058416, 34.4731868, 34.2527702, 28.2058246, 26.5787996, 28.2058246, 28.2058246, 27.66738, 27.6666568, 27.2469874, 25.5238858, 27.2469874, 25.2340502, 26.886397, 24.8035162, 24.519662, 26.7855648, 26.4733418, 26.3827484, 39.9148934, 37.3791976, 39.9148934, 39.9148934, 38.7122998, 38.708094, 37.7905962, 34.6650054, 37.6107554, 34.1456456, 37.8858836, 33.921016, 33.4111558, 37.7734844, 36.9821596, 36.8056446, 33.4711394, 32.6004118, 33.4711394, 33.4711394, 33.04149, 33.0383456, 30.3924568, 32.5038064, 30.0542808, 32.3956334, 32.9696938, 30.3601654, 30.025172, 32.8646394, 32.4398904, 32.3335082, 31.8428354, 31.3942576, 31.8428354, 31.8428354, 31.637986, 31.6358996, 29.2361886, 31.0176556, 28.9929926, 30.9364282, 31.6358006, 29.2361776, 28.9929822, 31.5146638, 31.0175998, 30.9363772, 34.9441596, 32.3312534, 34.9441596, 34.9441596, 33.4723724, 33.4683444, 30.8153706, 32.9387816, 30.4748512, 32.8289102, 32.6738296, 30.0963168, 29.7659662, 32.561628, 32.1535464, 32.0471204]
td_Bbases = [54.379613994, 52.641056249, 54.0194841, 51.498574842, 50.780358538, 53.669962132, 51.549471661, 45.964065176, 48.624647571, 42.845175292, 53.535374086, 45.919226343, 42.80751312, 53.311820816, 51.43795566, 48.522733084, 42.590795146, 40.133987396, 42.3087369, 40.334329178, 39.44260618, 41.667343504, 40.776782835, 37.725547193, 40.776782835, 35.427126679, 40.490156646, 36.660246851, 34.422663647, 40.446202848, 39.621053097, 37.463971015, 60.271489034, 56.442588376, 59.8723401, 57.078297562, 55.202040634, 58.310667004, 56.435790641, 51.019814812, 53.310062558, 47.762707137, 57.069909524, 49.92030233, 46.728770205, 57.037961444, 55.229109813, 52.166971365, 50.541420494, 49.226621818, 50.2067091, 47.863729342, 47.116772654, 49.7656626, 44.974095956, 48.667868199, 42.241274113, 46.022382453, 49.6630349, 44.930324539, 42.203344352, 49.625605494, 48.576089344, 45.937352987, 48.082681454, 47.024019995, 47.7642531, 45.535254622, 45.103175218, 47.640851402, 43.590661125, 46.584185484, 41.025529046, 44.063678425, 47.640703902, 43.590646205, 41.025515524, 47.587142338, 46.584108342, 44.063610873, 52.765680996, 48.820192634, 52.4162394, 49.970148228, 47.71891569, 50.406042552, 45.62052235, 49.326011714, 42.846059312, 46.640424087, 49.207539094, 44.559568777, 41.852323333, 49.16805828, 48.15227162, 45.531321684]
td_bins = [78, 85, 82, 83, 85, 83, 82, 78, 79, 63, 90, 72, 67, 78, 85, 83, 65, 62, 64, 52, 55, 59, 59, 56, 54, 50, 62, 55, 50, 60, 61, 53, 69, 76, 74, 75, 75, 71, 73, 72, 68, 65, 74, 71, 66, 74, 74, 65, 95, 103, 107, 96, 97, 103, 88, 98, 79, 81, 104, 90, 76, 101, 101, 90, 99, 100, 96, 97, 94, 97, 96, 97, 95, 82, 101, 96, 80, 97, 96, 91, 70, 68, 70, 65, 68, 69, 64, 61, 65, 64, 77, 62, 64, 78, 65, 62]
td_Mean_Completeness = [53.86, 53.28, 56.31, 57.37, 54.68, 50.1, 58.05, 54.04, 51.16, 54.76, 51.12, 52.23, 57.52, 52.15, 58.83, 54.29, 55.06, 54.27, 53.37, 53.16, 53.77, 57.81, 54.55, 54.88, 54.17, 52.29, 54.94, 52.04, 51.41, 58.36, 54.69, 56.81, 56.66, 53.72, 57.38, 56.22, 52.7, 52.65, 55.11, 54.88, 57.48, 54.64, 53.6, 54.34, 58.18, 52.11, 56.26, 58.28, 55.64, 57.45, 58.55, 54.54, 57.45, 58.65, 55.18, 55.5, 58.25, 60.02, 55.82, 58.17, 60.34, 56.8, 57.0, 59.14, 54.4, 51.04, 58.28, 56.73, 48.21, 51.68, 52.66, 56.77, 55.24, 57.44, 58.56, 57.33, 54.41, 56.39, 53.78, 58.04, 57.17, 59.53, 56.07, 51.82, 58.95, 60.91, 56.23, 56.59, 60.48, 59.56, 56.87, 61.29, 54.39, 60.71, 53.56, 53.89]
td_Mean_Contamination = [64.8, 56.57, 63.75, 56.14, 63.41, 65.77, 68.71, 59.0, 51.31, 53.39, 53.35, 56.24, 59.82, 71.72, 69.63, 66.78, 53.25, 46.8, 58.27, 54.69, 48.23, 50.47, 57.38, 53.78, 50.02, 45.22, 47.86, 47.19, 54.22, 48.53, 54.71, 53.82, 83.63, 71.19, 88.26, 89.13, 73.4, 65.59, 87.02, 85.4, 79.15, 67.55, 71.37, 72.84, 82.49, 64.29, 92.78, 85.56, 107.95, 99.99, 99.71, 85.91, 99.47, 78.31, 82.71, 107.35, 97.95, 92.98, 87.49, 93.41, 102.6, 77.96, 97.57, 99.91, 83.7, 63.09, 68.23, 69.77, 73.61, 78.21, 70.56, 69.82, 69.56, 64.5, 63.16, 82.06, 72.7, 69.26, 75.71, 62.79, 62.01, 54.02, 69.1, 58.79, 55.27, 51.99, 60.61, 57.98, 59.85, 65.24, 56.41, 58.49, 58.44, 53.79, 57.06, 54.33]
td_good_bins = [21, 19, 19, 14, 15, 20, 20, 17, 17, 12, 19, 16, 12, 22, 20, 17, 16, 16, 17, 17, 15, 17, 19, 15, 15, 14, 16, 16, 15, 18, 19, 16, 18, 17, 17, 16, 16, 16, 17, 17, 14, 15, 17, 15, 15, 19, 15, 17, 22, 21, 18, 19, 21, 22, 17, 21, 21, 16, 23, 16, 17, 24, 20, 21, 19, 21, 21, 15, 18, 19, 20, 18, 15, 17, 19, 18, 18, 18, 19, 16, 17, 17, 19, 18, 18, 17, 19, 17, 19, 18, 18, 16, 18, 21, 20, 17]
td_good_Mean_Completeness = [85.98, 87.17, 86.9, 87.66, 86.64, 86.04, 85.18, 86.79, 86.35, 86.03, 88.47, 85.58, 89.46, 86.17, 83.85, 86.86, 87.87, 87.38, 87.61, 86.94, 87.35, 86.96, 88.23, 88.04, 88.62, 90.16, 88.26, 89.11, 86.1, 88.26, 87.55, 87.21, 87.51, 86.6, 87.87, 87.83, 86.62, 87.87, 87.12, 87.7, 87.06, 87.94, 87.37, 85.69, 87.28, 85.92, 88.11, 86.67, 87.68, 87.33, 87.89, 88.48, 89.21, 88.4, 86.1, 86.69, 87.99, 88.53, 89.18, 87.33, 86.83, 88.58, 87.12, 87.34, 88.54, 86.69, 87.03, 86.06, 88.99, 86.81, 86.17, 86.12, 87.78, 85.64, 86.41, 87.08, 85.98, 88.56, 87.42, 87.18, 85.99, 87.07, 86.97, 86.86, 86.75, 89.43, 86.51, 86.19, 86.17, 85.87, 86.68, 87.63, 86.26, 89.38, 87.22, 87.03]
td_good_Mean_Contamination = [5.06, 4.78, 4.28, 4.58, 3.83, 4.25, 4.66, 4.68, 4.57, 4.01, 4.01, 4.34, 4.43, 4.96, 4.23, 5.17, 3.79, 3.63, 3.76, 3.61, 3.99, 3.66, 4.04, 3.17, 3.67, 3.37, 4.3, 3.39, 3.89, 3.63, 4.05, 4.51, 3.87, 3.93, 3.27, 3.48, 4.16, 4.88, 3.89, 3.16, 4.65, 4.07, 4.41, 4.22, 4.07, 3.78, 3.85, 4.16, 3.92, 3.06, 4.2, 4.41, 3.21, 3.81, 4.31, 3.64, 3.84, 4.09, 3.84, 3.64, 3.48, 3.71, 3.4, 4.06, 3.76, 4.32, 4.61, 4.74, 3.56, 4.54, 4.1, 4.05, 4.26, 4.31, 4.84, 4.06, 3.88, 4.59, 4.63, 3.76, 3.57, 3.88, 3.56, 3.5, 3.68, 4.71, 3.46, 3.84, 3.94, 3.61, 3.5, 3.36, 3.4, 3.57, 3.91, 4.21]
r_read_files = ['9117.5_raw', '10158.8_raw', '11263.1_raw', '11306.3_raw', '11306.1_raw', '11260.6_raw', '11260.5_raw', '9108.1_raw', '9053.2_raw', '9672.8_raw', '9108.2_raw', '9053.4_raw', '9053.3_raw', '9117.4_raw', '9117.6_raw', '9117.7_raw', '9117.8_raw', '10158.6_raw', '10186.3_raw', '10186.4_raw', '7331.1_raw', '9053.5_raw', '9041.8_raw']
r_tMreads = [36.0129894, 17.6218972, 38.2800142, 34.9076424, 35.3037194, 37.1504476, 40.3613864, 20.7773948, 31.8428354, 30.1166938, 27.718318, 40.8492618, 39.7169858, 34.4581152, 26.9696492, 21.3309852, 39.9148934, 34.9441596, 35.690255, 35.5019026, 33.4711394, 28.2058246, 36.96984]
r_Bbases = [54.379613994, 26.609064772, 57.802821442, 52.710540024, 53.308616294, 56.097175876, 60.945693464, 31.373866148, 48.082681454, 45.1750407, 41.85466018, 61.682385318, 59.972648558, 52.031753952, 40.724170292, 32.209787652, 60.271489034, 52.765680996, 53.89228505, 53.607872926, 50.541420494, 42.590795146, 55.8244584]
r_bins = [65, 47, 139, 99, 55, 90, 115, 38, 86, 87, 69, 71, 95, 70, 62, 95, 65, 78, 85, 109, 45, 49, 52]
r_Mean_Completeness = [45.96, 58.83, 51.28, 56.54, 56.55, 63.26, 52.23, 58.47, 54.69, 53.7, 58.18, 60.32, 62.41, 65.52, 52.14, 50.97, 56.26, 57.0, 65.39, 54.89, 53.78, 53.56, 52.81]
r_Mean_Contamination = [24.03, 69.63, 67.13, 60.4, 53.21, 81.43, 55.06, 35.14, 54.71, 74.62, 54.7, 76.81, 66.86, 78.94, 67.85, 46.5, 92.78, 97.57, 121.14, 103.47, 75.71, 57.06, 65.71]
r_good_bins = [15, 4, 23, 19, 14, 25, 29, 9, 20, 20, 18, 18, 23, 17, 10, 22, 16, 21, 18, 20, 9, 9, 14]
r_good_Mean_Completeness = [84.79, 83.85, 89.19, 87.14, 90.06, 87.41, 90.7, 86.6, 87.55, 85.4, 87.31, 86.68, 86.18, 84.94, 87.02, 88.92, 88.11, 87.12, 87.24, 89.3, 87.42, 87.22, 86.64]
r_good_Mean_Contamination = [3.37, 4.23, 3.57, 4.4, 3.62, 4.4, 4.56, 3.0, 4.05, 4.32, 4.53, 3.9, 4.48, 4.17, 2.86, 2.87, 3.85, 3.4, 3.63, 4.37, 4.63, 3.91, 4.25]

#create dataset
df = pd.DataFrame({'td_tMreads': td_tMreads,
                   'td_Bbases': td_Bbases,
                   'td_bins': td_bins,
                   'td_Mean_Completeness': td_Mean_Completeness,
                   'td_Mean_Contamination': td_Mean_Contamination,
                   'td_good_bins': td_good_bins,
                   'td_good_Mean_Completeness': td_good_Mean_Completeness,
                   'td_good_Mean_Contamination': td_good_Mean_Contamination,
                   'Mix_Group': Mix_Group})
df.rename(columns={'td_tMreads': 'tMreads', 'td_Bbases': 'Bbases'}, inplace=True)

df2 = pd.DataFrame({'r_tMreads': r_tMreads,
                   'r_Bbases': r_Bbases,
                   'r_bins': r_bins,
                   'r_Mean_Completeness': r_Mean_Completeness,
                   'r_Mean_Contamination': r_Mean_Contamination,
                   'r_good_bins': r_good_bins,
                   'r_good_Mean_Completeness': r_good_Mean_Completeness,
                   'r_good_Mean_Contamination': r_good_Mean_Contamination})
df2.rename(columns={'r_tMreads': 'tMreads', 'r_Bbases': 'Bbases'}, inplace=True)

#view dataset
#print(df)

#fit regression model
model = smf.mixedlm("td_bins ~ tMreads", data=df, groups=df["Mix_Group"])
modelf = model.fit()
model1 = ols('td_bins ~ tMreads', data=df).fit()
model2 = ols('r_bins ~ tMreads', data=df2).fit()

#adj r^2 = Pearson product-moment correlation coefficient (r) adjusted for number of predictors 
#... r = sqrt(0.146) 
#adjusted Pearson's r = 0.382

#mdf = md.fit()
#print(mdf.summary())

#view model summary
print(modelf.summary())
print(model1.summary())
print(model2.summary())

#define figure size
fig = plt.figure(figsize=(12,8))
fig2 = plt.figure(figsize=(12,8))

#produce regression plots
fig = sm.graphics.plot_regress_exog(model1, 'tMreads', fig=fig)
fig2 = sm.graphics.plot_regress_exog(model2, 'tMreads', fig=fig)

#post estimation
model3 = smf.mixedlm("td_bins ~ tMreads", data=df, groups=df["Mix_Group"])
modelf3 = model.fit(reml=False)
print(modelf3.summary())
         Mixed Linear Model Regression Results
=======================================================
Model:            MixedLM Dependent Variable: td_bins  
No. Observations: 96      Method:             REML     
No. Groups:       6       Scale:              27.4833  
Min. group size:  16      Likelihood:         -306.4298
Max. group size:  16      Converged:          Yes      
Mean group size:  16.0                                 
-------------------------------------------------------
             Coef.  Std.Err.   z   P>|z|  [0.025 0.975]
-------------------------------------------------------
Intercept     9.712   12.537 0.775 0.439 -14.860 34.284
tMreads       2.095    0.337 6.222 0.000   1.435  2.755
Group Var   231.903   29.013                           
=======================================================

                            OLS Regression Results                            
==============================================================================
Dep. Variable:                td_bins   R-squared:                       0.077
Model:                            OLS   Adj. R-squared:                  0.068
Method:                 Least Squares   F-statistic:                     7.888
Date:                Sun, 28 Mar 2021   Prob (F-statistic):            0.00605
Time:                        22:25:31   Log-Likelihood:                -392.51
No. Observations:                  96   AIC:                             789.0
Df Residuals:                      94   BIC:                             794.1
Df Model:                           1                                         
Covariance Type:            nonrobust                                         
==============================================================================
                 coef    std err          t      P>|t|      [0.025      0.975]
------------------------------------------------------------------------------
Intercept     39.3912     13.602      2.896      0.005      12.384      66.399
tMreads        1.1758      0.419      2.809      0.006       0.345       2.007
==============================================================================
Omnibus:                       35.897   Durbin-Watson:                   0.385
Prob(Omnibus):                  0.000   Jarque-Bera (JB):                9.857
Skew:                           0.506   Prob(JB):                      0.00724
Kurtosis:                       1.800   Cond. No.                         297.
==============================================================================

Warnings:
[1] Standard Errors assume that the covariance matrix of the errors is correctly specified.
                            OLS Regression Results                            
==============================================================================
Dep. Variable:                 r_bins   R-squared:                       0.185
Model:                            OLS   Adj. R-squared:                  0.146
Method:                 Least Squares   F-statistic:                     4.771
Date:                Sun, 28 Mar 2021   Prob (F-statistic):             0.0404
Time:                        22:25:31   Log-Likelihood:                -103.89
No. Observations:                  23   AIC:                             211.8
Df Residuals:                      21   BIC:                             214.1
Df Model:                           1                                         
Covariance Type:            nonrobust                                         
==============================================================================
                 coef    std err          t      P>|t|      [0.025      0.975]
------------------------------------------------------------------------------
Intercept     21.9212     25.577      0.857      0.401     -31.269      75.111
tMreads        1.6644      0.762      2.184      0.040       0.080       3.249
==============================================================================
Omnibus:                        1.524   Durbin-Watson:                   2.039
Prob(Omnibus):                  0.467   Jarque-Bera (JB):                1.220
Skew:                           0.536   Prob(JB):                        0.543
Kurtosis:                       2.652   Cond. No.                         178.
==============================================================================

Warnings:
[1] Standard Errors assume that the covariance matrix of the errors is correctly specified.
         Mixed Linear Model Regression Results
=======================================================
Model:            MixedLM Dependent Variable: td_bins  
No. Observations: 96      Method:             ML       
No. Groups:       6       Scale:              27.1879  
Min. group size:  16      Likelihood:         -308.9553
Max. group size:  16      Converged:          Yes      
Mean group size:  16.0                                 
-------------------------------------------------------
             Coef.  Std.Err.   z   P>|z|  [0.025 0.975]
-------------------------------------------------------
Intercept     9.925   12.192 0.814 0.416 -13.970 33.820
tMreads       2.088    0.334 6.251 0.000   1.433  2.743
Group Var   191.725   22.176                           
=======================================================

<Figure size 864x576 with 0 Axes>
#MAG counts were correlated with base counts of trimmed and decontaminated reads
#MAG counts were correlated with base counts of raw reads at alpha = 0.5

import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import statsmodels.api as sm
import statsmodels.formula.api as smf
from statsmodels.formula.api import ols

#data
Mix_Group = ['10158.6', '10158.6', '10158.6', '10158.6', '10158.6', '10158.6', '10158.6', '10158.6', '10158.6', '10158.6', '10158.6', '10158.6', '10158.6', '10158.6', '10158.6', '10158.6', '9117.8', '9117.8', '9117.8', '9117.8', '9117.8', '9117.8', '9117.8', '9117.8', '9117.8', '9117.8', '9117.8', '9117.8', '9117.8', '9117.8', '9117.8', '9117.8', '9108.2', '9108.2', '9108.2', '9108.2', '9108.2', '9108.2', '9108.2', '9108.2', '9108.2', '9108.2', '9108.2', '9108.2', '9108.2', '9108.2', '9108.2', '9108.2', '9117.7', '9117.7', '9117.7', '9117.7', '9117.7', '9117.7', '9117.7', '9117.7', '9117.7', '9117.7', '9117.7', '9117.7', '9117.7', '9117.7', '9117.7', '9117.7', '11306.3', '11306.3', '11306.3', '11306.3', '11306.3', '11306.3', '11306.3', '11306.3', '11306.3', '11306.3', '11306.3', '11306.3', '11306.3', '11306.3', '11306.3', '11306.3', '9117.4', '9117.4', '9117.4', '9117.4', '9117.4', '9117.4', '9117.4', '9117.4', '9117.4', '9117.4', '9117.4', '9117.4', '9117.4', '9117.4', '9117.4', '9117.4']
td_read_files = ['10158.6_raw', '10158.6_qc', '10158.6_trim150', '10158.6_ftrim', '10158.6_ktrim', '10158.6_atrim', '10158.6_aqbtrim', '10158.6_aqtrim', '10158.6_qbtrim', '10158.6_qtrim', '10158.6_bb1', '10158.6_bb2', '10158.6_bb3', '10158.6_bb4', '10158.6_bb5', '10158.6_bb6', '9117.8_raw', '9117.8_qc', '9117.8_trim150', '9117.8_ftrim', '9117.8_ktrim', '9117.8_atrim', '9117.8_aqbtrim', '9117.8_aqtrim', '9117.8_qbtrim', '9117.8_qtrim', '9117.8_bb1', '9117.8_bb2', '9117.8_bb3', '9117.8_bb4', '9117.8_bb5', '9117.8_bb6', '9108.2_raw', '9108.2_qc', '9108.2_trim150', '9108.2_ftrim', '9108.2_ktrim', '9108.2_atrim', '9108.2_aqbtrim', '9108.2_aqtrim', '9108.2_qbtrim', '9108.2_qtrim', '9108.2_bb1', '9108.2_bb2', '9108.2_bb3', '9108.2_bb4', '9108.2_bb5', '9108.2_bb6', '9117.7_raw', '9117.7_qc', '9117.7_trim150', '9117.7_ftrimmed', '9117.7_ktrimmed', '9117.7_atrimmed', '9117.7_aqbtrimmed', '9117.7_aqtrimmed', '9117.7_qbtrimmed', '9117.7_qtrimmed', '9117.7_bb1', '9117.7_bb2', '9117.7_bb3', '9117.7_bb4', '9117.7_bb5', '9117.7_bb6', '11306.3_raw', '11306.3_qc', '11306.3_trim150', '11306.3_ftrimmed', '11306.3_ktrimmed', '11306.3_atrimmed', '11306.3_aqbtrimmed', '11306.3_aqtrimmed', '11306.3_qbtrimmed', '11306.3_qtrimmed', '11306.3_bb1', '11306.3_bb2', '11306.3_bb3', '11306.3_bb4', '11306.3_bb5', '11306.3_bb6', '9117.4_raw', '9117.4_qc', '9117.4_trim150', '9117.4_ftrimmed', '9117.4_ktrimmed', '9117.4_atrimmed', '9117.4_aqbtrimmed', '9117.4_aqtrimmed', '9117.4_qbtrimmed', '9117.4_qtrimmed', '9117.4_bb1', '9117.4_bb2', '9117.4_bb3', '9117.4_bb4', '9117.4_bb5', '9117.4_bb6']
td_tMreads = [36.0129894, 35.2337896, 36.0129894, 36.0129894, 35.8983284, 35.933862, 34.552143, 31.2682706, 34.3282984, 30.6449696, 35.8442254, 31.2345964, 30.615651, 35.3058416, 34.4731868, 34.2527702, 28.2058246, 26.5787996, 28.2058246, 28.2058246, 27.66738, 27.6666568, 27.2469874, 25.5238858, 27.2469874, 25.2340502, 26.886397, 24.8035162, 24.519662, 26.7855648, 26.4733418, 26.3827484, 39.9148934, 37.3791976, 39.9148934, 39.9148934, 38.7122998, 38.708094, 37.7905962, 34.6650054, 37.6107554, 34.1456456, 37.8858836, 33.921016, 33.4111558, 37.7734844, 36.9821596, 36.8056446, 33.4711394, 32.6004118, 33.4711394, 33.4711394, 33.04149, 33.0383456, 30.3924568, 32.5038064, 30.0542808, 32.3956334, 32.9696938, 30.3601654, 30.025172, 32.8646394, 32.4398904, 32.3335082, 31.8428354, 31.3942576, 31.8428354, 31.8428354, 31.637986, 31.6358996, 29.2361886, 31.0176556, 28.9929926, 30.9364282, 31.6358006, 29.2361776, 28.9929822, 31.5146638, 31.0175998, 30.9363772, 34.9441596, 32.3312534, 34.9441596, 34.9441596, 33.4723724, 33.4683444, 30.8153706, 32.9387816, 30.4748512, 32.8289102, 32.6738296, 30.0963168, 29.7659662, 32.561628, 32.1535464, 32.0471204]
td_Bbases = [54.379613994, 52.641056249, 54.0194841, 51.498574842, 50.780358538, 53.669962132, 51.549471661, 45.964065176, 48.624647571, 42.845175292, 53.535374086, 45.919226343, 42.80751312, 53.311820816, 51.43795566, 48.522733084, 42.590795146, 40.133987396, 42.3087369, 40.334329178, 39.44260618, 41.667343504, 40.776782835, 37.725547193, 40.776782835, 35.427126679, 40.490156646, 36.660246851, 34.422663647, 40.446202848, 39.621053097, 37.463971015, 60.271489034, 56.442588376, 59.8723401, 57.078297562, 55.202040634, 58.310667004, 56.435790641, 51.019814812, 53.310062558, 47.762707137, 57.069909524, 49.92030233, 46.728770205, 57.037961444, 55.229109813, 52.166971365, 50.541420494, 49.226621818, 50.2067091, 47.863729342, 47.116772654, 49.7656626, 44.974095956, 48.667868199, 42.241274113, 46.022382453, 49.6630349, 44.930324539, 42.203344352, 49.625605494, 48.576089344, 45.937352987, 48.082681454, 47.024019995, 47.7642531, 45.535254622, 45.103175218, 47.640851402, 43.590661125, 46.584185484, 41.025529046, 44.063678425, 47.640703902, 43.590646205, 41.025515524, 47.587142338, 46.584108342, 44.063610873, 52.765680996, 48.820192634, 52.4162394, 49.970148228, 47.71891569, 50.406042552, 45.62052235, 49.326011714, 42.846059312, 46.640424087, 49.207539094, 44.559568777, 41.852323333, 49.16805828, 48.15227162, 45.531321684]
td_bins = [78, 85, 82, 83, 85, 83, 82, 78, 79, 63, 90, 72, 67, 78, 85, 83, 65, 62, 64, 52, 55, 59, 59, 56, 54, 50, 62, 55, 50, 60, 61, 53, 69, 76, 74, 75, 75, 71, 73, 72, 68, 65, 74, 71, 66, 74, 74, 65, 95, 103, 107, 96, 97, 103, 88, 98, 79, 81, 104, 90, 76, 101, 101, 90, 99, 100, 96, 97, 94, 97, 96, 97, 95, 82, 101, 96, 80, 97, 96, 91, 70, 68, 70, 65, 68, 69, 64, 61, 65, 64, 77, 62, 64, 78, 65, 62]
td_Mean_Completeness = [53.86, 53.28, 56.31, 57.37, 54.68, 50.1, 58.05, 54.04, 51.16, 54.76, 51.12, 52.23, 57.52, 52.15, 58.83, 54.29, 55.06, 54.27, 53.37, 53.16, 53.77, 57.81, 54.55, 54.88, 54.17, 52.29, 54.94, 52.04, 51.41, 58.36, 54.69, 56.81, 56.66, 53.72, 57.38, 56.22, 52.7, 52.65, 55.11, 54.88, 57.48, 54.64, 53.6, 54.34, 58.18, 52.11, 56.26, 58.28, 55.64, 57.45, 58.55, 54.54, 57.45, 58.65, 55.18, 55.5, 58.25, 60.02, 55.82, 58.17, 60.34, 56.8, 57.0, 59.14, 54.4, 51.04, 58.28, 56.73, 48.21, 51.68, 52.66, 56.77, 55.24, 57.44, 58.56, 57.33, 54.41, 56.39, 53.78, 58.04, 57.17, 59.53, 56.07, 51.82, 58.95, 60.91, 56.23, 56.59, 60.48, 59.56, 56.87, 61.29, 54.39, 60.71, 53.56, 53.89]
td_Mean_Contamination = [64.8, 56.57, 63.75, 56.14, 63.41, 65.77, 68.71, 59.0, 51.31, 53.39, 53.35, 56.24, 59.82, 71.72, 69.63, 66.78, 53.25, 46.8, 58.27, 54.69, 48.23, 50.47, 57.38, 53.78, 50.02, 45.22, 47.86, 47.19, 54.22, 48.53, 54.71, 53.82, 83.63, 71.19, 88.26, 89.13, 73.4, 65.59, 87.02, 85.4, 79.15, 67.55, 71.37, 72.84, 82.49, 64.29, 92.78, 85.56, 107.95, 99.99, 99.71, 85.91, 99.47, 78.31, 82.71, 107.35, 97.95, 92.98, 87.49, 93.41, 102.6, 77.96, 97.57, 99.91, 83.7, 63.09, 68.23, 69.77, 73.61, 78.21, 70.56, 69.82, 69.56, 64.5, 63.16, 82.06, 72.7, 69.26, 75.71, 62.79, 62.01, 54.02, 69.1, 58.79, 55.27, 51.99, 60.61, 57.98, 59.85, 65.24, 56.41, 58.49, 58.44, 53.79, 57.06, 54.33]
td_good_bins = [21, 19, 19, 14, 15, 20, 20, 17, 17, 12, 19, 16, 12, 22, 20, 17, 16, 16, 17, 17, 15, 17, 19, 15, 15, 14, 16, 16, 15, 18, 19, 16, 18, 17, 17, 16, 16, 16, 17, 17, 14, 15, 17, 15, 15, 19, 15, 17, 22, 21, 18, 19, 21, 22, 17, 21, 21, 16, 23, 16, 17, 24, 20, 21, 19, 21, 21, 15, 18, 19, 20, 18, 15, 17, 19, 18, 18, 18, 19, 16, 17, 17, 19, 18, 18, 17, 19, 17, 19, 18, 18, 16, 18, 21, 20, 17]
td_good_Mean_Completeness = [85.98, 87.17, 86.9, 87.66, 86.64, 86.04, 85.18, 86.79, 86.35, 86.03, 88.47, 85.58, 89.46, 86.17, 83.85, 86.86, 87.87, 87.38, 87.61, 86.94, 87.35, 86.96, 88.23, 88.04, 88.62, 90.16, 88.26, 89.11, 86.1, 88.26, 87.55, 87.21, 87.51, 86.6, 87.87, 87.83, 86.62, 87.87, 87.12, 87.7, 87.06, 87.94, 87.37, 85.69, 87.28, 85.92, 88.11, 86.67, 87.68, 87.33, 87.89, 88.48, 89.21, 88.4, 86.1, 86.69, 87.99, 88.53, 89.18, 87.33, 86.83, 88.58, 87.12, 87.34, 88.54, 86.69, 87.03, 86.06, 88.99, 86.81, 86.17, 86.12, 87.78, 85.64, 86.41, 87.08, 85.98, 88.56, 87.42, 87.18, 85.99, 87.07, 86.97, 86.86, 86.75, 89.43, 86.51, 86.19, 86.17, 85.87, 86.68, 87.63, 86.26, 89.38, 87.22, 87.03]
td_good_Mean_Contamination = [5.06, 4.78, 4.28, 4.58, 3.83, 4.25, 4.66, 4.68, 4.57, 4.01, 4.01, 4.34, 4.43, 4.96, 4.23, 5.17, 3.79, 3.63, 3.76, 3.61, 3.99, 3.66, 4.04, 3.17, 3.67, 3.37, 4.3, 3.39, 3.89, 3.63, 4.05, 4.51, 3.87, 3.93, 3.27, 3.48, 4.16, 4.88, 3.89, 3.16, 4.65, 4.07, 4.41, 4.22, 4.07, 3.78, 3.85, 4.16, 3.92, 3.06, 4.2, 4.41, 3.21, 3.81, 4.31, 3.64, 3.84, 4.09, 3.84, 3.64, 3.48, 3.71, 3.4, 4.06, 3.76, 4.32, 4.61, 4.74, 3.56, 4.54, 4.1, 4.05, 4.26, 4.31, 4.84, 4.06, 3.88, 4.59, 4.63, 3.76, 3.57, 3.88, 3.56, 3.5, 3.68, 4.71, 3.46, 3.84, 3.94, 3.61, 3.5, 3.36, 3.4, 3.57, 3.91, 4.21]
r_read_files = ['9117.5_raw', '10158.8_raw', '11263.1_raw', '11306.3_raw', '11306.1_raw', '11260.6_raw', '11260.5_raw', '9108.1_raw', '9053.2_raw', '9672.8_raw', '9108.2_raw', '9053.4_raw', '9053.3_raw', '9117.4_raw', '9117.6_raw', '9117.7_raw', '9117.8_raw', '10158.6_raw', '10186.3_raw', '10186.4_raw', '7331.1_raw', '9053.5_raw', '9041.8_raw']
r_tMreads = [36.0129894, 17.6218972, 38.2800142, 34.9076424, 35.3037194, 37.1504476, 40.3613864, 20.7773948, 31.8428354, 30.1166938, 27.718318, 40.8492618, 39.7169858, 34.4581152, 26.9696492, 21.3309852, 39.9148934, 34.9441596, 35.690255, 35.5019026, 33.4711394, 28.2058246, 36.96984]
r_Bbases = [54.379613994, 26.609064772, 57.802821442, 52.710540024, 53.308616294, 56.097175876, 60.945693464, 31.373866148, 48.082681454, 45.1750407, 41.85466018, 61.682385318, 59.972648558, 52.031753952, 40.724170292, 32.209787652, 60.271489034, 52.765680996, 53.89228505, 53.607872926, 50.541420494, 42.590795146, 55.8244584]
r_bins = [65, 47, 139, 99, 55, 90, 115, 38, 86, 87, 69, 71, 95, 70, 62, 95, 65, 78, 85, 109, 45, 49, 52]
r_Mean_Completeness = [45.96, 58.83, 51.28, 56.54, 56.55, 63.26, 52.23, 58.47, 54.69, 53.7, 58.18, 60.32, 62.41, 65.52, 52.14, 50.97, 56.26, 57.0, 65.39, 54.89, 53.78, 53.56, 52.81]
r_Mean_Contamination = [24.03, 69.63, 67.13, 60.4, 53.21, 81.43, 55.06, 35.14, 54.71, 74.62, 54.7, 76.81, 66.86, 78.94, 67.85, 46.5, 92.78, 97.57, 121.14, 103.47, 75.71, 57.06, 65.71]
r_good_bins = [15, 4, 23, 19, 14, 25, 29, 9, 20, 20, 18, 18, 23, 17, 10, 22, 16, 21, 18, 20, 9, 9, 14]
r_good_Mean_Completeness = [84.79, 83.85, 89.19, 87.14, 90.06, 87.41, 90.7, 86.6, 87.55, 85.4, 87.31, 86.68, 86.18, 84.94, 87.02, 88.92, 88.11, 87.12, 87.24, 89.3, 87.42, 87.22, 86.64]
r_good_Mean_Contamination = [3.37, 4.23, 3.57, 4.4, 3.62, 4.4, 4.56, 3.0, 4.05, 4.32, 4.53, 3.9, 4.48, 4.17, 2.86, 2.87, 3.85, 3.4, 3.63, 4.37, 4.63, 3.91, 4.25]

#create dataset
df = pd.DataFrame({'td_tMreads': td_tMreads,
                   'td_Bbases': td_Bbases,
                   'td_bins': td_bins,
                   'td_Mean_Completeness': td_Mean_Completeness,
                   'td_Mean_Contamination': td_Mean_Contamination,
                   'td_good_bins': td_good_bins,
                   'td_good_Mean_Completeness': td_good_Mean_Completeness,
                   'td_good_Mean_Contamination': td_good_Mean_Contamination,
                   'Mix_Group': Mix_Group})
df.rename(columns={'td_tMreads': 'tMreads', 'td_Bbases': 'Bbases'}, inplace=True)

df2 = pd.DataFrame({'r_tMreads': r_tMreads,
                   'r_Bbases': r_Bbases,
                   'r_bins': r_bins,
                   'r_Mean_Completeness': r_Mean_Completeness,
                   'r_Mean_Contamination': r_Mean_Contamination,
                   'r_good_bins': r_good_bins,
                   'r_good_Mean_Completeness': r_good_Mean_Completeness,
                   'r_good_Mean_Contamination': r_good_Mean_Contamination})
df2.rename(columns={'r_tMreads': 'tMreads', 'r_Bbases': 'Bbases'}, inplace=True)

#view dataset
#print(df)

#fit regression model
model = smf.mixedlm("td_bins ~ Bbases", data=df, groups=df["Mix_Group"])
modelf = model.fit()
model1 = ols('td_bins ~ Bbases', data=df).fit()
model2 = ols('r_bins ~ Bbases', data=df2).fit()

#adj r^2 = Pearson product-moment correlation coefficient (r) adjusted for number of predictors 
#... r = sqrt(0.146) 
#adjusted Pearson's r = 0.382

#mdf = md.fit()
#print(mdf.summary())

#view model summary
print(modelf.summary())
print(model1.summary())
print(model2.summary())

#define figure size
fig = plt.figure(figsize=(12,8))
fig2 = plt.figure(figsize=(12,8))

#produce regression plots
fig = sm.graphics.plot_regress_exog(model1, 'Bbases', fig=fig)
fig2 = sm.graphics.plot_regress_exog(model2, 'Bbases', fig=fig)
         Mixed Linear Model Regression Results
=======================================================
Model:            MixedLM Dependent Variable: td_bins  
No. Observations: 96      Method:             REML     
No. Groups:       6       Scale:              21.6539  
Min. group size:  16      Likelihood:         -296.4177
Max. group size:  16      Converged:          Yes      
Mean group size:  16.0                                 
-------------------------------------------------------
              Coef.  Std.Err.   z   P>|z| [0.025 0.975]
-------------------------------------------------------
Intercept     14.715    9.563 1.539 0.124 -4.028 33.458
Bbases         1.320    0.154 8.565 0.000  1.018  1.622
Group Var    226.343   31.822                          
=======================================================

                            OLS Regression Results                            
==============================================================================
Dep. Variable:                td_bins   R-squared:                       0.101
Model:                            OLS   Adj. R-squared:                  0.092
Method:                 Least Squares   F-statistic:                     10.61
Date:                Sun, 28 Mar 2021   Prob (F-statistic):            0.00157
Time:                        22:25:41   Log-Likelihood:                -391.24
No. Observations:                  96   AIC:                             786.5
Df Residuals:                      94   BIC:                             791.6
Df Model:                           1                                         
Covariance Type:            nonrobust                                         
==============================================================================
                 coef    std err          t      P>|t|      [0.025      0.975]
------------------------------------------------------------------------------
Intercept     36.3272     12.685      2.864      0.005      11.142      61.513
Bbases         0.8649      0.266      3.257      0.002       0.338       1.392
==============================================================================
Omnibus:                       46.989   Durbin-Watson:                   0.336
Prob(Omnibus):                  0.000   Jarque-Bera (JB):               10.364
Skew:                           0.498   Prob(JB):                      0.00562
Kurtosis:                       1.736   Cond. No.                         413.
==============================================================================

Warnings:
[1] Standard Errors assume that the covariance matrix of the errors is correctly specified.
                            OLS Regression Results                            
==============================================================================
Dep. Variable:                 r_bins   R-squared:                       0.184
Model:                            OLS   Adj. R-squared:                  0.146
Method:                 Least Squares   F-statistic:                     4.749
Date:                Sun, 28 Mar 2021   Prob (F-statistic):             0.0409
Time:                        22:25:41   Log-Likelihood:                -103.90
No. Observations:                  23   AIC:                             211.8
Df Residuals:                      21   BIC:                             214.1
Df Model:                           1                                         
Covariance Type:            nonrobust                                         
==============================================================================
                 coef    std err          t      P>|t|      [0.025      0.975]
------------------------------------------------------------------------------
Intercept     22.0775     25.566      0.864      0.398     -31.090      75.245
Bbases         1.0994      0.505      2.179      0.041       0.050       2.149
==============================================================================
Omnibus:                        1.517   Durbin-Watson:                   2.039
Prob(Omnibus):                  0.468   Jarque-Bera (JB):                1.218
Skew:                           0.535   Prob(JB):                        0.544
Kurtosis:                       2.647   Cond. No.                         268.
==============================================================================

Warnings:
[1] Standard Errors assume that the covariance matrix of the errors is correctly specified.
<Figure size 864x576 with 0 Axes>
#Average MAG completeness was not correlated with read counts of trimmed and decontaminated reads
#Average MAG completeness was not correlated with read counts of raw reads at alpha = 0.5

import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import statsmodels.api as sm
import statsmodels.formula.api as smf
from statsmodels.formula.api import ols

#data
Mix_Group = ['10158.6', '10158.6', '10158.6', '10158.6', '10158.6', '10158.6', '10158.6', '10158.6', '10158.6', '10158.6', '10158.6', '10158.6', '10158.6', '10158.6', '10158.6', '10158.6', '9117.8', '9117.8', '9117.8', '9117.8', '9117.8', '9117.8', '9117.8', '9117.8', '9117.8', '9117.8', '9117.8', '9117.8', '9117.8', '9117.8', '9117.8', '9117.8', '9108.2', '9108.2', '9108.2', '9108.2', '9108.2', '9108.2', '9108.2', '9108.2', '9108.2', '9108.2', '9108.2', '9108.2', '9108.2', '9108.2', '9108.2', '9108.2', '9117.7', '9117.7', '9117.7', '9117.7', '9117.7', '9117.7', '9117.7', '9117.7', '9117.7', '9117.7', '9117.7', '9117.7', '9117.7', '9117.7', '9117.7', '9117.7', '11306.3', '11306.3', '11306.3', '11306.3', '11306.3', '11306.3', '11306.3', '11306.3', '11306.3', '11306.3', '11306.3', '11306.3', '11306.3', '11306.3', '11306.3', '11306.3', '9117.4', '9117.4', '9117.4', '9117.4', '9117.4', '9117.4', '9117.4', '9117.4', '9117.4', '9117.4', '9117.4', '9117.4', '9117.4', '9117.4', '9117.4', '9117.4']
td_read_files = ['10158.6_raw', '10158.6_qc', '10158.6_trim150', '10158.6_ftrim', '10158.6_ktrim', '10158.6_atrim', '10158.6_aqbtrim', '10158.6_aqtrim', '10158.6_qbtrim', '10158.6_qtrim', '10158.6_bb1', '10158.6_bb2', '10158.6_bb3', '10158.6_bb4', '10158.6_bb5', '10158.6_bb6', '9117.8_raw', '9117.8_qc', '9117.8_trim150', '9117.8_ftrim', '9117.8_ktrim', '9117.8_atrim', '9117.8_aqbtrim', '9117.8_aqtrim', '9117.8_qbtrim', '9117.8_qtrim', '9117.8_bb1', '9117.8_bb2', '9117.8_bb3', '9117.8_bb4', '9117.8_bb5', '9117.8_bb6', '9108.2_raw', '9108.2_qc', '9108.2_trim150', '9108.2_ftrim', '9108.2_ktrim', '9108.2_atrim', '9108.2_aqbtrim', '9108.2_aqtrim', '9108.2_qbtrim', '9108.2_qtrim', '9108.2_bb1', '9108.2_bb2', '9108.2_bb3', '9108.2_bb4', '9108.2_bb5', '9108.2_bb6', '9117.7_raw', '9117.7_qc', '9117.7_trim150', '9117.7_ftrimmed', '9117.7_ktrimmed', '9117.7_atrimmed', '9117.7_aqbtrimmed', '9117.7_aqtrimmed', '9117.7_qbtrimmed', '9117.7_qtrimmed', '9117.7_bb1', '9117.7_bb2', '9117.7_bb3', '9117.7_bb4', '9117.7_bb5', '9117.7_bb6', '11306.3_raw', '11306.3_qc', '11306.3_trim150', '11306.3_ftrimmed', '11306.3_ktrimmed', '11306.3_atrimmed', '11306.3_aqbtrimmed', '11306.3_aqtrimmed', '11306.3_qbtrimmed', '11306.3_qtrimmed', '11306.3_bb1', '11306.3_bb2', '11306.3_bb3', '11306.3_bb4', '11306.3_bb5', '11306.3_bb6', '9117.4_raw', '9117.4_qc', '9117.4_trim150', '9117.4_ftrimmed', '9117.4_ktrimmed', '9117.4_atrimmed', '9117.4_aqbtrimmed', '9117.4_aqtrimmed', '9117.4_qbtrimmed', '9117.4_qtrimmed', '9117.4_bb1', '9117.4_bb2', '9117.4_bb3', '9117.4_bb4', '9117.4_bb5', '9117.4_bb6']
td_tMreads = [36.0129894, 35.2337896, 36.0129894, 36.0129894, 35.8983284, 35.933862, 34.552143, 31.2682706, 34.3282984, 30.6449696, 35.8442254, 31.2345964, 30.615651, 35.3058416, 34.4731868, 34.2527702, 28.2058246, 26.5787996, 28.2058246, 28.2058246, 27.66738, 27.6666568, 27.2469874, 25.5238858, 27.2469874, 25.2340502, 26.886397, 24.8035162, 24.519662, 26.7855648, 26.4733418, 26.3827484, 39.9148934, 37.3791976, 39.9148934, 39.9148934, 38.7122998, 38.708094, 37.7905962, 34.6650054, 37.6107554, 34.1456456, 37.8858836, 33.921016, 33.4111558, 37.7734844, 36.9821596, 36.8056446, 33.4711394, 32.6004118, 33.4711394, 33.4711394, 33.04149, 33.0383456, 30.3924568, 32.5038064, 30.0542808, 32.3956334, 32.9696938, 30.3601654, 30.025172, 32.8646394, 32.4398904, 32.3335082, 31.8428354, 31.3942576, 31.8428354, 31.8428354, 31.637986, 31.6358996, 29.2361886, 31.0176556, 28.9929926, 30.9364282, 31.6358006, 29.2361776, 28.9929822, 31.5146638, 31.0175998, 30.9363772, 34.9441596, 32.3312534, 34.9441596, 34.9441596, 33.4723724, 33.4683444, 30.8153706, 32.9387816, 30.4748512, 32.8289102, 32.6738296, 30.0963168, 29.7659662, 32.561628, 32.1535464, 32.0471204]
td_Bbases = [54.379613994, 52.641056249, 54.0194841, 51.498574842, 50.780358538, 53.669962132, 51.549471661, 45.964065176, 48.624647571, 42.845175292, 53.535374086, 45.919226343, 42.80751312, 53.311820816, 51.43795566, 48.522733084, 42.590795146, 40.133987396, 42.3087369, 40.334329178, 39.44260618, 41.667343504, 40.776782835, 37.725547193, 40.776782835, 35.427126679, 40.490156646, 36.660246851, 34.422663647, 40.446202848, 39.621053097, 37.463971015, 60.271489034, 56.442588376, 59.8723401, 57.078297562, 55.202040634, 58.310667004, 56.435790641, 51.019814812, 53.310062558, 47.762707137, 57.069909524, 49.92030233, 46.728770205, 57.037961444, 55.229109813, 52.166971365, 50.541420494, 49.226621818, 50.2067091, 47.863729342, 47.116772654, 49.7656626, 44.974095956, 48.667868199, 42.241274113, 46.022382453, 49.6630349, 44.930324539, 42.203344352, 49.625605494, 48.576089344, 45.937352987, 48.082681454, 47.024019995, 47.7642531, 45.535254622, 45.103175218, 47.640851402, 43.590661125, 46.584185484, 41.025529046, 44.063678425, 47.640703902, 43.590646205, 41.025515524, 47.587142338, 46.584108342, 44.063610873, 52.765680996, 48.820192634, 52.4162394, 49.970148228, 47.71891569, 50.406042552, 45.62052235, 49.326011714, 42.846059312, 46.640424087, 49.207539094, 44.559568777, 41.852323333, 49.16805828, 48.15227162, 45.531321684]
td_bins = [78, 85, 82, 83, 85, 83, 82, 78, 79, 63, 90, 72, 67, 78, 85, 83, 65, 62, 64, 52, 55, 59, 59, 56, 54, 50, 62, 55, 50, 60, 61, 53, 69, 76, 74, 75, 75, 71, 73, 72, 68, 65, 74, 71, 66, 74, 74, 65, 95, 103, 107, 96, 97, 103, 88, 98, 79, 81, 104, 90, 76, 101, 101, 90, 99, 100, 96, 97, 94, 97, 96, 97, 95, 82, 101, 96, 80, 97, 96, 91, 70, 68, 70, 65, 68, 69, 64, 61, 65, 64, 77, 62, 64, 78, 65, 62]
td_Mean_Completeness = [53.86, 53.28, 56.31, 57.37, 54.68, 50.1, 58.05, 54.04, 51.16, 54.76, 51.12, 52.23, 57.52, 52.15, 58.83, 54.29, 55.06, 54.27, 53.37, 53.16, 53.77, 57.81, 54.55, 54.88, 54.17, 52.29, 54.94, 52.04, 51.41, 58.36, 54.69, 56.81, 56.66, 53.72, 57.38, 56.22, 52.7, 52.65, 55.11, 54.88, 57.48, 54.64, 53.6, 54.34, 58.18, 52.11, 56.26, 58.28, 55.64, 57.45, 58.55, 54.54, 57.45, 58.65, 55.18, 55.5, 58.25, 60.02, 55.82, 58.17, 60.34, 56.8, 57.0, 59.14, 54.4, 51.04, 58.28, 56.73, 48.21, 51.68, 52.66, 56.77, 55.24, 57.44, 58.56, 57.33, 54.41, 56.39, 53.78, 58.04, 57.17, 59.53, 56.07, 51.82, 58.95, 60.91, 56.23, 56.59, 60.48, 59.56, 56.87, 61.29, 54.39, 60.71, 53.56, 53.89]
td_Mean_Contamination = [64.8, 56.57, 63.75, 56.14, 63.41, 65.77, 68.71, 59.0, 51.31, 53.39, 53.35, 56.24, 59.82, 71.72, 69.63, 66.78, 53.25, 46.8, 58.27, 54.69, 48.23, 50.47, 57.38, 53.78, 50.02, 45.22, 47.86, 47.19, 54.22, 48.53, 54.71, 53.82, 83.63, 71.19, 88.26, 89.13, 73.4, 65.59, 87.02, 85.4, 79.15, 67.55, 71.37, 72.84, 82.49, 64.29, 92.78, 85.56, 107.95, 99.99, 99.71, 85.91, 99.47, 78.31, 82.71, 107.35, 97.95, 92.98, 87.49, 93.41, 102.6, 77.96, 97.57, 99.91, 83.7, 63.09, 68.23, 69.77, 73.61, 78.21, 70.56, 69.82, 69.56, 64.5, 63.16, 82.06, 72.7, 69.26, 75.71, 62.79, 62.01, 54.02, 69.1, 58.79, 55.27, 51.99, 60.61, 57.98, 59.85, 65.24, 56.41, 58.49, 58.44, 53.79, 57.06, 54.33]
td_good_bins = [21, 19, 19, 14, 15, 20, 20, 17, 17, 12, 19, 16, 12, 22, 20, 17, 16, 16, 17, 17, 15, 17, 19, 15, 15, 14, 16, 16, 15, 18, 19, 16, 18, 17, 17, 16, 16, 16, 17, 17, 14, 15, 17, 15, 15, 19, 15, 17, 22, 21, 18, 19, 21, 22, 17, 21, 21, 16, 23, 16, 17, 24, 20, 21, 19, 21, 21, 15, 18, 19, 20, 18, 15, 17, 19, 18, 18, 18, 19, 16, 17, 17, 19, 18, 18, 17, 19, 17, 19, 18, 18, 16, 18, 21, 20, 17]
td_good_Mean_Completeness = [85.98, 87.17, 86.9, 87.66, 86.64, 86.04, 85.18, 86.79, 86.35, 86.03, 88.47, 85.58, 89.46, 86.17, 83.85, 86.86, 87.87, 87.38, 87.61, 86.94, 87.35, 86.96, 88.23, 88.04, 88.62, 90.16, 88.26, 89.11, 86.1, 88.26, 87.55, 87.21, 87.51, 86.6, 87.87, 87.83, 86.62, 87.87, 87.12, 87.7, 87.06, 87.94, 87.37, 85.69, 87.28, 85.92, 88.11, 86.67, 87.68, 87.33, 87.89, 88.48, 89.21, 88.4, 86.1, 86.69, 87.99, 88.53, 89.18, 87.33, 86.83, 88.58, 87.12, 87.34, 88.54, 86.69, 87.03, 86.06, 88.99, 86.81, 86.17, 86.12, 87.78, 85.64, 86.41, 87.08, 85.98, 88.56, 87.42, 87.18, 85.99, 87.07, 86.97, 86.86, 86.75, 89.43, 86.51, 86.19, 86.17, 85.87, 86.68, 87.63, 86.26, 89.38, 87.22, 87.03]
td_good_Mean_Contamination = [5.06, 4.78, 4.28, 4.58, 3.83, 4.25, 4.66, 4.68, 4.57, 4.01, 4.01, 4.34, 4.43, 4.96, 4.23, 5.17, 3.79, 3.63, 3.76, 3.61, 3.99, 3.66, 4.04, 3.17, 3.67, 3.37, 4.3, 3.39, 3.89, 3.63, 4.05, 4.51, 3.87, 3.93, 3.27, 3.48, 4.16, 4.88, 3.89, 3.16, 4.65, 4.07, 4.41, 4.22, 4.07, 3.78, 3.85, 4.16, 3.92, 3.06, 4.2, 4.41, 3.21, 3.81, 4.31, 3.64, 3.84, 4.09, 3.84, 3.64, 3.48, 3.71, 3.4, 4.06, 3.76, 4.32, 4.61, 4.74, 3.56, 4.54, 4.1, 4.05, 4.26, 4.31, 4.84, 4.06, 3.88, 4.59, 4.63, 3.76, 3.57, 3.88, 3.56, 3.5, 3.68, 4.71, 3.46, 3.84, 3.94, 3.61, 3.5, 3.36, 3.4, 3.57, 3.91, 4.21]
r_read_files = ['9117.5_raw', '10158.8_raw', '11263.1_raw', '11306.3_raw', '11306.1_raw', '11260.6_raw', '11260.5_raw', '9108.1_raw', '9053.2_raw', '9672.8_raw', '9108.2_raw', '9053.4_raw', '9053.3_raw', '9117.4_raw', '9117.6_raw', '9117.7_raw', '9117.8_raw', '10158.6_raw', '10186.3_raw', '10186.4_raw', '7331.1_raw', '9053.5_raw', '9041.8_raw']
r_tMreads = [36.0129894, 17.6218972, 38.2800142, 34.9076424, 35.3037194, 37.1504476, 40.3613864, 20.7773948, 31.8428354, 30.1166938, 27.718318, 40.8492618, 39.7169858, 34.4581152, 26.9696492, 21.3309852, 39.9148934, 34.9441596, 35.690255, 35.5019026, 33.4711394, 28.2058246, 36.96984]
r_Bbases = [54.379613994, 26.609064772, 57.802821442, 52.710540024, 53.308616294, 56.097175876, 60.945693464, 31.373866148, 48.082681454, 45.1750407, 41.85466018, 61.682385318, 59.972648558, 52.031753952, 40.724170292, 32.209787652, 60.271489034, 52.765680996, 53.89228505, 53.607872926, 50.541420494, 42.590795146, 55.8244584]
r_bins = [65, 47, 139, 99, 55, 90, 115, 38, 86, 87, 69, 71, 95, 70, 62, 95, 65, 78, 85, 109, 45, 49, 52]
r_Mean_Completeness = [45.96, 58.83, 51.28, 56.54, 56.55, 63.26, 52.23, 58.47, 54.69, 53.7, 58.18, 60.32, 62.41, 65.52, 52.14, 50.97, 56.26, 57.0, 65.39, 54.89, 53.78, 53.56, 52.81]
r_Mean_Contamination = [24.03, 69.63, 67.13, 60.4, 53.21, 81.43, 55.06, 35.14, 54.71, 74.62, 54.7, 76.81, 66.86, 78.94, 67.85, 46.5, 92.78, 97.57, 121.14, 103.47, 75.71, 57.06, 65.71]
r_good_bins = [15, 4, 23, 19, 14, 25, 29, 9, 20, 20, 18, 18, 23, 17, 10, 22, 16, 21, 18, 20, 9, 9, 14]
r_good_Mean_Completeness = [84.79, 83.85, 89.19, 87.14, 90.06, 87.41, 90.7, 86.6, 87.55, 85.4, 87.31, 86.68, 86.18, 84.94, 87.02, 88.92, 88.11, 87.12, 87.24, 89.3, 87.42, 87.22, 86.64]
r_good_Mean_Contamination = [3.37, 4.23, 3.57, 4.4, 3.62, 4.4, 4.56, 3.0, 4.05, 4.32, 4.53, 3.9, 4.48, 4.17, 2.86, 2.87, 3.85, 3.4, 3.63, 4.37, 4.63, 3.91, 4.25]

#create dataset
df = pd.DataFrame({'td_tMreads': td_tMreads,
                   'td_Bbases': td_Bbases,
                   'td_bins': td_bins,
                   'td_Mean_Completeness': td_Mean_Completeness,
                   'td_Mean_Contamination': td_Mean_Contamination,
                   'td_good_bins': td_good_bins,
                   'td_good_Mean_Completeness': td_good_Mean_Completeness,
                   'td_good_Mean_Contamination': td_good_Mean_Contamination,
                   'Mix_Group': Mix_Group})
df.rename(columns={'td_tMreads': 'tMreads', 'td_Bbases': 'Bbases'}, inplace=True)

df2 = pd.DataFrame({'r_tMreads': r_tMreads,
                   'r_Bbases': r_Bbases,
                   'r_bins': r_bins,
                   'r_Mean_Completeness': r_Mean_Completeness,
                   'r_Mean_Contamination': r_Mean_Contamination,
                   'r_good_bins': r_good_bins,
                   'r_good_Mean_Completeness': r_good_Mean_Completeness,
                   'r_good_Mean_Contamination': r_good_Mean_Contamination})
df2.rename(columns={'r_tMreads': 'tMreads', 'r_Bbases': 'Bbases'}, inplace=True)

#view dataset
#print(df)

#fit regression model
model = smf.mixedlm("td_Mean_Completeness ~ tMreads", data=df, groups=df["Mix_Group"])
modelf = model.fit()
model1 = ols('td_Mean_Completeness ~ tMreads', data=df).fit()
model2 = ols('r_Mean_Completeness ~ tMreads', data=df2).fit()

#mdf = md.fit()
#print(mdf.summary())

#view model summary
print(modelf.summary())
print(model1.summary())
print(model2.summary())

#define figure size
fig = plt.figure(figsize=(12,8))
fig2 = plt.figure(figsize=(12,8))

#produce regression plots
fig = sm.graphics.plot_regress_exog(model1, 'tMreads', fig=fig)
fig2 = sm.graphics.plot_regress_exog(model2, 'tMreads', fig=fig)
              Mixed Linear Model Regression Results
==================================================================
Model:            MixedLM Dependent Variable: td_Mean_Completeness
No. Observations: 96      Method:             REML                
No. Groups:       6       Scale:              5.9304              
Min. group size:  16      Likelihood:         -226.7723           
Max. group size:  16      Converged:          Yes                 
Mean group size:  16.0                                            
--------------------------------------------------------------------
                Coef.    Std.Err.     z      P>|z|   [0.025   0.975]
--------------------------------------------------------------------
Intercept       57.240      4.077   14.039   0.000   49.249   65.232
tMreads         -0.049      0.125   -0.393   0.694   -0.294    0.196
Group Var        1.826      0.610                                   
==================================================================

                             OLS Regression Results                             
================================================================================
Dep. Variable:     td_Mean_Completeness   R-squared:                       0.001
Model:                              OLS   Adj. R-squared:                 -0.010
Method:                   Least Squares   F-statistic:                   0.06334
Date:                  Sun, 28 Mar 2021   Prob (F-statistic):              0.802
Time:                          22:28:24   Log-Likelihood:                -230.60
No. Observations:                    96   AIC:                             465.2
Df Residuals:                        94   BIC:                             470.3
Df Model:                             1                                         
Covariance Type:              nonrobust                                         
==============================================================================
                 coef    std err          t      P>|t|      [0.025      0.975]
------------------------------------------------------------------------------
Intercept     55.0264      2.518     21.850      0.000      50.026      60.027
tMreads        0.0195      0.078      0.252      0.802      -0.134       0.173
==============================================================================
Omnibus:                        0.800   Durbin-Watson:                   1.696
Prob(Omnibus):                  0.670   Jarque-Bera (JB):                0.887
Skew:                          -0.119   Prob(JB):                        0.642
Kurtosis:                       2.593   Cond. No.                         297.
==============================================================================

Warnings:
[1] Standard Errors assume that the covariance matrix of the errors is correctly specified.
                             OLS Regression Results                            
===============================================================================
Dep. Variable:     r_Mean_Completeness   R-squared:                       0.010
Model:                             OLS   Adj. R-squared:                 -0.037
Method:                  Least Squares   F-statistic:                    0.2155
Date:                 Sun, 28 Mar 2021   Prob (F-statistic):              0.647
Time:                         22:28:24   Log-Likelihood:                -68.304
No. Observations:                   23   AIC:                             140.6
Df Residuals:                       21   BIC:                             142.9
Df Model:                            1                                         
Covariance Type:             nonrobust                                         
==============================================================================
                 coef    std err          t      P>|t|      [0.025      0.975]
------------------------------------------------------------------------------
Intercept     53.8117      5.443      9.887      0.000      42.493      65.131
tMreads        0.0753      0.162      0.464      0.647      -0.262       0.412
==============================================================================
Omnibus:                        0.192   Durbin-Watson:                   1.901
Prob(Omnibus):                  0.908   Jarque-Bera (JB):                0.159
Skew:                           0.159   Prob(JB):                        0.923
Kurtosis:                       2.745   Cond. No.                         178.
==============================================================================

Warnings:
[1] Standard Errors assume that the covariance matrix of the errors is correctly specified.
<Figure size 864x576 with 0 Axes>
#Average MAG completeness was not correlated with base counts of trimmed and decontaminated reads
#Average MAG completeness was not correlated with base counts of raw reads at alpha = 0.5

import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import statsmodels.api as sm
import statsmodels.formula.api as smf
from statsmodels.formula.api import ols

#data
Mix_Group = ['10158.6', '10158.6', '10158.6', '10158.6', '10158.6', '10158.6', '10158.6', '10158.6', '10158.6', '10158.6', '10158.6', '10158.6', '10158.6', '10158.6', '10158.6', '10158.6', '9117.8', '9117.8', '9117.8', '9117.8', '9117.8', '9117.8', '9117.8', '9117.8', '9117.8', '9117.8', '9117.8', '9117.8', '9117.8', '9117.8', '9117.8', '9117.8', '9108.2', '9108.2', '9108.2', '9108.2', '9108.2', '9108.2', '9108.2', '9108.2', '9108.2', '9108.2', '9108.2', '9108.2', '9108.2', '9108.2', '9108.2', '9108.2', '9117.7', '9117.7', '9117.7', '9117.7', '9117.7', '9117.7', '9117.7', '9117.7', '9117.7', '9117.7', '9117.7', '9117.7', '9117.7', '9117.7', '9117.7', '9117.7', '11306.3', '11306.3', '11306.3', '11306.3', '11306.3', '11306.3', '11306.3', '11306.3', '11306.3', '11306.3', '11306.3', '11306.3', '11306.3', '11306.3', '11306.3', '11306.3', '9117.4', '9117.4', '9117.4', '9117.4', '9117.4', '9117.4', '9117.4', '9117.4', '9117.4', '9117.4', '9117.4', '9117.4', '9117.4', '9117.4', '9117.4', '9117.4']
td_read_files = ['10158.6_raw', '10158.6_qc', '10158.6_trim150', '10158.6_ftrim', '10158.6_ktrim', '10158.6_atrim', '10158.6_aqbtrim', '10158.6_aqtrim', '10158.6_qbtrim', '10158.6_qtrim', '10158.6_bb1', '10158.6_bb2', '10158.6_bb3', '10158.6_bb4', '10158.6_bb5', '10158.6_bb6', '9117.8_raw', '9117.8_qc', '9117.8_trim150', '9117.8_ftrim', '9117.8_ktrim', '9117.8_atrim', '9117.8_aqbtrim', '9117.8_aqtrim', '9117.8_qbtrim', '9117.8_qtrim', '9117.8_bb1', '9117.8_bb2', '9117.8_bb3', '9117.8_bb4', '9117.8_bb5', '9117.8_bb6', '9108.2_raw', '9108.2_qc', '9108.2_trim150', '9108.2_ftrim', '9108.2_ktrim', '9108.2_atrim', '9108.2_aqbtrim', '9108.2_aqtrim', '9108.2_qbtrim', '9108.2_qtrim', '9108.2_bb1', '9108.2_bb2', '9108.2_bb3', '9108.2_bb4', '9108.2_bb5', '9108.2_bb6', '9117.7_raw', '9117.7_qc', '9117.7_trim150', '9117.7_ftrimmed', '9117.7_ktrimmed', '9117.7_atrimmed', '9117.7_aqbtrimmed', '9117.7_aqtrimmed', '9117.7_qbtrimmed', '9117.7_qtrimmed', '9117.7_bb1', '9117.7_bb2', '9117.7_bb3', '9117.7_bb4', '9117.7_bb5', '9117.7_bb6', '11306.3_raw', '11306.3_qc', '11306.3_trim150', '11306.3_ftrimmed', '11306.3_ktrimmed', '11306.3_atrimmed', '11306.3_aqbtrimmed', '11306.3_aqtrimmed', '11306.3_qbtrimmed', '11306.3_qtrimmed', '11306.3_bb1', '11306.3_bb2', '11306.3_bb3', '11306.3_bb4', '11306.3_bb5', '11306.3_bb6', '9117.4_raw', '9117.4_qc', '9117.4_trim150', '9117.4_ftrimmed', '9117.4_ktrimmed', '9117.4_atrimmed', '9117.4_aqbtrimmed', '9117.4_aqtrimmed', '9117.4_qbtrimmed', '9117.4_qtrimmed', '9117.4_bb1', '9117.4_bb2', '9117.4_bb3', '9117.4_bb4', '9117.4_bb5', '9117.4_bb6']
td_tMreads = [36.0129894, 35.2337896, 36.0129894, 36.0129894, 35.8983284, 35.933862, 34.552143, 31.2682706, 34.3282984, 30.6449696, 35.8442254, 31.2345964, 30.615651, 35.3058416, 34.4731868, 34.2527702, 28.2058246, 26.5787996, 28.2058246, 28.2058246, 27.66738, 27.6666568, 27.2469874, 25.5238858, 27.2469874, 25.2340502, 26.886397, 24.8035162, 24.519662, 26.7855648, 26.4733418, 26.3827484, 39.9148934, 37.3791976, 39.9148934, 39.9148934, 38.7122998, 38.708094, 37.7905962, 34.6650054, 37.6107554, 34.1456456, 37.8858836, 33.921016, 33.4111558, 37.7734844, 36.9821596, 36.8056446, 33.4711394, 32.6004118, 33.4711394, 33.4711394, 33.04149, 33.0383456, 30.3924568, 32.5038064, 30.0542808, 32.3956334, 32.9696938, 30.3601654, 30.025172, 32.8646394, 32.4398904, 32.3335082, 31.8428354, 31.3942576, 31.8428354, 31.8428354, 31.637986, 31.6358996, 29.2361886, 31.0176556, 28.9929926, 30.9364282, 31.6358006, 29.2361776, 28.9929822, 31.5146638, 31.0175998, 30.9363772, 34.9441596, 32.3312534, 34.9441596, 34.9441596, 33.4723724, 33.4683444, 30.8153706, 32.9387816, 30.4748512, 32.8289102, 32.6738296, 30.0963168, 29.7659662, 32.561628, 32.1535464, 32.0471204]
td_Bbases = [54.379613994, 52.641056249, 54.0194841, 51.498574842, 50.780358538, 53.669962132, 51.549471661, 45.964065176, 48.624647571, 42.845175292, 53.535374086, 45.919226343, 42.80751312, 53.311820816, 51.43795566, 48.522733084, 42.590795146, 40.133987396, 42.3087369, 40.334329178, 39.44260618, 41.667343504, 40.776782835, 37.725547193, 40.776782835, 35.427126679, 40.490156646, 36.660246851, 34.422663647, 40.446202848, 39.621053097, 37.463971015, 60.271489034, 56.442588376, 59.8723401, 57.078297562, 55.202040634, 58.310667004, 56.435790641, 51.019814812, 53.310062558, 47.762707137, 57.069909524, 49.92030233, 46.728770205, 57.037961444, 55.229109813, 52.166971365, 50.541420494, 49.226621818, 50.2067091, 47.863729342, 47.116772654, 49.7656626, 44.974095956, 48.667868199, 42.241274113, 46.022382453, 49.6630349, 44.930324539, 42.203344352, 49.625605494, 48.576089344, 45.937352987, 48.082681454, 47.024019995, 47.7642531, 45.535254622, 45.103175218, 47.640851402, 43.590661125, 46.584185484, 41.025529046, 44.063678425, 47.640703902, 43.590646205, 41.025515524, 47.587142338, 46.584108342, 44.063610873, 52.765680996, 48.820192634, 52.4162394, 49.970148228, 47.71891569, 50.406042552, 45.62052235, 49.326011714, 42.846059312, 46.640424087, 49.207539094, 44.559568777, 41.852323333, 49.16805828, 48.15227162, 45.531321684]
td_bins = [78, 85, 82, 83, 85, 83, 82, 78, 79, 63, 90, 72, 67, 78, 85, 83, 65, 62, 64, 52, 55, 59, 59, 56, 54, 50, 62, 55, 50, 60, 61, 53, 69, 76, 74, 75, 75, 71, 73, 72, 68, 65, 74, 71, 66, 74, 74, 65, 95, 103, 107, 96, 97, 103, 88, 98, 79, 81, 104, 90, 76, 101, 101, 90, 99, 100, 96, 97, 94, 97, 96, 97, 95, 82, 101, 96, 80, 97, 96, 91, 70, 68, 70, 65, 68, 69, 64, 61, 65, 64, 77, 62, 64, 78, 65, 62]
td_Mean_Completeness = [53.86, 53.28, 56.31, 57.37, 54.68, 50.1, 58.05, 54.04, 51.16, 54.76, 51.12, 52.23, 57.52, 52.15, 58.83, 54.29, 55.06, 54.27, 53.37, 53.16, 53.77, 57.81, 54.55, 54.88, 54.17, 52.29, 54.94, 52.04, 51.41, 58.36, 54.69, 56.81, 56.66, 53.72, 57.38, 56.22, 52.7, 52.65, 55.11, 54.88, 57.48, 54.64, 53.6, 54.34, 58.18, 52.11, 56.26, 58.28, 55.64, 57.45, 58.55, 54.54, 57.45, 58.65, 55.18, 55.5, 58.25, 60.02, 55.82, 58.17, 60.34, 56.8, 57.0, 59.14, 54.4, 51.04, 58.28, 56.73, 48.21, 51.68, 52.66, 56.77, 55.24, 57.44, 58.56, 57.33, 54.41, 56.39, 53.78, 58.04, 57.17, 59.53, 56.07, 51.82, 58.95, 60.91, 56.23, 56.59, 60.48, 59.56, 56.87, 61.29, 54.39, 60.71, 53.56, 53.89]
td_Mean_Contamination = [64.8, 56.57, 63.75, 56.14, 63.41, 65.77, 68.71, 59.0, 51.31, 53.39, 53.35, 56.24, 59.82, 71.72, 69.63, 66.78, 53.25, 46.8, 58.27, 54.69, 48.23, 50.47, 57.38, 53.78, 50.02, 45.22, 47.86, 47.19, 54.22, 48.53, 54.71, 53.82, 83.63, 71.19, 88.26, 89.13, 73.4, 65.59, 87.02, 85.4, 79.15, 67.55, 71.37, 72.84, 82.49, 64.29, 92.78, 85.56, 107.95, 99.99, 99.71, 85.91, 99.47, 78.31, 82.71, 107.35, 97.95, 92.98, 87.49, 93.41, 102.6, 77.96, 97.57, 99.91, 83.7, 63.09, 68.23, 69.77, 73.61, 78.21, 70.56, 69.82, 69.56, 64.5, 63.16, 82.06, 72.7, 69.26, 75.71, 62.79, 62.01, 54.02, 69.1, 58.79, 55.27, 51.99, 60.61, 57.98, 59.85, 65.24, 56.41, 58.49, 58.44, 53.79, 57.06, 54.33]
td_good_bins = [21, 19, 19, 14, 15, 20, 20, 17, 17, 12, 19, 16, 12, 22, 20, 17, 16, 16, 17, 17, 15, 17, 19, 15, 15, 14, 16, 16, 15, 18, 19, 16, 18, 17, 17, 16, 16, 16, 17, 17, 14, 15, 17, 15, 15, 19, 15, 17, 22, 21, 18, 19, 21, 22, 17, 21, 21, 16, 23, 16, 17, 24, 20, 21, 19, 21, 21, 15, 18, 19, 20, 18, 15, 17, 19, 18, 18, 18, 19, 16, 17, 17, 19, 18, 18, 17, 19, 17, 19, 18, 18, 16, 18, 21, 20, 17]
td_good_Mean_Completeness = [85.98, 87.17, 86.9, 87.66, 86.64, 86.04, 85.18, 86.79, 86.35, 86.03, 88.47, 85.58, 89.46, 86.17, 83.85, 86.86, 87.87, 87.38, 87.61, 86.94, 87.35, 86.96, 88.23, 88.04, 88.62, 90.16, 88.26, 89.11, 86.1, 88.26, 87.55, 87.21, 87.51, 86.6, 87.87, 87.83, 86.62, 87.87, 87.12, 87.7, 87.06, 87.94, 87.37, 85.69, 87.28, 85.92, 88.11, 86.67, 87.68, 87.33, 87.89, 88.48, 89.21, 88.4, 86.1, 86.69, 87.99, 88.53, 89.18, 87.33, 86.83, 88.58, 87.12, 87.34, 88.54, 86.69, 87.03, 86.06, 88.99, 86.81, 86.17, 86.12, 87.78, 85.64, 86.41, 87.08, 85.98, 88.56, 87.42, 87.18, 85.99, 87.07, 86.97, 86.86, 86.75, 89.43, 86.51, 86.19, 86.17, 85.87, 86.68, 87.63, 86.26, 89.38, 87.22, 87.03]
td_good_Mean_Contamination = [5.06, 4.78, 4.28, 4.58, 3.83, 4.25, 4.66, 4.68, 4.57, 4.01, 4.01, 4.34, 4.43, 4.96, 4.23, 5.17, 3.79, 3.63, 3.76, 3.61, 3.99, 3.66, 4.04, 3.17, 3.67, 3.37, 4.3, 3.39, 3.89, 3.63, 4.05, 4.51, 3.87, 3.93, 3.27, 3.48, 4.16, 4.88, 3.89, 3.16, 4.65, 4.07, 4.41, 4.22, 4.07, 3.78, 3.85, 4.16, 3.92, 3.06, 4.2, 4.41, 3.21, 3.81, 4.31, 3.64, 3.84, 4.09, 3.84, 3.64, 3.48, 3.71, 3.4, 4.06, 3.76, 4.32, 4.61, 4.74, 3.56, 4.54, 4.1, 4.05, 4.26, 4.31, 4.84, 4.06, 3.88, 4.59, 4.63, 3.76, 3.57, 3.88, 3.56, 3.5, 3.68, 4.71, 3.46, 3.84, 3.94, 3.61, 3.5, 3.36, 3.4, 3.57, 3.91, 4.21]
r_read_files = ['9117.5_raw', '10158.8_raw', '11263.1_raw', '11306.3_raw', '11306.1_raw', '11260.6_raw', '11260.5_raw', '9108.1_raw', '9053.2_raw', '9672.8_raw', '9108.2_raw', '9053.4_raw', '9053.3_raw', '9117.4_raw', '9117.6_raw', '9117.7_raw', '9117.8_raw', '10158.6_raw', '10186.3_raw', '10186.4_raw', '7331.1_raw', '9053.5_raw', '9041.8_raw']
r_tMreads = [36.0129894, 17.6218972, 38.2800142, 34.9076424, 35.3037194, 37.1504476, 40.3613864, 20.7773948, 31.8428354, 30.1166938, 27.718318, 40.8492618, 39.7169858, 34.4581152, 26.9696492, 21.3309852, 39.9148934, 34.9441596, 35.690255, 35.5019026, 33.4711394, 28.2058246, 36.96984]
r_Bbases = [54.379613994, 26.609064772, 57.802821442, 52.710540024, 53.308616294, 56.097175876, 60.945693464, 31.373866148, 48.082681454, 45.1750407, 41.85466018, 61.682385318, 59.972648558, 52.031753952, 40.724170292, 32.209787652, 60.271489034, 52.765680996, 53.89228505, 53.607872926, 50.541420494, 42.590795146, 55.8244584]
r_bins = [65, 47, 139, 99, 55, 90, 115, 38, 86, 87, 69, 71, 95, 70, 62, 95, 65, 78, 85, 109, 45, 49, 52]
r_Mean_Completeness = [45.96, 58.83, 51.28, 56.54, 56.55, 63.26, 52.23, 58.47, 54.69, 53.7, 58.18, 60.32, 62.41, 65.52, 52.14, 50.97, 56.26, 57.0, 65.39, 54.89, 53.78, 53.56, 52.81]
r_Mean_Contamination = [24.03, 69.63, 67.13, 60.4, 53.21, 81.43, 55.06, 35.14, 54.71, 74.62, 54.7, 76.81, 66.86, 78.94, 67.85, 46.5, 92.78, 97.57, 121.14, 103.47, 75.71, 57.06, 65.71]
r_good_bins = [15, 4, 23, 19, 14, 25, 29, 9, 20, 20, 18, 18, 23, 17, 10, 22, 16, 21, 18, 20, 9, 9, 14]
r_good_Mean_Completeness = [84.79, 83.85, 89.19, 87.14, 90.06, 87.41, 90.7, 86.6, 87.55, 85.4, 87.31, 86.68, 86.18, 84.94, 87.02, 88.92, 88.11, 87.12, 87.24, 89.3, 87.42, 87.22, 86.64]
r_good_Mean_Contamination = [3.37, 4.23, 3.57, 4.4, 3.62, 4.4, 4.56, 3.0, 4.05, 4.32, 4.53, 3.9, 4.48, 4.17, 2.86, 2.87, 3.85, 3.4, 3.63, 4.37, 4.63, 3.91, 4.25]

#create dataset
df = pd.DataFrame({'td_tMreads': td_tMreads,
                   'td_Bbases': td_Bbases,
                   'td_bins': td_bins,
                   'td_Mean_Completeness': td_Mean_Completeness,
                   'td_Mean_Contamination': td_Mean_Contamination,
                   'td_good_bins': td_good_bins,
                   'td_good_Mean_Completeness': td_good_Mean_Completeness,
                   'td_good_Mean_Contamination': td_good_Mean_Contamination,
                   'Mix_Group': Mix_Group})
df.rename(columns={'td_tMreads': 'tMreads', 'td_Bbases': 'Bbases'}, inplace=True)

df2 = pd.DataFrame({'r_tMreads': r_tMreads,
                   'r_Bbases': r_Bbases,
                   'r_bins': r_bins,
                   'r_Mean_Completeness': r_Mean_Completeness,
                   'r_Mean_Contamination': r_Mean_Contamination,
                   'r_good_bins': r_good_bins,
                   'r_good_Mean_Completeness': r_good_Mean_Completeness,
                   'r_good_Mean_Contamination': r_good_Mean_Contamination})
df2.rename(columns={'r_tMreads': 'tMreads', 'r_Bbases': 'Bbases'}, inplace=True)

#view dataset
#print(df)

#fit regression model
model = smf.mixedlm("td_Mean_Completeness ~ Bbases", data=df, groups=df["Mix_Group"])
modelf = model.fit()
model1 = ols('td_Mean_Completeness ~ Bbases', data=df).fit()
model2 = ols('r_Mean_Completeness ~ Bbases', data=df2).fit()

#mdf = md.fit()
#print(mdf.summary())

#view model summary
print(modelf.summary())
print(model1.summary())
print(model2.summary())

#define figure size
fig = plt.figure(figsize=(12,8))
fig2 = plt.figure(figsize=(12,8))

#produce regression plots
fig = sm.graphics.plot_regress_exog(model1, 'Bbases', fig=fig)
fig2 = sm.graphics.plot_regress_exog(model2, 'Bbases', fig=fig)
              Mixed Linear Model Regression Results
==================================================================
Model:            MixedLM Dependent Variable: td_Mean_Completeness
No. Observations: 96      Method:             REML                
No. Groups:       6       Scale:              5.9308              
Min. group size:  16      Likelihood:         -227.2845           
Max. group size:  16      Converged:          Yes                 
Mean group size:  16.0                                            
--------------------------------------------------------------------
                Coef.    Std.Err.     z      P>|z|   [0.025   0.975]
--------------------------------------------------------------------
Intercept       57.331      3.383   16.945   0.000   50.700   63.962
Bbases          -0.035      0.070   -0.503   0.615   -0.173    0.102
Group Var        1.774      0.587                                   
==================================================================

                             OLS Regression Results                             
================================================================================
Dep. Variable:     td_Mean_Completeness   R-squared:                       0.000
Model:                              OLS   Adj. R-squared:                 -0.010
Method:                   Least Squares   F-statistic:                   0.02058
Date:                  Sun, 28 Mar 2021   Prob (F-statistic):              0.886
Time:                          22:28:41   Log-Likelihood:                -230.62
No. Observations:                    96   AIC:                             465.2
Df Residuals:                        94   BIC:                             470.4
Df Model:                             1                                         
Covariance Type:              nonrobust                                         
==============================================================================
                 coef    std err          t      P>|t|      [0.025      0.975]
------------------------------------------------------------------------------
Intercept     55.3173      2.380     23.240      0.000      50.591      60.043
Bbases         0.0071      0.050      0.143      0.886      -0.092       0.106
==============================================================================
Omnibus:                        0.847   Durbin-Watson:                   1.695
Prob(Omnibus):                  0.655   Jarque-Bera (JB):                0.917
Skew:                          -0.117   Prob(JB):                        0.632
Kurtosis:                       2.583   Cond. No.                         413.
==============================================================================

Warnings:
[1] Standard Errors assume that the covariance matrix of the errors is correctly specified.
                             OLS Regression Results                            
===============================================================================
Dep. Variable:     r_Mean_Completeness   R-squared:                       0.010
Model:                             OLS   Adj. R-squared:                 -0.037
Method:                  Least Squares   F-statistic:                    0.2185
Date:                 Sun, 28 Mar 2021   Prob (F-statistic):              0.645
Time:                         22:28:41   Log-Likelihood:                -68.302
No. Observations:                   23   AIC:                             140.6
Df Residuals:                       21   BIC:                             142.9
Df Model:                            1                                         
Covariance Type:             nonrobust                                         
==============================================================================
                 coef    std err          t      P>|t|      [0.025      0.975]
------------------------------------------------------------------------------
Intercept     53.7971      5.438      9.893      0.000      42.489      65.105
Bbases         0.0502      0.107      0.467      0.645      -0.173       0.273
==============================================================================
Omnibus:                        0.191   Durbin-Watson:                   1.902
Prob(Omnibus):                  0.909   Jarque-Bera (JB):                0.158
Skew:                           0.158   Prob(JB):                        0.924
Kurtosis:                       2.745   Cond. No.                         268.
==============================================================================

Warnings:
[1] Standard Errors assume that the covariance matrix of the errors is correctly specified.
<Figure size 864x576 with 0 Axes>
#Average MAG contamination was not correlated with read counts of trimmed and decontaminated reads
#Average MAG contamination was not correlated with read counts of raw reads at alpha = 0.5

import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import statsmodels.api as sm
import statsmodels.formula.api as smf
from statsmodels.formula.api import ols

#data
Mix_Group = ['10158.6', '10158.6', '10158.6', '10158.6', '10158.6', '10158.6', '10158.6', '10158.6', '10158.6', '10158.6', '10158.6', '10158.6', '10158.6', '10158.6', '10158.6', '10158.6', '9117.8', '9117.8', '9117.8', '9117.8', '9117.8', '9117.8', '9117.8', '9117.8', '9117.8', '9117.8', '9117.8', '9117.8', '9117.8', '9117.8', '9117.8', '9117.8', '9108.2', '9108.2', '9108.2', '9108.2', '9108.2', '9108.2', '9108.2', '9108.2', '9108.2', '9108.2', '9108.2', '9108.2', '9108.2', '9108.2', '9108.2', '9108.2', '9117.7', '9117.7', '9117.7', '9117.7', '9117.7', '9117.7', '9117.7', '9117.7', '9117.7', '9117.7', '9117.7', '9117.7', '9117.7', '9117.7', '9117.7', '9117.7', '11306.3', '11306.3', '11306.3', '11306.3', '11306.3', '11306.3', '11306.3', '11306.3', '11306.3', '11306.3', '11306.3', '11306.3', '11306.3', '11306.3', '11306.3', '11306.3', '9117.4', '9117.4', '9117.4', '9117.4', '9117.4', '9117.4', '9117.4', '9117.4', '9117.4', '9117.4', '9117.4', '9117.4', '9117.4', '9117.4', '9117.4', '9117.4']
td_read_files = ['10158.6_raw', '10158.6_qc', '10158.6_trim150', '10158.6_ftrim', '10158.6_ktrim', '10158.6_atrim', '10158.6_aqbtrim', '10158.6_aqtrim', '10158.6_qbtrim', '10158.6_qtrim', '10158.6_bb1', '10158.6_bb2', '10158.6_bb3', '10158.6_bb4', '10158.6_bb5', '10158.6_bb6', '9117.8_raw', '9117.8_qc', '9117.8_trim150', '9117.8_ftrim', '9117.8_ktrim', '9117.8_atrim', '9117.8_aqbtrim', '9117.8_aqtrim', '9117.8_qbtrim', '9117.8_qtrim', '9117.8_bb1', '9117.8_bb2', '9117.8_bb3', '9117.8_bb4', '9117.8_bb5', '9117.8_bb6', '9108.2_raw', '9108.2_qc', '9108.2_trim150', '9108.2_ftrim', '9108.2_ktrim', '9108.2_atrim', '9108.2_aqbtrim', '9108.2_aqtrim', '9108.2_qbtrim', '9108.2_qtrim', '9108.2_bb1', '9108.2_bb2', '9108.2_bb3', '9108.2_bb4', '9108.2_bb5', '9108.2_bb6', '9117.7_raw', '9117.7_qc', '9117.7_trim150', '9117.7_ftrimmed', '9117.7_ktrimmed', '9117.7_atrimmed', '9117.7_aqbtrimmed', '9117.7_aqtrimmed', '9117.7_qbtrimmed', '9117.7_qtrimmed', '9117.7_bb1', '9117.7_bb2', '9117.7_bb3', '9117.7_bb4', '9117.7_bb5', '9117.7_bb6', '11306.3_raw', '11306.3_qc', '11306.3_trim150', '11306.3_ftrimmed', '11306.3_ktrimmed', '11306.3_atrimmed', '11306.3_aqbtrimmed', '11306.3_aqtrimmed', '11306.3_qbtrimmed', '11306.3_qtrimmed', '11306.3_bb1', '11306.3_bb2', '11306.3_bb3', '11306.3_bb4', '11306.3_bb5', '11306.3_bb6', '9117.4_raw', '9117.4_qc', '9117.4_trim150', '9117.4_ftrimmed', '9117.4_ktrimmed', '9117.4_atrimmed', '9117.4_aqbtrimmed', '9117.4_aqtrimmed', '9117.4_qbtrimmed', '9117.4_qtrimmed', '9117.4_bb1', '9117.4_bb2', '9117.4_bb3', '9117.4_bb4', '9117.4_bb5', '9117.4_bb6']
td_tMreads = [36.0129894, 35.2337896, 36.0129894, 36.0129894, 35.8983284, 35.933862, 34.552143, 31.2682706, 34.3282984, 30.6449696, 35.8442254, 31.2345964, 30.615651, 35.3058416, 34.4731868, 34.2527702, 28.2058246, 26.5787996, 28.2058246, 28.2058246, 27.66738, 27.6666568, 27.2469874, 25.5238858, 27.2469874, 25.2340502, 26.886397, 24.8035162, 24.519662, 26.7855648, 26.4733418, 26.3827484, 39.9148934, 37.3791976, 39.9148934, 39.9148934, 38.7122998, 38.708094, 37.7905962, 34.6650054, 37.6107554, 34.1456456, 37.8858836, 33.921016, 33.4111558, 37.7734844, 36.9821596, 36.8056446, 33.4711394, 32.6004118, 33.4711394, 33.4711394, 33.04149, 33.0383456, 30.3924568, 32.5038064, 30.0542808, 32.3956334, 32.9696938, 30.3601654, 30.025172, 32.8646394, 32.4398904, 32.3335082, 31.8428354, 31.3942576, 31.8428354, 31.8428354, 31.637986, 31.6358996, 29.2361886, 31.0176556, 28.9929926, 30.9364282, 31.6358006, 29.2361776, 28.9929822, 31.5146638, 31.0175998, 30.9363772, 34.9441596, 32.3312534, 34.9441596, 34.9441596, 33.4723724, 33.4683444, 30.8153706, 32.9387816, 30.4748512, 32.8289102, 32.6738296, 30.0963168, 29.7659662, 32.561628, 32.1535464, 32.0471204]
td_Bbases = [54.379613994, 52.641056249, 54.0194841, 51.498574842, 50.780358538, 53.669962132, 51.549471661, 45.964065176, 48.624647571, 42.845175292, 53.535374086, 45.919226343, 42.80751312, 53.311820816, 51.43795566, 48.522733084, 42.590795146, 40.133987396, 42.3087369, 40.334329178, 39.44260618, 41.667343504, 40.776782835, 37.725547193, 40.776782835, 35.427126679, 40.490156646, 36.660246851, 34.422663647, 40.446202848, 39.621053097, 37.463971015, 60.271489034, 56.442588376, 59.8723401, 57.078297562, 55.202040634, 58.310667004, 56.435790641, 51.019814812, 53.310062558, 47.762707137, 57.069909524, 49.92030233, 46.728770205, 57.037961444, 55.229109813, 52.166971365, 50.541420494, 49.226621818, 50.2067091, 47.863729342, 47.116772654, 49.7656626, 44.974095956, 48.667868199, 42.241274113, 46.022382453, 49.6630349, 44.930324539, 42.203344352, 49.625605494, 48.576089344, 45.937352987, 48.082681454, 47.024019995, 47.7642531, 45.535254622, 45.103175218, 47.640851402, 43.590661125, 46.584185484, 41.025529046, 44.063678425, 47.640703902, 43.590646205, 41.025515524, 47.587142338, 46.584108342, 44.063610873, 52.765680996, 48.820192634, 52.4162394, 49.970148228, 47.71891569, 50.406042552, 45.62052235, 49.326011714, 42.846059312, 46.640424087, 49.207539094, 44.559568777, 41.852323333, 49.16805828, 48.15227162, 45.531321684]
td_bins = [78, 85, 82, 83, 85, 83, 82, 78, 79, 63, 90, 72, 67, 78, 85, 83, 65, 62, 64, 52, 55, 59, 59, 56, 54, 50, 62, 55, 50, 60, 61, 53, 69, 76, 74, 75, 75, 71, 73, 72, 68, 65, 74, 71, 66, 74, 74, 65, 95, 103, 107, 96, 97, 103, 88, 98, 79, 81, 104, 90, 76, 101, 101, 90, 99, 100, 96, 97, 94, 97, 96, 97, 95, 82, 101, 96, 80, 97, 96, 91, 70, 68, 70, 65, 68, 69, 64, 61, 65, 64, 77, 62, 64, 78, 65, 62]
td_Mean_Completeness = [53.86, 53.28, 56.31, 57.37, 54.68, 50.1, 58.05, 54.04, 51.16, 54.76, 51.12, 52.23, 57.52, 52.15, 58.83, 54.29, 55.06, 54.27, 53.37, 53.16, 53.77, 57.81, 54.55, 54.88, 54.17, 52.29, 54.94, 52.04, 51.41, 58.36, 54.69, 56.81, 56.66, 53.72, 57.38, 56.22, 52.7, 52.65, 55.11, 54.88, 57.48, 54.64, 53.6, 54.34, 58.18, 52.11, 56.26, 58.28, 55.64, 57.45, 58.55, 54.54, 57.45, 58.65, 55.18, 55.5, 58.25, 60.02, 55.82, 58.17, 60.34, 56.8, 57.0, 59.14, 54.4, 51.04, 58.28, 56.73, 48.21, 51.68, 52.66, 56.77, 55.24, 57.44, 58.56, 57.33, 54.41, 56.39, 53.78, 58.04, 57.17, 59.53, 56.07, 51.82, 58.95, 60.91, 56.23, 56.59, 60.48, 59.56, 56.87, 61.29, 54.39, 60.71, 53.56, 53.89]
td_Mean_Contamination = [64.8, 56.57, 63.75, 56.14, 63.41, 65.77, 68.71, 59.0, 51.31, 53.39, 53.35, 56.24, 59.82, 71.72, 69.63, 66.78, 53.25, 46.8, 58.27, 54.69, 48.23, 50.47, 57.38, 53.78, 50.02, 45.22, 47.86, 47.19, 54.22, 48.53, 54.71, 53.82, 83.63, 71.19, 88.26, 89.13, 73.4, 65.59, 87.02, 85.4, 79.15, 67.55, 71.37, 72.84, 82.49, 64.29, 92.78, 85.56, 107.95, 99.99, 99.71, 85.91, 99.47, 78.31, 82.71, 107.35, 97.95, 92.98, 87.49, 93.41, 102.6, 77.96, 97.57, 99.91, 83.7, 63.09, 68.23, 69.77, 73.61, 78.21, 70.56, 69.82, 69.56, 64.5, 63.16, 82.06, 72.7, 69.26, 75.71, 62.79, 62.01, 54.02, 69.1, 58.79, 55.27, 51.99, 60.61, 57.98, 59.85, 65.24, 56.41, 58.49, 58.44, 53.79, 57.06, 54.33]
td_good_bins = [21, 19, 19, 14, 15, 20, 20, 17, 17, 12, 19, 16, 12, 22, 20, 17, 16, 16, 17, 17, 15, 17, 19, 15, 15, 14, 16, 16, 15, 18, 19, 16, 18, 17, 17, 16, 16, 16, 17, 17, 14, 15, 17, 15, 15, 19, 15, 17, 22, 21, 18, 19, 21, 22, 17, 21, 21, 16, 23, 16, 17, 24, 20, 21, 19, 21, 21, 15, 18, 19, 20, 18, 15, 17, 19, 18, 18, 18, 19, 16, 17, 17, 19, 18, 18, 17, 19, 17, 19, 18, 18, 16, 18, 21, 20, 17]
td_good_Mean_Completeness = [85.98, 87.17, 86.9, 87.66, 86.64, 86.04, 85.18, 86.79, 86.35, 86.03, 88.47, 85.58, 89.46, 86.17, 83.85, 86.86, 87.87, 87.38, 87.61, 86.94, 87.35, 86.96, 88.23, 88.04, 88.62, 90.16, 88.26, 89.11, 86.1, 88.26, 87.55, 87.21, 87.51, 86.6, 87.87, 87.83, 86.62, 87.87, 87.12, 87.7, 87.06, 87.94, 87.37, 85.69, 87.28, 85.92, 88.11, 86.67, 87.68, 87.33, 87.89, 88.48, 89.21, 88.4, 86.1, 86.69, 87.99, 88.53, 89.18, 87.33, 86.83, 88.58, 87.12, 87.34, 88.54, 86.69, 87.03, 86.06, 88.99, 86.81, 86.17, 86.12, 87.78, 85.64, 86.41, 87.08, 85.98, 88.56, 87.42, 87.18, 85.99, 87.07, 86.97, 86.86, 86.75, 89.43, 86.51, 86.19, 86.17, 85.87, 86.68, 87.63, 86.26, 89.38, 87.22, 87.03]
td_good_Mean_Contamination = [5.06, 4.78, 4.28, 4.58, 3.83, 4.25, 4.66, 4.68, 4.57, 4.01, 4.01, 4.34, 4.43, 4.96, 4.23, 5.17, 3.79, 3.63, 3.76, 3.61, 3.99, 3.66, 4.04, 3.17, 3.67, 3.37, 4.3, 3.39, 3.89, 3.63, 4.05, 4.51, 3.87, 3.93, 3.27, 3.48, 4.16, 4.88, 3.89, 3.16, 4.65, 4.07, 4.41, 4.22, 4.07, 3.78, 3.85, 4.16, 3.92, 3.06, 4.2, 4.41, 3.21, 3.81, 4.31, 3.64, 3.84, 4.09, 3.84, 3.64, 3.48, 3.71, 3.4, 4.06, 3.76, 4.32, 4.61, 4.74, 3.56, 4.54, 4.1, 4.05, 4.26, 4.31, 4.84, 4.06, 3.88, 4.59, 4.63, 3.76, 3.57, 3.88, 3.56, 3.5, 3.68, 4.71, 3.46, 3.84, 3.94, 3.61, 3.5, 3.36, 3.4, 3.57, 3.91, 4.21]
r_read_files = ['9117.5_raw', '10158.8_raw', '11263.1_raw', '11306.3_raw', '11306.1_raw', '11260.6_raw', '11260.5_raw', '9108.1_raw', '9053.2_raw', '9672.8_raw', '9108.2_raw', '9053.4_raw', '9053.3_raw', '9117.4_raw', '9117.6_raw', '9117.7_raw', '9117.8_raw', '10158.6_raw', '10186.3_raw', '10186.4_raw', '7331.1_raw', '9053.5_raw', '9041.8_raw']
r_tMreads = [36.0129894, 17.6218972, 38.2800142, 34.9076424, 35.3037194, 37.1504476, 40.3613864, 20.7773948, 31.8428354, 30.1166938, 27.718318, 40.8492618, 39.7169858, 34.4581152, 26.9696492, 21.3309852, 39.9148934, 34.9441596, 35.690255, 35.5019026, 33.4711394, 28.2058246, 36.96984]
r_Bbases = [54.379613994, 26.609064772, 57.802821442, 52.710540024, 53.308616294, 56.097175876, 60.945693464, 31.373866148, 48.082681454, 45.1750407, 41.85466018, 61.682385318, 59.972648558, 52.031753952, 40.724170292, 32.209787652, 60.271489034, 52.765680996, 53.89228505, 53.607872926, 50.541420494, 42.590795146, 55.8244584]
r_bins = [65, 47, 139, 99, 55, 90, 115, 38, 86, 87, 69, 71, 95, 70, 62, 95, 65, 78, 85, 109, 45, 49, 52]
r_Mean_Completeness = [45.96, 58.83, 51.28, 56.54, 56.55, 63.26, 52.23, 58.47, 54.69, 53.7, 58.18, 60.32, 62.41, 65.52, 52.14, 50.97, 56.26, 57.0, 65.39, 54.89, 53.78, 53.56, 52.81]
r_Mean_Contamination = [24.03, 69.63, 67.13, 60.4, 53.21, 81.43, 55.06, 35.14, 54.71, 74.62, 54.7, 76.81, 66.86, 78.94, 67.85, 46.5, 92.78, 97.57, 121.14, 103.47, 75.71, 57.06, 65.71]
r_good_bins = [15, 4, 23, 19, 14, 25, 29, 9, 20, 20, 18, 18, 23, 17, 10, 22, 16, 21, 18, 20, 9, 9, 14]
r_good_Mean_Completeness = [84.79, 83.85, 89.19, 87.14, 90.06, 87.41, 90.7, 86.6, 87.55, 85.4, 87.31, 86.68, 86.18, 84.94, 87.02, 88.92, 88.11, 87.12, 87.24, 89.3, 87.42, 87.22, 86.64]
r_good_Mean_Contamination = [3.37, 4.23, 3.57, 4.4, 3.62, 4.4, 4.56, 3.0, 4.05, 4.32, 4.53, 3.9, 4.48, 4.17, 2.86, 2.87, 3.85, 3.4, 3.63, 4.37, 4.63, 3.91, 4.25]

#create dataset
df = pd.DataFrame({'td_tMreads': td_tMreads,
                   'td_Bbases': td_Bbases,
                   'td_bins': td_bins,
                   'td_Mean_Completeness': td_Mean_Completeness,
                   'td_Mean_Contamination': td_Mean_Contamination,
                   'td_good_bins': td_good_bins,
                   'td_good_Mean_Completeness': td_good_Mean_Completeness,
                   'td_good_Mean_Contamination': td_good_Mean_Contamination,
                   'Mix_Group': Mix_Group})
df.rename(columns={'td_tMreads': 'tMreads', 'td_Bbases': 'Bbases'}, inplace=True)

df2 = pd.DataFrame({'r_tMreads': r_tMreads,
                   'r_Bbases': r_Bbases,
                   'r_bins': r_bins,
                   'r_Mean_Completeness': r_Mean_Completeness,
                   'r_Mean_Contamination': r_Mean_Contamination,
                   'r_good_bins': r_good_bins,
                   'r_good_Mean_Completeness': r_good_Mean_Completeness,
                   'r_good_Mean_Contamination': r_good_Mean_Contamination})
df2.rename(columns={'r_tMreads': 'tMreads', 'r_Bbases': 'Bbases'}, inplace=True)

#view dataset
#print(df)

#fit regression model
model = smf.mixedlm("td_Mean_Contamination ~ tMreads", data=df, groups=df["Mix_Group"])
modelf = model.fit()
model1 = ols('td_Mean_Contamination ~ tMreads', data=df).fit()
model2 = ols('r_Mean_Contamination ~ tMreads', data=df2).fit()

#mdf = md.fit()
#print(mdf.summary())

#view model summary
print(modelf.summary())
print(model1.summary())
print(model2.summary())

#define figure size
fig = plt.figure(figsize=(12,8))
fig2 = plt.figure(figsize=(12,8))

#produce regression plots
fig = sm.graphics.plot_regress_exog(model1, 'tMreads', fig=fig)
fig2 = sm.graphics.plot_regress_exog(model2, 'tMreads', fig=fig)
               Mixed Linear Model Regression Results
===================================================================
Model:            MixedLM Dependent Variable: td_Mean_Contamination
No. Observations: 96      Method:             REML                 
No. Groups:       6       Scale:              48.3367              
Min. group size:  16      Likelihood:         -331.4206            
Max. group size:  16      Converged:          Yes                  
Mean group size:  16.0                                             
----------------------------------------------------------------------
              Coef.     Std.Err.      z      P>|z|    [0.025    0.975]
----------------------------------------------------------------------
Intercept     47.905      15.522    3.086    0.002    17.482    78.329
tMreads        0.660       0.443    1.492    0.136    -0.207     1.528
Group Var    217.245      20.703                                      
===================================================================

                              OLS Regression Results                             
=================================================================================
Dep. Variable:     td_Mean_Contamination   R-squared:                       0.152
Model:                               OLS   Adj. R-squared:                  0.143
Method:                    Least Squares   F-statistic:                     16.82
Date:                   Sun, 28 Mar 2021   Prob (F-statistic):           8.72e-05
Time:                           22:28:56   Log-Likelihood:                -393.32
No. Observations:                     96   AIC:                             790.6
Df Residuals:                         94   BIC:                             795.8
Df Model:                              1                                         
Covariance Type:               nonrobust                                         
==============================================================================
                 coef    std err          t      P>|t|      [0.025      0.975]
------------------------------------------------------------------------------
Intercept     13.3043     13.717      0.970      0.335     -13.931      40.540
tMreads        1.7316      0.422      4.102      0.000       0.893       2.570
==============================================================================
Omnibus:                       13.222   Durbin-Watson:                   0.517
Prob(Omnibus):                  0.001   Jarque-Bera (JB):               14.931
Skew:                           0.963   Prob(JB):                     0.000573
Kurtosis:                       3.151   Cond. No.                         297.
==============================================================================

Warnings:
[1] Standard Errors assume that the covariance matrix of the errors is correctly specified.
                             OLS Regression Results                             
================================================================================
Dep. Variable:     r_Mean_Contamination   R-squared:                       0.115
Model:                              OLS   Adj. R-squared:                  0.073
Method:                   Least Squares   F-statistic:                     2.733
Date:                  Sun, 28 Mar 2021   Prob (F-statistic):              0.113
Time:                          22:28:56   Log-Likelihood:                -101.59
No. Observations:                    23   AIC:                             207.2
Df Residuals:                        21   BIC:                             209.4
Df Model:                             1                                         
Covariance Type:              nonrobust                                         
==============================================================================
                 coef    std err          t      P>|t|      [0.025      0.975]
------------------------------------------------------------------------------
Intercept     31.1563     23.136      1.347      0.192     -16.957      79.270
tMreads        1.1395      0.689      1.653      0.113      -0.294       2.573
==============================================================================
Omnibus:                        1.877   Durbin-Watson:                   1.151
Prob(Omnibus):                  0.391   Jarque-Bera (JB):                0.610
Skew:                           0.223   Prob(JB):                        0.737
Kurtosis:                       3.662   Cond. No.                         178.
==============================================================================

Warnings:
[1] Standard Errors assume that the covariance matrix of the errors is correctly specified.
<Figure size 864x576 with 0 Axes>
#Average MAG contamination was not correlated with base counts of trimmed and decontaminated reads
#Average MAG contamination was not correlated with base counts of raw reads at alpha = 0.5

import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import statsmodels.api as sm
import statsmodels.formula.api as smf
from statsmodels.formula.api import ols

#data
Mix_Group = ['10158.6', '10158.6', '10158.6', '10158.6', '10158.6', '10158.6', '10158.6', '10158.6', '10158.6', '10158.6', '10158.6', '10158.6', '10158.6', '10158.6', '10158.6', '10158.6', '9117.8', '9117.8', '9117.8', '9117.8', '9117.8', '9117.8', '9117.8', '9117.8', '9117.8', '9117.8', '9117.8', '9117.8', '9117.8', '9117.8', '9117.8', '9117.8', '9108.2', '9108.2', '9108.2', '9108.2', '9108.2', '9108.2', '9108.2', '9108.2', '9108.2', '9108.2', '9108.2', '9108.2', '9108.2', '9108.2', '9108.2', '9108.2', '9117.7', '9117.7', '9117.7', '9117.7', '9117.7', '9117.7', '9117.7', '9117.7', '9117.7', '9117.7', '9117.7', '9117.7', '9117.7', '9117.7', '9117.7', '9117.7', '11306.3', '11306.3', '11306.3', '11306.3', '11306.3', '11306.3', '11306.3', '11306.3', '11306.3', '11306.3', '11306.3', '11306.3', '11306.3', '11306.3', '11306.3', '11306.3', '9117.4', '9117.4', '9117.4', '9117.4', '9117.4', '9117.4', '9117.4', '9117.4', '9117.4', '9117.4', '9117.4', '9117.4', '9117.4', '9117.4', '9117.4', '9117.4']
td_read_files = ['10158.6_raw', '10158.6_qc', '10158.6_trim150', '10158.6_ftrim', '10158.6_ktrim', '10158.6_atrim', '10158.6_aqbtrim', '10158.6_aqtrim', '10158.6_qbtrim', '10158.6_qtrim', '10158.6_bb1', '10158.6_bb2', '10158.6_bb3', '10158.6_bb4', '10158.6_bb5', '10158.6_bb6', '9117.8_raw', '9117.8_qc', '9117.8_trim150', '9117.8_ftrim', '9117.8_ktrim', '9117.8_atrim', '9117.8_aqbtrim', '9117.8_aqtrim', '9117.8_qbtrim', '9117.8_qtrim', '9117.8_bb1', '9117.8_bb2', '9117.8_bb3', '9117.8_bb4', '9117.8_bb5', '9117.8_bb6', '9108.2_raw', '9108.2_qc', '9108.2_trim150', '9108.2_ftrim', '9108.2_ktrim', '9108.2_atrim', '9108.2_aqbtrim', '9108.2_aqtrim', '9108.2_qbtrim', '9108.2_qtrim', '9108.2_bb1', '9108.2_bb2', '9108.2_bb3', '9108.2_bb4', '9108.2_bb5', '9108.2_bb6', '9117.7_raw', '9117.7_qc', '9117.7_trim150', '9117.7_ftrimmed', '9117.7_ktrimmed', '9117.7_atrimmed', '9117.7_aqbtrimmed', '9117.7_aqtrimmed', '9117.7_qbtrimmed', '9117.7_qtrimmed', '9117.7_bb1', '9117.7_bb2', '9117.7_bb3', '9117.7_bb4', '9117.7_bb5', '9117.7_bb6', '11306.3_raw', '11306.3_qc', '11306.3_trim150', '11306.3_ftrimmed', '11306.3_ktrimmed', '11306.3_atrimmed', '11306.3_aqbtrimmed', '11306.3_aqtrimmed', '11306.3_qbtrimmed', '11306.3_qtrimmed', '11306.3_bb1', '11306.3_bb2', '11306.3_bb3', '11306.3_bb4', '11306.3_bb5', '11306.3_bb6', '9117.4_raw', '9117.4_qc', '9117.4_trim150', '9117.4_ftrimmed', '9117.4_ktrimmed', '9117.4_atrimmed', '9117.4_aqbtrimmed', '9117.4_aqtrimmed', '9117.4_qbtrimmed', '9117.4_qtrimmed', '9117.4_bb1', '9117.4_bb2', '9117.4_bb3', '9117.4_bb4', '9117.4_bb5', '9117.4_bb6']
td_tMreads = [36.0129894, 35.2337896, 36.0129894, 36.0129894, 35.8983284, 35.933862, 34.552143, 31.2682706, 34.3282984, 30.6449696, 35.8442254, 31.2345964, 30.615651, 35.3058416, 34.4731868, 34.2527702, 28.2058246, 26.5787996, 28.2058246, 28.2058246, 27.66738, 27.6666568, 27.2469874, 25.5238858, 27.2469874, 25.2340502, 26.886397, 24.8035162, 24.519662, 26.7855648, 26.4733418, 26.3827484, 39.9148934, 37.3791976, 39.9148934, 39.9148934, 38.7122998, 38.708094, 37.7905962, 34.6650054, 37.6107554, 34.1456456, 37.8858836, 33.921016, 33.4111558, 37.7734844, 36.9821596, 36.8056446, 33.4711394, 32.6004118, 33.4711394, 33.4711394, 33.04149, 33.0383456, 30.3924568, 32.5038064, 30.0542808, 32.3956334, 32.9696938, 30.3601654, 30.025172, 32.8646394, 32.4398904, 32.3335082, 31.8428354, 31.3942576, 31.8428354, 31.8428354, 31.637986, 31.6358996, 29.2361886, 31.0176556, 28.9929926, 30.9364282, 31.6358006, 29.2361776, 28.9929822, 31.5146638, 31.0175998, 30.9363772, 34.9441596, 32.3312534, 34.9441596, 34.9441596, 33.4723724, 33.4683444, 30.8153706, 32.9387816, 30.4748512, 32.8289102, 32.6738296, 30.0963168, 29.7659662, 32.561628, 32.1535464, 32.0471204]
td_Bbases = [54.379613994, 52.641056249, 54.0194841, 51.498574842, 50.780358538, 53.669962132, 51.549471661, 45.964065176, 48.624647571, 42.845175292, 53.535374086, 45.919226343, 42.80751312, 53.311820816, 51.43795566, 48.522733084, 42.590795146, 40.133987396, 42.3087369, 40.334329178, 39.44260618, 41.667343504, 40.776782835, 37.725547193, 40.776782835, 35.427126679, 40.490156646, 36.660246851, 34.422663647, 40.446202848, 39.621053097, 37.463971015, 60.271489034, 56.442588376, 59.8723401, 57.078297562, 55.202040634, 58.310667004, 56.435790641, 51.019814812, 53.310062558, 47.762707137, 57.069909524, 49.92030233, 46.728770205, 57.037961444, 55.229109813, 52.166971365, 50.541420494, 49.226621818, 50.2067091, 47.863729342, 47.116772654, 49.7656626, 44.974095956, 48.667868199, 42.241274113, 46.022382453, 49.6630349, 44.930324539, 42.203344352, 49.625605494, 48.576089344, 45.937352987, 48.082681454, 47.024019995, 47.7642531, 45.535254622, 45.103175218, 47.640851402, 43.590661125, 46.584185484, 41.025529046, 44.063678425, 47.640703902, 43.590646205, 41.025515524, 47.587142338, 46.584108342, 44.063610873, 52.765680996, 48.820192634, 52.4162394, 49.970148228, 47.71891569, 50.406042552, 45.62052235, 49.326011714, 42.846059312, 46.640424087, 49.207539094, 44.559568777, 41.852323333, 49.16805828, 48.15227162, 45.531321684]
td_bins = [78, 85, 82, 83, 85, 83, 82, 78, 79, 63, 90, 72, 67, 78, 85, 83, 65, 62, 64, 52, 55, 59, 59, 56, 54, 50, 62, 55, 50, 60, 61, 53, 69, 76, 74, 75, 75, 71, 73, 72, 68, 65, 74, 71, 66, 74, 74, 65, 95, 103, 107, 96, 97, 103, 88, 98, 79, 81, 104, 90, 76, 101, 101, 90, 99, 100, 96, 97, 94, 97, 96, 97, 95, 82, 101, 96, 80, 97, 96, 91, 70, 68, 70, 65, 68, 69, 64, 61, 65, 64, 77, 62, 64, 78, 65, 62]
td_Mean_Completeness = [53.86, 53.28, 56.31, 57.37, 54.68, 50.1, 58.05, 54.04, 51.16, 54.76, 51.12, 52.23, 57.52, 52.15, 58.83, 54.29, 55.06, 54.27, 53.37, 53.16, 53.77, 57.81, 54.55, 54.88, 54.17, 52.29, 54.94, 52.04, 51.41, 58.36, 54.69, 56.81, 56.66, 53.72, 57.38, 56.22, 52.7, 52.65, 55.11, 54.88, 57.48, 54.64, 53.6, 54.34, 58.18, 52.11, 56.26, 58.28, 55.64, 57.45, 58.55, 54.54, 57.45, 58.65, 55.18, 55.5, 58.25, 60.02, 55.82, 58.17, 60.34, 56.8, 57.0, 59.14, 54.4, 51.04, 58.28, 56.73, 48.21, 51.68, 52.66, 56.77, 55.24, 57.44, 58.56, 57.33, 54.41, 56.39, 53.78, 58.04, 57.17, 59.53, 56.07, 51.82, 58.95, 60.91, 56.23, 56.59, 60.48, 59.56, 56.87, 61.29, 54.39, 60.71, 53.56, 53.89]
td_Mean_Contamination = [64.8, 56.57, 63.75, 56.14, 63.41, 65.77, 68.71, 59.0, 51.31, 53.39, 53.35, 56.24, 59.82, 71.72, 69.63, 66.78, 53.25, 46.8, 58.27, 54.69, 48.23, 50.47, 57.38, 53.78, 50.02, 45.22, 47.86, 47.19, 54.22, 48.53, 54.71, 53.82, 83.63, 71.19, 88.26, 89.13, 73.4, 65.59, 87.02, 85.4, 79.15, 67.55, 71.37, 72.84, 82.49, 64.29, 92.78, 85.56, 107.95, 99.99, 99.71, 85.91, 99.47, 78.31, 82.71, 107.35, 97.95, 92.98, 87.49, 93.41, 102.6, 77.96, 97.57, 99.91, 83.7, 63.09, 68.23, 69.77, 73.61, 78.21, 70.56, 69.82, 69.56, 64.5, 63.16, 82.06, 72.7, 69.26, 75.71, 62.79, 62.01, 54.02, 69.1, 58.79, 55.27, 51.99, 60.61, 57.98, 59.85, 65.24, 56.41, 58.49, 58.44, 53.79, 57.06, 54.33]
td_good_bins = [21, 19, 19, 14, 15, 20, 20, 17, 17, 12, 19, 16, 12, 22, 20, 17, 16, 16, 17, 17, 15, 17, 19, 15, 15, 14, 16, 16, 15, 18, 19, 16, 18, 17, 17, 16, 16, 16, 17, 17, 14, 15, 17, 15, 15, 19, 15, 17, 22, 21, 18, 19, 21, 22, 17, 21, 21, 16, 23, 16, 17, 24, 20, 21, 19, 21, 21, 15, 18, 19, 20, 18, 15, 17, 19, 18, 18, 18, 19, 16, 17, 17, 19, 18, 18, 17, 19, 17, 19, 18, 18, 16, 18, 21, 20, 17]
td_good_Mean_Completeness = [85.98, 87.17, 86.9, 87.66, 86.64, 86.04, 85.18, 86.79, 86.35, 86.03, 88.47, 85.58, 89.46, 86.17, 83.85, 86.86, 87.87, 87.38, 87.61, 86.94, 87.35, 86.96, 88.23, 88.04, 88.62, 90.16, 88.26, 89.11, 86.1, 88.26, 87.55, 87.21, 87.51, 86.6, 87.87, 87.83, 86.62, 87.87, 87.12, 87.7, 87.06, 87.94, 87.37, 85.69, 87.28, 85.92, 88.11, 86.67, 87.68, 87.33, 87.89, 88.48, 89.21, 88.4, 86.1, 86.69, 87.99, 88.53, 89.18, 87.33, 86.83, 88.58, 87.12, 87.34, 88.54, 86.69, 87.03, 86.06, 88.99, 86.81, 86.17, 86.12, 87.78, 85.64, 86.41, 87.08, 85.98, 88.56, 87.42, 87.18, 85.99, 87.07, 86.97, 86.86, 86.75, 89.43, 86.51, 86.19, 86.17, 85.87, 86.68, 87.63, 86.26, 89.38, 87.22, 87.03]
td_good_Mean_Contamination = [5.06, 4.78, 4.28, 4.58, 3.83, 4.25, 4.66, 4.68, 4.57, 4.01, 4.01, 4.34, 4.43, 4.96, 4.23, 5.17, 3.79, 3.63, 3.76, 3.61, 3.99, 3.66, 4.04, 3.17, 3.67, 3.37, 4.3, 3.39, 3.89, 3.63, 4.05, 4.51, 3.87, 3.93, 3.27, 3.48, 4.16, 4.88, 3.89, 3.16, 4.65, 4.07, 4.41, 4.22, 4.07, 3.78, 3.85, 4.16, 3.92, 3.06, 4.2, 4.41, 3.21, 3.81, 4.31, 3.64, 3.84, 4.09, 3.84, 3.64, 3.48, 3.71, 3.4, 4.06, 3.76, 4.32, 4.61, 4.74, 3.56, 4.54, 4.1, 4.05, 4.26, 4.31, 4.84, 4.06, 3.88, 4.59, 4.63, 3.76, 3.57, 3.88, 3.56, 3.5, 3.68, 4.71, 3.46, 3.84, 3.94, 3.61, 3.5, 3.36, 3.4, 3.57, 3.91, 4.21]
r_read_files = ['9117.5_raw', '10158.8_raw', '11263.1_raw', '11306.3_raw', '11306.1_raw', '11260.6_raw', '11260.5_raw', '9108.1_raw', '9053.2_raw', '9672.8_raw', '9108.2_raw', '9053.4_raw', '9053.3_raw', '9117.4_raw', '9117.6_raw', '9117.7_raw', '9117.8_raw', '10158.6_raw', '10186.3_raw', '10186.4_raw', '7331.1_raw', '9053.5_raw', '9041.8_raw']
r_tMreads = [36.0129894, 17.6218972, 38.2800142, 34.9076424, 35.3037194, 37.1504476, 40.3613864, 20.7773948, 31.8428354, 30.1166938, 27.718318, 40.8492618, 39.7169858, 34.4581152, 26.9696492, 21.3309852, 39.9148934, 34.9441596, 35.690255, 35.5019026, 33.4711394, 28.2058246, 36.96984]
r_Bbases = [54.379613994, 26.609064772, 57.802821442, 52.710540024, 53.308616294, 56.097175876, 60.945693464, 31.373866148, 48.082681454, 45.1750407, 41.85466018, 61.682385318, 59.972648558, 52.031753952, 40.724170292, 32.209787652, 60.271489034, 52.765680996, 53.89228505, 53.607872926, 50.541420494, 42.590795146, 55.8244584]
r_bins = [65, 47, 139, 99, 55, 90, 115, 38, 86, 87, 69, 71, 95, 70, 62, 95, 65, 78, 85, 109, 45, 49, 52]
r_Mean_Completeness = [45.96, 58.83, 51.28, 56.54, 56.55, 63.26, 52.23, 58.47, 54.69, 53.7, 58.18, 60.32, 62.41, 65.52, 52.14, 50.97, 56.26, 57.0, 65.39, 54.89, 53.78, 53.56, 52.81]
r_Mean_Contamination = [24.03, 69.63, 67.13, 60.4, 53.21, 81.43, 55.06, 35.14, 54.71, 74.62, 54.7, 76.81, 66.86, 78.94, 67.85, 46.5, 92.78, 97.57, 121.14, 103.47, 75.71, 57.06, 65.71]
r_good_bins = [15, 4, 23, 19, 14, 25, 29, 9, 20, 20, 18, 18, 23, 17, 10, 22, 16, 21, 18, 20, 9, 9, 14]
r_good_Mean_Completeness = [84.79, 83.85, 89.19, 87.14, 90.06, 87.41, 90.7, 86.6, 87.55, 85.4, 87.31, 86.68, 86.18, 84.94, 87.02, 88.92, 88.11, 87.12, 87.24, 89.3, 87.42, 87.22, 86.64]
r_good_Mean_Contamination = [3.37, 4.23, 3.57, 4.4, 3.62, 4.4, 4.56, 3.0, 4.05, 4.32, 4.53, 3.9, 4.48, 4.17, 2.86, 2.87, 3.85, 3.4, 3.63, 4.37, 4.63, 3.91, 4.25]

#create dataset
df = pd.DataFrame({'td_tMreads': td_tMreads,
                   'td_Bbases': td_Bbases,
                   'td_bins': td_bins,
                   'td_Mean_Completeness': td_Mean_Completeness,
                   'td_Mean_Contamination': td_Mean_Contamination,
                   'td_good_bins': td_good_bins,
                   'td_good_Mean_Completeness': td_good_Mean_Completeness,
                   'td_good_Mean_Contamination': td_good_Mean_Contamination,
                   'Mix_Group': Mix_Group})
df.rename(columns={'td_tMreads': 'tMreads', 'td_Bbases': 'Bbases'}, inplace=True)

df2 = pd.DataFrame({'r_tMreads': r_tMreads,
                   'r_Bbases': r_Bbases,
                   'r_bins': r_bins,
                   'r_Mean_Completeness': r_Mean_Completeness,
                   'r_Mean_Contamination': r_Mean_Contamination,
                   'r_good_bins': r_good_bins,
                   'r_good_Mean_Completeness': r_good_Mean_Completeness,
                   'r_good_Mean_Contamination': r_good_Mean_Contamination})
df2.rename(columns={'r_tMreads': 'tMreads', 'r_Bbases': 'Bbases'}, inplace=True)

#view dataset
#print(df)

#fit regression model
model = smf.mixedlm("td_Mean_Contamination ~ Bbases", data=df, groups=df["Mix_Group"])
modelf = model.fit()
model1 = ols('td_Mean_Contamination ~ Bbases', data=df).fit()
model2 = ols('r_Mean_Contamination ~ Bbases', data=df2).fit()

#mdf = md.fit()
#print(mdf.summary())

#view model summary
print(modelf.summary())
print(model1.summary())
print(model2.summary())

#define figure size
fig = plt.figure(figsize=(12,8))
fig2 = plt.figure(figsize=(12,8))

#produce regression plots
fig = sm.graphics.plot_regress_exog(model1, 'Bbases', fig=fig)
fig2 = sm.graphics.plot_regress_exog(model2, 'Bbases', fig=fig)
               Mixed Linear Model Regression Results
===================================================================
Model:            MixedLM Dependent Variable: td_Mean_Contamination
No. Observations: 96      Method:             REML                 
No. Groups:       6       Scale:              48.5993              
Min. group size:  16      Likelihood:         -332.4019            
Max. group size:  16      Converged:          Yes                  
Mean group size:  16.0                                             
----------------------------------------------------------------------
              Coef.     Std.Err.      z      P>|z|    [0.025    0.975]
----------------------------------------------------------------------
Intercept     55.584      12.528    4.437    0.000    31.029    80.139
Bbases         0.288       0.230    1.251    0.211    -0.163     0.738
Group Var    224.759      21.324                                      
===================================================================

                              OLS Regression Results                             
=================================================================================
Dep. Variable:     td_Mean_Contamination   R-squared:                       0.136
Model:                               OLS   Adj. R-squared:                  0.127
Method:                    Least Squares   F-statistic:                     14.83
Date:                   Sun, 28 Mar 2021   Prob (F-statistic):           0.000215
Time:                           22:29:16   Log-Likelihood:                -394.19
No. Observations:                     96   AIC:                             792.4
Df Residuals:                         94   BIC:                             797.5
Df Model:                              1                                         
Covariance Type:               nonrobust                                         
==============================================================================
                 coef    std err          t      P>|t|      [0.025      0.975]
------------------------------------------------------------------------------
Intercept     19.1996     13.079      1.468      0.145      -6.770      45.169
Bbases         1.0544      0.274      3.851      0.000       0.511       1.598
==============================================================================
Omnibus:                       12.843   Durbin-Watson:                   0.551
Prob(Omnibus):                  0.002   Jarque-Bera (JB):               14.479
Skew:                           0.950   Prob(JB):                     0.000718
Kurtosis:                       3.108   Cond. No.                         413.
==============================================================================

Warnings:
[1] Standard Errors assume that the covariance matrix of the errors is correctly specified.
                             OLS Regression Results                             
================================================================================
Dep. Variable:     r_Mean_Contamination   R-squared:                       0.115
Model:                              OLS   Adj. R-squared:                  0.073
Method:                   Least Squares   F-statistic:                     2.722
Date:                  Sun, 28 Mar 2021   Prob (F-statistic):              0.114
Time:                          22:29:16   Log-Likelihood:                -101.59
No. Observations:                    23   AIC:                             207.2
Df Residuals:                        21   BIC:                             209.5
Df Model:                             1                                         
Covariance Type:              nonrobust                                         
==============================================================================
                 coef    std err          t      P>|t|      [0.025      0.975]
------------------------------------------------------------------------------
Intercept     31.2555     23.121      1.352      0.191     -16.827      79.338
Bbases         0.7528      0.456      1.650      0.114      -0.196       1.702
==============================================================================
Omnibus:                        1.868   Durbin-Watson:                   1.152
Prob(Omnibus):                  0.393   Jarque-Bera (JB):                0.604
Skew:                           0.222   Prob(JB):                        0.739
Kurtosis:                       3.658   Cond. No.                         268.
==============================================================================

Warnings:
[1] Standard Errors assume that the covariance matrix of the errors is correctly specified.
<Figure size 864x576 with 0 Axes>
#Good MAG counts were correlated with read counts of trimmed and decontaminated reads
#Good MAG counts were correlated with read counts of raw reads at alpha = 0.5

import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import statsmodels.api as sm
import statsmodels.formula.api as smf
from statsmodels.formula.api import ols

#data
Mix_Group = ['10158.6', '10158.6', '10158.6', '10158.6', '10158.6', '10158.6', '10158.6', '10158.6', '10158.6', '10158.6', '10158.6', '10158.6', '10158.6', '10158.6', '10158.6', '10158.6', '9117.8', '9117.8', '9117.8', '9117.8', '9117.8', '9117.8', '9117.8', '9117.8', '9117.8', '9117.8', '9117.8', '9117.8', '9117.8', '9117.8', '9117.8', '9117.8', '9108.2', '9108.2', '9108.2', '9108.2', '9108.2', '9108.2', '9108.2', '9108.2', '9108.2', '9108.2', '9108.2', '9108.2', '9108.2', '9108.2', '9108.2', '9108.2', '9117.7', '9117.7', '9117.7', '9117.7', '9117.7', '9117.7', '9117.7', '9117.7', '9117.7', '9117.7', '9117.7', '9117.7', '9117.7', '9117.7', '9117.7', '9117.7', '11306.3', '11306.3', '11306.3', '11306.3', '11306.3', '11306.3', '11306.3', '11306.3', '11306.3', '11306.3', '11306.3', '11306.3', '11306.3', '11306.3', '11306.3', '11306.3', '9117.4', '9117.4', '9117.4', '9117.4', '9117.4', '9117.4', '9117.4', '9117.4', '9117.4', '9117.4', '9117.4', '9117.4', '9117.4', '9117.4', '9117.4', '9117.4']
td_read_files = ['10158.6_raw', '10158.6_qc', '10158.6_trim150', '10158.6_ftrim', '10158.6_ktrim', '10158.6_atrim', '10158.6_aqbtrim', '10158.6_aqtrim', '10158.6_qbtrim', '10158.6_qtrim', '10158.6_bb1', '10158.6_bb2', '10158.6_bb3', '10158.6_bb4', '10158.6_bb5', '10158.6_bb6', '9117.8_raw', '9117.8_qc', '9117.8_trim150', '9117.8_ftrim', '9117.8_ktrim', '9117.8_atrim', '9117.8_aqbtrim', '9117.8_aqtrim', '9117.8_qbtrim', '9117.8_qtrim', '9117.8_bb1', '9117.8_bb2', '9117.8_bb3', '9117.8_bb4', '9117.8_bb5', '9117.8_bb6', '9108.2_raw', '9108.2_qc', '9108.2_trim150', '9108.2_ftrim', '9108.2_ktrim', '9108.2_atrim', '9108.2_aqbtrim', '9108.2_aqtrim', '9108.2_qbtrim', '9108.2_qtrim', '9108.2_bb1', '9108.2_bb2', '9108.2_bb3', '9108.2_bb4', '9108.2_bb5', '9108.2_bb6', '9117.7_raw', '9117.7_qc', '9117.7_trim150', '9117.7_ftrimmed', '9117.7_ktrimmed', '9117.7_atrimmed', '9117.7_aqbtrimmed', '9117.7_aqtrimmed', '9117.7_qbtrimmed', '9117.7_qtrimmed', '9117.7_bb1', '9117.7_bb2', '9117.7_bb3', '9117.7_bb4', '9117.7_bb5', '9117.7_bb6', '11306.3_raw', '11306.3_qc', '11306.3_trim150', '11306.3_ftrimmed', '11306.3_ktrimmed', '11306.3_atrimmed', '11306.3_aqbtrimmed', '11306.3_aqtrimmed', '11306.3_qbtrimmed', '11306.3_qtrimmed', '11306.3_bb1', '11306.3_bb2', '11306.3_bb3', '11306.3_bb4', '11306.3_bb5', '11306.3_bb6', '9117.4_raw', '9117.4_qc', '9117.4_trim150', '9117.4_ftrimmed', '9117.4_ktrimmed', '9117.4_atrimmed', '9117.4_aqbtrimmed', '9117.4_aqtrimmed', '9117.4_qbtrimmed', '9117.4_qtrimmed', '9117.4_bb1', '9117.4_bb2', '9117.4_bb3', '9117.4_bb4', '9117.4_bb5', '9117.4_bb6']
td_tMreads = [36.0129894, 35.2337896, 36.0129894, 36.0129894, 35.8983284, 35.933862, 34.552143, 31.2682706, 34.3282984, 30.6449696, 35.8442254, 31.2345964, 30.615651, 35.3058416, 34.4731868, 34.2527702, 28.2058246, 26.5787996, 28.2058246, 28.2058246, 27.66738, 27.6666568, 27.2469874, 25.5238858, 27.2469874, 25.2340502, 26.886397, 24.8035162, 24.519662, 26.7855648, 26.4733418, 26.3827484, 39.9148934, 37.3791976, 39.9148934, 39.9148934, 38.7122998, 38.708094, 37.7905962, 34.6650054, 37.6107554, 34.1456456, 37.8858836, 33.921016, 33.4111558, 37.7734844, 36.9821596, 36.8056446, 33.4711394, 32.6004118, 33.4711394, 33.4711394, 33.04149, 33.0383456, 30.3924568, 32.5038064, 30.0542808, 32.3956334, 32.9696938, 30.3601654, 30.025172, 32.8646394, 32.4398904, 32.3335082, 31.8428354, 31.3942576, 31.8428354, 31.8428354, 31.637986, 31.6358996, 29.2361886, 31.0176556, 28.9929926, 30.9364282, 31.6358006, 29.2361776, 28.9929822, 31.5146638, 31.0175998, 30.9363772, 34.9441596, 32.3312534, 34.9441596, 34.9441596, 33.4723724, 33.4683444, 30.8153706, 32.9387816, 30.4748512, 32.8289102, 32.6738296, 30.0963168, 29.7659662, 32.561628, 32.1535464, 32.0471204]
td_Bbases = [54.379613994, 52.641056249, 54.0194841, 51.498574842, 50.780358538, 53.669962132, 51.549471661, 45.964065176, 48.624647571, 42.845175292, 53.535374086, 45.919226343, 42.80751312, 53.311820816, 51.43795566, 48.522733084, 42.590795146, 40.133987396, 42.3087369, 40.334329178, 39.44260618, 41.667343504, 40.776782835, 37.725547193, 40.776782835, 35.427126679, 40.490156646, 36.660246851, 34.422663647, 40.446202848, 39.621053097, 37.463971015, 60.271489034, 56.442588376, 59.8723401, 57.078297562, 55.202040634, 58.310667004, 56.435790641, 51.019814812, 53.310062558, 47.762707137, 57.069909524, 49.92030233, 46.728770205, 57.037961444, 55.229109813, 52.166971365, 50.541420494, 49.226621818, 50.2067091, 47.863729342, 47.116772654, 49.7656626, 44.974095956, 48.667868199, 42.241274113, 46.022382453, 49.6630349, 44.930324539, 42.203344352, 49.625605494, 48.576089344, 45.937352987, 48.082681454, 47.024019995, 47.7642531, 45.535254622, 45.103175218, 47.640851402, 43.590661125, 46.584185484, 41.025529046, 44.063678425, 47.640703902, 43.590646205, 41.025515524, 47.587142338, 46.584108342, 44.063610873, 52.765680996, 48.820192634, 52.4162394, 49.970148228, 47.71891569, 50.406042552, 45.62052235, 49.326011714, 42.846059312, 46.640424087, 49.207539094, 44.559568777, 41.852323333, 49.16805828, 48.15227162, 45.531321684]
td_bins = [78, 85, 82, 83, 85, 83, 82, 78, 79, 63, 90, 72, 67, 78, 85, 83, 65, 62, 64, 52, 55, 59, 59, 56, 54, 50, 62, 55, 50, 60, 61, 53, 69, 76, 74, 75, 75, 71, 73, 72, 68, 65, 74, 71, 66, 74, 74, 65, 95, 103, 107, 96, 97, 103, 88, 98, 79, 81, 104, 90, 76, 101, 101, 90, 99, 100, 96, 97, 94, 97, 96, 97, 95, 82, 101, 96, 80, 97, 96, 91, 70, 68, 70, 65, 68, 69, 64, 61, 65, 64, 77, 62, 64, 78, 65, 62]
td_Mean_Completeness = [53.86, 53.28, 56.31, 57.37, 54.68, 50.1, 58.05, 54.04, 51.16, 54.76, 51.12, 52.23, 57.52, 52.15, 58.83, 54.29, 55.06, 54.27, 53.37, 53.16, 53.77, 57.81, 54.55, 54.88, 54.17, 52.29, 54.94, 52.04, 51.41, 58.36, 54.69, 56.81, 56.66, 53.72, 57.38, 56.22, 52.7, 52.65, 55.11, 54.88, 57.48, 54.64, 53.6, 54.34, 58.18, 52.11, 56.26, 58.28, 55.64, 57.45, 58.55, 54.54, 57.45, 58.65, 55.18, 55.5, 58.25, 60.02, 55.82, 58.17, 60.34, 56.8, 57.0, 59.14, 54.4, 51.04, 58.28, 56.73, 48.21, 51.68, 52.66, 56.77, 55.24, 57.44, 58.56, 57.33, 54.41, 56.39, 53.78, 58.04, 57.17, 59.53, 56.07, 51.82, 58.95, 60.91, 56.23, 56.59, 60.48, 59.56, 56.87, 61.29, 54.39, 60.71, 53.56, 53.89]
td_Mean_Contamination = [64.8, 56.57, 63.75, 56.14, 63.41, 65.77, 68.71, 59.0, 51.31, 53.39, 53.35, 56.24, 59.82, 71.72, 69.63, 66.78, 53.25, 46.8, 58.27, 54.69, 48.23, 50.47, 57.38, 53.78, 50.02, 45.22, 47.86, 47.19, 54.22, 48.53, 54.71, 53.82, 83.63, 71.19, 88.26, 89.13, 73.4, 65.59, 87.02, 85.4, 79.15, 67.55, 71.37, 72.84, 82.49, 64.29, 92.78, 85.56, 107.95, 99.99, 99.71, 85.91, 99.47, 78.31, 82.71, 107.35, 97.95, 92.98, 87.49, 93.41, 102.6, 77.96, 97.57, 99.91, 83.7, 63.09, 68.23, 69.77, 73.61, 78.21, 70.56, 69.82, 69.56, 64.5, 63.16, 82.06, 72.7, 69.26, 75.71, 62.79, 62.01, 54.02, 69.1, 58.79, 55.27, 51.99, 60.61, 57.98, 59.85, 65.24, 56.41, 58.49, 58.44, 53.79, 57.06, 54.33]
td_good_bins = [21, 19, 19, 14, 15, 20, 20, 17, 17, 12, 19, 16, 12, 22, 20, 17, 16, 16, 17, 17, 15, 17, 19, 15, 15, 14, 16, 16, 15, 18, 19, 16, 18, 17, 17, 16, 16, 16, 17, 17, 14, 15, 17, 15, 15, 19, 15, 17, 22, 21, 18, 19, 21, 22, 17, 21, 21, 16, 23, 16, 17, 24, 20, 21, 19, 21, 21, 15, 18, 19, 20, 18, 15, 17, 19, 18, 18, 18, 19, 16, 17, 17, 19, 18, 18, 17, 19, 17, 19, 18, 18, 16, 18, 21, 20, 17]
td_good_Mean_Completeness = [85.98, 87.17, 86.9, 87.66, 86.64, 86.04, 85.18, 86.79, 86.35, 86.03, 88.47, 85.58, 89.46, 86.17, 83.85, 86.86, 87.87, 87.38, 87.61, 86.94, 87.35, 86.96, 88.23, 88.04, 88.62, 90.16, 88.26, 89.11, 86.1, 88.26, 87.55, 87.21, 87.51, 86.6, 87.87, 87.83, 86.62, 87.87, 87.12, 87.7, 87.06, 87.94, 87.37, 85.69, 87.28, 85.92, 88.11, 86.67, 87.68, 87.33, 87.89, 88.48, 89.21, 88.4, 86.1, 86.69, 87.99, 88.53, 89.18, 87.33, 86.83, 88.58, 87.12, 87.34, 88.54, 86.69, 87.03, 86.06, 88.99, 86.81, 86.17, 86.12, 87.78, 85.64, 86.41, 87.08, 85.98, 88.56, 87.42, 87.18, 85.99, 87.07, 86.97, 86.86, 86.75, 89.43, 86.51, 86.19, 86.17, 85.87, 86.68, 87.63, 86.26, 89.38, 87.22, 87.03]
td_good_Mean_Contamination = [5.06, 4.78, 4.28, 4.58, 3.83, 4.25, 4.66, 4.68, 4.57, 4.01, 4.01, 4.34, 4.43, 4.96, 4.23, 5.17, 3.79, 3.63, 3.76, 3.61, 3.99, 3.66, 4.04, 3.17, 3.67, 3.37, 4.3, 3.39, 3.89, 3.63, 4.05, 4.51, 3.87, 3.93, 3.27, 3.48, 4.16, 4.88, 3.89, 3.16, 4.65, 4.07, 4.41, 4.22, 4.07, 3.78, 3.85, 4.16, 3.92, 3.06, 4.2, 4.41, 3.21, 3.81, 4.31, 3.64, 3.84, 4.09, 3.84, 3.64, 3.48, 3.71, 3.4, 4.06, 3.76, 4.32, 4.61, 4.74, 3.56, 4.54, 4.1, 4.05, 4.26, 4.31, 4.84, 4.06, 3.88, 4.59, 4.63, 3.76, 3.57, 3.88, 3.56, 3.5, 3.68, 4.71, 3.46, 3.84, 3.94, 3.61, 3.5, 3.36, 3.4, 3.57, 3.91, 4.21]
r_read_files = ['9117.5_raw', '10158.8_raw', '11263.1_raw', '11306.3_raw', '11306.1_raw', '11260.6_raw', '11260.5_raw', '9108.1_raw', '9053.2_raw', '9672.8_raw', '9108.2_raw', '9053.4_raw', '9053.3_raw', '9117.4_raw', '9117.6_raw', '9117.7_raw', '9117.8_raw', '10158.6_raw', '10186.3_raw', '10186.4_raw', '7331.1_raw', '9053.5_raw', '9041.8_raw']
r_tMreads = [36.0129894, 17.6218972, 38.2800142, 34.9076424, 35.3037194, 37.1504476, 40.3613864, 20.7773948, 31.8428354, 30.1166938, 27.718318, 40.8492618, 39.7169858, 34.4581152, 26.9696492, 21.3309852, 39.9148934, 34.9441596, 35.690255, 35.5019026, 33.4711394, 28.2058246, 36.96984]
r_Bbases = [54.379613994, 26.609064772, 57.802821442, 52.710540024, 53.308616294, 56.097175876, 60.945693464, 31.373866148, 48.082681454, 45.1750407, 41.85466018, 61.682385318, 59.972648558, 52.031753952, 40.724170292, 32.209787652, 60.271489034, 52.765680996, 53.89228505, 53.607872926, 50.541420494, 42.590795146, 55.8244584]
r_bins = [65, 47, 139, 99, 55, 90, 115, 38, 86, 87, 69, 71, 95, 70, 62, 95, 65, 78, 85, 109, 45, 49, 52]
r_Mean_Completeness = [45.96, 58.83, 51.28, 56.54, 56.55, 63.26, 52.23, 58.47, 54.69, 53.7, 58.18, 60.32, 62.41, 65.52, 52.14, 50.97, 56.26, 57.0, 65.39, 54.89, 53.78, 53.56, 52.81]
r_Mean_Contamination = [24.03, 69.63, 67.13, 60.4, 53.21, 81.43, 55.06, 35.14, 54.71, 74.62, 54.7, 76.81, 66.86, 78.94, 67.85, 46.5, 92.78, 97.57, 121.14, 103.47, 75.71, 57.06, 65.71]
r_good_bins = [15, 4, 23, 19, 14, 25, 29, 9, 20, 20, 18, 18, 23, 17, 10, 22, 16, 21, 18, 20, 9, 9, 14]
r_good_Mean_Completeness = [84.79, 83.85, 89.19, 87.14, 90.06, 87.41, 90.7, 86.6, 87.55, 85.4, 87.31, 86.68, 86.18, 84.94, 87.02, 88.92, 88.11, 87.12, 87.24, 89.3, 87.42, 87.22, 86.64]
r_good_Mean_Contamination = [3.37, 4.23, 3.57, 4.4, 3.62, 4.4, 4.56, 3.0, 4.05, 4.32, 4.53, 3.9, 4.48, 4.17, 2.86, 2.87, 3.85, 3.4, 3.63, 4.37, 4.63, 3.91, 4.25]

#create dataset
df = pd.DataFrame({'td_tMreads': td_tMreads,
                   'td_Bbases': td_Bbases,
                   'td_bins': td_bins,
                   'td_Mean_Completeness': td_Mean_Completeness,
                   'td_Mean_Contamination': td_Mean_Contamination,
                   'td_good_bins': td_good_bins,
                   'td_good_Mean_Completeness': td_good_Mean_Completeness,
                   'td_good_Mean_Contamination': td_good_Mean_Contamination,
                   'Mix_Group': Mix_Group})
df.rename(columns={'td_tMreads': 'tMreads', 'td_Bbases': 'Bbases'}, inplace=True)

df2 = pd.DataFrame({'r_tMreads': r_tMreads,
                   'r_Bbases': r_Bbases,
                   'r_bins': r_bins,
                   'r_Mean_Completeness': r_Mean_Completeness,
                   'r_Mean_Contamination': r_Mean_Contamination,
                   'r_good_bins': r_good_bins,
                   'r_good_Mean_Completeness': r_good_Mean_Completeness,
                   'r_good_Mean_Contamination': r_good_Mean_Contamination})
df2.rename(columns={'r_tMreads': 'tMreads', 'r_Bbases': 'Bbases'}, inplace=True)

#view dataset
#print(df)

#fit regression model
model = smf.mixedlm("td_good_bins ~ tMreads", data=df, groups=df["Mix_Group"])
modelf = model.fit()
model1 = ols('td_good_bins ~ tMreads', data=df).fit()
model2 = ols('r_good_bins ~ tMreads', data=df2).fit()

#adj r^2 = Pearson product-moment correlation coefficient (r) adjusted for number of predictors 
#... r = sqrt(0.297) 
#adjusted Pearson's r = 0.545

#mdf = md.fit()
#print(mdf.summary())

#view model summary
print(modelf.summary())
print(model1.summary())
print(model2.summary())

#define figure size
fig = plt.figure(figsize=(12,8))
fig2 = plt.figure(figsize=(12,8))

#produce regression plots
fig = sm.graphics.plot_regress_exog(model1, 'tMreads', fig=fig)
fig2 = sm.graphics.plot_regress_exog(model2, 'tMreads', fig=fig)
          Mixed Linear Model Regression Results
==========================================================
Model:            MixedLM Dependent Variable: td_good_bins
No. Observations: 96      Method:             REML        
No. Groups:       6       Scale:              3.4594      
Min. group size:  16      Likelihood:         -204.1157   
Max. group size:  16      Converged:          Yes         
Mean group size:  16.0                                    
-----------------------------------------------------------
              Coef.  Std.Err.    z    P>|z|  [0.025  0.975]
-----------------------------------------------------------
Intercept     4.835     4.017  1.204  0.229  -3.037  12.708
tMreads       0.399     0.122  3.276  0.001   0.160   0.638
Group Var     3.774     1.555                              
==========================================================

                            OLS Regression Results                            
==============================================================================
Dep. Variable:           td_good_bins   R-squared:                       0.020
Model:                            OLS   Adj. R-squared:                  0.009
Method:                 Least Squares   F-statistic:                     1.902
Date:                Sun, 28 Mar 2021   Prob (F-statistic):              0.171
Time:                        22:29:30   Log-Likelihood:                -215.55
No. Observations:                  96   AIC:                             435.1
Df Residuals:                      94   BIC:                             440.2
Df Model:                           1                                         
Covariance Type:            nonrobust                                         
==============================================================================
                 coef    std err          t      P>|t|      [0.025      0.975]
------------------------------------------------------------------------------
Intercept     14.7670      2.153      6.858      0.000      10.492      19.042
tMreads        0.0914      0.066      1.379      0.171      -0.040       0.223
==============================================================================
Omnibus:                        0.619   Durbin-Watson:                   1.508
Prob(Omnibus):                  0.734   Jarque-Bera (JB):                0.550
Skew:                           0.182   Prob(JB):                        0.760
Kurtosis:                       2.926   Cond. No.                         297.
==============================================================================

Warnings:
[1] Standard Errors assume that the covariance matrix of the errors is correctly specified.
                            OLS Regression Results                            
==============================================================================
Dep. Variable:            r_good_bins   R-squared:                       0.329
Model:                            OLS   Adj. R-squared:                  0.297
Method:                 Least Squares   F-statistic:                     10.31
Date:                Sun, 28 Mar 2021   Prob (F-statistic):            0.00419
Time:                        22:29:30   Log-Likelihood:                -68.680
No. Observations:                  23   AIC:                             141.4
Df Residuals:                      21   BIC:                             143.6
Df Model:                           1                                         
Covariance Type:            nonrobust                                         
==============================================================================
                 coef    std err          t      P>|t|      [0.025      0.975]
------------------------------------------------------------------------------
Intercept     -0.3598      5.532     -0.065      0.949     -11.865      11.146
tMreads        0.5293      0.165      3.211      0.004       0.187       0.872
==============================================================================
Omnibus:                        0.823   Durbin-Watson:                   2.044
Prob(Omnibus):                  0.663   Jarque-Bera (JB):                0.827
Skew:                           0.372   Prob(JB):                        0.661
Kurtosis:                       2.444   Cond. No.                         178.
==============================================================================

Warnings:
[1] Standard Errors assume that the covariance matrix of the errors is correctly specified.
<Figure size 864x576 with 0 Axes>
#Good MAG counts were correlated with base counts of trimmed and decontaminated reads
#Good MAG counts were correlated with base counts of raw reads at alpha = 0.5

import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import statsmodels.api as sm
import statsmodels.formula.api as smf
from statsmodels.formula.api import ols

#data
Mix_Group = ['10158.6', '10158.6', '10158.6', '10158.6', '10158.6', '10158.6', '10158.6', '10158.6', '10158.6', '10158.6', '10158.6', '10158.6', '10158.6', '10158.6', '10158.6', '10158.6', '9117.8', '9117.8', '9117.8', '9117.8', '9117.8', '9117.8', '9117.8', '9117.8', '9117.8', '9117.8', '9117.8', '9117.8', '9117.8', '9117.8', '9117.8', '9117.8', '9108.2', '9108.2', '9108.2', '9108.2', '9108.2', '9108.2', '9108.2', '9108.2', '9108.2', '9108.2', '9108.2', '9108.2', '9108.2', '9108.2', '9108.2', '9108.2', '9117.7', '9117.7', '9117.7', '9117.7', '9117.7', '9117.7', '9117.7', '9117.7', '9117.7', '9117.7', '9117.7', '9117.7', '9117.7', '9117.7', '9117.7', '9117.7', '11306.3', '11306.3', '11306.3', '11306.3', '11306.3', '11306.3', '11306.3', '11306.3', '11306.3', '11306.3', '11306.3', '11306.3', '11306.3', '11306.3', '11306.3', '11306.3', '9117.4', '9117.4', '9117.4', '9117.4', '9117.4', '9117.4', '9117.4', '9117.4', '9117.4', '9117.4', '9117.4', '9117.4', '9117.4', '9117.4', '9117.4', '9117.4']
td_read_files = ['10158.6_raw', '10158.6_qc', '10158.6_trim150', '10158.6_ftrim', '10158.6_ktrim', '10158.6_atrim', '10158.6_aqbtrim', '10158.6_aqtrim', '10158.6_qbtrim', '10158.6_qtrim', '10158.6_bb1', '10158.6_bb2', '10158.6_bb3', '10158.6_bb4', '10158.6_bb5', '10158.6_bb6', '9117.8_raw', '9117.8_qc', '9117.8_trim150', '9117.8_ftrim', '9117.8_ktrim', '9117.8_atrim', '9117.8_aqbtrim', '9117.8_aqtrim', '9117.8_qbtrim', '9117.8_qtrim', '9117.8_bb1', '9117.8_bb2', '9117.8_bb3', '9117.8_bb4', '9117.8_bb5', '9117.8_bb6', '9108.2_raw', '9108.2_qc', '9108.2_trim150', '9108.2_ftrim', '9108.2_ktrim', '9108.2_atrim', '9108.2_aqbtrim', '9108.2_aqtrim', '9108.2_qbtrim', '9108.2_qtrim', '9108.2_bb1', '9108.2_bb2', '9108.2_bb3', '9108.2_bb4', '9108.2_bb5', '9108.2_bb6', '9117.7_raw', '9117.7_qc', '9117.7_trim150', '9117.7_ftrimmed', '9117.7_ktrimmed', '9117.7_atrimmed', '9117.7_aqbtrimmed', '9117.7_aqtrimmed', '9117.7_qbtrimmed', '9117.7_qtrimmed', '9117.7_bb1', '9117.7_bb2', '9117.7_bb3', '9117.7_bb4', '9117.7_bb5', '9117.7_bb6', '11306.3_raw', '11306.3_qc', '11306.3_trim150', '11306.3_ftrimmed', '11306.3_ktrimmed', '11306.3_atrimmed', '11306.3_aqbtrimmed', '11306.3_aqtrimmed', '11306.3_qbtrimmed', '11306.3_qtrimmed', '11306.3_bb1', '11306.3_bb2', '11306.3_bb3', '11306.3_bb4', '11306.3_bb5', '11306.3_bb6', '9117.4_raw', '9117.4_qc', '9117.4_trim150', '9117.4_ftrimmed', '9117.4_ktrimmed', '9117.4_atrimmed', '9117.4_aqbtrimmed', '9117.4_aqtrimmed', '9117.4_qbtrimmed', '9117.4_qtrimmed', '9117.4_bb1', '9117.4_bb2', '9117.4_bb3', '9117.4_bb4', '9117.4_bb5', '9117.4_bb6']
td_tMreads = [36.0129894, 35.2337896, 36.0129894, 36.0129894, 35.8983284, 35.933862, 34.552143, 31.2682706, 34.3282984, 30.6449696, 35.8442254, 31.2345964, 30.615651, 35.3058416, 34.4731868, 34.2527702, 28.2058246, 26.5787996, 28.2058246, 28.2058246, 27.66738, 27.6666568, 27.2469874, 25.5238858, 27.2469874, 25.2340502, 26.886397, 24.8035162, 24.519662, 26.7855648, 26.4733418, 26.3827484, 39.9148934, 37.3791976, 39.9148934, 39.9148934, 38.7122998, 38.708094, 37.7905962, 34.6650054, 37.6107554, 34.1456456, 37.8858836, 33.921016, 33.4111558, 37.7734844, 36.9821596, 36.8056446, 33.4711394, 32.6004118, 33.4711394, 33.4711394, 33.04149, 33.0383456, 30.3924568, 32.5038064, 30.0542808, 32.3956334, 32.9696938, 30.3601654, 30.025172, 32.8646394, 32.4398904, 32.3335082, 31.8428354, 31.3942576, 31.8428354, 31.8428354, 31.637986, 31.6358996, 29.2361886, 31.0176556, 28.9929926, 30.9364282, 31.6358006, 29.2361776, 28.9929822, 31.5146638, 31.0175998, 30.9363772, 34.9441596, 32.3312534, 34.9441596, 34.9441596, 33.4723724, 33.4683444, 30.8153706, 32.9387816, 30.4748512, 32.8289102, 32.6738296, 30.0963168, 29.7659662, 32.561628, 32.1535464, 32.0471204]
td_Bbases = [54.379613994, 52.641056249, 54.0194841, 51.498574842, 50.780358538, 53.669962132, 51.549471661, 45.964065176, 48.624647571, 42.845175292, 53.535374086, 45.919226343, 42.80751312, 53.311820816, 51.43795566, 48.522733084, 42.590795146, 40.133987396, 42.3087369, 40.334329178, 39.44260618, 41.667343504, 40.776782835, 37.725547193, 40.776782835, 35.427126679, 40.490156646, 36.660246851, 34.422663647, 40.446202848, 39.621053097, 37.463971015, 60.271489034, 56.442588376, 59.8723401, 57.078297562, 55.202040634, 58.310667004, 56.435790641, 51.019814812, 53.310062558, 47.762707137, 57.069909524, 49.92030233, 46.728770205, 57.037961444, 55.229109813, 52.166971365, 50.541420494, 49.226621818, 50.2067091, 47.863729342, 47.116772654, 49.7656626, 44.974095956, 48.667868199, 42.241274113, 46.022382453, 49.6630349, 44.930324539, 42.203344352, 49.625605494, 48.576089344, 45.937352987, 48.082681454, 47.024019995, 47.7642531, 45.535254622, 45.103175218, 47.640851402, 43.590661125, 46.584185484, 41.025529046, 44.063678425, 47.640703902, 43.590646205, 41.025515524, 47.587142338, 46.584108342, 44.063610873, 52.765680996, 48.820192634, 52.4162394, 49.970148228, 47.71891569, 50.406042552, 45.62052235, 49.326011714, 42.846059312, 46.640424087, 49.207539094, 44.559568777, 41.852323333, 49.16805828, 48.15227162, 45.531321684]
td_bins = [78, 85, 82, 83, 85, 83, 82, 78, 79, 63, 90, 72, 67, 78, 85, 83, 65, 62, 64, 52, 55, 59, 59, 56, 54, 50, 62, 55, 50, 60, 61, 53, 69, 76, 74, 75, 75, 71, 73, 72, 68, 65, 74, 71, 66, 74, 74, 65, 95, 103, 107, 96, 97, 103, 88, 98, 79, 81, 104, 90, 76, 101, 101, 90, 99, 100, 96, 97, 94, 97, 96, 97, 95, 82, 101, 96, 80, 97, 96, 91, 70, 68, 70, 65, 68, 69, 64, 61, 65, 64, 77, 62, 64, 78, 65, 62]
td_Mean_Completeness = [53.86, 53.28, 56.31, 57.37, 54.68, 50.1, 58.05, 54.04, 51.16, 54.76, 51.12, 52.23, 57.52, 52.15, 58.83, 54.29, 55.06, 54.27, 53.37, 53.16, 53.77, 57.81, 54.55, 54.88, 54.17, 52.29, 54.94, 52.04, 51.41, 58.36, 54.69, 56.81, 56.66, 53.72, 57.38, 56.22, 52.7, 52.65, 55.11, 54.88, 57.48, 54.64, 53.6, 54.34, 58.18, 52.11, 56.26, 58.28, 55.64, 57.45, 58.55, 54.54, 57.45, 58.65, 55.18, 55.5, 58.25, 60.02, 55.82, 58.17, 60.34, 56.8, 57.0, 59.14, 54.4, 51.04, 58.28, 56.73, 48.21, 51.68, 52.66, 56.77, 55.24, 57.44, 58.56, 57.33, 54.41, 56.39, 53.78, 58.04, 57.17, 59.53, 56.07, 51.82, 58.95, 60.91, 56.23, 56.59, 60.48, 59.56, 56.87, 61.29, 54.39, 60.71, 53.56, 53.89]
td_Mean_Contamination = [64.8, 56.57, 63.75, 56.14, 63.41, 65.77, 68.71, 59.0, 51.31, 53.39, 53.35, 56.24, 59.82, 71.72, 69.63, 66.78, 53.25, 46.8, 58.27, 54.69, 48.23, 50.47, 57.38, 53.78, 50.02, 45.22, 47.86, 47.19, 54.22, 48.53, 54.71, 53.82, 83.63, 71.19, 88.26, 89.13, 73.4, 65.59, 87.02, 85.4, 79.15, 67.55, 71.37, 72.84, 82.49, 64.29, 92.78, 85.56, 107.95, 99.99, 99.71, 85.91, 99.47, 78.31, 82.71, 107.35, 97.95, 92.98, 87.49, 93.41, 102.6, 77.96, 97.57, 99.91, 83.7, 63.09, 68.23, 69.77, 73.61, 78.21, 70.56, 69.82, 69.56, 64.5, 63.16, 82.06, 72.7, 69.26, 75.71, 62.79, 62.01, 54.02, 69.1, 58.79, 55.27, 51.99, 60.61, 57.98, 59.85, 65.24, 56.41, 58.49, 58.44, 53.79, 57.06, 54.33]
td_good_bins = [21, 19, 19, 14, 15, 20, 20, 17, 17, 12, 19, 16, 12, 22, 20, 17, 16, 16, 17, 17, 15, 17, 19, 15, 15, 14, 16, 16, 15, 18, 19, 16, 18, 17, 17, 16, 16, 16, 17, 17, 14, 15, 17, 15, 15, 19, 15, 17, 22, 21, 18, 19, 21, 22, 17, 21, 21, 16, 23, 16, 17, 24, 20, 21, 19, 21, 21, 15, 18, 19, 20, 18, 15, 17, 19, 18, 18, 18, 19, 16, 17, 17, 19, 18, 18, 17, 19, 17, 19, 18, 18, 16, 18, 21, 20, 17]
td_good_Mean_Completeness = [85.98, 87.17, 86.9, 87.66, 86.64, 86.04, 85.18, 86.79, 86.35, 86.03, 88.47, 85.58, 89.46, 86.17, 83.85, 86.86, 87.87, 87.38, 87.61, 86.94, 87.35, 86.96, 88.23, 88.04, 88.62, 90.16, 88.26, 89.11, 86.1, 88.26, 87.55, 87.21, 87.51, 86.6, 87.87, 87.83, 86.62, 87.87, 87.12, 87.7, 87.06, 87.94, 87.37, 85.69, 87.28, 85.92, 88.11, 86.67, 87.68, 87.33, 87.89, 88.48, 89.21, 88.4, 86.1, 86.69, 87.99, 88.53, 89.18, 87.33, 86.83, 88.58, 87.12, 87.34, 88.54, 86.69, 87.03, 86.06, 88.99, 86.81, 86.17, 86.12, 87.78, 85.64, 86.41, 87.08, 85.98, 88.56, 87.42, 87.18, 85.99, 87.07, 86.97, 86.86, 86.75, 89.43, 86.51, 86.19, 86.17, 85.87, 86.68, 87.63, 86.26, 89.38, 87.22, 87.03]
td_good_Mean_Contamination = [5.06, 4.78, 4.28, 4.58, 3.83, 4.25, 4.66, 4.68, 4.57, 4.01, 4.01, 4.34, 4.43, 4.96, 4.23, 5.17, 3.79, 3.63, 3.76, 3.61, 3.99, 3.66, 4.04, 3.17, 3.67, 3.37, 4.3, 3.39, 3.89, 3.63, 4.05, 4.51, 3.87, 3.93, 3.27, 3.48, 4.16, 4.88, 3.89, 3.16, 4.65, 4.07, 4.41, 4.22, 4.07, 3.78, 3.85, 4.16, 3.92, 3.06, 4.2, 4.41, 3.21, 3.81, 4.31, 3.64, 3.84, 4.09, 3.84, 3.64, 3.48, 3.71, 3.4, 4.06, 3.76, 4.32, 4.61, 4.74, 3.56, 4.54, 4.1, 4.05, 4.26, 4.31, 4.84, 4.06, 3.88, 4.59, 4.63, 3.76, 3.57, 3.88, 3.56, 3.5, 3.68, 4.71, 3.46, 3.84, 3.94, 3.61, 3.5, 3.36, 3.4, 3.57, 3.91, 4.21]
r_read_files = ['9117.5_raw', '10158.8_raw', '11263.1_raw', '11306.3_raw', '11306.1_raw', '11260.6_raw', '11260.5_raw', '9108.1_raw', '9053.2_raw', '9672.8_raw', '9108.2_raw', '9053.4_raw', '9053.3_raw', '9117.4_raw', '9117.6_raw', '9117.7_raw', '9117.8_raw', '10158.6_raw', '10186.3_raw', '10186.4_raw', '7331.1_raw', '9053.5_raw', '9041.8_raw']
r_tMreads = [36.0129894, 17.6218972, 38.2800142, 34.9076424, 35.3037194, 37.1504476, 40.3613864, 20.7773948, 31.8428354, 30.1166938, 27.718318, 40.8492618, 39.7169858, 34.4581152, 26.9696492, 21.3309852, 39.9148934, 34.9441596, 35.690255, 35.5019026, 33.4711394, 28.2058246, 36.96984]
r_Bbases = [54.379613994, 26.609064772, 57.802821442, 52.710540024, 53.308616294, 56.097175876, 60.945693464, 31.373866148, 48.082681454, 45.1750407, 41.85466018, 61.682385318, 59.972648558, 52.031753952, 40.724170292, 32.209787652, 60.271489034, 52.765680996, 53.89228505, 53.607872926, 50.541420494, 42.590795146, 55.8244584]
r_bins = [65, 47, 139, 99, 55, 90, 115, 38, 86, 87, 69, 71, 95, 70, 62, 95, 65, 78, 85, 109, 45, 49, 52]
r_Mean_Completeness = [45.96, 58.83, 51.28, 56.54, 56.55, 63.26, 52.23, 58.47, 54.69, 53.7, 58.18, 60.32, 62.41, 65.52, 52.14, 50.97, 56.26, 57.0, 65.39, 54.89, 53.78, 53.56, 52.81]
r_Mean_Contamination = [24.03, 69.63, 67.13, 60.4, 53.21, 81.43, 55.06, 35.14, 54.71, 74.62, 54.7, 76.81, 66.86, 78.94, 67.85, 46.5, 92.78, 97.57, 121.14, 103.47, 75.71, 57.06, 65.71]
r_good_bins = [15, 4, 23, 19, 14, 25, 29, 9, 20, 20, 18, 18, 23, 17, 10, 22, 16, 21, 18, 20, 9, 9, 14]
r_good_Mean_Completeness = [84.79, 83.85, 89.19, 87.14, 90.06, 87.41, 90.7, 86.6, 87.55, 85.4, 87.31, 86.68, 86.18, 84.94, 87.02, 88.92, 88.11, 87.12, 87.24, 89.3, 87.42, 87.22, 86.64]
r_good_Mean_Contamination = [3.37, 4.23, 3.57, 4.4, 3.62, 4.4, 4.56, 3.0, 4.05, 4.32, 4.53, 3.9, 4.48, 4.17, 2.86, 2.87, 3.85, 3.4, 3.63, 4.37, 4.63, 3.91, 4.25]

#create dataset
df = pd.DataFrame({'td_tMreads': td_tMreads,
                   'td_Bbases': td_Bbases,
                   'td_bins': td_bins,
                   'td_Mean_Completeness': td_Mean_Completeness,
                   'td_Mean_Contamination': td_Mean_Contamination,
                   'td_good_bins': td_good_bins,
                   'td_good_Mean_Completeness': td_good_Mean_Completeness,
                   'td_good_Mean_Contamination': td_good_Mean_Contamination,
                   'Mix_Group': Mix_Group})
df.rename(columns={'td_tMreads': 'tMreads', 'td_Bbases': 'Bbases'}, inplace=True)

df2 = pd.DataFrame({'r_tMreads': r_tMreads,
                   'r_Bbases': r_Bbases,
                   'r_bins': r_bins,
                   'r_Mean_Completeness': r_Mean_Completeness,
                   'r_Mean_Contamination': r_Mean_Contamination,
                   'r_good_bins': r_good_bins,
                   'r_good_Mean_Completeness': r_good_Mean_Completeness,
                   'r_good_Mean_Contamination': r_good_Mean_Contamination})
df2.rename(columns={'r_tMreads': 'tMreads', 'r_Bbases': 'Bbases'}, inplace=True)

#view dataset
#print(df)

#fit regression model
model = smf.mixedlm("td_good_bins ~ Bbases", data=df, groups=df["Mix_Group"])
modelf = model.fit()
model1 = ols('td_good_bins ~ Bbases', data=df).fit()
model2 = ols('r_good_bins ~ Bbases', data=df2).fit()

#adj r^2 = Pearson product-moment correlation coefficient (r) adjusted for number of predictors 
#... r = sqrt(0.296) 
#adjusted Pearson's r = 0.544

#mdf = md.fit()
#print(mdf.summary())

#view model summary
print(modelf.summary())
print(model1.summary())
print(model2.summary())

#define figure size
fig = plt.figure(figsize=(12,8))
fig2 = plt.figure(figsize=(12,8))

#produce regression plots
fig = sm.graphics.plot_regress_exog(model1, 'Bbases', fig=fig)
fig2 = sm.graphics.plot_regress_exog(model2, 'Bbases', fig=fig)
          Mixed Linear Model Regression Results
==========================================================
Model:            MixedLM Dependent Variable: td_good_bins
No. Observations: 96      Method:             REML        
No. Groups:       6       Scale:              2.9316      
Min. group size:  16      Likelihood:         -197.6067   
Max. group size:  16      Converged:          Yes         
Mean group size:  16.0                                    
-----------------------------------------------------------
              Coef.  Std.Err.    z    P>|z|  [0.025  0.975]
-----------------------------------------------------------
Intercept     3.079     2.863  1.076  0.282  -2.531   8.690
Bbases        0.309     0.058  5.359  0.000   0.196   0.421
Group Var     4.218     1.763                              
==========================================================

                            OLS Regression Results                            
==============================================================================
Dep. Variable:           td_good_bins   R-squared:                       0.059
Model:                            OLS   Adj. R-squared:                  0.049
Method:                 Least Squares   F-statistic:                     5.911
Date:                Sun, 28 Mar 2021   Prob (F-statistic):             0.0169
Time:                        22:29:45   Log-Likelihood:                -213.59
No. Observations:                  96   AIC:                             431.2
Df Residuals:                      94   BIC:                             436.3
Df Model:                           1                                         
Covariance Type:            nonrobust                                         
==============================================================================
                 coef    std err          t      P>|t|      [0.025      0.975]
------------------------------------------------------------------------------
Intercept     12.9052      1.993      6.474      0.000       8.948      16.863
Bbases         0.1014      0.042      2.431      0.017       0.019       0.184
==============================================================================
Omnibus:                        0.552   Durbin-Watson:                   1.429
Prob(Omnibus):                  0.759   Jarque-Bera (JB):                0.576
Skew:                           0.175   Prob(JB):                        0.750
Kurtosis:                       2.854   Cond. No.                         413.
==============================================================================

Warnings:
[1] Standard Errors assume that the covariance matrix of the errors is correctly specified.
                            OLS Regression Results                            
==============================================================================
Dep. Variable:            r_good_bins   R-squared:                       0.328
Model:                            OLS   Adj. R-squared:                  0.296
Method:                 Least Squares   F-statistic:                     10.26
Date:                Sun, 28 Mar 2021   Prob (F-statistic):            0.00428
Time:                        22:29:45   Log-Likelihood:                -68.700
No. Observations:                  23   AIC:                             141.4
Df Residuals:                      21   BIC:                             143.7
Df Model:                           1                                         
Covariance Type:            nonrobust                                         
==============================================================================
                 coef    std err          t      P>|t|      [0.025      0.975]
------------------------------------------------------------------------------
Intercept     -0.3125      5.533     -0.056      0.955     -11.818      11.193
Bbases         0.3497      0.109      3.203      0.004       0.123       0.577
==============================================================================
Omnibus:                        0.824   Durbin-Watson:                   2.042
Prob(Omnibus):                  0.662   Jarque-Bera (JB):                0.831
Skew:                           0.369   Prob(JB):                        0.660
Kurtosis:                       2.432   Cond. No.                         268.
==============================================================================

Warnings:
[1] Standard Errors assume that the covariance matrix of the errors is correctly specified.
<Figure size 864x576 with 0 Axes>
#Medium MAG counts were correlated with read counts of trimmed and decontaminated reads
#Medium MAG counts were correlated with read counts of raw reads at alpha = 0.5

import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import statsmodels.api as sm
import statsmodels.formula.api as smf
from statsmodels.formula.api import ols

#data
Mix_Group = ['10158.6', '10158.6', '10158.6', '10158.6', '10158.6', '10158.6', '10158.6', '10158.6', '10158.6', '10158.6', '10158.6', '10158.6', '10158.6', '10158.6', '10158.6', '10158.6', '9117.8', '9117.8', '9117.8', '9117.8', '9117.8', '9117.8', '9117.8', '9117.8', '9117.8', '9117.8', '9117.8', '9117.8', '9117.8', '9117.8', '9117.8', '9117.8', '9108.2', '9108.2', '9108.2', '9108.2', '9108.2', '9108.2', '9108.2', '9108.2', '9108.2', '9108.2', '9108.2', '9108.2', '9108.2', '9108.2', '9108.2', '9108.2', '9117.7', '9117.7', '9117.7', '9117.7', '9117.7', '9117.7', '9117.7', '9117.7', '9117.7', '9117.7', '9117.7', '9117.7', '9117.7', '9117.7', '9117.7', '9117.7', '11306.3', '11306.3', '11306.3', '11306.3', '11306.3', '11306.3', '11306.3', '11306.3', '11306.3', '11306.3', '11306.3', '11306.3', '11306.3', '11306.3', '11306.3', '11306.3', '9117.4', '9117.4', '9117.4', '9117.4', '9117.4', '9117.4', '9117.4', '9117.4', '9117.4', '9117.4', '9117.4', '9117.4', '9117.4', '9117.4', '9117.4', '9117.4']
td_read_files = ['10158.6_raw', '10158.6_qc', '10158.6_trim150', '10158.6_ftrim', '10158.6_ktrim', '10158.6_atrim', '10158.6_aqbtrim', '10158.6_aqtrim', '10158.6_qbtrim', '10158.6_qtrim', '10158.6_bb1', '10158.6_bb2', '10158.6_bb3', '10158.6_bb4', '10158.6_bb5', '10158.6_bb6', '9117.8_raw', '9117.8_qc', '9117.8_trim150', '9117.8_ftrim', '9117.8_ktrim', '9117.8_atrim', '9117.8_aqbtrim', '9117.8_aqtrim', '9117.8_qbtrim', '9117.8_qtrim', '9117.8_bb1', '9117.8_bb2', '9117.8_bb3', '9117.8_bb4', '9117.8_bb5', '9117.8_bb6', '9108.2_raw', '9108.2_qc', '9108.2_trim150', '9108.2_ftrim', '9108.2_ktrim', '9108.2_atrim', '9108.2_aqbtrim', '9108.2_aqtrim', '9108.2_qbtrim', '9108.2_qtrim', '9108.2_bb1', '9108.2_bb2', '9108.2_bb3', '9108.2_bb4', '9108.2_bb5', '9108.2_bb6', '9117.7_raw', '9117.7_qc', '9117.7_trim150', '9117.7_ftrimmed', '9117.7_ktrimmed', '9117.7_atrimmed', '9117.7_aqbtrimmed', '9117.7_aqtrimmed', '9117.7_qbtrimmed', '9117.7_qtrimmed', '9117.7_bb1', '9117.7_bb2', '9117.7_bb3', '9117.7_bb4', '9117.7_bb5', '9117.7_bb6', '11306.3_raw', '11306.3_qc', '11306.3_trim150', '11306.3_ftrimmed', '11306.3_ktrimmed', '11306.3_atrimmed', '11306.3_aqbtrimmed', '11306.3_aqtrimmed', '11306.3_qbtrimmed', '11306.3_qtrimmed', '11306.3_bb1', '11306.3_bb2', '11306.3_bb3', '11306.3_bb4', '11306.3_bb5', '11306.3_bb6', '9117.4_raw', '9117.4_qc', '9117.4_trim150', '9117.4_ftrimmed', '9117.4_ktrimmed', '9117.4_atrimmed', '9117.4_aqbtrimmed', '9117.4_aqtrimmed', '9117.4_qbtrimmed', '9117.4_qtrimmed', '9117.4_bb1', '9117.4_bb2', '9117.4_bb3', '9117.4_bb4', '9117.4_bb5', '9117.4_bb6']
td_tMreads = [36.0129894, 35.2337896, 36.0129894, 36.0129894, 35.8983284, 35.933862, 34.552143, 31.2682706, 34.3282984, 30.6449696, 35.8442254, 31.2345964, 30.615651, 35.3058416, 34.4731868, 34.2527702, 28.2058246, 26.5787996, 28.2058246, 28.2058246, 27.66738, 27.6666568, 27.2469874, 25.5238858, 27.2469874, 25.2340502, 26.886397, 24.8035162, 24.519662, 26.7855648, 26.4733418, 26.3827484, 39.9148934, 37.3791976, 39.9148934, 39.9148934, 38.7122998, 38.708094, 37.7905962, 34.6650054, 37.6107554, 34.1456456, 37.8858836, 33.921016, 33.4111558, 37.7734844, 36.9821596, 36.8056446, 33.4711394, 32.6004118, 33.4711394, 33.4711394, 33.04149, 33.0383456, 30.3924568, 32.5038064, 30.0542808, 32.3956334, 32.9696938, 30.3601654, 30.025172, 32.8646394, 32.4398904, 32.3335082, 31.8428354, 31.3942576, 31.8428354, 31.8428354, 31.637986, 31.6358996, 29.2361886, 31.0176556, 28.9929926, 30.9364282, 31.6358006, 29.2361776, 28.9929822, 31.5146638, 31.0175998, 30.9363772, 34.9441596, 32.3312534, 34.9441596, 34.9441596, 33.4723724, 33.4683444, 30.8153706, 32.9387816, 30.4748512, 32.8289102, 32.6738296, 30.0963168, 29.7659662, 32.561628, 32.1535464, 32.0471204]
td_Bbases = [54.379613994, 52.641056249, 54.0194841, 51.498574842, 50.780358538, 53.669962132, 51.549471661, 45.964065176, 48.624647571, 42.845175292, 53.535374086, 45.919226343, 42.80751312, 53.311820816, 51.43795566, 48.522733084, 42.590795146, 40.133987396, 42.3087369, 40.334329178, 39.44260618, 41.667343504, 40.776782835, 37.725547193, 40.776782835, 35.427126679, 40.490156646, 36.660246851, 34.422663647, 40.446202848, 39.621053097, 37.463971015, 60.271489034, 56.442588376, 59.8723401, 57.078297562, 55.202040634, 58.310667004, 56.435790641, 51.019814812, 53.310062558, 47.762707137, 57.069909524, 49.92030233, 46.728770205, 57.037961444, 55.229109813, 52.166971365, 50.541420494, 49.226621818, 50.2067091, 47.863729342, 47.116772654, 49.7656626, 44.974095956, 48.667868199, 42.241274113, 46.022382453, 49.6630349, 44.930324539, 42.203344352, 49.625605494, 48.576089344, 45.937352987, 48.082681454, 47.024019995, 47.7642531, 45.535254622, 45.103175218, 47.640851402, 43.590661125, 46.584185484, 41.025529046, 44.063678425, 47.640703902, 43.590646205, 41.025515524, 47.587142338, 46.584108342, 44.063610873, 52.765680996, 48.820192634, 52.4162394, 49.970148228, 47.71891569, 50.406042552, 45.62052235, 49.326011714, 42.846059312, 46.640424087, 49.207539094, 44.559568777, 41.852323333, 49.16805828, 48.15227162, 45.531321684]
td_bins = [78, 85, 82, 83, 85, 83, 82, 78, 79, 63, 90, 72, 67, 78, 85, 83, 65, 62, 64, 52, 55, 59, 59, 56, 54, 50, 62, 55, 50, 60, 61, 53, 69, 76, 74, 75, 75, 71, 73, 72, 68, 65, 74, 71, 66, 74, 74, 65, 95, 103, 107, 96, 97, 103, 88, 98, 79, 81, 104, 90, 76, 101, 101, 90, 99, 100, 96, 97, 94, 97, 96, 97, 95, 82, 101, 96, 80, 97, 96, 91, 70, 68, 70, 65, 68, 69, 64, 61, 65, 64, 77, 62, 64, 78, 65, 62]
td_Mean_Completeness = [53.86, 53.28, 56.31, 57.37, 54.68, 50.1, 58.05, 54.04, 51.16, 54.76, 51.12, 52.23, 57.52, 52.15, 58.83, 54.29, 55.06, 54.27, 53.37, 53.16, 53.77, 57.81, 54.55, 54.88, 54.17, 52.29, 54.94, 52.04, 51.41, 58.36, 54.69, 56.81, 56.66, 53.72, 57.38, 56.22, 52.7, 52.65, 55.11, 54.88, 57.48, 54.64, 53.6, 54.34, 58.18, 52.11, 56.26, 58.28, 55.64, 57.45, 58.55, 54.54, 57.45, 58.65, 55.18, 55.5, 58.25, 60.02, 55.82, 58.17, 60.34, 56.8, 57.0, 59.14, 54.4, 51.04, 58.28, 56.73, 48.21, 51.68, 52.66, 56.77, 55.24, 57.44, 58.56, 57.33, 54.41, 56.39, 53.78, 58.04, 57.17, 59.53, 56.07, 51.82, 58.95, 60.91, 56.23, 56.59, 60.48, 59.56, 56.87, 61.29, 54.39, 60.71, 53.56, 53.89]
td_Mean_Contamination = [64.8, 56.57, 63.75, 56.14, 63.41, 65.77, 68.71, 59.0, 51.31, 53.39, 53.35, 56.24, 59.82, 71.72, 69.63, 66.78, 53.25, 46.8, 58.27, 54.69, 48.23, 50.47, 57.38, 53.78, 50.02, 45.22, 47.86, 47.19, 54.22, 48.53, 54.71, 53.82, 83.63, 71.19, 88.26, 89.13, 73.4, 65.59, 87.02, 85.4, 79.15, 67.55, 71.37, 72.84, 82.49, 64.29, 92.78, 85.56, 107.95, 99.99, 99.71, 85.91, 99.47, 78.31, 82.71, 107.35, 97.95, 92.98, 87.49, 93.41, 102.6, 77.96, 97.57, 99.91, 83.7, 63.09, 68.23, 69.77, 73.61, 78.21, 70.56, 69.82, 69.56, 64.5, 63.16, 82.06, 72.7, 69.26, 75.71, 62.79, 62.01, 54.02, 69.1, 58.79, 55.27, 51.99, 60.61, 57.98, 59.85, 65.24, 56.41, 58.49, 58.44, 53.79, 57.06, 54.33]
td_good_bins = [21, 19, 19, 14, 15, 20, 20, 17, 17, 12, 19, 16, 12, 22, 20, 17, 16, 16, 17, 17, 15, 17, 19, 15, 15, 14, 16, 16, 15, 18, 19, 16, 18, 17, 17, 16, 16, 16, 17, 17, 14, 15, 17, 15, 15, 19, 15, 17, 22, 21, 18, 19, 21, 22, 17, 21, 21, 16, 23, 16, 17, 24, 20, 21, 19, 21, 21, 15, 18, 19, 20, 18, 15, 17, 19, 18, 18, 18, 19, 16, 17, 17, 19, 18, 18, 17, 19, 17, 19, 18, 18, 16, 18, 21, 20, 17]
td_good_Mean_Completeness = [85.98, 87.17, 86.9, 87.66, 86.64, 86.04, 85.18, 86.79, 86.35, 86.03, 88.47, 85.58, 89.46, 86.17, 83.85, 86.86, 87.87, 87.38, 87.61, 86.94, 87.35, 86.96, 88.23, 88.04, 88.62, 90.16, 88.26, 89.11, 86.1, 88.26, 87.55, 87.21, 87.51, 86.6, 87.87, 87.83, 86.62, 87.87, 87.12, 87.7, 87.06, 87.94, 87.37, 85.69, 87.28, 85.92, 88.11, 86.67, 87.68, 87.33, 87.89, 88.48, 89.21, 88.4, 86.1, 86.69, 87.99, 88.53, 89.18, 87.33, 86.83, 88.58, 87.12, 87.34, 88.54, 86.69, 87.03, 86.06, 88.99, 86.81, 86.17, 86.12, 87.78, 85.64, 86.41, 87.08, 85.98, 88.56, 87.42, 87.18, 85.99, 87.07, 86.97, 86.86, 86.75, 89.43, 86.51, 86.19, 86.17, 85.87, 86.68, 87.63, 86.26, 89.38, 87.22, 87.03]
td_good_Mean_Contamination = [5.06, 4.78, 4.28, 4.58, 3.83, 4.25, 4.66, 4.68, 4.57, 4.01, 4.01, 4.34, 4.43, 4.96, 4.23, 5.17, 3.79, 3.63, 3.76, 3.61, 3.99, 3.66, 4.04, 3.17, 3.67, 3.37, 4.3, 3.39, 3.89, 3.63, 4.05, 4.51, 3.87, 3.93, 3.27, 3.48, 4.16, 4.88, 3.89, 3.16, 4.65, 4.07, 4.41, 4.22, 4.07, 3.78, 3.85, 4.16, 3.92, 3.06, 4.2, 4.41, 3.21, 3.81, 4.31, 3.64, 3.84, 4.09, 3.84, 3.64, 3.48, 3.71, 3.4, 4.06, 3.76, 4.32, 4.61, 4.74, 3.56, 4.54, 4.1, 4.05, 4.26, 4.31, 4.84, 4.06, 3.88, 4.59, 4.63, 3.76, 3.57, 3.88, 3.56, 3.5, 3.68, 4.71, 3.46, 3.84, 3.94, 3.61, 3.5, 3.36, 3.4, 3.57, 3.91, 4.21]
td_medium_bins = [32, 33, 32, 24, 26, 35, 33, 26, 29, 18, 35, 27, 18, 33, 32, 28, 22, 20, 20, 23, 23, 21, 25, 22, 21, 22, 22, 23, 22, 22, 26, 21, 28, 27, 30, 26, 25, 27, 29, 23, 24, 21, 28, 23, 20, 26, 26, 24, 36, 34, 31, 26, 29, 36, 33, 29, 27, 21, 39, 24, 22, 37, 32, 28, 28, 28, 31, 26, 32, 30, 32, 29, 28, 29, 31, 28, 30, 32, 30, 27, 22, 23, 28, 26, 24, 23, 24, 25, 27, 23, 25, 27, 25, 25, 25, 25]
r_read_files = ['9117.5_raw', '10158.8_raw', '11263.1_raw', '11306.3_raw', '11306.1_raw', '11260.6_raw', '11260.5_raw', '9108.1_raw', '9053.2_raw', '9672.8_raw', '9108.2_raw', '9053.4_raw', '9053.3_raw', '9117.4_raw', '9117.6_raw', '9117.7_raw', '9117.8_raw', '10158.6_raw', '10186.3_raw', '10186.4_raw', '7331.1_raw', '9053.5_raw', '9041.8_raw']
r_tMreads = [36.0129894, 17.6218972, 38.2800142, 34.9076424, 35.3037194, 37.1504476, 40.3613864, 20.7773948, 31.8428354, 30.1166938, 27.718318, 40.8492618, 39.7169858, 34.4581152, 26.9696492, 21.3309852, 39.9148934, 34.9441596, 35.690255, 35.5019026, 33.4711394, 28.2058246, 36.96984]
r_Bbases = [54.379613994, 26.609064772, 57.802821442, 52.710540024, 53.308616294, 56.097175876, 60.945693464, 31.373866148, 48.082681454, 45.1750407, 41.85466018, 61.682385318, 59.972648558, 52.031753952, 40.724170292, 32.209787652, 60.271489034, 52.765680996, 53.89228505, 53.607872926, 50.541420494, 42.590795146, 55.8244584]
r_bins = [65, 47, 139, 99, 55, 90, 115, 38, 86, 87, 69, 71, 95, 70, 62, 95, 65, 78, 85, 109, 45, 49, 52]
r_Mean_Completeness = [45.96, 58.83, 51.28, 56.54, 56.55, 63.26, 52.23, 58.47, 54.69, 53.7, 58.18, 60.32, 62.41, 65.52, 52.14, 50.97, 56.26, 57.0, 65.39, 54.89, 53.78, 53.56, 52.81]
r_Mean_Contamination = [24.03, 69.63, 67.13, 60.4, 53.21, 81.43, 55.06, 35.14, 54.71, 74.62, 54.7, 76.81, 66.86, 78.94, 67.85, 46.5, 92.78, 97.57, 121.14, 103.47, 75.71, 57.06, 65.71]
r_good_bins = [15, 4, 23, 19, 14, 25, 29, 9, 20, 20, 18, 18, 23, 17, 10, 22, 16, 21, 18, 20, 9, 9, 14]
r_medium_bins = [24, 9, 37, 28, 22, 36, 41, 14, 34, 26, 28, 27, 33, 22, 16, 36, 22, 32, 28, 31, 14, 16, 20]
r_good_Mean_Completeness = [84.79, 83.85, 89.19, 87.14, 90.06, 87.41, 90.7, 86.6, 87.55, 85.4, 87.31, 86.68, 86.18, 84.94, 87.02, 88.92, 88.11, 87.12, 87.24, 89.3, 87.42, 87.22, 86.64]
r_good_Mean_Contamination = [3.37, 4.23, 3.57, 4.4, 3.62, 4.4, 4.56, 3.0, 4.05, 4.32, 4.53, 3.9, 4.48, 4.17, 2.86, 2.87, 3.85, 3.4, 3.63, 4.37, 4.63, 3.91, 4.25]


#create dataset
df = pd.DataFrame({'td_tMreads': td_tMreads,
                   'td_Bbases': td_Bbases,
                   'td_bins': td_bins,
                   'td_Mean_Completeness': td_Mean_Completeness,
                   'td_Mean_Contamination': td_Mean_Contamination,
                   'td_good_bins': td_good_bins,
                   'td_medium_bins': td_medium_bins,
                   'td_good_Mean_Completeness': td_good_Mean_Completeness,
                   'td_good_Mean_Contamination': td_good_Mean_Contamination,
                   'Mix_Group': Mix_Group})
df.rename(columns={'td_tMreads': 'tMreads', 'td_Bbases': 'Bbases'}, inplace=True)

df2 = pd.DataFrame({'r_tMreads': r_tMreads,
                   'r_Bbases': r_Bbases,
                   'r_bins': r_bins,
                   'r_Mean_Completeness': r_Mean_Completeness,
                   'r_Mean_Contamination': r_Mean_Contamination,
                   'r_good_bins': r_good_bins,
                   'r_medium_bins': r_medium_bins,
                   'r_good_Mean_Completeness': r_good_Mean_Completeness,
                   'r_good_Mean_Contamination': r_good_Mean_Contamination})
df2.rename(columns={'r_tMreads': 'tMreads', 'r_Bbases': 'Bbases'}, inplace=True)

#view dataset
#print(df)

#fit regression model
model = smf.mixedlm("td_medium_bins ~ tMreads", data=df, groups=df["Mix_Group"])
modelf = model.fit()
model1 = ols('td_medium_bins ~ tMreads', data=df).fit()
model2 = ols('r_medium_bins ~ tMreads', data=df2).fit()

#adj r^2 = Pearson product-moment correlation coefficient (r) adjusted for number of predictors 
#... r = sqrt(0.297) 
#adjusted Pearson's r = 0.545

#mdf = md.fit()
#print(mdf.summary())

#view model summary
print(modelf.summary())
print(model1.summary())
print(model2.summary())

#define figure size
fig = plt.figure(figsize=(12,8))
fig2 = plt.figure(figsize=(12,8))

#produce regression plots
fig = sm.graphics.plot_regress_exog(model1, 'tMreads', fig=fig)
fig2 = sm.graphics.plot_regress_exog(model2, 'tMreads', fig=fig)
           Mixed Linear Model Regression Results
============================================================
Model:            MixedLM Dependent Variable: td_medium_bins
No. Observations: 96      Method:             REML          
No. Groups:       6       Scale:              10.0592       
Min. group size:  16      Likelihood:         -254.6484     
Max. group size:  16      Converged:          Yes           
Mean group size:  16.0                                      
-------------------------------------------------------------
             Coef.   Std.Err.    z     P>|z|   [0.025  0.975]
-------------------------------------------------------------
Intercept    -1.702     6.623  -0.257  0.797  -14.682  11.279
tMreads       0.883     0.200   4.422  0.000    0.492   1.275
Group Var    12.867     2.927                                
============================================================

                            OLS Regression Results                            
==============================================================================
Dep. Variable:         td_medium_bins   R-squared:                       0.119
Model:                            OLS   Adj. R-squared:                  0.110
Method:                 Least Squares   F-statistic:                     12.73
Date:                Sun, 11 Apr 2021   Prob (F-statistic):           0.000568
Time:                        12:21:36   Log-Likelihood:                -274.12
No. Observations:                  96   AIC:                             552.2
Df Residuals:                      94   BIC:                             557.4
Df Model:                           1                                         
Covariance Type:            nonrobust                                         
==============================================================================
                 coef    std err          t      P>|t|      [0.025      0.975]
------------------------------------------------------------------------------
Intercept     12.7682      3.963      3.222      0.002       4.900      20.636
tMreads        0.4352      0.122      3.568      0.001       0.193       0.677
==============================================================================
Omnibus:                        3.754   Durbin-Watson:                   1.392
Prob(Omnibus):                  0.153   Jarque-Bera (JB):                3.763
Skew:                           0.463   Prob(JB):                        0.152
Kurtosis:                       2.711   Cond. No.                         297.
==============================================================================

Warnings:
[1] Standard Errors assume that the covariance matrix of the errors is correctly specified.
                            OLS Regression Results                            
==============================================================================
Dep. Variable:          r_medium_bins   R-squared:                       0.243
Model:                            OLS   Adj. R-squared:                  0.207
Method:                 Least Squares   F-statistic:                     6.745
Date:                Sun, 11 Apr 2021   Prob (F-statistic):             0.0168
Time:                        12:21:36   Log-Likelihood:                -78.203
No. Observations:                  23   AIC:                             160.4
Df Residuals:                      21   BIC:                             162.7
Df Model:                           1                                         
Covariance Type:            nonrobust                                         
==============================================================================
                 coef    std err          t      P>|t|      [0.025      0.975]
------------------------------------------------------------------------------
Intercept      4.5665      8.370      0.546      0.591     -12.840      21.974
tMreads        0.6476      0.249      2.597      0.017       0.129       1.166
==============================================================================
Omnibus:                        1.144   Durbin-Watson:                   2.336
Prob(Omnibus):                  0.564   Jarque-Bera (JB):                1.006
Skew:                           0.461   Prob(JB):                        0.605
Kurtosis:                       2.554   Cond. No.                         178.
==============================================================================

Warnings:
[1] Standard Errors assume that the covariance matrix of the errors is correctly specified.
/kb/runtime/lib/python3.6/site-packages/statsmodels/graphics/regressionplots.py:221: MatplotlibDeprecationWarning: Adding an axes using the same arguments as a previous axes currently reuses the earlier instance.  In a future version, a new instance will always be created and returned.  Meanwhile, this warning can be suppressed, and the future behavior ensured, by passing a unique label to each axes instance.
  ax = fig.add_subplot(2, 2, 1)
/kb/runtime/lib/python3.6/site-packages/statsmodels/graphics/regressionplots.py:231: MatplotlibDeprecationWarning: Adding an axes using the same arguments as a previous axes currently reuses the earlier instance.  In a future version, a new instance will always be created and returned.  Meanwhile, this warning can be suppressed, and the future behavior ensured, by passing a unique label to each axes instance.
  ax = fig.add_subplot(2, 2, 2)
/kb/runtime/lib/python3.6/site-packages/statsmodels/graphics/regressionplots.py:238: MatplotlibDeprecationWarning: Adding an axes using the same arguments as a previous axes currently reuses the earlier instance.  In a future version, a new instance will always be created and returned.  Meanwhile, this warning can be suppressed, and the future behavior ensured, by passing a unique label to each axes instance.
  ax = fig.add_subplot(2, 2, 3)
/kb/runtime/lib/python3.6/site-packages/statsmodels/graphics/regressionplots.py:251: MatplotlibDeprecationWarning: Adding an axes using the same arguments as a previous axes currently reuses the earlier instance.  In a future version, a new instance will always be created and returned.  Meanwhile, this warning can be suppressed, and the future behavior ensured, by passing a unique label to each axes instance.
  ax = fig.add_subplot(2, 2, 4)
<Figure size 864x576 with 0 Axes>
#Medium MAG counts were correlated with base counts of trimmed and decontaminated reads
#Medium MAG counts were correlated with base counts of raw reads at alpha = 0.5

import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import statsmodels.api as sm
import statsmodels.formula.api as smf
from statsmodels.formula.api import ols

#data
Mix_Group = ['10158.6', '10158.6', '10158.6', '10158.6', '10158.6', '10158.6', '10158.6', '10158.6', '10158.6', '10158.6', '10158.6', '10158.6', '10158.6', '10158.6', '10158.6', '10158.6', '9117.8', '9117.8', '9117.8', '9117.8', '9117.8', '9117.8', '9117.8', '9117.8', '9117.8', '9117.8', '9117.8', '9117.8', '9117.8', '9117.8', '9117.8', '9117.8', '9108.2', '9108.2', '9108.2', '9108.2', '9108.2', '9108.2', '9108.2', '9108.2', '9108.2', '9108.2', '9108.2', '9108.2', '9108.2', '9108.2', '9108.2', '9108.2', '9117.7', '9117.7', '9117.7', '9117.7', '9117.7', '9117.7', '9117.7', '9117.7', '9117.7', '9117.7', '9117.7', '9117.7', '9117.7', '9117.7', '9117.7', '9117.7', '11306.3', '11306.3', '11306.3', '11306.3', '11306.3', '11306.3', '11306.3', '11306.3', '11306.3', '11306.3', '11306.3', '11306.3', '11306.3', '11306.3', '11306.3', '11306.3', '9117.4', '9117.4', '9117.4', '9117.4', '9117.4', '9117.4', '9117.4', '9117.4', '9117.4', '9117.4', '9117.4', '9117.4', '9117.4', '9117.4', '9117.4', '9117.4']
td_read_files = ['10158.6_raw', '10158.6_qc', '10158.6_trim150', '10158.6_ftrim', '10158.6_ktrim', '10158.6_atrim', '10158.6_aqbtrim', '10158.6_aqtrim', '10158.6_qbtrim', '10158.6_qtrim', '10158.6_bb1', '10158.6_bb2', '10158.6_bb3', '10158.6_bb4', '10158.6_bb5', '10158.6_bb6', '9117.8_raw', '9117.8_qc', '9117.8_trim150', '9117.8_ftrim', '9117.8_ktrim', '9117.8_atrim', '9117.8_aqbtrim', '9117.8_aqtrim', '9117.8_qbtrim', '9117.8_qtrim', '9117.8_bb1', '9117.8_bb2', '9117.8_bb3', '9117.8_bb4', '9117.8_bb5', '9117.8_bb6', '9108.2_raw', '9108.2_qc', '9108.2_trim150', '9108.2_ftrim', '9108.2_ktrim', '9108.2_atrim', '9108.2_aqbtrim', '9108.2_aqtrim', '9108.2_qbtrim', '9108.2_qtrim', '9108.2_bb1', '9108.2_bb2', '9108.2_bb3', '9108.2_bb4', '9108.2_bb5', '9108.2_bb6', '9117.7_raw', '9117.7_qc', '9117.7_trim150', '9117.7_ftrimmed', '9117.7_ktrimmed', '9117.7_atrimmed', '9117.7_aqbtrimmed', '9117.7_aqtrimmed', '9117.7_qbtrimmed', '9117.7_qtrimmed', '9117.7_bb1', '9117.7_bb2', '9117.7_bb3', '9117.7_bb4', '9117.7_bb5', '9117.7_bb6', '11306.3_raw', '11306.3_qc', '11306.3_trim150', '11306.3_ftrimmed', '11306.3_ktrimmed', '11306.3_atrimmed', '11306.3_aqbtrimmed', '11306.3_aqtrimmed', '11306.3_qbtrimmed', '11306.3_qtrimmed', '11306.3_bb1', '11306.3_bb2', '11306.3_bb3', '11306.3_bb4', '11306.3_bb5', '11306.3_bb6', '9117.4_raw', '9117.4_qc', '9117.4_trim150', '9117.4_ftrimmed', '9117.4_ktrimmed', '9117.4_atrimmed', '9117.4_aqbtrimmed', '9117.4_aqtrimmed', '9117.4_qbtrimmed', '9117.4_qtrimmed', '9117.4_bb1', '9117.4_bb2', '9117.4_bb3', '9117.4_bb4', '9117.4_bb5', '9117.4_bb6']
td_tMreads = [36.0129894, 35.2337896, 36.0129894, 36.0129894, 35.8983284, 35.933862, 34.552143, 31.2682706, 34.3282984, 30.6449696, 35.8442254, 31.2345964, 30.615651, 35.3058416, 34.4731868, 34.2527702, 28.2058246, 26.5787996, 28.2058246, 28.2058246, 27.66738, 27.6666568, 27.2469874, 25.5238858, 27.2469874, 25.2340502, 26.886397, 24.8035162, 24.519662, 26.7855648, 26.4733418, 26.3827484, 39.9148934, 37.3791976, 39.9148934, 39.9148934, 38.7122998, 38.708094, 37.7905962, 34.6650054, 37.6107554, 34.1456456, 37.8858836, 33.921016, 33.4111558, 37.7734844, 36.9821596, 36.8056446, 33.4711394, 32.6004118, 33.4711394, 33.4711394, 33.04149, 33.0383456, 30.3924568, 32.5038064, 30.0542808, 32.3956334, 32.9696938, 30.3601654, 30.025172, 32.8646394, 32.4398904, 32.3335082, 31.8428354, 31.3942576, 31.8428354, 31.8428354, 31.637986, 31.6358996, 29.2361886, 31.0176556, 28.9929926, 30.9364282, 31.6358006, 29.2361776, 28.9929822, 31.5146638, 31.0175998, 30.9363772, 34.9441596, 32.3312534, 34.9441596, 34.9441596, 33.4723724, 33.4683444, 30.8153706, 32.9387816, 30.4748512, 32.8289102, 32.6738296, 30.0963168, 29.7659662, 32.561628, 32.1535464, 32.0471204]
td_Bbases = [54.379613994, 52.641056249, 54.0194841, 51.498574842, 50.780358538, 53.669962132, 51.549471661, 45.964065176, 48.624647571, 42.845175292, 53.535374086, 45.919226343, 42.80751312, 53.311820816, 51.43795566, 48.522733084, 42.590795146, 40.133987396, 42.3087369, 40.334329178, 39.44260618, 41.667343504, 40.776782835, 37.725547193, 40.776782835, 35.427126679, 40.490156646, 36.660246851, 34.422663647, 40.446202848, 39.621053097, 37.463971015, 60.271489034, 56.442588376, 59.8723401, 57.078297562, 55.202040634, 58.310667004, 56.435790641, 51.019814812, 53.310062558, 47.762707137, 57.069909524, 49.92030233, 46.728770205, 57.037961444, 55.229109813, 52.166971365, 50.541420494, 49.226621818, 50.2067091, 47.863729342, 47.116772654, 49.7656626, 44.974095956, 48.667868199, 42.241274113, 46.022382453, 49.6630349, 44.930324539, 42.203344352, 49.625605494, 48.576089344, 45.937352987, 48.082681454, 47.024019995, 47.7642531, 45.535254622, 45.103175218, 47.640851402, 43.590661125, 46.584185484, 41.025529046, 44.063678425, 47.640703902, 43.590646205, 41.025515524, 47.587142338, 46.584108342, 44.063610873, 52.765680996, 48.820192634, 52.4162394, 49.970148228, 47.71891569, 50.406042552, 45.62052235, 49.326011714, 42.846059312, 46.640424087, 49.207539094, 44.559568777, 41.852323333, 49.16805828, 48.15227162, 45.531321684]
td_bins = [78, 85, 82, 83, 85, 83, 82, 78, 79, 63, 90, 72, 67, 78, 85, 83, 65, 62, 64, 52, 55, 59, 59, 56, 54, 50, 62, 55, 50, 60, 61, 53, 69, 76, 74, 75, 75, 71, 73, 72, 68, 65, 74, 71, 66, 74, 74, 65, 95, 103, 107, 96, 97, 103, 88, 98, 79, 81, 104, 90, 76, 101, 101, 90, 99, 100, 96, 97, 94, 97, 96, 97, 95, 82, 101, 96, 80, 97, 96, 91, 70, 68, 70, 65, 68, 69, 64, 61, 65, 64, 77, 62, 64, 78, 65, 62]
td_Mean_Completeness = [53.86, 53.28, 56.31, 57.37, 54.68, 50.1, 58.05, 54.04, 51.16, 54.76, 51.12, 52.23, 57.52, 52.15, 58.83, 54.29, 55.06, 54.27, 53.37, 53.16, 53.77, 57.81, 54.55, 54.88, 54.17, 52.29, 54.94, 52.04, 51.41, 58.36, 54.69, 56.81, 56.66, 53.72, 57.38, 56.22, 52.7, 52.65, 55.11, 54.88, 57.48, 54.64, 53.6, 54.34, 58.18, 52.11, 56.26, 58.28, 55.64, 57.45, 58.55, 54.54, 57.45, 58.65, 55.18, 55.5, 58.25, 60.02, 55.82, 58.17, 60.34, 56.8, 57.0, 59.14, 54.4, 51.04, 58.28, 56.73, 48.21, 51.68, 52.66, 56.77, 55.24, 57.44, 58.56, 57.33, 54.41, 56.39, 53.78, 58.04, 57.17, 59.53, 56.07, 51.82, 58.95, 60.91, 56.23, 56.59, 60.48, 59.56, 56.87, 61.29, 54.39, 60.71, 53.56, 53.89]
td_Mean_Contamination = [64.8, 56.57, 63.75, 56.14, 63.41, 65.77, 68.71, 59.0, 51.31, 53.39, 53.35, 56.24, 59.82, 71.72, 69.63, 66.78, 53.25, 46.8, 58.27, 54.69, 48.23, 50.47, 57.38, 53.78, 50.02, 45.22, 47.86, 47.19, 54.22, 48.53, 54.71, 53.82, 83.63, 71.19, 88.26, 89.13, 73.4, 65.59, 87.02, 85.4, 79.15, 67.55, 71.37, 72.84, 82.49, 64.29, 92.78, 85.56, 107.95, 99.99, 99.71, 85.91, 99.47, 78.31, 82.71, 107.35, 97.95, 92.98, 87.49, 93.41, 102.6, 77.96, 97.57, 99.91, 83.7, 63.09, 68.23, 69.77, 73.61, 78.21, 70.56, 69.82, 69.56, 64.5, 63.16, 82.06, 72.7, 69.26, 75.71, 62.79, 62.01, 54.02, 69.1, 58.79, 55.27, 51.99, 60.61, 57.98, 59.85, 65.24, 56.41, 58.49, 58.44, 53.79, 57.06, 54.33]
td_good_bins = [21, 19, 19, 14, 15, 20, 20, 17, 17, 12, 19, 16, 12, 22, 20, 17, 16, 16, 17, 17, 15, 17, 19, 15, 15, 14, 16, 16, 15, 18, 19, 16, 18, 17, 17, 16, 16, 16, 17, 17, 14, 15, 17, 15, 15, 19, 15, 17, 22, 21, 18, 19, 21, 22, 17, 21, 21, 16, 23, 16, 17, 24, 20, 21, 19, 21, 21, 15, 18, 19, 20, 18, 15, 17, 19, 18, 18, 18, 19, 16, 17, 17, 19, 18, 18, 17, 19, 17, 19, 18, 18, 16, 18, 21, 20, 17]
td_good_Mean_Completeness = [85.98, 87.17, 86.9, 87.66, 86.64, 86.04, 85.18, 86.79, 86.35, 86.03, 88.47, 85.58, 89.46, 86.17, 83.85, 86.86, 87.87, 87.38, 87.61, 86.94, 87.35, 86.96, 88.23, 88.04, 88.62, 90.16, 88.26, 89.11, 86.1, 88.26, 87.55, 87.21, 87.51, 86.6, 87.87, 87.83, 86.62, 87.87, 87.12, 87.7, 87.06, 87.94, 87.37, 85.69, 87.28, 85.92, 88.11, 86.67, 87.68, 87.33, 87.89, 88.48, 89.21, 88.4, 86.1, 86.69, 87.99, 88.53, 89.18, 87.33, 86.83, 88.58, 87.12, 87.34, 88.54, 86.69, 87.03, 86.06, 88.99, 86.81, 86.17, 86.12, 87.78, 85.64, 86.41, 87.08, 85.98, 88.56, 87.42, 87.18, 85.99, 87.07, 86.97, 86.86, 86.75, 89.43, 86.51, 86.19, 86.17, 85.87, 86.68, 87.63, 86.26, 89.38, 87.22, 87.03]
td_good_Mean_Contamination = [5.06, 4.78, 4.28, 4.58, 3.83, 4.25, 4.66, 4.68, 4.57, 4.01, 4.01, 4.34, 4.43, 4.96, 4.23, 5.17, 3.79, 3.63, 3.76, 3.61, 3.99, 3.66, 4.04, 3.17, 3.67, 3.37, 4.3, 3.39, 3.89, 3.63, 4.05, 4.51, 3.87, 3.93, 3.27, 3.48, 4.16, 4.88, 3.89, 3.16, 4.65, 4.07, 4.41, 4.22, 4.07, 3.78, 3.85, 4.16, 3.92, 3.06, 4.2, 4.41, 3.21, 3.81, 4.31, 3.64, 3.84, 4.09, 3.84, 3.64, 3.48, 3.71, 3.4, 4.06, 3.76, 4.32, 4.61, 4.74, 3.56, 4.54, 4.1, 4.05, 4.26, 4.31, 4.84, 4.06, 3.88, 4.59, 4.63, 3.76, 3.57, 3.88, 3.56, 3.5, 3.68, 4.71, 3.46, 3.84, 3.94, 3.61, 3.5, 3.36, 3.4, 3.57, 3.91, 4.21]
td_medium_bins = [32, 33, 32, 24, 26, 35, 33, 26, 29, 18, 35, 27, 18, 33, 32, 28, 22, 20, 20, 23, 23, 21, 25, 22, 21, 22, 22, 23, 22, 22, 26, 21, 28, 27, 30, 26, 25, 27, 29, 23, 24, 21, 28, 23, 20, 26, 26, 24, 36, 34, 31, 26, 29, 36, 33, 29, 27, 21, 39, 24, 22, 37, 32, 28, 28, 28, 31, 26, 32, 30, 32, 29, 28, 29, 31, 28, 30, 32, 30, 27, 22, 23, 28, 26, 24, 23, 24, 25, 27, 23, 25, 27, 25, 25, 25, 25]
r_read_files = ['9117.5_raw', '10158.8_raw', '11263.1_raw', '11306.3_raw', '11306.1_raw', '11260.6_raw', '11260.5_raw', '9108.1_raw', '9053.2_raw', '9672.8_raw', '9108.2_raw', '9053.4_raw', '9053.3_raw', '9117.4_raw', '9117.6_raw', '9117.7_raw', '9117.8_raw', '10158.6_raw', '10186.3_raw', '10186.4_raw', '7331.1_raw', '9053.5_raw', '9041.8_raw']
r_tMreads = [36.0129894, 17.6218972, 38.2800142, 34.9076424, 35.3037194, 37.1504476, 40.3613864, 20.7773948, 31.8428354, 30.1166938, 27.718318, 40.8492618, 39.7169858, 34.4581152, 26.9696492, 21.3309852, 39.9148934, 34.9441596, 35.690255, 35.5019026, 33.4711394, 28.2058246, 36.96984]
r_Bbases = [54.379613994, 26.609064772, 57.802821442, 52.710540024, 53.308616294, 56.097175876, 60.945693464, 31.373866148, 48.082681454, 45.1750407, 41.85466018, 61.682385318, 59.972648558, 52.031753952, 40.724170292, 32.209787652, 60.271489034, 52.765680996, 53.89228505, 53.607872926, 50.541420494, 42.590795146, 55.8244584]
r_bins = [65, 47, 139, 99, 55, 90, 115, 38, 86, 87, 69, 71, 95, 70, 62, 95, 65, 78, 85, 109, 45, 49, 52]
r_Mean_Completeness = [45.96, 58.83, 51.28, 56.54, 56.55, 63.26, 52.23, 58.47, 54.69, 53.7, 58.18, 60.32, 62.41, 65.52, 52.14, 50.97, 56.26, 57.0, 65.39, 54.89, 53.78, 53.56, 52.81]
r_Mean_Contamination = [24.03, 69.63, 67.13, 60.4, 53.21, 81.43, 55.06, 35.14, 54.71, 74.62, 54.7, 76.81, 66.86, 78.94, 67.85, 46.5, 92.78, 97.57, 121.14, 103.47, 75.71, 57.06, 65.71]
r_good_bins = [15, 4, 23, 19, 14, 25, 29, 9, 20, 20, 18, 18, 23, 17, 10, 22, 16, 21, 18, 20, 9, 9, 14]
r_medium_bins = [24, 9, 37, 28, 22, 36, 41, 14, 34, 26, 28, 27, 33, 22, 16, 36, 22, 32, 28, 31, 14, 16, 20]
r_good_Mean_Completeness = [84.79, 83.85, 89.19, 87.14, 90.06, 87.41, 90.7, 86.6, 87.55, 85.4, 87.31, 86.68, 86.18, 84.94, 87.02, 88.92, 88.11, 87.12, 87.24, 89.3, 87.42, 87.22, 86.64]
r_good_Mean_Contamination = [3.37, 4.23, 3.57, 4.4, 3.62, 4.4, 4.56, 3.0, 4.05, 4.32, 4.53, 3.9, 4.48, 4.17, 2.86, 2.87, 3.85, 3.4, 3.63, 4.37, 4.63, 3.91, 4.25]

#create dataset
df = pd.DataFrame({'td_tMreads': td_tMreads,
                   'td_Bbases': td_Bbases,
                   'td_bins': td_bins,
                   'td_Mean_Completeness': td_Mean_Completeness,
                   'td_Mean_Contamination': td_Mean_Contamination,
                   'td_good_bins': td_good_bins,
                   'td_medium_bins': td_medium_bins,
                   'td_good_Mean_Completeness': td_good_Mean_Completeness,
                   'td_good_Mean_Contamination': td_good_Mean_Contamination,
                   'Mix_Group': Mix_Group})
df.rename(columns={'td_tMreads': 'tMreads', 'td_Bbases': 'Bbases'}, inplace=True)

df2 = pd.DataFrame({'r_tMreads': r_tMreads,
                   'r_Bbases': r_Bbases,
                   'r_bins': r_bins,
                   'r_Mean_Completeness': r_Mean_Completeness,
                   'r_Mean_Contamination': r_Mean_Contamination,
                   'r_good_bins': r_good_bins,
                   'r_medium_bins': r_medium_bins,
                   'r_good_Mean_Completeness': r_good_Mean_Completeness,
                   'r_good_Mean_Contamination': r_good_Mean_Contamination})
df2.rename(columns={'r_tMreads': 'tMreads', 'r_Bbases': 'Bbases'}, inplace=True)

#view dataset
#print(df)

#fit regression model
model = smf.mixedlm("td_medium_bins ~ Bbases", data=df, groups=df["Mix_Group"])
modelf = model.fit()
model1 = ols('td_medium_bins ~ Bbases', data=df).fit()
model2 = ols('r_medium_bins ~ Bbases', data=df2).fit()

#adj r^2 = Pearson product-moment correlation coefficient (r) adjusted for number of predictors 
#... r = sqrt(0.296) 
#adjusted Pearson's r = 0.544

#mdf = md.fit()
#print(mdf.summary())

#view model summary
print(modelf.summary())
print(model1.summary())
print(model2.summary())

#define figure size
fig = plt.figure(figsize=(12,8))
fig2 = plt.figure(figsize=(12,8))

#produce regression plots
fig = sm.graphics.plot_regress_exog(model1, 'Bbases', fig=fig)
fig2 = sm.graphics.plot_regress_exog(model2, 'Bbases', fig=fig)
           Mixed Linear Model Regression Results
============================================================
Model:            MixedLM Dependent Variable: td_medium_bins
No. Observations: 96      Method:             REML          
No. Groups:       6       Scale:              8.4560        
Min. group size:  16      Likelihood:         -247.5058     
Max. group size:  16      Converged:          Yes           
Mean group size:  16.0                                      
-------------------------------------------------------------
             Coef.   Std.Err.    z     P>|z|   [0.025  0.975]
-------------------------------------------------------------
Intercept    -2.118     4.767  -0.444  0.657  -11.462   7.226
Bbases        0.610     0.095   6.390  0.000    0.423   0.797
Group Var    12.750     3.052                                
============================================================

                            OLS Regression Results                            
==============================================================================
Dep. Variable:         td_medium_bins   R-squared:                       0.178
Model:                            OLS   Adj. R-squared:                  0.169
Method:                 Least Squares   F-statistic:                     20.29
Date:                Sun, 11 Apr 2021   Prob (F-statistic):           1.91e-05
Time:                        18:21:14   Log-Likelihood:                -270.83
No. Observations:                  96   AIC:                             545.7
Df Residuals:                      94   BIC:                             550.8
Df Model:                           1                                         
Covariance Type:            nonrobust                                         
==============================================================================
                 coef    std err          t      P>|t|      [0.025      0.975]
------------------------------------------------------------------------------
Intercept     10.6339      3.619      2.939      0.004       3.449      17.819
Bbases         0.3412      0.076      4.504      0.000       0.191       0.492
==============================================================================
Omnibus:                        3.934   Durbin-Watson:                   1.321
Prob(Omnibus):                  0.140   Jarque-Bera (JB):                3.838
Skew:                           0.441   Prob(JB):                        0.147
Kurtosis:                       2.575   Cond. No.                         413.
==============================================================================

Warnings:
[1] Standard Errors assume that the covariance matrix of the errors is correctly specified.
                            OLS Regression Results                            
==============================================================================
Dep. Variable:          r_medium_bins   R-squared:                       0.243
Model:                            OLS   Adj. R-squared:                  0.207
Method:                 Least Squares   F-statistic:                     6.733
Date:                Sun, 11 Apr 2021   Prob (F-statistic):             0.0169
Time:                        18:21:14   Log-Likelihood:                -78.208
No. Observations:                  23   AIC:                             160.4
Df Residuals:                      21   BIC:                             162.7
Df Model:                           1                                         
Covariance Type:            nonrobust                                         
==============================================================================
                 coef    std err          t      P>|t|      [0.025      0.975]
------------------------------------------------------------------------------
Intercept      4.5998      8.365      0.550      0.588     -12.796      21.995
Bbases         0.4283      0.165      2.595      0.017       0.085       0.772
==============================================================================
Omnibus:                        1.130   Durbin-Watson:                   2.333
Prob(Omnibus):                  0.568   Jarque-Bera (JB):                1.000
Skew:                           0.458   Prob(JB):                        0.607
Kurtosis:                       2.548   Cond. No.                         268.
==============================================================================

Warnings:
[1] Standard Errors assume that the covariance matrix of the errors is correctly specified.
<Figure size 864x576 with 0 Axes>

DISCUSSION

In this study, we demonstrated that JGI trimming and decontamination procedures had little impact on the quantity or quality of MAGs from complex rhizosphere metagenomes, or the functional profiling of raw and qc MAGs that were phylogenomically paired (Table 2). However, we did observe that the number of raw and qc MAGs discretely placed in species trees increased from zero to four MAGs to five to seven MAGs to seven to 10 MAGs as quality thresholds for completeness and contamination were decreased from high to good to medium quality (Fig. 3, Supplemental_Fig.1). Phylogenomic differences of MAGs may be explained by differences in binning and assembly metrics including the 2.0% lower average contamination of qc MAGs compared to raw MAGs, and significantly higher total contig counts, contigs greater than 10k bp in length, and larger total lengths of raw assemblies compared to qc assemblies. Since choosing JGI trimmed and decontaminated or raw reads means reporting and depositing a similar quantity and quality of MAGs, some with phylogenomic differences, researchers may choose to assemble each and retain the union of discreet and paired MAGs to increase the total number in their analysis and avoid missing functionally important community members.We believe our methods were appropriate for the questions we were asking, but there are other ways of analyzing the data. To illustrate this point, consider that binning of a single assembly generates multiple MAGs, 77 MAGs for example. Binning multiple assemblies generates multiple MAG counts (e.g. assembly01 = 77 MAGs, assembly02 = 66 MAGs, assembly03 = 91 MAGs, ... assembly24 = 73 MAGs). So, a distribution of MAG counts can be generated from a set of assemblies, which can subsequently be compared to another distribution of MAG counts from an alternative assembly set (e.g. raw vs qc assembly MAG counts). However, each MAG has its own completeness percentage (e.g. bin001 = 14.9%, bin002 = 8.7%, bin003 = 93.4%, ... bin077 = 23.1%), contamination percentage, and counts of single-copy and multi-copy markers, used to calculate the completeness and contamination percentages. Since each assembly has multiple MAGs, each assembly set contains multiple distributions for these other metrics. To evaluate differences in binning metrics besides MAG counts, we elected to average MAGs single-copy marker counts, multi-copy marker counts, completeness scores, and contamination scores for each assembly. Distributions used for statistical testing were therefore average values. The consequence of this is that we tested differences in the averages of averages. A possible alternative method could be to make distributions by combining all values for each MAG metric for all assemblies generated with the same trimming and decontamination procedure, disregarding intuitive assembly-level groupings. We believe our method is more relevant to the researcher who wants to know if their assembly, when binned, is going to have better or worse binning metrics than if they used an assembly prepared a different way (raw vs qc).

We failed to reject the null hypotheses that there were no significant differences in several key binning metrics for assemblies that were JGI trimmed and decontaminated compared to raw assemblies. These include total counts of MAGs, and completeness averages, single-copy marker count averages, and multi-copy marker count averages of assembly MAGs. However, our study was unpowered, comparing 23 assembly pairs. It is expected that differences in these metrics could be found significant given a much higher sample size. Based on the small effect sizes of less than 0.1 found for the significant difference in average contamination, it is also expected though that significance would have a small practical effect. We calculate that a powered study (power = 0.8) would need a sample size of greater than 824 assembly pairs (801 more pairs than what we used) for an effect size less than or equal to 0.1 and α = 0.05. Then again, an effect size may be greater for low quality data, and some JGI datasets are worse quality than the ones used in this study. Therefore, in addition to sample size, future studies should consider using average Q scores as a factor or filter in experiment designs.

CONCLUSIONS

Mild trimming and decontamination of metagenomics reads can change the way an investigator answers the questions "Who is there and what are they doing?" This is because some MAGs assembled with JGI trimming and decontamination are phylogenomically distinct from ones assembled with raw reads. Phylogenomics informs investigators of MAG identities and functions through relatedness to other organisms, and phylogenomically distinct microbes also have differing COG, PFAM, and TIGRFAM functional profiles. Since the number of MAGs discretely placed in species trees increases with inclusion of MAGs with lower qualities, the discrepancy will be more substantial with medium quality MAGs compared to high quality MAGs. While mild JGI trimming and decontamination can impact MAG identities and functions, it does not appear to impact how many are assembled. However, aggressive trimming should be avoided for this reason.

List of abbreviations

IMG/M = Integrated Microbial Genomes and Microbiomes

DOE = United States Department of Energy

JGI = Joint Genome Institute

KBS = Kellogg Biological Station

MAGs = metagenome assembled genomes

PCA = principal component analysis

qc = JGI trimmed and decontaminated fastq files or reads

raw = raw fastq files or reads

DECLARATIONS

Not applicable

Not applicable

Availability of data and material

All data and code generated and analyzed during this study are included in this published article, JGI IMG/M (Proposal ID: 1296, [5]), in the KBase narratives [34-35], and in the GitHub repository [33].

Competing interests

The authors declare that they have no competing interests.

Funding

Funding was provided by the United States Department of Energy, Award No. DE-EE0008523.

Authors' contributions

JMW was responsible for experimental design, data acquisition, wrangling, statistical analyses, creating figures and tables, depositing code and generated data into repositories, and drafted the manuscript.

AMG contributed to manuscript edits.

Acknowledgements

We acknowledge the computing resources provided on Henry2, a high-performance computing cluster operated by North Carolina State University, and acknowledge Lisa L. Lowe for her assistance with adding software packages to Henry2, which was provided through the Office of Information Technology High Performance Computing services at NC State University.

REFERENCES

1. Gurevich A, Saveliev V, Vyahhi N, Tesler G. QUAST: quality assessment tool for genome assemblies, Bioinformatics. 2013;29:1072-1075. doi:10.1093/bioinformatics/btt086.

2. Mikheenko A, Saveliev V, Gurevich A. MetaQUAST: evaluation of metagenome assemblies. Bioinformatics. 2016;32:1088-90. doi:10.1093/bioinformatics/btv697.

3. Bowers RM, Kyrpides NC, Stepanauskas R, Harmon-Smith M, Doud D, Reddy TB, et al. Minimum information about a single amplified genome (MISAG) and a metagenome-assembled genome (MIMAG) of bacteria and archaea. Nature biotechnology. 2017;35:725-31. doi:10.1038/nbt.3893.

4. Parks DH, Imelfort M, Skennerton CT, Hugenholtz P, Tyson GW. CheckM: assessing the quality of microbial genomes recovered from isolates, single cells, and metagenomes. Genome Res. 2015;25:1043-1055. doi:10.1101/gr.186072.114.

5. Tiedje JM. Metagenomic analysis of the rhizosphere of three biofuel crops at the KBS intensive site. United States: N. p. 2013. doi:10.25585/1488010.

6. Guo J, Cole JR, Zhang Q, Brown CT, Tiedje JM. Microbial community analysis with ribosomal gene fragments from shotgun metagenomes. Appl. Environ. Microbiol. 2016;82:157-166.

7. Bay SK, Dong X, Bradley JA, Leung PM, Grinter R, Jirapanjawat T, et al.. Trace gas oxidizers are widespread and active members of soil microbial communities. Nat. Microbiology. 2021:1-11.

8. Arkin AP, Cottingham RW, Henry CS, Harris NL, Stevens RL, Maslov S, et al.. KBase: The United States Department of Energy Systems Biology Knowledgebase. Nat. Biotechnology. 2018;36:566. doi:10.1038/nbt.4163.

9. Kluyver T, Ragan-Kelley B, Pérez F, Granger BE, Bussonnier M, Frederic J, et al.. Jupyter Notebooks - a publishing format for reproducible computational workflows. ELPUB. 2016.

10. Chen IM, Chu K, Palaniappan K, Ratner A, Huang J, Huntemann M, et al.. The IMG/M data management and analysis system v. 6.0: new tools and advanced capabilities. Nucleic Acids Res. 2021;49:D751-63. doi.org/10.1093/nar/gkaa939.

11. Mukherjee S, Stamatis D, Bertsch J, Ovchinnikova G, Sundaramurthi JC, Lee J, et al.. Genomes OnLine Database (GOLD) v. 8: overview and updates. Nucleic Acids Res. 2021;49:D723-33. doi:10.1093/nar/gkaa983.

12. Bushnell B: BBTools Software Package. 2017. http://sourceforge.net/projects/bbmap. Accessed 15 Oct 2020.

13. BBDuk Guide. https://jgi.doe.gov/data-and-tools/bbtools/bb-tools-user-guide/bbduk-guide/. Accessed 15 Oct 2020.

14. SeqAnswers BBDuk. http://seqanswers.com/forums/showthread.php?t=96593&goto=nextnewest. Accessed 15 Oct 2020.

15. BioStars BBDuk 1. https://www.biostars.org/p/237714/#237745. Accessed 15 Oct 2020.

16. BioStars BBDuk 2. https://www.biostars.org/p/237931/. Accessed 15 Oct 2020.

17. Gelman A, Hill J. Data analysis using regression and multilevel/hierarchical models. Cambridge University Press. 2006.

18. Li D, Luo R, Liu CM, Leung CM, Ting HF, Sadakane K, et al.. MEGAHIT v1. 0: a fast and scalable metagenome assembler driven by advanced methodologies and community practices. Methods. 2016;102:3-11.

19. Azad A, Pavlopoulos GA, Ouzounis CA, Kyrpides NC, Buluç A. HipMCL: a high-performance parallel implementation of the Markov clustering algorithm for large-scale networks. Nucleic Acids Res. 2018;46:e33.

20. Prjibelski A, Antipov D, Meleshko D, Lapidus A, Korobeynikov A. Using SPAdes De Novo Assembler. Curr. Protoc. Bioinform. 2020;70:e102.

21. Peng Y, Leung HC, Yiu SM, Chin FY. IDBA-UD: a de novo assembler for single-cell and metagenomic sequencing data with highly uneven depth. Bioinformatics. 2012;28:1420-8. doi.org/10.1093/bioinformatics/bts174.

22. Whitham JM. KBase Silver Case Study: Determining Media Formulation Requirements for Isolation of Microbiome Constituents. United States: N. p. 2021. doi:10.25982/68579.143/1766297.

23. Kang DD, Li F, Kirton E, Thomas A, Egan R, An H, Wang Z. MetaBAT 2: an adaptive binning algorithm for robust and efficient genome reconstruction from metagenome assemblies. PeerJ. 2019;7:e7359. doi.org/10.7717/peerj.7359.

24. Wu YW, Simmons BA, Singer SW. MaxBin 2.0: an automated binning algorithm to recover genomes from multiple metagenomic datasets. Bioinformatics. 2016;32:605-607.

25. Yue Y, Huang H, Qi Z, Dou HM, Liu XY, Han TF, et al. Evaluating metagenomics tools for genome binning with real metagenomic datasets and CAMI datasets. BMC Bioinformatics. 2020;21:334. doi.org/10.1186/s12859-020-03667-3.

26. Brettin T, Davis JJ, Disz T, Edwards RA, Gerdes S, Olsen GJ, et al. RASTtk: a modular and extensible implementation of the RAST algorithm for building custom annotation pipelines and annotating batches of genomes. Sci. Rep. 2015;5:1-6.

27. Price MN, Dehal PS, Arkin AP. FastTree 2 Approximately Maximum-Likelihood Trees for Large Alignments. PLoS One. 2010;5. doi:10.1371/journal.pone.0009490

28. Huerta-Cepas J, Serra F, Bork P. ETE 3: reconstruction, analysis, and visualization of phylogenomic data. Mol Biol Evol. 2016;33:1635-8.

29. Galperin MY, Wolf YI, Makarova KS, Vera Alvarez R, Landsman D, Koonin EV. COG database update: focus on microbial diversity, model organisms, and widespread pathogens. Nucleic Acids Res. 2021;49:D274-81.

30. Mistry J, Chuguransky S, Williams L, Qureshi M, Salazar GA, Sonnhammer EL, et al. Pfam: The protein families database in 2021. Nucleic Acids Research. 2021;49:D412-9.

31. Haft DH, Loftus BJ, Richardson DL, Yang F, Eisen JA, Paulsen IT, White O. TIGRFAMs: a protein family resource for the functional identification of proteins. Nucleic Acids Res. 2001;29:41-3.

32. Torchiano M. effsize: Efficient Effect Size Computation. 2020. doi:10.5281/zenodo.1480624.

33. GitHub. https://github.com/jmwhitha/Trimming_and_decon. Accessed 22 April 2021.

34. Whitham, Jason. JGI QC impact on assembly, binning, phylogenomics, and functional analysis. United States: N. p., 2021. Web. doi:10.25982/62657.1515/1779219.

35. Whitham, Jason. Impact of BBDuk metagenomic read trimming and decontamination. United States: N. p., 2021. Web. doi:10.25982/77705.1341/1779218.

36. Sainani K. The importance of accounting for correlated observations. PM&R. 2010;2:858-861.

Apps

  1. Assemble Reads with MEGAHIT v1.2.9
    • Li D, Liu C-M, Luo R, Sadakane K, Lam T-W. MEGAHIT: an ultra-fast single-node solution for large and complex metagenomics assembly via succinct de Bruijn graph. Bioinformatics. 2015;31: 1674 1676. doi:10.1093/bioinformatics/btv033
  2. Assess Genome Quality with CheckM - v1.0.18
    • Parks DH, Imelfort M, Skennerton CT, Hugenholtz P, Tyson GW. CheckM: assessing the quality of microbial genomes recovered from isolates, single cells, and metagenomes. Genome Res. 2015;25: 1043 1055. doi:10.1101/gr.186072.114
    • CheckM source:
    • Additional info:
  3. Assess Quality of Assemblies with QUAST - v4.4
    • [1] Gurevich A, Saveliev V, Vyahhi N, Tesler G. QUAST: quality assessment tool for genome assemblies. Bioinformatics. 2013;29: 1072 1075. doi:10.1093/bioinformatics/btt086
    • [2] Mikheenko A, Valin G, Prjibelski A, Saveliev V, Gurevich A. Icarus: visualizer for de novo assembly evaluation. Bioinformatics. 2016;32: 3321 3323. doi:10.1093/bioinformatics/btw379
  4. Import FASTQ/SRA File as Reads from Staging Area
    no citations
  5. MetaBAT2 Contig Binning - v1.7
    • Kang DD, Froula J, Egan R, Wang Z. MetaBAT, an efficient tool for accurately reconstructing single genomes from complex microbial communities. PeerJ. 2015;3: e1165. doi:10.7717/peerj.1165
    • MetaBAT2 source: