Generated September 14, 2022

Abstract / Description

Upon investigating the gut microbiome of mice for microbes related to Polycystic Ovary Syndrome (PCOS), our team found 2 metagenomes that could not be classified using alignment based methods. To further investigate microbial species from these samples, we used common MAGs workflow to create draft genomes which were then later taxonomically classified using GTDB-tk app.

from biokbase.narrative.jobs.appmanager import AppManager
AppManager().run_app_batch(
    [{
        "app_id": "kb_uploadmethods/import_fasta_as_assembly_from_staging",
        "tag": "release",
        "version": "d67ff71a675aed5566d257c267689ea0d2a4a8b0",
        "params": [{
            "staging_file_subdir_path": "scaffolds.fasta",
            "assembly_name": "scaffolds.fasta_assembly"
        }],
        "shared_params": {
            "type": "draft isolate",
            "min_contig_length": 500
        }
    }, {
        "app_id": "kb_uploadmethods/import_fastq_noninterleaved_as_reads_from_staging",
        "tag": "release",
        "version": "d67ff71a675aed5566d257c267689ea0d2a4a8b0",
        "params": [{
            "fastq_fwd_staging_file_name": "mouse_132_T4_S87_L001_R1_001.fastq.gz",
            "fastq_rev_staging_file_name": "mouse_132_T4_S87_L001_R2_001.fastq.gz",
            "name": "mouse_132"
        }, {
            "fastq_fwd_staging_file_name": "mouse_112_T4_S86_L001_R1_001.fastq.gz",
            "fastq_rev_staging_file_name": "mouse_112_T4_S86_L001_R2_001.fastq.gz",
            "name": "mouse_112"
        }],
        "shared_params": {
            "sequencing_tech": "Illumina",
            "single_genome": 1,
            "read_orientation_outward": 0,
            "insert_size_std_dev": None,
            "insert_size_mean": None
        }
    }],
    cell_id="e7f10ab6-212f-4b49-b542-fed7f2700d3f",
    run_id="7aef9407-2a90-4ba9-8a1e-200eff598ba2"
)

Binning Contigs

The reassembly of metagenomes were conducted outside of kbase using SPAdes software. To bin the contigs we used MaxBin2 and to refine the bins produced from Maxbin2 we used DASTool. The reason we decided to refine the bins, was to get higher quality draft genomes and remove any dubplicate contigs in each bin.

Group assembled metagenomic contigs into lineages (Bins) using depth-of-coverage, nucleotide composition, and marker genes.
This app completed without errors in 4h 32m 11s.
Objects
Created Object Name Type Description
mouse_scaffolds_binned_MaxBin2 BinnedContigs BinnedContigs from MaxBin2
Links
Files
These are only available in the live Narrative: https://narrative.kbase.us/narrative/116829
  • maxbin_result.zip - File(s) generated by MaxBin2 App
Optimize bacterial or archaeal genome bins using a dereplication, aggregation and scoring strategy
This app completed without errors in 46m 58s.
Links
Files
These are only available in the live Narrative: https://narrative.kbase.us/narrative/116829
  • das_tool_result.zip - Files generated by kb_das_tool App

Annotating MAGs

After binning process, we annotated our bins using RASTtk. Annotations were then later used to determine if each bin contained the essential components for a living organism.

Annotate bacterial or archaeal assemblies and/or assembly sets using RASTtk (Rapid Annotations using Subsystems Technology toolkit).
This app completed without errors in 2h 41m 27s.
Objects
Created Object Name Type Description
bin.016.fasta_assembly.RAST Genome RAST annotation
bin.051.fasta_assembly.RAST Genome RAST annotation
bin.007.fasta_assembly.RAST Genome RAST annotation
bin.046.fasta_assembly.RAST Genome RAST annotation
bin.037.fasta_assembly.RAST Genome RAST annotation
bin.015.fasta_assembly.RAST Genome RAST annotation
bin.019.fasta_assembly.RAST Genome RAST annotation
bin.022.fasta_assembly.RAST Genome RAST annotation
bin.047.fasta_assembly.RAST Genome RAST annotation
bin.042.fasta_assembly.RAST Genome RAST annotation
bin.024.fasta_assembly.RAST Genome RAST annotation
bin.005.fasta_assembly.RAST Genome RAST annotation
bin.025.fasta_assembly.RAST Genome RAST annotation
bin.006.fasta_assembly.RAST Genome RAST annotation
bin.028.fasta_assembly.RAST Genome RAST annotation
bin.017.fasta_assembly.RAST Genome RAST annotation
bin.032.fasta_assembly.RAST Genome RAST annotation
bin.044.fasta_assembly.RAST Genome RAST annotation
bin.030.fasta_assembly.RAST Genome RAST annotation
bin.010.fasta_assembly.RAST Genome RAST annotation
bin.012.fasta_assembly.RAST Genome RAST annotation
bin.021.fasta_assembly.RAST Genome RAST annotation
bin.023.fasta_assembly.RAST Genome RAST annotation
bin.008.fasta_assembly.RAST Genome RAST annotation
bin.001.fasta_assembly.RAST Genome RAST annotation
bin.050.fasta_assembly.RAST Genome RAST annotation
bin.035.fasta_assembly.RAST Genome RAST annotation
bin.038.fasta_assembly.RAST Genome RAST annotation
bin.011.fasta_assembly.RAST Genome RAST annotation
bin.002.fasta_assembly.RAST Genome RAST annotation
bin.039.fasta_assembly.RAST Genome RAST annotation
bin.034.fasta_assembly.RAST Genome RAST annotation
bin.018.fasta_assembly.RAST Genome RAST annotation
bin.004.fasta_assembly.RAST Genome RAST annotation
bin.031.fasta_assembly.RAST Genome RAST annotation
bin.040.fasta_assembly.RAST Genome RAST annotation
bin.048.fasta_assembly.RAST Genome RAST annotation
bin.009.fasta_assembly.RAST Genome RAST annotation
bin.029.fasta_assembly.RAST Genome RAST annotation
bin.033.fasta_assembly.RAST Genome RAST annotation
bin.043.fasta_assembly.RAST Genome RAST annotation
bin.003.fasta_assembly.RAST Genome RAST annotation
bin.026.fasta_assembly.RAST Genome RAST annotation
bin.014.fasta_assembly.RAST Genome RAST annotation
bin.013.fasta_assembly.RAST Genome RAST annotation
bin.020.fasta_assembly.RAST Genome RAST annotation
bin.049.fasta_assembly.RAST Genome RAST annotation
bin.041.fasta_assembly.RAST Genome RAST annotation
bin.045.fasta_assembly.RAST Genome RAST annotation
bin.027.fasta_assembly.RAST Genome RAST annotation
bin.036.fasta_assembly.RAST Genome RAST annotation
Anotated_assembly GenomeSet Genome Set
Summary
The RAST algorithm was applied to annotating a genome sequence comprised of 131 contigs containing 3071452 nucleotides. 
No initial gene calls were provided.
Standard features were called using: glimmer3; prodigal.
A scan was conducted for the following additional feature types: rRNA; tRNA; selenoproteins; pyrrolysoproteins; repeat regions; crispr.
The genome features were functionally annotated using the following algorithm(s): Kmers V2; Kmers V1; protein similarity.
In addition to the remaining original 0 coding features and 0 non-coding features, 3022 new features were called, of which 136 are non-coding.
Output genome has the following feature types:
	Coding gene                     2886 
	Non-coding crispr_array            1 
	Non-coding crispr_repeat          17 
	Non-coding crispr_spacer          16 
	Non-coding repeat                 50 
	Non-coding rna                    52 
Overall, the genes have 0 distinct functions. 
The genes include 0 genes with a SEED annotation ontology across 0 distinct SEED functions.
The number of distinct functions can exceed the number of genes because some genes have multiple functions.
bin.016.fasta_assembly succeeded!

The RAST algorithm was applied to annotating a genome sequence comprised of 103 contigs containing 2829289 nucleotides. 
No initial gene calls were provided.
Standard features were called using: glimmer3; prodigal.
A scan was conducted for the following additional feature types: rRNA; tRNA; selenoproteins; pyrrolysoproteins; repeat regions; crispr.
The genome features were functionally annotated using the following algorithm(s): Kmers V2; Kmers V1; protein similarity.
In addition to the remaining original 0 coding features and 0 non-coding features, 3188 new features were called, of which 52 are non-coding.
Output genome has the following feature types:
	Coding gene                     3136 
	Non-coding repeat                 24 
	Non-coding rna                    28 
Overall, the genes have 0 distinct functions. 
The genes include 0 genes with a SEED annotation ontology across 0 distinct SEED functions.
The number of distinct functions can exceed the number of genes because some genes have multiple functions.
bin.051.fasta_assembly succeeded!

The RAST algorithm was applied to annotating a genome sequence comprised of 122 contigs containing 2333177 nucleotides. 
No initial gene calls were provided.
Standard features were called using: glimmer3; prodigal.
A scan was conducted for the following additional feature types: rRNA; tRNA; selenoproteins; pyrrolysoproteins; repeat regions; crispr.
The genome features were functionally annotated using the following algorithm(s): Kmers V2; Kmers V1; protein similarity.
In addition to the remaining original 0 coding features and 0 non-coding features, 2514 new features were called, of which 259 are non-coding.
Output genome has the following feature types:
	Coding gene                     2255 
	Non-coding crispr_array            1 
	Non-coding crispr_repeat          68 
	Non-coding crispr_spacer          67 
	Non-coding repeat                 88 
	Non-coding rna                    35 
Overall, the genes have 0 distinct functions. 
The genes include 0 genes with a SEED annotation ontology across 0 distinct SEED functions.
The number of distinct functions can exceed the number of genes because some genes have multiple functions.
bin.007.fasta_assembly succeeded!

The RAST algorithm was applied to annotating a genome sequence comprised of 142 contigs containing 1554166 nucleotides. 
No initial gene calls were provided.
Standard features were called using: glimmer3; prodigal.
A scan was conducted for the following additional feature types: rRNA; tRNA; selenoproteins; pyrrolysoproteins; repeat regions; crispr.
The genome features were functionally annotated using the following algorithm(s): Kmers V2; Kmers V1; protein similarity.
In addition to the remaining original 0 coding features and 0 non-coding features, 1764 new features were called, of which 53 are non-coding.
Output genome has the following feature types:
	Coding gene                     1711 
	Non-coding repeat                 23 
	Non-coding rna                    30 
Overall, the genes have 0 distinct functions. 
The genes include 0 genes with a SEED annotation ontology across 0 distinct SEED functions.
The number of distinct functions can exceed the number of genes because some genes have multiple functions.
bin.046.fasta_assembly succeeded!

The RAST algorithm was applied to annotating a genome sequence comprised of 132 contigs containing 3179658 nucleotides. 
No initial gene calls were provided.
Standard features were called using: glimmer3; prodigal.
A scan was conducted for the following additional feature types: rRNA; tRNA; selenoproteins; pyrrolysoproteins; repeat regions; crispr.
The genome features were functionally annotated using the following algorithm(s): Kmers V2; Kmers V1; protein similarity.
In addition to the remaining original 0 coding features and 0 non-coding features, 2839 new features were called, of which 57 are non-coding.
Output genome has the following feature types:
	Coding gene                     2782 
	Non-coding repeat                 19 
	Non-coding rna                    38 
Overall, the genes have 0 distinct functions. 
The genes include 0 genes with a SEED annotation ontology across 0 distinct SEED functions.
The number of distinct functions can exceed the number of genes because some genes have multiple functions.
bin.037.fasta_assembly succeeded!

The RAST algorithm was applied to annotating a genome sequence comprised of 432 contigs containing 4637941 nucleotides. 
No initial gene calls were provided.
Standard features were called using: glimmer3; prodigal.
A scan was conducted for the following additional feature types: rRNA; tRNA; selenoproteins; pyrrolysoproteins; repeat regions; crispr.
The genome features were functionally annotated using the following algorithm(s): Kmers V2; Kmers V1; protein similarity.
In addition to the remaining original 0 coding features and 0 non-coding features, 5183 new features were called, of which 131 are non-coding.
Output genome has the following feature types:
	Coding gene                     5052 
	Non-coding repeat                 99 
	Non-coding rna                    32 
Overall, the genes have 0 distinct functions. 
The genes include 0 genes with a SEED annotation ontology across 0 distinct SEED functions.
The number of distinct functions can exceed the number of genes because some genes have multiple functions.
bin.015.fasta_assembly succeeded!

The RAST algorithm was applied to annotating a genome sequence comprised of 272 contigs containing 4495861 nucleotides. 
No initial gene calls were provided.
Standard features were called using: glimmer3; prodigal.
A scan was conducted for the following additional feature types: rRNA; tRNA; selenoproteins; pyrrolysoproteins; repeat regions; crispr.
The genome features were functionally annotated using the following algorithm(s): Kmers V2; Kmers V1; protein similarity.
In addition to the remaining original 0 coding features and 0 non-coding features, 4958 new features were called, of which 277 are non-coding.
Output genome has the following feature types:
	Coding gene                     4681 
	Non-coding crispr_array            2 
	Non-coding crispr_repeat          83 
	Non-coding crispr_spacer          81 
	Non-coding repeat                 73 
	Non-coding rna                    38 
Overall, the genes have 0 distinct functions. 
The genes include 0 genes with a SEED annotation ontology across 0 distinct SEED functions.
The number of distinct functions can exceed the number of genes because some genes have multiple functions.
bin.019.fasta_assembly succeeded!

The RAST algorithm was applied to annotating a genome sequence comprised of 256 contigs containing 3073907 nucleotides. 
No initial gene calls were provided.
Standard features were called using: glimmer3; prodigal.
A scan was conducted for the following additional feature types: rRNA; tRNA; selenoproteins; pyrrolysoproteins; repeat regions; crispr.
The genome features were functionally annotated using the following algorithm(s): Kmers V2; Kmers V1; protein similarity.
In addition to the remaining original 0 coding features and 0 non-coding features, 3518 new features were called, of which 79 are non-coding.
Output genome has the following feature types:
	Coding gene                     3439 
	Non-coding crispr_array            2 
	Non-coding crispr_repeat          25 
	Non-coding crispr_spacer          23 
	Non-coding repeat                 11 
	Non-coding rna                    18 
Overall, the genes have 0 distinct functions. 
The genes include 0 genes with a SEED annotation ontology across 0 distinct SEED functions.
The number of distinct functions can exceed the number of genes because some genes have multiple functions.
bin.022.fasta_assembly succeeded!

The RAST algorithm was applied to annotating a genome sequence comprised of 572 contigs containing 1151330 nucleotides. 
No initial gene calls were provided.
Standard features were called using: glimmer3; prodigal.
A scan was conducted for the following additional feature types: rRNA; tRNA; selenoproteins; pyrrolysoproteins; repeat regions; crispr.
The genome features were functionally annotated using the following algorithm(s): Kmers V2; Kmers V1; protein similarity.
In addition to the remaining original 0 coding features and 0 non-coding features, 1958 new features were called, of which 10 are non-coding.
Output genome has the following feature types:
	Coding gene                     1948 
	Non-coding repeat                  2 
	Non-coding rna                     8 
Overall, the genes have 0 distinct functions. 
The genes include 0 genes with a SEED annotation ontology across 0 distinct SEED functions.
The number of distinct functions can exceed the number of genes because some genes have multiple functions.
bin.047.fasta_assembly succeeded!

The RAST algorithm was applied to annotating a genome sequence comprised of 47 contigs containing 1845997 nucleotides. 
No initial gene calls were provided.
Standard features were called using: glimmer3; prodigal.
A scan was conducted for the following additional feature types: rRNA; tRNA; selenoproteins; pyrrolysoproteins; repeat regions; crispr.
The genome features were functionally annotated using the following algorithm(s): Kmers V2; Kmers V1; protein similarity.
In addition to the remaining original 0 coding features and 0 non-coding features, 1651 new features were called, of which 19 are non-coding.
Output genome has the following feature types:
	Coding gene                     1632 
	Non-coding repeat                  2 
	Non-coding rna                    17 
Overall, the genes have 0 distinct functions. 
The genes include 0 genes with a SEED annotation ontology across 0 distinct SEED functions.
The number of distinct functions can exceed the number of genes because some genes have multiple functions.
bin.042.fasta_assembly succeeded!

The RAST algorithm was applied to annotating a genome sequence comprised of 254 contigs containing 3482378 nucleotides. 
No initial gene calls were provided.
Standard features were called using: glimmer3; prodigal.
A scan was conducted for the following additional feature types: rRNA; tRNA; selenoproteins; pyrrolysoproteins; repeat regions; crispr.
The genome features were functionally annotated using the following algorithm(s): Kmers V2; Kmers V1; protein similarity.
In addition to the remaining original 0 coding features and 0 non-coding features, 3810 new features were called, of which 204 are non-coding.
Output genome has the following feature types:
	Coding gene                     3606 
	Non-coding crispr_array            5 
	Non-coding crispr_repeat          54 
	Non-coding crispr_spacer          49 
	Non-coding repeat                 66 
	Non-coding rna                    30 
Overall, the genes have 0 distinct functions. 
The genes include 0 genes with a SEED annotation ontology across 0 distinct SEED functions.
The number of distinct functions can exceed the number of genes because some genes have multiple functions.
bin.024.fasta_assembly succeeded!

The RAST algorithm was applied to annotating a genome sequence comprised of 33 contigs containing 1786607 nucleotides. 
No initial gene calls were provided.
Standard features were called using: glimmer3; prodigal.
A scan was conducted for the following additional feature types: rRNA; tRNA; selenoproteins; pyrrolysoproteins; repeat regions; crispr.
The genome features were functionally annotated using the following algorithm(s): Kmers V2; Kmers V1; protein similarity.
In addition to the remaining original 0 coding features and 0 non-coding features, 1830 new features were called, of which 128 are non-coding.
Output genome has the following feature types:
	Coding gene                     1702 
	Non-coding crispr_array            2 
	Non-coding crispr_repeat          23 
	Non-coding crispr_spacer          21 
	Non-coding repeat                 43 
	Non-coding rna                    39 
Overall, the genes have 0 distinct functions. 
The genes include 0 genes with a SEED annotation ontology across 0 distinct SEED functions.
The number of distinct functions can exceed the number of genes because some genes have multiple functions.
bin.005.fasta_assembly succeeded!

The RAST algorithm was applied to annotating a genome sequence comprised of 52 contigs containing 1861431 nucleotides. 
No initial gene calls were provided.
Standard features were called using: glimmer3; prodigal.
A scan was conducted for the following additional feature types: rRNA; tRNA; selenoproteins; pyrrolysoproteins; repeat regions; crispr.
The genome features were functionally annotated using the following algorithm(s): Kmers V2; Kmers V1; protein similarity.
In addition to the remaining original 0 coding features and 0 non-coding features, 1946 new features were called, of which 103 are non-coding.
Output genome has the following feature types:
	Coding gene                     1843 
	Non-coding crispr_array            1 
	Non-coding crispr_repeat          13 
	Non-coding crispr_spacer          12 
	Non-coding repeat                 44 
	Non-coding rna                    33 
Overall, the genes have 0 distinct functions. 
The genes include 0 genes with a SEED annotation ontology across 0 distinct SEED functions.
The number of distinct functions can exceed the number of genes because some genes have multiple functions.
bin.025.fasta_assembly succeeded!

The RAST algorithm was applied to annotating a genome sequence comprised of 113 contigs containing 2213058 nucleotides. 
No initial gene calls were provided.
Standard features were called using: glimmer3; prodigal.
A scan was conducted for the following additional feature types: rRNA; tRNA; selenoproteins; pyrrolysoproteins; repeat regions; crispr.
The genome features were functionally annotated using the following algorithm(s): Kmers V2; Kmers V1; protein similarity.
In addition to the remaining original 0 coding features and 0 non-coding features, 2430 new features were called, of which 130 are non-coding.
Output genome has the following feature types:
	Coding gene                     2300 
	Non-coding crispr_array            2 
	Non-coding crispr_repeat          24 
	Non-coding crispr_spacer          22 
	Non-coding repeat                 45 
	Non-coding rna                    37 
Overall, the genes have 0 distinct functions. 
The genes include 0 genes with a SEED annotation ontology across 0 distinct SEED functions.
The number of distinct functions can exceed the number of genes because some genes have multiple functions.
bin.006.fasta_assembly succeeded!

The RAST algorithm was applied to annotating a genome sequence comprised of 41 contigs containing 2060480 nucleotides. 
No initial gene calls were provided.
Standard features were called using: glimmer3; prodigal.
A scan was conducted for the following additional feature types: rRNA; tRNA; selenoproteins; pyrrolysoproteins; repeat regions; crispr.
The genome features were functionally annotated using the following algorithm(s): Kmers V2; Kmers V1; protein similarity.
In addition to the remaining original 0 coding features and 0 non-coding features, 1980 new features were called, of which 59 are non-coding.
Output genome has the following feature types:
	Coding gene                     1921 
	Non-coding repeat                 12 
	Non-coding rna                    47 
Overall, the genes have 0 distinct functions. 
The genes include 0 genes with a SEED annotation ontology across 0 distinct SEED functions.
The number of distinct functions can exceed the number of genes because some genes have multiple functions.
bin.028.fasta_assembly succeeded!

The RAST algorithm was applied to annotating a genome sequence comprised of 81 contigs containing 2706857 nucleotides. 
No initial gene calls were provided.
Standard features were called using: glimmer3; prodigal.
A scan was conducted for the following additional feature types: rRNA; tRNA; selenoproteins; pyrrolysoproteins; repeat regions; crispr.
The genome features were functionally annotated using the following algorithm(s): Kmers V2; Kmers V1; protein similarity.
In addition to the remaining original 0 coding features and 0 non-coding features, 2367 new features were called, of which 65 are non-coding.
Output genome has the following feature types:
	Coding gene                     2302 
	Non-coding repeat                 19 
	Non-coding rna                    46 
Overall, the genes have 0 distinct functions. 
The genes include 0 genes with a SEED annotation ontology across 0 distinct SEED functions.
The number of distinct functions can exceed the number of genes because some genes have multiple functions.
bin.017.fasta_assembly succeeded!

The RAST algorithm was applied to annotating a genome sequence comprised of 481 contigs containing 1747970 nucleotides. 
No initial gene calls were provided.
Standard features were called using: glimmer3; prodigal.
A scan was conducted for the following additional feature types: rRNA; tRNA; selenoproteins; pyrrolysoproteins; repeat regions; crispr.
The genome features were functionally annotated using the following algorithm(s): Kmers V2; Kmers V1; protein similarity.
In addition to the remaining original 0 coding features and 0 non-coding features, 2528 new features were called, of which 58 are non-coding.
Output genome has the following feature types:
	Coding gene                     2470 
	Non-coding crispr_array            1 
	Non-coding crispr_repeat          16 
	Non-coding crispr_spacer          15 
	Non-coding repeat                  2 
	Non-coding rna                    24 
Overall, the genes have 0 distinct functions. 
The genes include 0 genes with a SEED annotation ontology across 0 distinct SEED functions.
The number of distinct functions can exceed the number of genes because some genes have multiple functions.
bin.032.fasta_assembly succeeded!

The RAST algorithm was applied to annotating a genome sequence comprised of 64 contigs containing 1976154 nucleotides. 
No initial gene calls were provided.
Standard features were called using: glimmer3; prodigal.
A scan was conducted for the following additional feature types: rRNA; tRNA; selenoproteins; pyrrolysoproteins; repeat regions; crispr.
The genome features were functionally annotated using the following algorithm(s): Kmers V2; Kmers V1; protein similarity.
In addition to the remaining original 0 coding features and 0 non-coding features, 1744 new features were called, of which 40 are non-coding.
Output genome has the following feature types:
	Coding gene                     1704 
	Non-coding repeat                 11 
	Non-coding rna                    29 
Overall, the genes have 0 distinct functions. 
The genes include 0 genes with a SEED annotation ontology across 0 distinct SEED functions.
The number of distinct functions can exceed the number of genes because some genes have multiple functions.
bin.044.fasta_assembly succeeded!

The RAST algorithm was applied to annotating a genome sequence comprised of 163 contigs containing 5219781 nucleotides. 
No initial gene calls were provided.
Standard features were called using: glimmer3; prodigal.
A scan was conducted for the following additional feature types: rRNA; tRNA; selenoproteins; pyrrolysoproteins; repeat regions; crispr.
The genome features were functionally annotated using the following algorithm(s): Kmers V2; Kmers V1; protein similarity.
In addition to the remaining original 0 coding features and 0 non-coding features, 5035 new features were called, of which 154 are non-coding.
Output genome has the following feature types:
	Coding gene                     4881 
	Non-coding crispr_array            1 
	Non-coding crispr_repeat          10 
	Non-coding crispr_spacer           9 
	Non-coding repeat                 67 
	Non-coding rna                    67 
Overall, the genes have 0 distinct functions. 
The genes include 0 genes with a SEED annotation ontology across 0 distinct SEED functions.
The number of distinct functions can exceed the number of genes because some genes have multiple functions.
bin.030.fasta_assembly succeeded!

The RAST algorithm was applied to annotating a genome sequence comprised of 74 contigs containing 3247437 nucleotides. 
No initial gene calls were provided.
Standard features were called using: glimmer3; prodigal.
A scan was conducted for the following additional feature types: rRNA; tRNA; selenoproteins; pyrrolysoproteins; repeat regions; crispr.
The genome features were functionally annotated using the following algorithm(s): Kmers V2; Kmers V1; protein similarity.
In addition to the remaining original 0 coding features and 0 non-coding features, 3820 new features were called, of which 66 are non-coding.
Output genome has the following feature types:
	Coding gene                     3754 
	Non-coding repeat                 29 
	Non-coding rna                    37 
Overall, the genes have 0 distinct functions. 
The genes include 0 genes with a SEED annotation ontology across 0 distinct SEED functions.
The number of distinct functions can exceed the number of genes because some genes have multiple functions.
bin.010.fasta_assembly succeeded!

The RAST algorithm was applied to annotating a genome sequence comprised of 91 contigs containing 1828498 nucleotides. 
No initial gene calls were provided.
Standard features were called using: glimmer3; prodigal.
A scan was conducted for the following additional feature types: rRNA; tRNA; selenoproteins; pyrrolysoproteins; repeat regions; crispr.
The genome features were functionally annotated using the following algorithm(s): Kmers V2; Kmers V1; protein similarity.
In addition to the remaining original 0 coding features and 0 non-coding features, 2121 new features were called, of which 168 are non-coding.
Output genome has the following feature types:
	Coding gene                     1953 
	Non-coding crispr_array            2 
	Non-coding crispr_repeat          32 
	Non-coding crispr_spacer          30 
	Non-coding repeat                 60 
	Non-coding rna                    44 
Overall, the genes have 0 distinct functions. 
The genes include 0 genes with a SEED annotation ontology across 0 distinct SEED functions.
The number of distinct functions can exceed the number of genes because some genes have multiple functions.
bin.012.fasta_assembly succeeded!

The RAST algorithm was applied to annotating a genome sequence comprised of 59 contigs containing 1955743 nucleotides. 
No initial gene calls were provided.
Standard features were called using: glimmer3; prodigal.
A scan was conducted for the following additional feature types: rRNA; tRNA; selenoproteins; pyrrolysoproteins; repeat regions; crispr.
The genome features were functionally annotated using the following algorithm(s): Kmers V2; Kmers V1; protein similarity.
In addition to the remaining original 0 coding features and 0 non-coding features, 1746 new features were called, of which 80 are non-coding.
Output genome has the following feature types:
	Coding gene                     1666 
	Non-coding crispr_array            1 
	Non-coding crispr_repeat           9 
	Non-coding crispr_spacer           8 
	Non-coding repeat                 40 
	Non-coding rna                    22 
Overall, the genes have 0 distinct functions. 
The genes include 0 genes with a SEED annotation ontology across 0 distinct SEED functions.
The number of distinct functions can exceed the number of genes because some genes have multiple functions.
bin.021.fasta_assembly succeeded!

The RAST algorithm was applied to annotating a genome sequence comprised of 72 contigs containing 1524270 nucleotides. 
No initial gene calls were provided.
Standard features were called using: glimmer3; prodigal.
A scan was conducted for the following additional feature types: rRNA; tRNA; selenoproteins; pyrrolysoproteins; repeat regions; crispr.
The genome features were functionally annotated using the following algorithm(s): Kmers V2; Kmers V1; protein similarity.
In addition to the remaining original 0 coding features and 0 non-coding features, 1398 new features were called, of which 43 are non-coding.
Output genome has the following feature types:
	Coding gene                     1355 
	Non-coding repeat                 21 
	Non-coding rna                    22 
Overall, the genes have 0 distinct functions. 
The genes include 0 genes with a SEED annotation ontology across 0 distinct SEED functions.
The number of distinct functions can exceed the number of genes because some genes have multiple functions.
bin.023.fasta_assembly succeeded!

The RAST algorithm was applied to annotating a genome sequence comprised of 751 contigs containing 5956186 nucleotides. 
No initial gene calls were provided.
Standard features were called using: glimmer3; prodigal.
A scan was conducted for the following additional feature types: rRNA; tRNA; selenoproteins; pyrrolysoproteins; repeat regions; crispr.
The genome features were functionally annotated using the following algorithm(s): Kmers V2; Kmers V1; protein similarity.
In addition to the remaining original 0 coding features and 0 non-coding features, 7107 new features were called, of which 464 are non-coding.
Output genome has the following feature types:
	Coding gene                     6643 
	Non-coding crispr_array            2 
	Non-coding crispr_repeat          25 
	Non-coding crispr_spacer          23 
	Non-coding repeat                379 
	Non-coding rna                    35 
Overall, the genes have 0 distinct functions. 
The genes include 0 genes with a SEED annotation ontology across 0 distinct SEED functions.
The number of distinct functions can exceed the number of genes because some genes have multiple functions.
bin.008.fasta_assembly succeeded!

The RAST algorithm was applied to annotating a genome sequence comprised of 245 contigs containing 1103659 nucleotides. 
No initial gene calls were provided.
Standard features were called using: glimmer3; prodigal.
A scan was conducted for the following additional feature types: rRNA; tRNA; selenoproteins; pyrrolysoproteins; repeat regions; crispr.
The genome features were functionally annotated using the following algorithm(s): Kmers V2; Kmers V1; protein similarity.
In addition to the remaining original 0 coding features and 0 non-coding features, 1273 new features were called, of which 28 are non-coding.
Output genome has the following feature types:
	Coding gene                     1245 
	Non-coding repeat                 17 
	Non-coding rna                    11 
Overall, the genes have 0 distinct functions. 
The genes include 0 genes with a SEED annotation ontology across 0 distinct SEED functions.
The number of distinct functions can exceed the number of genes because some genes have multiple functions.
bin.001.fasta_assembly succeeded!

The RAST algorithm was applied to annotating a genome sequence comprised of 21 contigs containing 1606372 nucleotides. 
No initial gene calls were provided.
Standard features were called using: glimmer3; prodigal.
A scan was conducted for the following additional feature types: rRNA; tRNA; selenoproteins; pyrrolysoproteins; repeat regions; crispr.
The genome features were functionally annotated using the following algorithm(s): Kmers V2; Kmers V1; protein similarity.
In addition to the remaining original 0 coding features and 0 non-coding features, 1734 new features were called, of which 53 are non-coding.
Output genome has the following feature types:
	Coding gene                     1681 
	Non-coding repeat                 23 
	Non-coding rna                    30 
Overall, the genes have 0 distinct functions. 
The genes include 0 genes with a SEED annotation ontology across 0 distinct SEED functions.
The number of distinct functions can exceed the number of genes because some genes have multiple functions.
bin.050.fasta_assembly succeeded!

The RAST algorithm was applied to annotating a genome sequence comprised of 56 contigs containing 1915398 nucleotides. 
No initial gene calls were provided.
Standard features were called using: glimmer3; prodigal.
A scan was conducted for the following additional feature types: rRNA; tRNA; selenoproteins; pyrrolysoproteins; repeat regions; crispr.
The genome features were functionally annotated using the following algorithm(s): Kmers V2; Kmers V1; protein similarity.
In addition to the remaining original 0 coding features and 0 non-coding features, 1850 new features were called, of which 61 are non-coding.
Output genome has the following feature types:
	Coding gene                     1789 
	Non-coding repeat                 23 
	Non-coding rna                    38 
Overall, the genes have 0 distinct functions. 
The genes include 0 genes with a SEED annotation ontology across 0 distinct SEED functions.
The number of distinct functions can exceed the number of genes because some genes have multiple functions.
bin.035.fasta_assembly succeeded!

The RAST algorithm was applied to annotating a genome sequence comprised of 62 contigs containing 2144203 nucleotides. 
No initial gene calls were provided.
Standard features were called using: glimmer3; prodigal.
A scan was conducted for the following additional feature types: rRNA; tRNA; selenoproteins; pyrrolysoproteins; repeat regions; crispr.
The genome features were functionally annotated using the following algorithm(s): Kmers V2; Kmers V1; protein similarity.
In addition to the remaining original 0 coding features and 0 non-coding features, 2286 new features were called, of which 69 are non-coding.
Output genome has the following feature types:
	Coding gene                     2217 
	Non-coding repeat                 22 
	Non-coding rna                    47 
Overall, the genes have 0 distinct functions. 
The genes include 0 genes with a SEED annotation ontology across 0 distinct SEED functions.
The number of distinct functions can exceed the number of genes because some genes have multiple functions.
bin.038.fasta_assembly succeeded!

The RAST algorithm was applied to annotating a genome sequence comprised of 390 contigs containing 2309802 nucleotides. 
No initial gene calls were provided.
Standard features were called using: glimmer3; prodigal.
A scan was conducted for the following additional feature types: rRNA; tRNA; selenoproteins; pyrrolysoproteins; repeat regions; crispr.
The genome features were functionally annotated using the following algorithm(s): Kmers V2; Kmers V1; protein similarity.
In addition to the remaining original 0 coding features and 0 non-coding features, 2678 new features were called, of which 80 are non-coding.
Output genome has the following feature types:
	Coding gene                     2598 
	Non-coding repeat                 44 
	Non-coding rna                    36 
Overall, the genes have 0 distinct functions. 
The genes include 0 genes with a SEED annotation ontology across 0 distinct SEED functions.
The number of distinct functions can exceed the number of genes because some genes have multiple functions.
bin.011.fasta_assembly succeeded!

The RAST algorithm was applied to annotating a genome sequence comprised of 251 contigs containing 4383635 nucleotides. 
No initial gene calls were provided.
Standard features were called using: glimmer3; prodigal.
A scan was conducted for the following additional feature types: rRNA; tRNA; selenoproteins; pyrrolysoproteins; repeat regions; crispr.
The genome features were functionally annotated using the following algorithm(s): Kmers V2; Kmers V1; protein similarity.
In addition to the remaining original 0 coding features and 0 non-coding features, 4258 new features were called, of which 49 are non-coding.
Output genome has the following feature types:
	Coding gene                     4209 
	Non-coding repeat                 14 
	Non-coding rna                    35 
Overall, the genes have 0 distinct functions. 
The genes include 0 genes with a SEED annotation ontology across 0 distinct SEED functions.
The number of distinct functions can exceed the number of genes because some genes have multiple functions.
bin.002.fasta_assembly succeeded!

The RAST algorithm was applied to annotating a genome sequence comprised of 219 contigs containing 3771889 nucleotides. 
No initial gene calls were provided.
Standard features were called using: glimmer3; prodigal.
A scan was conducted for the following additional feature types: rRNA; tRNA; selenoproteins; pyrrolysoproteins; repeat regions; crispr.
The genome features were functionally annotated using the following algorithm(s): Kmers V2; Kmers V1; protein similarity.
In addition to the remaining original 0 coding features and 0 non-coding features, 3730 new features were called, of which 76 are non-coding.
Output genome has the following feature types:
	Coding gene                     3654 
	Non-coding repeat                 29 
	Non-coding rna                    47 
Overall, the genes have 0 distinct functions. 
The genes include 0 genes with a SEED annotation ontology across 0 distinct SEED functions.
The number of distinct functions can exceed the number of genes because some genes have multiple functions.
bin.039.fasta_assembly succeeded!

The RAST algorithm was applied to annotating a genome sequence comprised of 96 contigs containing 2680722 nucleotides. 
No initial gene calls were provided.
Standard features were called using: glimmer3; prodigal.
A scan was conducted for the following additional feature types: rRNA; tRNA; selenoproteins; pyrrolysoproteins; repeat regions; crispr.
The genome features were functionally annotated using the following algorithm(s): Kmers V2; Kmers V1; protein similarity.
In addition to the remaining original 0 coding features and 0 non-coding features, 2607 new features were called, of which 141 are non-coding.
Output genome has the following feature types:
	Coding gene                     2466 
	Non-coding crispr_array            1 
	Non-coding crispr_repeat          28 
	Non-coding crispr_spacer          27 
	Non-coding repeat                 36 
	Non-coding rna                    49 
Overall, the genes have 0 distinct functions. 
The genes include 0 genes with a SEED annotation ontology across 0 distinct SEED functions.
The number of distinct functions can exceed the number of genes because some genes have multiple functions.
bin.034.fasta_assembly succeeded!

The RAST algorithm was applied to annotating a genome sequence comprised of 60 contigs containing 1836713 nucleotides. 
No initial gene calls were provided.
Standard features were called using: glimmer3; prodigal.
A scan was conducted for the following additional feature types: rRNA; tRNA; selenoproteins; pyrrolysoproteins; repeat regions; crispr.
The genome features were functionally annotated using the following algorithm(s): Kmers V2; Kmers V1; protein similarity.
In addition to the remaining original 0 coding features and 0 non-coding features, 1549 new features were called, of which 58 are non-coding.
Output genome has the following feature types:
	Coding gene                     1491 
	Non-coding crispr_array            1 
	Non-coding crispr_repeat          14 
	Non-coding crispr_spacer          13 
	Non-coding repeat                 11 
	Non-coding rna                    19 
Overall, the genes have 0 distinct functions. 
The genes include 0 genes with a SEED annotation ontology across 0 distinct SEED functions.
The number of distinct functions can exceed the number of genes because some genes have multiple functions.
bin.018.fasta_assembly succeeded!

The RAST algorithm was applied to annotating a genome sequence comprised of 630 contigs containing 4109606 nucleotides. 
No initial gene calls were provided.
Standard features were called using: glimmer3; prodigal.
A scan was conducted for the following additional feature types: rRNA; tRNA; selenoproteins; pyrrolysoproteins; repeat regions; crispr.
The genome features were functionally annotated using the following algorithm(s): Kmers V2; Kmers V1; protein similarity.
In addition to the remaining original 0 coding features and 0 non-coding features, 5389 new features were called, of which 175 are non-coding.
Output genome has the following feature types:
	Coding gene                     5214 
	Non-coding crispr_array            2 
	Non-coding crispr_repeat          24 
	Non-coding crispr_spacer          22 
	Non-coding repeat                 79 
	Non-coding rna                    48 
Overall, the genes have 0 distinct functions. 
The genes include 0 genes with a SEED annotation ontology across 0 distinct SEED functions.
The number of distinct functions can exceed the number of genes because some genes have multiple functions.
bin.004.fasta_assembly succeeded!

The RAST algorithm was applied to annotating a genome sequence comprised of 339 contigs containing 2990041 nucleotides. 
No initial gene calls were provided.
Standard features were called using: glimmer3; prodigal.
A scan was conducted for the following additional feature types: rRNA; tRNA; selenoproteins; pyrrolysoproteins; repeat regions; crispr.
The genome features were functionally annotated using the following algorithm(s): Kmers V2; Kmers V1; protein similarity.
In addition to the remaining original 0 coding features and 0 non-coding features, 3355 new features were called, of which 48 are non-coding.
Output genome has the following feature types:
	Coding gene                     3307 
	Non-coding repeat                 32 
	Non-coding rna                    16 
Overall, the genes have 0 distinct functions. 
The genes include 0 genes with a SEED annotation ontology across 0 distinct SEED functions.
The number of distinct functions can exceed the number of genes because some genes have multiple functions.
bin.031.fasta_assembly succeeded!

The RAST algorithm was applied to annotating a genome sequence comprised of 102 contigs containing 2078219 nucleotides. 
No initial gene calls were provided.
Standard features were called using: glimmer3; prodigal.
A scan was conducted for the following additional feature types: rRNA; tRNA; selenoproteins; pyrrolysoproteins; repeat regions; crispr.
The genome features were functionally annotated using the following algorithm(s): Kmers V2; Kmers V1; protein similarity.
In addition to the remaining original 0 coding features and 0 non-coding features, 1935 new features were called, of which 63 are non-coding.
Output genome has the following feature types:
	Coding gene                     1872 
	Non-coding repeat                 41 
	Non-coding rna                    22 
Overall, the genes have 0 distinct functions. 
The genes include 0 genes with a SEED annotation ontology across 0 distinct SEED functions.
The number of distinct functions can exceed the number of genes because some genes have multiple functions.
bin.040.fasta_assembly succeeded!

The RAST algorithm was applied to annotating a genome sequence comprised of 214 contigs containing 4222813 nucleotides. 
No initial gene calls were provided.
Standard features were called using: glimmer3; prodigal.
A scan was conducted for the following additional feature types: rRNA; tRNA; selenoproteins; pyrrolysoproteins; repeat regions; crispr.
The genome features were functionally annotated using the following algorithm(s): Kmers V2; Kmers V1; protein similarity.
In addition to the remaining original 0 coding features and 0 non-coding features, 4868 new features were called, of which 633 are non-coding.
Output genome has the following feature types:
	Coding gene                     4235 
	Non-coding crispr_array            6 
	Non-coding crispr_repeat          85 
	Non-coding crispr_spacer          79 
	Non-coding repeat                433 
	Non-coding rna                    30 
Overall, the genes have 0 distinct functions. 
The genes include 0 genes with a SEED annotation ontology across 0 distinct SEED functions.
The number of distinct functions can exceed the number of genes because some genes have multiple functions.
bin.048.fasta_assembly succeeded!

The RAST algorithm was applied to annotating a genome sequence comprised of 158 contigs containing 2068718 nucleotides. 
No initial gene calls were provided.
Standard features were called using: glimmer3; prodigal.
A scan was conducted for the following additional feature types: rRNA; tRNA; selenoproteins; pyrrolysoproteins; repeat regions; crispr.
The genome features were functionally annotated using the following algorithm(s): Kmers V2; Kmers V1; protein similarity.
In addition to the remaining original 0 coding features and 0 non-coding features, 2030 new features were called, of which 171 are non-coding.
Output genome has the following feature types:
	Coding gene                     1859 
	Non-coding repeat                132 
	Non-coding rna                    39 
Overall, the genes have 0 distinct functions. 
The genes include 0 genes with a SEED annotation ontology across 0 distinct SEED functions.
The number of distinct functions can exceed the number of genes because some genes have multiple functions.
bin.009.fasta_assembly succeeded!

The RAST algorithm was applied to annotating a genome sequence comprised of 492 contigs containing 2731838 nucleotides. 
No initial gene calls were provided.
Standard features were called using: glimmer3; prodigal.
A scan was conducted for the following additional feature types: rRNA; tRNA; selenoproteins; pyrrolysoproteins; repeat regions; crispr.
The genome features were functionally annotated using the following algorithm(s): Kmers V2; Kmers V1; protein similarity.
In addition to the remaining original 0 coding features and 0 non-coding features, 3044 new features were called, of which 57 are non-coding.
Output genome has the following feature types:
	Coding gene                     2987 
	Non-coding repeat                 14 
	Non-coding rna                    43 
Overall, the genes have 0 distinct functions. 
The genes include 0 genes with a SEED annotation ontology across 0 distinct SEED functions.
The number of distinct functions can exceed the number of genes because some genes have multiple functions.
bin.029.fasta_assembly succeeded!

The RAST algorithm was applied to annotating a genome sequence comprised of 224 contigs containing 3420041 nucleotides. 
No initial gene calls were provided.
Standard features were called using: glimmer3; prodigal.
A scan was conducted for the following additional feature types: rRNA; tRNA; selenoproteins; pyrrolysoproteins; repeat regions; crispr.
The genome features were functionally annotated using the following algorithm(s): Kmers V2; Kmers V1; protein similarity.
In addition to the remaining original 0 coding features and 0 non-coding features, 3630 new features were called, of which 91 are non-coding.
Output genome has the following feature types:
	Coding gene                     3539 
	Non-coding repeat                 45 
	Non-coding rna                    46 
Overall, the genes have 0 distinct functions. 
The genes include 0 genes with a SEED annotation ontology across 0 distinct SEED functions.
The number of distinct functions can exceed the number of genes because some genes have multiple functions.
bin.033.fasta_assembly succeeded!

The RAST algorithm was applied to annotating a genome sequence comprised of 98 contigs containing 2588322 nucleotides. 
No initial gene calls were provided.
Standard features were called using: glimmer3; prodigal.
A scan was conducted for the following additional feature types: rRNA; tRNA; selenoproteins; pyrrolysoproteins; repeat regions; crispr.
The genome features were functionally annotated using the following algorithm(s): Kmers V2; Kmers V1; protein similarity.
In addition to the remaining original 0 coding features and 0 non-coding features, 2285 new features were called, of which 49 are non-coding.
Output genome has the following feature types:
	Coding gene                     2236 
	Non-coding repeat                 12 
	Non-coding rna                    37 
Overall, the genes have 0 distinct functions. 
The genes include 0 genes with a SEED annotation ontology across 0 distinct SEED functions.
The number of distinct functions can exceed the number of genes because some genes have multiple functions.
bin.043.fasta_assembly succeeded!

The RAST algorithm was applied to annotating a genome sequence comprised of 120 contigs containing 3023207 nucleotides. 
No initial gene calls were provided.
Standard features were called using: glimmer3; prodigal.
A scan was conducted for the following additional feature types: rRNA; tRNA; selenoproteins; pyrrolysoproteins; repeat regions; crispr.
The genome features were functionally annotated using the following algorithm(s): Kmers V2; Kmers V1; protein similarity.
In addition to the remaining original 0 coding features and 0 non-coding features, 2494 new features were called, of which 70 are non-coding.
Output genome has the following feature types:
	Coding gene                     2424 
	Non-coding repeat                 50 
	Non-coding rna                    20 
Overall, the genes have 0 distinct functions. 
The genes include 0 genes with a SEED annotation ontology across 0 distinct SEED functions.
The number of distinct functions can exceed the number of genes because some genes have multiple functions.
bin.003.fasta_assembly succeeded!

The RAST algorithm was applied to annotating a genome sequence comprised of 610 contigs containing 4402949 nucleotides. 
No initial gene calls were provided.
Standard features were called using: glimmer3; prodigal.
A scan was conducted for the following additional feature types: rRNA; tRNA; selenoproteins; pyrrolysoproteins; repeat regions; crispr.
The genome features were functionally annotated using the following algorithm(s): Kmers V2; Kmers V1; protein similarity.
In addition to the remaining original 0 coding features and 0 non-coding features, 4845 new features were called, of which 85 are non-coding.
Output genome has the following feature types:
	Coding gene                     4760 
	Non-coding repeat                 59 
	Non-coding rna                    26 
Overall, the genes have 0 distinct functions. 
The genes include 0 genes with a SEED annotation ontology across 0 distinct SEED functions.
The number of distinct functions can exceed the number of genes because some genes have multiple functions.
bin.026.fasta_assembly succeeded!

The RAST algorithm was applied to annotating a genome sequence comprised of 194 contigs containing 3585436 nucleotides. 
No initial gene calls were provided.
Standard features were called using: glimmer3; prodigal.
A scan was conducted for the following additional feature types: rRNA; tRNA; selenoproteins; pyrrolysoproteins; repeat regions; crispr.
The genome features were functionally annotated using the following algorithm(s): Kmers V2; Kmers V1; protein similarity.
In addition to the remaining original 0 coding features and 0 non-coding features, 3806 new features were called, of which 107 are non-coding.
Output genome has the following feature types:
	Coding gene                     3699 
	Non-coding crispr_array            1 
	Non-coding crispr_repeat          10 
	Non-coding crispr_spacer           9 
	Non-coding repeat                 56 
	Non-coding rna                    31 
Overall, the genes have 0 distinct functions. 
The genes include 0 genes with a SEED annotation ontology across 0 distinct SEED functions.
The number of distinct functions can exceed the number of genes because some genes have multiple functions.
bin.014.fasta_assembly succeeded!

The RAST algorithm was applied to annotating a genome sequence comprised of 96 contigs containing 2902469 nucleotides. 
No initial gene calls were provided.
Standard features were called using: glimmer3; prodigal.
A scan was conducted for the following additional feature types: rRNA; tRNA; selenoproteins; pyrrolysoproteins; repeat regions; crispr.
The genome features were functionally annotated using the following algorithm(s): Kmers V2; Kmers V1; protein similarity.
In addition to the remaining original 0 coding features and 0 non-coding features, 3005 new features were called, of which 52 are non-coding.
Output genome has the following feature types:
	Coding gene                     2953 
	Non-coding crispr_array            1 
	Non-coding crispr_repeat           6 
	Non-coding crispr_spacer           5 
	Non-coding repeat                 24 
	Non-coding rna                    16 
Overall, the genes have 0 distinct functions. 
The genes include 0 genes with a SEED annotation ontology across 0 distinct SEED functions.
The number of distinct functions can exceed the number of genes because some genes have multiple functions.
bin.013.fasta_assembly succeeded!

The RAST algorithm was applied to annotating a genome sequence comprised of 29 contigs containing 1581972 nucleotides. 
No initial gene calls were provided.
Standard features were called using: glimmer3; prodigal.
A scan was conducted for the following additional feature types: rRNA; tRNA; selenoproteins; pyrrolysoproteins; repeat regions; crispr.
The genome features were functionally annotated using the following algorithm(s): Kmers V2; Kmers V1; protein similarity.
In addition to the remaining original 0 coding features and 0 non-coding features, 1313 new features were called, of which 29 are non-coding.
Output genome has the following feature types:
	Coding gene                     1284 
	Non-coding repeat                  6 
	Non-coding rna                    23 
Overall, the genes have 0 distinct functions. 
The genes include 0 genes with a SEED annotation ontology across 0 distinct SEED functions.
The number of distinct functions can exceed the number of genes because some genes have multiple functions.
bin.020.fasta_assembly succeeded!

The RAST algorithm was applied to annotating a genome sequence comprised of 20 contigs containing 2146755 nucleotides. 
No initial gene calls were provided.
Standard features were called using: glimmer3; prodigal.
A scan was conducted for the following additional feature types: rRNA; tRNA; selenoproteins; pyrrolysoproteins; repeat regions; crispr.
The genome features were functionally annotated using the following algorithm(s): Kmers V2; Kmers V1; protein similarity.
In addition to the remaining original 0 coding features and 0 non-coding features, 2076 new features were called, of which 106 are non-coding.
Output genome has the following feature types:
	Coding gene                     1970 
	Non-coding crispr_array            1 
	Non-coding crispr_repeat          24 
	Non-coding crispr_spacer          23 
	Non-coding repeat                 15 
	Non-coding rna                    43 
Overall, the genes have 0 distinct functions. 
The genes include 0 genes with a SEED annotation ontology across 0 distinct SEED functions.
The number of distinct functions can exceed the number of genes because some genes have multiple functions.
bin.049.fasta_assembly succeeded!

The RAST algorithm was applied to annotating a genome sequence comprised of 580 contigs containing 5555699 nucleotides. 
No initial gene calls were provided.
Standard features were called using: glimmer3; prodigal.
A scan was conducted for the following additional feature types: rRNA; tRNA; selenoproteins; pyrrolysoproteins; repeat regions; crispr.
The genome features were functionally annotated using the following algorithm(s): Kmers V2; Kmers V1; protein similarity.
In addition to the remaining original 0 coding features and 0 non-coding features, 6363 new features were called, of which 377 are non-coding.
Output genome has the following feature types:
	Coding gene                     5986 
	Non-coding crispr_array            1 
	Non-coding crispr_repeat          24 
	Non-coding crispr_spacer          23 
	Non-coding repeat                295 
	Non-coding rna                    34 
Overall, the genes have 0 distinct functions. 
The genes include 0 genes with a SEED annotation ontology across 0 distinct SEED functions.
The number of distinct functions can exceed the number of genes because some genes have multiple functions.
bin.041.fasta_assembly succeeded!

The RAST algorithm was applied to annotating a genome sequence comprised of 126 contigs containing 3972461 nucleotides. 
No initial gene calls were provided.
Standard features were called using: glimmer3; prodigal.
A scan was conducted for the following additional feature types: rRNA; tRNA; selenoproteins; pyrrolysoproteins; repeat regions; crispr.
The genome features were functionally annotated using the following algorithm(s): Kmers V2; Kmers V1; protein similarity.
In addition to the remaining original 0 coding features and 0 non-coding features, 4056 new features were called, of which 145 are non-coding.
Output genome has the following feature types:
	Coding gene                     3911 
	Non-coding crispr_array            2 
	Non-coding crispr_repeat          33 
	Non-coding crispr_spacer          31 
	Non-coding repeat                 41 
	Non-coding rna                    38 
Overall, the genes have 0 distinct functions. 
The genes include 0 genes with a SEED annotation ontology across 0 distinct SEED functions.
The number of distinct functions can exceed the number of genes because some genes have multiple functions.
bin.045.fasta_assembly succeeded!

The RAST algorithm was applied to annotating a genome sequence comprised of 135 contigs containing 2385651 nucleotides. 
No initial gene calls were provided.
Standard features were called using: glimmer3; prodigal.
A scan was conducted for the following additional feature types: rRNA; tRNA; selenoproteins; pyrrolysoproteins; repeat regions; crispr.
The genome features were functionally annotated using the following algorithm(s): Kmers V2; Kmers V1; protein similarity.
In addition to the remaining original 0 coding features and 0 non-coding features, 2709 new features were called, of which 67 are non-coding.
Output genome has the following feature types:
	Coding gene                     2642 
	Non-coding crispr_array            1 
	Non-coding crispr_repeat           3 
	Non-coding crispr_spacer           2 
	Non-coding repeat                 22 
	Non-coding rna                    39 
Overall, the genes have 0 distinct functions. 
The genes include 0 genes with a SEED annotation ontology across 0 distinct SEED functions.
The number of distinct functions can exceed the number of genes because some genes have multiple functions.
bin.027.fasta_assembly succeeded!

The RAST algorithm was applied to annotating a genome sequence comprised of 108 contigs containing 2917673 nucleotides. 
No initial gene calls were provided.
Standard features were called using: glimmer3; prodigal.
A scan was conducted for the following additional feature types: rRNA; tRNA; selenoproteins; pyrrolysoproteins; repeat regions; crispr.
The genome features were functionally annotated using the following algorithm(s): Kmers V2; Kmers V1; protein similarity.
In addition to the remaining original 0 coding features and 0 non-coding features, 2617 new features were called, of which 133 are non-coding.
Output genome has the following feature types:
	Coding gene                     2484 
	Non-coding repeat                101 
	Non-coding rna                    32 
Overall, the genes have 0 distinct functions. 
The genes include 0 genes with a SEED annotation ontology across 0 distinct SEED functions.
The number of distinct functions can exceed the number of genes because some genes have multiple functions.
bin.036.fasta_assembly succeeded!

Files
These are only available in the live Narrative: https://narrative.kbase.us/narrative/116829
  • annotation_report.Anotated_assembly - Microbial Annotation Report

Phylogenetic Analysis

To determine which bins are related to each other we inserted the annotated bins into a phylogenetic tree. KBase also looks at the annotations and compares bins to their own database, if a bin matches a genome that is already classified they insert that organism into the tree to show how well the bins relate to that taxonomically classified organism

Add a user-provided GenomeSet to a KBase SpeciesTree.
This app completed without errors in 14m 40s.
Links
Files
These are only available in the live Narrative: https://narrative.kbase.us/narrative/116829
  • Tree.newick
  • Tree-labels.newick
  • Tree.png
  • Tree.pdf

Creating Assemblies

To use GTDBtk app we needed to extract the bins as assemblies.

Extract a bin as an Assembly from a BinnedContig dataset
This app completed without errors in 35m 55s.
Objects
Created Object Name Type Description
extracted_bins.AssemblySet AssemblySet Assembly set of extracted assemblies
bin.038.fasta_assembly Assembly Assembly object of extracted contigs
bin.007.fasta_assembly Assembly Assembly object of extracted contigs
bin.035.fasta_assembly Assembly Assembly object of extracted contigs
bin.051.fasta_assembly Assembly Assembly object of extracted contigs
bin.050.fasta_assembly Assembly Assembly object of extracted contigs
bin.016.fasta_assembly Assembly Assembly object of extracted contigs
bin.029.fasta_assembly Assembly Assembly object of extracted contigs
bin.044.fasta_assembly Assembly Assembly object of extracted contigs
bin.014.fasta_assembly Assembly Assembly object of extracted contigs
bin.032.fasta_assembly Assembly Assembly object of extracted contigs
bin.026.fasta_assembly Assembly Assembly object of extracted contigs
bin.017.fasta_assembly Assembly Assembly object of extracted contigs
bin.003.fasta_assembly Assembly Assembly object of extracted contigs
bin.028.fasta_assembly Assembly Assembly object of extracted contigs
bin.043.fasta_assembly Assembly Assembly object of extracted contigs
bin.006.fasta_assembly Assembly Assembly object of extracted contigs
bin.033.fasta_assembly Assembly Assembly object of extracted contigs
bin.002.fasta_assembly Assembly Assembly object of extracted contigs
bin.019.fasta_assembly Assembly Assembly object of extracted contigs
bin.034.fasta_assembly Assembly Assembly object of extracted contigs
bin.047.fasta_assembly Assembly Assembly object of extracted contigs
bin.004.fasta_assembly Assembly Assembly object of extracted contigs
bin.024.fasta_assembly Assembly Assembly object of extracted contigs
bin.040.fasta_assembly Assembly Assembly object of extracted contigs
bin.005.fasta_assembly Assembly Assembly object of extracted contigs
bin.048.fasta_assembly Assembly Assembly object of extracted contigs
bin.025.fasta_assembly Assembly Assembly object of extracted contigs
bin.020.fasta_assembly Assembly Assembly object of extracted contigs
bin.012.fasta_assembly Assembly Assembly object of extracted contigs
bin.041.fasta_assembly Assembly Assembly object of extracted contigs
bin.023.fasta_assembly Assembly Assembly object of extracted contigs
bin.027.fasta_assembly Assembly Assembly object of extracted contigs
bin.008.fasta_assembly Assembly Assembly object of extracted contigs
bin.036.fasta_assembly Assembly Assembly object of extracted contigs
bin.001.fasta_assembly Assembly Assembly object of extracted contigs
bin.046.fasta_assembly Assembly Assembly object of extracted contigs
bin.011.fasta_assembly Assembly Assembly object of extracted contigs
bin.037.fasta_assembly Assembly Assembly object of extracted contigs
bin.039.fasta_assembly Assembly Assembly object of extracted contigs
bin.015.fasta_assembly Assembly Assembly object of extracted contigs
bin.018.fasta_assembly Assembly Assembly object of extracted contigs
bin.022.fasta_assembly Assembly Assembly object of extracted contigs
bin.031.fasta_assembly Assembly Assembly object of extracted contigs
bin.042.fasta_assembly Assembly Assembly object of extracted contigs
bin.009.fasta_assembly Assembly Assembly object of extracted contigs
bin.030.fasta_assembly Assembly Assembly object of extracted contigs
bin.013.fasta_assembly Assembly Assembly object of extracted contigs
bin.010.fasta_assembly Assembly Assembly object of extracted contigs
bin.049.fasta_assembly Assembly Assembly object of extracted contigs
bin.021.fasta_assembly Assembly Assembly object of extracted contigs
bin.045.fasta_assembly Assembly Assembly object of extracted contigs
Summary
Job Finished Generated Assembly Reference: 116829/13/1, 116829/14/1, 116829/15/1, 116829/16/1, 116829/17/1, 116829/18/1, 116829/19/1, 116829/20/1, 116829/21/1, 116829/22/1, 116829/23/1, 116829/24/1, 116829/25/1, 116829/26/1, 116829/27/1, 116829/28/1, 116829/29/1, 116829/30/1, 116829/31/1, 116829/32/1, 116829/33/1, 116829/34/1, 116829/35/1, 116829/36/1, 116829/37/1, 116829/38/1, 116829/39/1, 116829/40/1, 116829/41/1, 116829/42/1, 116829/43/1, 116829/44/1, 116829/47/1, 116829/48/1, 116829/49/1, 116829/50/1, 116829/51/1, 116829/52/1, 116829/53/1, 116829/54/1, 116829/55/1, 116829/56/1, 116829/57/1, 116829/58/1, 116829/59/1, 116829/60/1, 116829/61/1, 116829/62/1, 116829/63/1, 116829/64/1, 116829/65/1 Generated Assembly Set: 116829/66/1

Taxonomic Classification

The taxonomic classification was done using GTDB app. As we can see there are many bins that were not able to be completely classified. There also bins that seem to not have any close relavent taxonomic classes (i.e GTDB does not know what these bins are) and only show the lowest taxon level based on protein markers of contigs found in the annotated bins.

Obtain objective taxonomic assignments for bacterial and archaeal genomes based on the Genome Taxonomy Database (GTDB) ver R06-RS202
This app completed without errors in 57m 4s.
Links

Assessing Genome Quality

To determine the quality of our draft genomes we used CheckM. The results from CheckM and GTDBtk app where then used to do further analysis on a select few (n=5) genomes. The process of picking these genomes were discussed among researchers, and criteria that we used were CheckM completeness, FastANI, and frequency of contigs seen in metagenome (This analysis was done using Salmon).

Runs the CheckM lineage workflow to assess the genome quality of isolates, single cells, or genome bins from metagenome assemblies through comparison to an existing database of genomes.
This app completed without errors in 1h 20m 14s.
Links
Files
These are only available in the live Narrative: https://narrative.kbase.us/narrative/116829
  • CheckM_summary_table.tsv.zip - TSV Summary Table from CheckM
  • full_output.zip - Full output of CheckM
  • plots.zip - Output plots from CheckM

Visualizing Genomes and Pathways

After selecting five binned assemblys, we checked each bin to determine if they contained the necessary metabolic components to sustain living organisms. To do this we used DRAM, which gives us a visual representation of which metabolic pathways and enzymes in those pathways are present in our bins. To visualize the annotated bins we used circular genome visualization tool, which shows which regions on the contigs are CDS, GC skew, etc.....

Annotate your assembly with DRAM. Annotations will then be distilled to create an interactive functional summary per assembly.
This app completed without errors in 31m 46s.
Objects
Created Object Name Type Description
bin.049.fasta_assembly_DRAM Genome Annotated Genome
Alistipes_Draft_Genome GenomeSet bin_049_Alistipes
Summary
Here are the results from your DRAM run.
Links
Files
These are only available in the live Narrative: https://narrative.kbase.us/narrative/116829
  • annotations.tsv - DRAM annotations in a tab separate table format
  • genes.fna - Genes as nucleotides predicted by DRAM with brief annotations
  • genes.faa - Genes as amino acids predicted by DRAM with brief annotations
  • genes.gff - GFF file of all DRAM annotations
  • trnas.tsv - Tab separated table of tRNAs as detected by tRNAscan-SE
  • genbank.tar.gz - Compressed folder of output genbank files
  • product.tsv - DRAM product in tabular format
  • metabolism_summary.xlsx - DRAM metabolism summary tables
  • genome_stats.tsv - DRAM genome statistics table
v1 - KBaseGenomes.Genome-11.0
The viewer for the data in this Cell is available at the original Narrative here: https://narrative.kbase.us/narrative/116829
Generate a map and annotations of circular genomes using CGView.
This app completed without errors in 3m 47s.
Files
These are only available in the live Narrative: https://narrative.kbase.us/narrative/116829
  • KBase_derived_bin.049.fasta_assembly.RAST.png
  • KBase_derived_bin.049.fasta_assembly.RAST.jpg
  • KBase_derived_bin.049.fasta_assembly.RAST.svg
v1 - KBaseGenomes.Genome-11.0
The viewer for the data in this Cell is available at the original Narrative here: https://narrative.kbase.us/narrative/116829
Generate a map and annotations of circular genomes using CGView.
This app completed without errors in 3m 44s.
Files
These are only available in the live Narrative: https://narrative.kbase.us/narrative/116829
  • KBase_derived_bin.048.fasta_assembly.RAST.png
  • KBase_derived_bin.048.fasta_assembly.RAST.jpg
  • KBase_derived_bin.048.fasta_assembly.RAST.svg
Generate a map and annotations of circular genomes using CGView.
This app completed without errors in 1m 17s.
Files
These are only available in the live Narrative: https://narrative.kbase.us/narrative/116829
  • KBase_derived_bin.051.fasta_assembly.RAST.png
  • KBase_derived_bin.051.fasta_assembly.RAST.jpg
  • KBase_derived_bin.051.fasta_assembly.RAST.svg
Generate a map and annotations of circular genomes using CGView.
This app completed without errors in 1m 30s.
Files
These are only available in the live Narrative: https://narrative.kbase.us/narrative/116829
  • KBase_derived_bin.019.fasta_assembly.RAST.png
  • KBase_derived_bin.019.fasta_assembly.RAST.jpg
  • KBase_derived_bin.019.fasta_assembly.RAST.svg
Generate a map and annotations of circular genomes using CGView.
This app completed without errors in 1m 16s.
Files
These are only available in the live Narrative: https://narrative.kbase.us/narrative/116829
  • KBase_derived_bin.009.fasta_assembly.RAST.png
  • KBase_derived_bin.009.fasta_assembly.RAST.jpg
  • KBase_derived_bin.009.fasta_assembly.RAST.svg
Annotate your assembly with DRAM. Annotations will then be distilled to create an interactive functional summary per assembly.
This app completed without errors in 40m 33s.
Summary
Here are the results from your DRAM run.
Links
Files
These are only available in the live Narrative: https://narrative.kbase.us/narrative/116829
  • annotations.tsv - DRAM annotations in a tab separate table format
  • genes.fna - Genes as nucleotides predicted by DRAM with brief annotations
  • genes.faa - Genes as amino acids predicted by DRAM with brief annotations
  • genes.gff - GFF file of all DRAM annotations
  • trnas.tsv - Tab separated table of tRNAs as detected by tRNAscan-SE
  • genbank.tar.gz - Compressed folder of output genbank files
  • product.tsv - DRAM product in tabular format
  • metabolism_summary.xlsx - DRAM metabolism summary tables
  • genome_stats.tsv - DRAM genome statistics table
Annotate your assembly with DRAM. Annotations will then be distilled to create an interactive functional summary per assembly.
This app completed without errors in 28m 13s.
Summary
Here are the results from your DRAM run.
Links
Files
These are only available in the live Narrative: https://narrative.kbase.us/narrative/116829
  • annotations.tsv - DRAM annotations in a tab separate table format
  • genes.fna - Genes as nucleotides predicted by DRAM with brief annotations
  • genes.faa - Genes as amino acids predicted by DRAM with brief annotations
  • genes.gff - GFF file of all DRAM annotations
  • trnas.tsv - Tab separated table of tRNAs as detected by tRNAscan-SE
  • genbank.tar.gz - Compressed folder of output genbank files
  • product.tsv - DRAM product in tabular format
  • metabolism_summary.xlsx - DRAM metabolism summary tables
  • genome_stats.tsv - DRAM genome statistics table
Annotate your assembly with DRAM. Annotations will then be distilled to create an interactive functional summary per assembly.
This app completed without errors in 36m 18s.
Summary
Here are the results from your DRAM run.
Links
Files
These are only available in the live Narrative: https://narrative.kbase.us/narrative/116829
  • annotations.tsv - DRAM annotations in a tab separate table format
  • genes.fna - Genes as nucleotides predicted by DRAM with brief annotations
  • genes.faa - Genes as amino acids predicted by DRAM with brief annotations
  • genes.gff - GFF file of all DRAM annotations
  • trnas.tsv - Tab separated table of tRNAs as detected by tRNAscan-SE
  • genbank.tar.gz - Compressed folder of output genbank files
  • product.tsv - DRAM product in tabular format
  • metabolism_summary.xlsx - DRAM metabolism summary tables
  • genome_stats.tsv - DRAM genome statistics table
Annotate your assembly with DRAM. Annotations will then be distilled to create an interactive functional summary per assembly.
This app completed without errors in 23m 47s.
Summary
Here are the results from your DRAM run.
Links
Files
These are only available in the live Narrative: https://narrative.kbase.us/narrative/116829
  • annotations.tsv - DRAM annotations in a tab separate table format
  • genes.fna - Genes as nucleotides predicted by DRAM with brief annotations
  • genes.faa - Genes as amino acids predicted by DRAM with brief annotations
  • genes.gff - GFF file of all DRAM annotations
  • trnas.tsv - Tab separated table of tRNAs as detected by tRNAscan-SE
  • genbank.tar.gz - Compressed folder of output genbank files
  • product.tsv - DRAM product in tabular format
  • metabolism_summary.xlsx - DRAM metabolism summary tables
  • genome_stats.tsv - DRAM genome statistics table

Acknowledgements

Special thanks to Alex Handzel for choosing the samples to run the analysis and Laura Sisk-Hackworth for DNA prep of metagenomes. Funding for the project was through NIH grant.

References

  1. Yu-Wei Wu, Blake A. Simmons, Steven W. Singer, MaxBin 2.0: an automated binning algorithm to recover genomes from multiple metagenomic datasets, Bioinformatics, Volume 32, Issue 4, 15 February 2016, Pages 605–607, https://doi.org/10.1093/bioinformatics/btv638
  2. Sieber, C.M.K., Probst, A.J., Sharrar, A. et al. Recovery of genomes from metagenomes via a dereplication, aggregation and scoring strategy. Nat Microbiol 3, 836–843 (2018). https://doi.org/10.1038/s41564-018-0171-1
  3. Patro, R., Duggal, G., Love, M. I., Irizarry, R. A., & Kingsford, C. (2017). Salmon provides fast and bias-aware quantification of transcript expression. Nature Methods.
  4. Pierre-Alain Chaumeil, Aaron J Mussig, Philip Hugenholtz, Donovan H Parks, GTDB-Tk: a toolkit to classify genomes with the Genome Taxonomy Database, Bioinformatics, Volume 36, Issue 6, 15 March 2020, Pages 1925–1927,
  5. Parks DH, Imelfort M, Skennerton CT, Hugenholtz P, Tyson GW. 2014. Assessing the quality of microbial genomes recovered from isolates, single cells, and metagenomes. Genome Research, 25: 1043-1055.
  6. Michael Shaffer, Mikayla A Borton, Bridget B McGivern, Ahmed A Zayed, Sabina Leanti La Rosa, Lindsey M Solden, Pengfei Liu, Adrienne B Narrowe, Josué Rodríguez-Ramos, Benjamin Bolduc, M Consuelo Gazitúa, Rebecca A Daly, Garrett J Smith, Dean R Vik, Phil B Pope, Matthew B Sullivan, Simon Roux, Kelly C Wrighton, DRAM for distilling microbial metabolism to automate the curation of microbiome function, Nucleic Acids Research, Volume 48, Issue 16, 18 September 2020, Pages 8883–8900, https://doi.org/10.1093/nar/gkaa621
  7. Stothard P, Wishart DS. Circular genome visualization and exploration using CGView. Bioinformatics. 2005 Feb 15;21(4):537-9. doi: 10.1093/bioinformatics/bti054. Epub 2004 Oct 12. PMID: 15479716.
  8. Price MN, Dehal PS, Arkin AP. FastTree 2--approximately maximum-likelihood trees for large alignments. PLoS One. 2010 Mar 10;5(3):e9490. doi: 10.1371/journal.pone.0009490. PMID: 20224823; PMCID: PMC2835736.
  9. Aziz, R.K., Bartels, D., Best, A.A. et al. The RAST Server: Rapid Annotations using Subsystems Technology. BMC Genomics 9, 75 (2008). https://doi.org/10.1186/1471-2164-9-75
  10. Arkin, A., Cottingham, R., Henry, C. et al. KBase: The United States Department of Energy Systems Biology Knowledgebase. Nat Biotechnol 36, 566–569 (2018). https://doi.org/10.1038/nbt.4163

Apps

  1. Annotate and Distill Assemblies with DRAM
    • DRAM source code
    • DRAM documentation
    • DRAM publication
  2. Annotate Multiple Microbial Assemblies with RASTtk - v1.073
    • [1] Aziz RK, Bartels D, Best AA, DeJongh M, Disz T, Edwards RA, et al. The RAST Server: Rapid Annotations using Subsystems Technology. BMC Genomics. 2008;9: 75. doi:10.1186/1471-2164-9-75
    • [2] Overbeek R, Olson R, Pusch GD, Olsen GJ, Davis JJ, Disz T, et al. The SEED and the Rapid Annotation of microbial genomes using Subsystems Technology (RAST). Nucleic Acids Res. 2014;42: D206 D214. doi:10.1093/nar/gkt1226
    • [3] Brettin T, Davis JJ, Disz T, Edwards RA, Gerdes S, Olsen GJ, et al. RASTtk: A modular and extensible implementation of the RAST algorithm for building custom annotation pipelines and annotating batches of genomes. Sci Rep. 2015;5. doi:10.1038/srep08365
    • [4] Kent WJ. BLAT The BLAST-Like Alignment Tool. Genome Res. 2002;12: 656 664. doi:10.1101/gr.229202
    • [5] Altschul SF, Madden TL, Schaffer AA, Zhang J, Zhang Z, Miller W, Lipman DJ. Gapped BLAST and PSI-BLAST: a new generation of protein database search programs. Nucleic Acids Res. 1997;25: 3389-3402. doi:10.1093/nar/25.17.3389
    • [6] Lowe TM, Eddy SR. tRNAscan-SE: a program for improved detection of transfer RNA genes in genomic sequence. Nucleic Acids Res. 1997;25: 955 964.
    • [7] Cobucci-Ponzano B, Rossi M, Moracci M. Translational recoding in archaea. Extremophiles. 2012;16: 793 803. doi:10.1007/s00792-012-0482-8
    • [8] Meyer F, Overbeek R, Rodriguez A. FIGfams: yet another set of protein families. Nucleic Acids Res. 2009;37 6643-54. doi:10.1093/nar/gkp698.
    • [9] van Belkum A, Sluijuter M, de Groot R, Verbrugh H, Hermans PW. Novel BOX repeat PCR assay for high-resolution typing of Streptococcus pneumoniae strains. J Clin Microbiol. 1996;34: 1176 1179.
    • [10] Croucher NJ, Vernikos GS, Parkhill J, Bentley SD. Identification, variation and transcription of pneumococcal repeat sequences. BMC Genomics. 2011;12: 120. doi:10.1186/1471-2164-12-120
    • [11] Hyatt D, Chen G-L, Locascio PF, Land ML, Larimer FW, Hauser LJ. Prodigal: prokaryotic gene recognition and translation initiation site identification. BMC Bioinformatics. 2010;11: 119. doi:10.1186/1471-2105-11-119
    • [12] Delcher AL, Bratke KA, Powers EC, Salzberg SL. Identifying bacterial genes and endosymbiont DNA with Glimmer. Bioinformatics. 2007;23: 673 679. doi:10.1093/bioinformatics/btm009
    • [13] Akhter S, Aziz RK, Edwards RA. PhiSpy: a novel algorithm for finding prophages in bacterial genomes that combines similarity- and composition-based strategies. Nucleic Acids Res. 2012;40: e126. doi:10.1093/nar/gks406
  3. Assess Genome Quality with CheckM - v1.0.18
    • Parks DH, Imelfort M, Skennerton CT, Hugenholtz P, Tyson GW. CheckM: assessing the quality of microbial genomes recovered from isolates, single cells, and metagenomes. Genome Res. 2015;25: 1043 1055. doi:10.1101/gr.186072.114
    • CheckM source:
    • Additional info:
  4. Bin Contigs using MaxBin2 - v2.2.4
    • Wu Y-W, Simmons BA, Singer SW. MaxBin 2.0: an automated binning algorithm to recover genomes from multiple metagenomic datasets. Bioinformatics. 2016;32: 605 607. doi:10.1093/bioinformatics/btv638 (2) 1. Wu Y-W, Tang Y-H, Tringe SG, Simmons BA, Singer SW. MaxBin: an automated binning method to recover individual genomes from metagenomes using an expectation-maximization algorithm. Microbiome. 2014;2: 26. doi:10.1186/2049-2618-2-26
    • Wu Y-W, Tang Y-H, Tringe SG, Simmons BA, Singer SW. MaxBin: an automated binning method to recover individual genomes from metagenomes using an expectation-maximization algorithm. Microbiome. 2014;2: 26. doi:10.1186/2049-2618-2-26
    • Maxbin2 source:
    • Maxbin source:
  5. Circular Genome Visualization Tool
    no citations
  6. Classify Microbes with GTDB-Tk - v1.7.0
    • Pierre-Alain Chaumeil, Aaron J Mussig, Philip Hugenholtz, Donovan H Parks, GTDB-Tk: a toolkit to classify genomes with the Genome Taxonomy Database, Bioinformatics, Volume 36, Issue 6, 15 March 2020, Pages 1925 1927. DOI: https://doi.org/10.1093/bioinformatics/btz848
    • Parks, D., Chuvochina, M., Waite, D. et al. A standardized bacterial taxonomy based on genome phylogeny substantially revises the tree of life. Nat Biotechnol 36, 996 1004 (2018). DOI: https://doi.org/10.1038/nbt.4229
    • Parks DH, Chuvochina M, Chaumeil PA, Rinke C, Mussig AJ, Hugenholtz P. A complete domain-to-species taxonomy for Bacteria and Archaea. Nat Biotechnol. 2020;10.1038/s41587-020-0501-8. DOI:10.1038/s41587-020-0501-8
    • Rinke C, Chuvochina M, Mussig AJ, Chaumeil PA, Dav n AA, Waite DW, Whitman WB, Parks DH, and Hugenholtz P. A standardized archaeal taxonomy for the Genome Taxonomy Database. Nat Microbiol. 2021 Jul;6(7):946-959. DOI:10.1038/s41564-021-00918-8
    • Matsen FA, Kodner RB, Armbrust EV. pplacer: linear time maximum-likelihood and Bayesian phylogenetic placement of sequences onto a fixed reference tree. BMC Bioinformatics. 2010;11:538. Published 2010 Oct 30. doi:10.1186/1471-2105-11-538
    • Jain C, Rodriguez-R LM, Phillippy AM, Konstantinidis KT, Aluru S. High throughput ANI analysis of 90K prokaryotic genomes reveals clear species boundaries. Nat Commun. 2018;9(1):5114. Published 2018 Nov 30. DOI:10.1038/s41467-018-07641-9
    • Hyatt D, Chen GL, Locascio PF, Land ML, Larimer FW, Hauser LJ. Prodigal: prokaryotic gene recognition and translation initiation site identification. BMC Bioinformatics. 2010;11:119. Published 2010 Mar 8. DOI:10.1186/1471-2105-11-119
    • Price MN, Dehal PS, Arkin AP. FastTree 2--approximately maximum-likelihood trees for large alignments. PLoS One. 2010;5(3):e9490. Published 2010 Mar 10. DOI:10.1371/journal.pone.0009490 link: https://www.ncbi.nlm.nih.gov/pmc/articles/PMC2835736/
    • Eddy SR. Accelerated Profile HMM Searches. PLoS Comput Biol. 2011;7(10):e1002195. DOI:10.1371/journal.pcbi.1002195
  7. Extract Bins as Assemblies from BinnedContigs - v1.0.2
    • Arkin AP, Cottingham RW, Henry CS, Harris NL, Stevens RL, Maslov S, et al. KBase: The United States Department of Energy Systems Biology Knowledgebase. Nature Biotechnology. 2018;36: 566. doi: 10.1038/nbt.4163
  8. Insert Set of Genomes Into SpeciesTree - v2.2.0
    • Price MN, Dehal PS, Arkin AP. FastTree 2 Approximately Maximum-Likelihood Trees for Large Alignments. PLoS One. 2010;5. doi:10.1371/journal.pone.0009490
  9. Optimize Bacterial or Archaeal Binned Contigs using DAS Tool - v1.1.2
    • Sieber CMK, Probst AJ, Sharrar A, Thomas BC, Hess M, Tringe SG, Banfield JF. Recovery of genomes from metagenomes via a dereplication, aggregation and scoring strategy. 2018; 3(7): 836-843. doi:10.1038/s41564-018-0171-1
    • DAS_Tool source:
    • Hyatt D, Chen G-L, LoCascio PF, Land ML, Larimer FW, Hauser LJ. Prodigal: prokaryotic gene recognition and translation initiation site identification. BMC Bioinformatics. 2010;11: 119. doi:10.1186/1471-2105-11-119
    • Camacho C, Coulouris G, Avagyan V, Ma N, Papadopoulos J, Bealer K, Madden TL. BLAST+: architecture and applications. BMC Bioinformatics. 2009;10: 421. doi:10.1186/1471-2105-10-421
    • Buchfink B, Xie C, Huson DH. Fast and sensitive protein alignment using DIAMOND. Nature Methods. 2015;12: 59-60. doi:10.1038/nmeth.3176
    • Pullseq:
    • R: A Language and Environment for Statistical Computing:
    • Ruby: A Programmers Best Friend: