Add a user-provided GenomeSet to a KBase SpeciesTree.
This App allows a user to construct a species tree using a set of 49 core, universal genes defined by COG (Clusters of Orthologous Groups) gene families. It combines the genome(s) provided by the user with a set of closely related genomes selected from the public KBase genomes import of RefSeq. Since the number of genomes available in KBase is very large, the procedure starts by selecting a subset of public KBase genomes closely related to the user-provided genomes. Relatedness is determined by alignment similarity to a select subset of 49 COG domains. Next, the user genome(s) are inserted into our curated multiple sequence alignment (MSA) for each COG family. The curated alignments have been trimmed using GBLOCKS to remove poorly aligned sections of the MSA. The MSAs are then concatenated. Then, a phylogenetic tree is reconstructed using , (version 2.1.10, a method to quickly estimate approximate maximum likelihood phylogeny, see Publications) with the genome(s) provided by the user and the set of genomes identified as similar in the previous step. FastTree2 is used with the -fastest setting.
Note: when inserting one or more genomes into a species tree, the inserted genomes that are also contained within the KBase list of species will be duplicated in the tree. One copy will have the KBase list ID and the other will have the ID of the inserted genome.
The Neighbor public genome count parameter will control how many nearby genomes are included in the species tree for each input genome. The maximum number of nearby genomes is 200. This represents the total number of close genomes that will be added to the tree. If you have several diverse input genomes, you should increase this number.
The primary output of this object will be a species tree data object. The App report will show the resulting tree and contain links for downloading the tree in newick format and as a pdf or png. Tree leaves are labeled by the NCBI RefSeq species name plus their GCF identifiers. User genomes are labeled with the data object name. Branch support values are shown for each node, as described in the FastTree2 documentation.
This App also creates a Genome Set object containing all the genomes in the species tree that was generated. You must provide a name for this output object. By default, the genomes in this Genome Set are not copied into your narrative, in order to make further analyses (e.g., viewing, finding domains) more convenient, you may select the Copy public genomes to your workspace option. However, this is not recommended if you set a large value for the Neighbor public genome count.
By clicking on the Genome Set object that is generated, you can edit the set of genomes, e.g., to exclude some of them from downstream analyses.
Team members who developed & deployed algorithm in KBase: Roman Sutormin. For questions, please contact us.
The COGs domains used in the estimate of relatedness include:COG0012 | COG0012 | Predicted GTPase, probable translation factor [Translation, ribosomal structure and biogenesis]. | ||
COG0013 | AlaS | Alanyl-tRNA synthetase [Translation, ribosomal structure and biogenesis]. | ||
COG0016 | PheS | Phenylalanyl-tRNA synthetase alpha subunit [Translation, ribosomal structure and biogenesis]. | ||
COG0018 | ArgS | Arginyl-tRNA synthetase [Translation, ribosomal structure and biogenesis]. | ||
COG0030 | KsgA | Dimethyladenosine transferase (rRNA methylation) [Translation, ribosomal structure and biogenesis]. | ||
COG0041 | PurE | Phosphoribosylcarboxyaminoimidazole (NCAIR) mutase [Nucleotide transport and metabolism]. | ||
COG0046 | PurL | Phosphoribosylformylglycinamidine (FGAM) synthase, synthetase domain [Nucleotide transport and metabolism]. | ||
COG0048 | RpsL | Ribosomal protein S12 [Translation, ribosomal structure and biogenesis]. | ||
COG0049 | RpsG | Ribosomal protein S7 [Translation, ribosomal structure and biogenesis]. | ||
COG0051 | RpsJ | Ribosomal protein S10 [Translation, ribosomal structure and biogenesis]. | ||
COG0052 | RpsB | Ribosomal protein S2 [Translation, ribosomal structure and biogenesis]. | ||
COG0072 | PheT | Phenylalanyl-tRNA synthetase beta subunit [Translation, ribosomal structure and biogenesis]. | ||
COG0080 | RplK | Ribosomal protein L11 [Translation, ribosomal structure and biogenesis]. | ||
COG0081 | RplA | Ribosomal protein L1 [Translation, ribosomal structure and biogenesis]. | ||
COG0082 | AroC | Chorismate synthase [Amino acid transport and metabolism]. | ||
COG0086 | RpoC | DNA-directed RNA polymerase, beta' subunit/160 kD subunit [Transcription]. | ||
COG0087 | RplC | Ribosomal protein L3 [Translation, ribosomal structure and biogenesis]. | ||
COG0088 | RplD | Ribosomal protein L4 [Translation, ribosomal structure and biogenesis]. | ||
COG0089 | RplW | Ribosomal protein L23 [Translation, ribosomal structure and biogenesis]. | ||
COG0090 | RplB | Ribosomal protein L2 [Translation, ribosomal structure and biogenesis]. | ||
COG0091 | RplV | Ribosomal protein L22 [Translation, ribosomal structure and biogenesis]. | ||
COG0092 | RpsC | Ribosomal protein S3 [Translation, ribosomal structure and biogenesis]. | ||
COG0093 | RplN | Ribosomal protein L14 [Translation, ribosomal structure and biogenesis]. | ||
COG0094 | RplE | Ribosomal protein L5 [Translation, ribosomal structure and biogenesis]. | ||
COG0096 | RpsH | Ribosomal protein S8 [Translation, ribosomal structure and biogenesis]. | ||
COG0097 | RplF | Ribosomal protein L6P/L9E [Translation, ribosomal structure and biogenesis]. | ||
COG0098 | RpsE | Ribosomal protein S5 [Translation, ribosomal structure and biogenesis]. | ||
COG0099 | RpsM | Ribosomal protein S13 [Translation, ribosomal structure and biogenesis]. | ||
COG0100 | RpsK | Ribosomal protein S11 [Translation, ribosomal structure and biogenesis]. | ||
COG0102 | RplM | Ribosomal protein L13 [Translation, ribosomal structure and biogenesis]. | ||
COG0103 | RpsI | Ribosomal protein S9 [Translation, ribosomal structure and biogenesis]. | ||
COG0105 | Ndk | Nucleoside diphosphate kinase [Nucleotide transport and metabolism]. | ||
COG0126 | Pgk | 3-phosphoglycerate kinase [Carbohydrate transport and metabolism]. | ||
COG0127 | COG0127 | Xanthosine triphosphate pyrophosphatase [Nucleotide transport and metabolism]. | ||
COG0130 | TruB | Pseudouridine synthase [Translation, ribosomal structure and biogenesis]. | ||
COG0150 | PurM | Phosphoribosylaminoimidazole (AIR) synthetase [Nucleotide transport and metabolism]. | ||
COG0151 | PurD | Phosphoribosylamine-glycine ligase [Nucleotide transport and metabolism]. | ||
COG0164 | RnhB | Ribonuclease HII [DNA replication, recombination, and repair]. | ||
COG0172 | SerS | Seryl-tRNA synthetase [Translation, ribosomal structure and biogenesis]. | ||
COG0185 | RpsS | Ribosomal protein S19 [Translation, ribosomal structure and biogenesis]. | ||
COG0186 | RpsQ | Ribosomal protein S17 [Translation, ribosomal structure and biogenesis]. | ||
COG0215 | CysS | Cysteinyl-tRNA synthetase [Translation, ribosomal structure and biogenesis]. | ||
COG0244 | RplJ | Ribosomal protein L10 [Translation, ribosomal structure and biogenesis]. | ||
COG0256 | RplR | Ribosomal protein L18 [Translation, ribosomal structure and biogenesis]. | ||
COG0343 | Tgt | Queuine/archaeosine tRNA-ribosyltransferase [Translation, ribosomal structure and biogenesis]. | ||
COG0504 | PyrG | CTP synthase (UTP-ammonia lyase) [Nucleotide transport and metabolism]. | ||
COG0519 | GuaA | GMP synthase, PP-ATPase domain/subunit [Nucleotide transport and metabolism]. | ||
COG0532 | InfB | Translation initiation factor 2 (IF-2; GTPase) [Translation, ribosomal structure and biogenesis]. | ||
COG0533 | QRI7 | Metal-dependent proteases with possible chaperone activity [Posttranslational modification, protein turnover, chaperones]. |
Related Publications
- Price MN, Dehal PS, Arkin AP. FastTree 2 Approximately Maximum-Likelihood Trees for Large Alignments. PLoS One. 2010;5. doi:10.1371/journal.pone.0009490 , http://www.ncbi.nlm.nih.gov/pmc/articles/PMC2835736/
App Specification:
https://github.com/kbaseapps/SpeciesTreeBuilder/tree/dce166f6d1673018a001b750c191b9a2deda0c71/ui/narrative/methods/insert_genomeset_into_species_treeModule Commit: dce166f6d1673018a001b750c191b9a2deda0c71