App Catalog
Sign Up Sign In
Insert Set of Genomes Into SpeciesTree - v2.2.0
By: rsutormin

Launch

Add a user-provided GenomeSet to a KBase SpeciesTree.

This App allows a user to construct a species tree using a set of 49 core, universal genes defined by COG (Clusters of Orthologous Groups) gene families. It combines the genome(s) provided by the user with a set of closely related genomes selected from the public KBase genomes import of RefSeq. Since the number of genomes available in KBase is very large, the procedure starts by selecting a subset of public KBase genomes closely related to the user-provided genomes. Relatedness is determined by alignment similarity to a select subset of 49 COG domains. Next, the user genome(s) are inserted into our curated multiple sequence alignment (MSA) for each COG family. The curated alignments have been trimmed using GBLOCKS to remove poorly aligned sections of the MSA. The MSAs are then concatenated. Then, a phylogenetic tree is reconstructed using , (version 2.1.10, a method to quickly estimate approximate maximum likelihood phylogeny, see Publications) with the genome(s) provided by the user and the set of genomes identified as similar in the previous step. FastTree2 is used with the -fastest setting.

Note: when inserting one or more genomes into a species tree, the inserted genomes that are also contained within the KBase list of species will be duplicated in the tree. One copy will have the KBase list ID and the other will have the ID of the inserted genome.

The Neighbor public genome count parameter will control how many nearby genomes are included in the species tree for each input genome. The maximum number of nearby genomes is 200. This represents the total number of close genomes that will be added to the tree. If you have several diverse input genomes, you should increase this number.

The primary output of this object will be a species tree data object. The App report will show the resulting tree and contain links for downloading the tree in newick format and as a pdf or png. Tree leaves are labeled by the NCBI RefSeq species name plus their GCF identifiers. User genomes are labeled with the data object name. Branch support values are shown for each node, as described in the FastTree2 documentation.

This App also creates a Genome Set object containing all the genomes in the species tree that was generated. You must provide a name for this output object. By default, the genomes in this Genome Set are not copied into your narrative, in order to make further analyses (e.g., viewing, finding domains) more convenient, you may select the Copy public genomes to your workspace option. However, this is not recommended if you set a large value for the Neighbor public genome count.

By clicking on the Genome Set object that is generated, you can edit the set of genomes, e.g., to exclude some of them from downstream analyses.

Team members who developed & deployed algorithm in KBase: Roman Sutormin. For questions, please contact us.

The COGs domains used in the estimate of relatedness include:
COG0012 COG0012 Predicted GTPase, probable translation factor [Translation, ribosomal structure and biogenesis].
COG0013 AlaS Alanyl-tRNA synthetase [Translation, ribosomal structure and biogenesis].
COG0016 PheS Phenylalanyl-tRNA synthetase alpha subunit [Translation, ribosomal structure and biogenesis].
COG0018 ArgS Arginyl-tRNA synthetase [Translation, ribosomal structure and biogenesis].
COG0030 KsgA Dimethyladenosine transferase (rRNA methylation) [Translation, ribosomal structure and biogenesis].
COG0041 PurE Phosphoribosylcarboxyaminoimidazole (NCAIR) mutase [Nucleotide transport and metabolism].
COG0046 PurL Phosphoribosylformylglycinamidine (FGAM) synthase, synthetase domain [Nucleotide transport and metabolism].
COG0048 RpsL Ribosomal protein S12 [Translation, ribosomal structure and biogenesis].
COG0049 RpsG Ribosomal protein S7 [Translation, ribosomal structure and biogenesis].
COG0051 RpsJ Ribosomal protein S10 [Translation, ribosomal structure and biogenesis].
COG0052 RpsB Ribosomal protein S2 [Translation, ribosomal structure and biogenesis].
COG0072 PheT Phenylalanyl-tRNA synthetase beta subunit [Translation, ribosomal structure and biogenesis].
COG0080 RplK Ribosomal protein L11 [Translation, ribosomal structure and biogenesis].
COG0081 RplA Ribosomal protein L1 [Translation, ribosomal structure and biogenesis].
COG0082 AroC Chorismate synthase [Amino acid transport and metabolism].
COG0086 RpoC DNA-directed RNA polymerase, beta' subunit/160 kD subunit [Transcription].
COG0087 RplC Ribosomal protein L3 [Translation, ribosomal structure and biogenesis].
COG0088 RplD Ribosomal protein L4 [Translation, ribosomal structure and biogenesis].
COG0089 RplW Ribosomal protein L23 [Translation, ribosomal structure and biogenesis].
COG0090 RplB Ribosomal protein L2 [Translation, ribosomal structure and biogenesis].
COG0091 RplV Ribosomal protein L22 [Translation, ribosomal structure and biogenesis].
COG0092 RpsC Ribosomal protein S3 [Translation, ribosomal structure and biogenesis].
COG0093 RplN Ribosomal protein L14 [Translation, ribosomal structure and biogenesis].
COG0094 RplE Ribosomal protein L5 [Translation, ribosomal structure and biogenesis].
COG0096 RpsH Ribosomal protein S8 [Translation, ribosomal structure and biogenesis].
COG0097 RplF Ribosomal protein L6P/L9E [Translation, ribosomal structure and biogenesis].
COG0098 RpsE Ribosomal protein S5 [Translation, ribosomal structure and biogenesis].
COG0099 RpsM Ribosomal protein S13 [Translation, ribosomal structure and biogenesis].
COG0100 RpsK Ribosomal protein S11 [Translation, ribosomal structure and biogenesis].
COG0102 RplM Ribosomal protein L13 [Translation, ribosomal structure and biogenesis].
COG0103 RpsI Ribosomal protein S9 [Translation, ribosomal structure and biogenesis].
COG0105 Ndk Nucleoside diphosphate kinase [Nucleotide transport and metabolism].
COG0126 Pgk 3-phosphoglycerate kinase [Carbohydrate transport and metabolism].
COG0127 COG0127 Xanthosine triphosphate pyrophosphatase [Nucleotide transport and metabolism].
COG0130 TruB Pseudouridine synthase [Translation, ribosomal structure and biogenesis].
COG0150 PurM Phosphoribosylaminoimidazole (AIR) synthetase [Nucleotide transport and metabolism].
COG0151 PurD Phosphoribosylamine-glycine ligase [Nucleotide transport and metabolism].
COG0164 RnhB Ribonuclease HII [DNA replication, recombination, and repair].
COG0172 SerS Seryl-tRNA synthetase [Translation, ribosomal structure and biogenesis].
COG0185 RpsS Ribosomal protein S19 [Translation, ribosomal structure and biogenesis].
COG0186 RpsQ Ribosomal protein S17 [Translation, ribosomal structure and biogenesis].
COG0215 CysS Cysteinyl-tRNA synthetase [Translation, ribosomal structure and biogenesis].
COG0244 RplJ Ribosomal protein L10 [Translation, ribosomal structure and biogenesis].
COG0256 RplR Ribosomal protein L18 [Translation, ribosomal structure and biogenesis].
COG0343 Tgt Queuine/archaeosine tRNA-ribosyltransferase [Translation, ribosomal structure and biogenesis].
COG0504 PyrG CTP synthase (UTP-ammonia lyase) [Nucleotide transport and metabolism].
COG0519 GuaA GMP synthase, PP-ATPase domain/subunit [Nucleotide transport and metabolism].
COG0532 InfB Translation initiation factor 2 (IF-2; GTPase) [Translation, ribosomal structure and biogenesis].
COG0533 QRI7 Metal-dependent proteases with possible chaperone activity [Posttranslational modification, protein turnover, chaperones].

Related Publications


App Specification:

https://github.com/kbaseapps/SpeciesTreeBuilder/tree/dce166f6d1673018a001b750c191b9a2deda0c71/ui/narrative/methods/insert_genomeset_into_species_tree

Module Commit: dce166f6d1673018a001b750c191b9a2deda0c71