App Catalog
Sign Up Sign In
Build Pangenome with OrthoMCL - v2.0
By: rsutormin

Launch

Create a Pangenome object by performing OrthoMCL orthologous groups construction on a set of Genomes.

Background Information


Orthologs are homologs seperated by speciation events. Paralogs are homologs separated by duplication events. Detection of orthologs is becoming much more important with the rapid progress in genome sequencing.

OrthoMCL is a genome-scale algorithm for grouping orthologous protein sequences. It provides not only groups shared by two or more species/genomes, but also groups representing species-specific gene expansion families. So it serves as an important utility for automated eukaryotic genome annotation.

OrthoMCL starts with reciprocal best BLAST hits within each genome as potential in-paralog/recent paralog pairs and reciprocal best hits across any two genomes as potential ortholog pairs. Related proteins are interlinked in a similarity graph. Then, MCL is invoked to split mega-clusters. This process is analogous to the manual review in COG construction. MCL clustering is based on weights between each pair of proteins, so to correct for differences in evolutionary distance, the weights are normalized before running MCL.

OrthoMCL is similar to the INPARANOID algorithm, but is extended to cluster orthologs from multiple species. OrthoMCL clusters are coherent with groups identified by EGO, and an analysis using EC number suggests a high degree of reliability [1].

Overview of OrthoMCL Processing

In KBase, the input to OrthoMCL is a set of genomes and/or a list of individual genomes, and the output is a Pangenome object. A pangenome is the set of protein-coding genes in all the selected organisms. It includes genes present in all organisms (core genome) and genes present only in some organisms. The advanced parameters are either options for the BLAST or MCL portions of the code.

Output


The output cell has three tabs:

In the data panel, the newly created Pangenome object can be downloaded as a tab-separated values file (TSV) or as Excel.

Related Publications


App Specification:

https://github.com/kbaseapps/PangenomeOrthomcl/tree/ec78927c83921ccd6ddc670725ffecc6ab3d96da/ui/narrative/methods/build_pangenome_with_orthomcl

Module Commit: ec78927c83921ccd6ddc670725ffecc6ab3d96da