Annotate Genome or Protein Sequence Set Object with Snekmer Apply.
This App leverages Snekmer for its Apply function, a powerful tool designed to re-encode amino acid sequences, process them into kmers and predict protein sequence function. The Apply function utilizes pre-built kmer counts matrices from 1000 InterPro genomes for TIGRFams, Pfam, and PANTHER annotations.
The kmerized protein sequences are compared to each family in the prebuilt matrices using cosine similarity to find the most likely protein function. Cosine similarity is a similarity metric that vectorizes objects into N dimensions, where N is the number of kmers, and then measures the angle between the vectors. It is commonly used in text analysis.
The prebuilt kmer count matrices were created using Snekmer Learn, a feature not yet available to KBase.
Snekmer Apply Key Features:
- Rapid Annotation: Using pre-built counts matrices mentioned above, Snekmer Apply utilizes cosine similarity to predict the annotation ontology.
- Efficient Data Handling: Optimized to manage and analyze Genome and ProteinSequenceSet Obects effectively.
- Integration of Comprehensive Databases: Incorporates annotations from TIGRFams, Pfam, and PANTHER, covering a wide range of genomic features.
- Seamless Workflow Integration: Designed to complement and integrate with other genomic analysis tools available in KBase, enhancing overall research capabilities.
- Detailed Output: While updating object ontologies, this tool also generates a secondary output with detailed results of prediction confidence.
The output of Snekmer Apply is an updated object with new protein sequence / gene ontologies. The secondary output provides predictions and confidence levels which may be directly downloaded.
Related Publications
- Chang CH, Nelson WC, Jerger A, Wright AT, Egbert RG, McDermott JE. Snekmer: a scalable pipeline for protein sequence fingerprinting based on amino acid recoding. Bioinform Adv. 2023 Feb 2;3(1):vbad005. doi: 10.1093/bioadv/vbad005. PMID: 36789294; PMCID: PMC9913046. , https://pubmed.ncbi.nlm.nih.gov/36789294/
App Specification:
https://github.com/jjacobson95/KbaseSnekmerLA/tree/5e6a3f106099b96eb0f1a41265175a5249e55b94/ui/narrative/methods/run_SnekmerLearnApplyModule Commit: 5e6a3f106099b96eb0f1a41265175a5249e55b94