Estimate K for K-Means Clustering

KBaseFeatureValues

v.0.0.22

By: rsutormin, msneddon, psnovichkov, marcin, srividya22

Launch

Compute reasonable values of K for use in K-means clustering.

This App generates reasonable numbers of clusters (K) for use in the Cluster Expression Data - K-Means App. Reasonable values of K are computed by minimizing both the number of clusters (K) and the average variance between the clusters. Begin by selecting or importing an expression dataset to analyze using the Add Data button. Next, provide a name for the output estimate, a valid number for the maximum number of clusters, and select a criterion for running estimate K. Run this App to generate a graph with peaks that represent reasonable values for K calculated by the algorithm.

The input is a .tsv file with "gene-id" listed in the A1 cell, the gene IDs listed in the A column, the sample/conditions identifiers in the first row, and the expression values that correspond to the gene-ids and sample throughout. For a comprehensive guide to formatting your expression data for import into KBase, see the Data Upload/Download Guide.

This App is based on the fpc package for R.

NOTE: This App is one of the steps in the Transcriptomics and Expression Analysis Workflow in KBase, however, it can also be run as a standalone.

Team members who developed & deployed algorithm in KBase: Paramvir Dehal, Roman Sutormin, Michael Sneddon, Srividya Ramakrishnan, Pavel Novichkov, Keith Keller. For questions, please contact us.

Related Publications

Arkin AP, Cottingham RW, Henry CS, Harris NL, Stevens RL, Maslov S, et al. KBase: The United States Department of Energy Systems Biology Knowledgebase. Nature Biotechnology. 2018;36: 566. doi: 10.1038/nbt.4163 , https://www.nature.com/articles/nbt.4163

App Specification:

https://github.com/kbaseapps/FeatureValues/tree/6cdc50905a08883a53333c073abe1e1df7b3f97f/ui/narrative/methods/expression_toolkit_estimate_k

Module Commit: 6cdc50905a08883a53333c073abe1e1df7b3f97f