App Catalog
Sign Up Sign In
Cluster Expression Data - K-Means


Perform K-means clustering to group expression data for observing and analyzing patterns of gene expression.

This App enables users to observe and analyze patterns of gene expression by grouping expression data via K-means clustering. K-means clustering is useful for discovering functionally related sets of genes, investigating regulatory networks for gene expression, and deducing unknown gene functions by observing and grouping their expression patterns in differing conditions.

Begin by selecting or importing both the expression dataset to analyze and the genome associated with the expression dataset using the Add Data button. Next, specify a value for K. The Estimate K for K-Means Clustering App should be used to assist in determining an optimal value for K. Then provide a name for the output set of clusters. Finally, define the number of starts and iterations, select the K-means clustering algorithm to use for the analysis, and input a random seed value.

The input is a .tsv file with "gene-id" listed in the A1 cell, the gene IDs listed in the A column, the sample/conditions identifiers in the first row, and the expression values that correspond to the gene-ids and sample throughout. For a comprehensive guide to formatting your expression data for import into KBase, see the Data Upload/Download Guide.

Description of k-means clustering algorithms:

This App is based on the amap package for R.

NOTE: This App is one of the steps in the Transcriptomics and Expression Analysis Workflow in KBase, however it can also be run as a standalone.

Team members who implemented algorithm in KBase: Paramvir Dehal, Roman Sutormin, Michael Sneddon, Srividya Ramakrishnan, Pavel Novichkov, Keith Keller. For questions, please contact us.

Related Publications

App Specification:

Module Commit: 6cdc50905a08883a53333c073abe1e1df7b3f97f