Data Summary

KBase provides users with a single comprehensive resource for analyzing a wide range of public bioinformatics data together with the data generated from their own experiments. The KBase data model integrates diverse biological data types, and describes the relationships between different data components.

The data in KBase ranges from thousands of genomes and metagenomes and their annotations, to expression and protein-protein interaction data, to inferred models of organismal and community metabolism and gene regulation, and even to geographical information about populations. The table to the right represents the total number of different data objects imported from public repositories or created by KBase analysis pipelines in the combined KBase datastores as of February 20, 2016.

Please see this page for KBase’s Data Policy and the sources of our public reference data.

In the table to the right, searchable data types can be clicked to go to the search page for that data type. We will be releasing a new version of data search soon, with an expanded set of searchable types.

Category Data Type Total number
Genomes Plant 60
Bacterial 26,862
Archaeal 504
Other Eukaryotic 40
Viruses 858
Total Genomes 28,372
Variation Datasets 196
Annotation Classes Genome Features 111,557,369
Subsystem Variants 8,372
Phylogenetic Trees 55,096
Ontological Aliases and Synonyms 132,383,718
Biochemistry Biochemical Species (Compounds) 27,838
Reactions 33,773
Metabolic Maps 75
Metabolic Pathways 1,470
Functional Data Expression Series 625
Expression Samples 8,420
Expression Platforms 38
Expression Replicate Groups 553
Phenotype Data Sets 463
Protein-Protein Interaction Network Datasets 13
Pairwise Protein-Protein Interactions 231,220
Fitness/Growth Datasets 370
Derived Data Correlation Networks 245
Co-expressed Gene Pairs 93,018,189
Expression Data Biclusters 382
Fitness Data Biclusters 114
Metagenomic Profiles 41,511
Models Metabolic Models 23,324
Regulons 6,538