Data Type Descriptions

The data types in KBase cover a wide range of types relevant to systems biology research, including genomes and their annotations, metagenomes, expression and protein-protein interaction data, inferred models of organismal and community metabolism and gene regulation, and even geographical information about populations. The Data Summary page lists the key data types in KBase.

By “data type”, we mean the internal KBase representation of a particular sort of biological data. The list of data types in this section of the Data Guide uses the internal KBase nomenclature for the data type names, and describes what each of these types includes in the KBase database.


  • SingleEndLibrary Sequence data from one end of a strand of DNA
  • PairedEndLibrary Sequence data from both ends of a DNA fragment
  • ReferenceAssembly Curated and non-redundant sequence data from RefSeq, the National Center for Biotechnology Information’s (NCBI) Reference Sequence Database


  • Assembly (previously called ContigSet) A collection of contiguous DNA sequences that each represent a consensus sequence
  • Genome — A set of known structural and functional annotations for contiguous DNA sequences that represent an organism
  • GenomeSet — Collection of genomes from multiple organisms
  • Pangenome — Data representing unified gene sets of multiple organisms from a GenomeSet
  • Transcriptome — Set of both coding and non-coding RNA present in an organism within a specific environment
  • ProteomeComparison — Contains information about the corresponding proteins between two genomes
  • Tree — Species tree representing evolutionary relationships between organisms


  • Metagenome — Collection of genes and genomes present in an environmental sample
  • Collection — A list of metagenomes to be used in comparative analysis
  • TaxonomicMatrix — Table of abundance data for various organisms in an environmental sample
  • FunctionalMatrix — Table of abundance data for proteins and other functional biological expressions in an environmental sample

Flux Balance Analysis (FBA) Modeling

  • FBAModel — Contains the reactions, compounds, compartments, biomass reactions, and gene associations that comprise a metabolic model for an organism
  • Media — A series of compounds with information about their concentration and flux parameters


  • PhenotypeSet — Contains information about experimentally measured growth phenotype data
  • PhenotypeSimulationSet — List of the growth phenotypes included in the phenotype set, along with growth rates predicted by the specified metabolic model.​