Workspace typed objects

The Workspace Service (WSS) provides storage, sharing, versioning, validation and provenance tracking of typed object (TO) data. This document describes basic information for developers who need to define and register TOs for use with the WSS.

Typed object basics

TOs in the WSS are hierarchical data objects that conform to type definitions specified in the KBase Interface Description Language (KIDL). Just as KIDL is used to specify the structure of data exchanged between KBase clients and servers (generated by the Type Compiler), KIDL is used to define the structure of data stored in the WSS.

Any structures defined in a KIDL formatted file (e.g. typedef structure { } StructureName;) can be registered with the WSS (see Typed object registration & versioning). Instances of these objects can then be saved to the WSS by any user. The WSS does not support storage of primitive or basic container types directly (ie string, int, float, list, mapping).

KIDL defined Modules provides namespacing for typed objects. Thus, the module name and type name is required to uniquely identify a type in the WSS, generally in the format ModuleName.TypeName.

Typed object registration & versioning

TO definitions must be registered with the WSS before instances of the TOs can be saved. The basic process for registering a TO is:

  • Developer requests ownership of a module name via the Workspace API
    • see API method request_module_ownership(...)
  • WSS admin approves the request
  • Developer uploads (i.e. registers) a type specification file (typespec) in KIDL format where the module name is identical to the just approved module name in the WSS and indicates the names of the TOs which the developer wants the WSS to support
    • see API method register_typespec(...)
  • Developer releases the module, which releases the latest version of all TO definitions in the module
    • see API method release_module(...)

TO definitions marked for WSS usage are versioned with a major and minor version. Every time a new typespec is uploaded and registered, the TO definitions defined in the module automatically receive a new version number if changed. Minor versions are incremented if the change is backwards compatible (i.e. addition of a new optional field). Major versions are incremented if the change is not backwards compatible.

All versions of all registered TO definitions are available to WSS users, but to save an object instance of an old version, or an unreleased version, the exact version number must be provided by the user. If a WSS user saves an object instance without providing version numbers for the type, the latest released version of the TO definition is assumed. The process of releasing a module therefore indicates that the latest version of all typed object definitions in the module are ready for public use, but does not limit user’s or developer’s ability to work with old or pre-released versions of TOs.

Before the first release of a module, repeated uploads of a module result in version numbers of TO definitions of 0.x and are assumed to be backwards incompatible. On first release of a module, all version numbers of TO definitions are updated to 1.0.

Users and developers can use the ws-typespec-list script or the API to list registered modules, type definitions, and versions of type definitions, and to retrieve the actual KIDL or JSON schema encoding of the typed object definition. End users will only be able to view the versions of TOs that are released. Owners of a module can list all versions of TOs in modules that they own.

Typed object validation

Instances of TOs can be validated against type definitions registered with the WSS. Instances of TOs must pass this validation process to be stored in the WSS, thereby guaranteeing that WSS data is structurally valid.

Todo

Update this document to use the kb-sdk tools.

The WSS validates the TO instance against a JSON Schema V4 encoding of the TO definition. The JSON Schema encoding can be generated by the KBase Type Compiler (currently in branch dev-prototypes). In addition to matching the structure and type of data, additional constraints can be placed on TO validation through the use of Annotations (see Typed object annotations).

To generate JSON encodings of your TOs for review, checkout the dev-prototypes branch of the typecompiler and compile your typespec file with the --jsonschema option of the compile_typespec command. The JSON Schema encoding of each object definition is generated in the output location in a directory called jsonschema. The JSON Schema encoding is also available for all registered TO definitions via the WSS API or the ws-typespec-list command.

All TO instances pulled from the WSS are guaranteed to be valid instances of a TO definition as registered with the WSS. Therefore it is recommended that KBase services which require rigorous validation of complex data operate on data stored in the WSS (as opposed to passing the object by value and writing the validation code yourself). Note that full validation is not built into generated KBase client/server code, so it is not safe to assume that input data received directly from a type compiler generated client conforms to the specified type definitions in your API.

Additional technical details: The TO validation code is written in Java and is available in the workspace_deluxe KBase repo.

Typed object annotations

Annotations provide an infrastructure for attaching structured meta data to type definitions (and eventually to functions and modules). Such meta data is useful for specifying additional constraints on data types, interpreting data types within a particular context, and declaring structured information that can later be automatically indexed or searched, such as authorship of a function implementation.

Annotations are declared in the comment immediately preceding the definition of the TO. Thus, all annotations are always attached and viewable within the API documentation. Each annotation must be specified on its own line in the following format:

@[ANNOTATION] [INFO]

where [ANNOTATION] is the name of the annotation and [INFO] is any additional information, if any, required of the annotation. To provide a simple example which associates authorship information to a TO using the @author annotation:

/*
  Data type for my experimental data.
  @author John Scientist
*/
typedef structure {
    string name;
    list <int> results;
} MyExperimentData;

Currently supported type definition annotations

Optional annotation

Mark a specific field of a structure as an optional field. The optional annotation can only be declared where a structure is first defined. On validation of TO instances by the WSS, missing optional fields are permitted. If an optional field is present, however, the value of the field will be validated normally. Optional fields are defined as:

@optional [FIELD_NAME_1] [FIELD_NAME_2] ...

For example, the following annotation will declare that two fields of the structure are optional.:

/*
  @optional alias functional_assignments
*/
typedef structure {
    string name;
    string alias;
    string sequence;
    list <string> functional_assignments;
} Feature;

ID annotations

Mark a string as an ID that references another object or entity. ID annotations can only be associated to type definitions which resolve to a string. ID annotations are declared in the general form:

@id [ID_TYPE] [PARAMETERS]

where [ID_TYPE] specifies the type of ID and is required, and [PARAMETERS] provides additional information or constraints. [PARAMETERS] are always optional.

ID annotations are inherited when declaring a new typedef of a string that was already marked as an ID. If a new ID Annotation is declared in a typedef, it overrides any previous ID declaration.

Note that although @id annotations may be specified as any ID_TYPE and associated to any typedef, applications that consume type specifications (primarily the workspace at the time of writing) may only recognize specific @id ID_TYPE / typedef combinations.

The ID types currently supported are described below.

Workspace ID

@id ws [TYPEDEF_NAME] ...

The ID must reference a TO instance stored in the WSS. There are multiple valid ways to specify a workspace object, and all are acceptable. A reference path into the object graph may be provided by providing a string consisting of a list of references separated by semicolons.

Optionally, one or more type definition names can be specified indicating that the ID must point to a TO instance that is one of the specified types. The typedef with which the @id annotation is associated must be a string.

Example:

/*
   A reference to a genome.
   @id ws KB.MicrobialGenome KB.PlantGenome
*/
typedef string genome_id;

KBase ID

@id kb

This annotation originally specified that the string must be a KBase ID which was typically registered in the ID service in a format such as “kb|type.XXX”. The ID server is no longer used in KBase and this field doesn’t have any particular meaning at this point.

No type checking on this field is performed, but the annotation may be used in the future so that users can automatically extract KBase IDs from typed objects.

Handle ID

@id handle

The ID must reference a Handle ID from the Handle Service. This is typically in the format KBH_XXX. When saving an object containing one or more handles to the WSS, the WSS checks that the handles are owned by the user before completing the save. Furthermore, the handle data is shared as the workspace object is shared. See Shock integration with the workspace service for more details.

Shock ID

@id bytestream

The ID must reference a Shock node that exists in the Shock instance configured for linking Shock nodes to WSS objects. When saving an object containing one or more Shock nodes to the WSS, the WSS checks that the nodes are owned by the user or owned by the workspace and readable by the user and (if necessary) takes ownership of the nodes. Furthermore, the nodes are shared as the workspace object is shared. See Shock integration with the workspace service for more details.

Sample ID

@id sample

The ID must reference a Sample service sample. When saving an object containing one or more sample IDs to the WSS, the WSS checks that the samples are administrated by the user. Furthermore, the nodes are shared as the workspace object is shared. See Sample service integration with the workspace service for more details.

External ID

@id external [SOURCE] ...

The ID must reference an entity in an external (i.e. outside of KBase) data store. The IDs are not verified or validated, but may be used in the future to allow users to automatically extract external IDs from typed objects. [SOURCE] provides an optional way to specify the external source. Currently there is no standard dictionary of sources.

Deprecated annotation

@deprecated [REPLACEMENT_TYPE]

The deprecated annotation is used to mark a type definition as deprecated, and provides a structured mechanism for indicating a replacement type if one exists. The deprecated annotation so far is only for documentation purposes, but may be used by the Workspace in the future to better display, list, or query workspace objects (e.g. list all objects of a type that is not deprecated).

Range annotation

@range [RANGE SPECIFICATION]

The range annotation is associated with either a float or int typedef and specifies the minimum and / or maximum value of the int or float. The [RANGE SPECIFICATION] is a tuple of the minimum and maximum numbers, separated by a comma. Omit the minimum or maximum to specify an infinite negative or positive range, respectively. Bracketing the [RANGE SPECIFICATION] with parentheses indicates the range extents are exclusive; square brackets or no brackets indicates an inclusive range.

Examples:

Range Explanation
0, 30 Range from 0 - 30, inclusive
[0, 30] Range from 0 - 30, inclusive
[0, 30) Range from 0 - 30, including 0, excluding 30
(0, Range from 0 - +inf, excluding 0
,30] Range from -inf - 30, including 30

Example specification:

/*
   @range -4.5,7.6)
*/
typedef float my_float;

/*
   @range [2,10]
*/
typedef int my_int;

Metadata annotation

@metadata [CONTEXT] [ACTION] [as NAME]

The metadata annotation specifies data that an application should extract from a TO as metadata about the TO. Typically this metadata is very small compared to the TO and is therefore suitable for use when only a summary of the TO is necessary for an operation. As of this writing, the WSS uses the annotation to automatically generate user metadata for a TO.

The metadata annotation may only be associated with structure typedef s. Metadata annotations on nested structure s are ignored.

[CONTEXT] specifies where the metadata annotation is applicable. In the case of the WSS, the [CONTEXT] is ws. [CONTEXT] is always required.

[ACTION] specifies what metadata should be extracted and any operations to perform on said metadata. At minimum, the [ACTION] must provide the path (dot separated) to the item of interest. Note that the path may only proceed through structure typedef s, not mapping s or list s. A bare path must terminate at a primitive type - either a string, int, or float.

[ACTION] s may also specify a function to apply to the item specified by the path. Currently, the only available function is length(), which may be applied to list s, mapping s, tuple s, and string s. length() returns the number of items in a list, mapping, or tuple, or the length of a string.

[as NAME] allows specifying an optional NAME for the extracted metadata. If a NAME is not provided, the application will use the [ACTION] string as the metadata name. The NAME is entirety of the remainder of the line after “as”.

Example:

/* Nested structure, metadata annotations have no effect here
   Cannot provide a path into the mapping in a metadata annotation
*/
typedef structure {
    mapping<string, string> strmap;
    int an_int;
} InnerStruct;

/*
   Specifies the metadata ("str" -> value of str in TO)
   @metadata ws str

   Specifies the metadata ("my rad string" -> value of str in TO)
   @metadata ws str as my rad string

   Specifies the metadata ("inner.an_int" -> value of inner.an_int in TO)
   @metadata ws inner.an_int

   Specifies the metadata ("length(str)" -> length of str in TO)
   @metadata ws length(str)

   Specifies the metadata ("num strings" -> # of items in inner.strmap)
   @metadata ws length(inner.strmap) as num strings

   Note that metadata paths cannot enter outerstrmap.
*/
typedef structure {
    InnerStruct inner;
    string str;
    mapping<string, string> outerstrmap;
} MyStruct;