Paging workspace objects

Workspaces can contain many objects, but the list_objects command returns a maximum of 10,000 object_info data structures (referred to as objects from now on) at a time. This section of the documentation discusses how to handle the case where access to large amounts of data is required.

If it is reasonable to expect that a list_objects command will return no more than 10,000 objects then paging doesn’t need to be considered. For example if the command only encompasses one workspace, only the most recent version of each object is required, and the max_objid field in the workspace information returned by various API methods (for example get_workspace_info) is no more than 10,000, then a single list_objects command can return all the data requested. See Workspaces for more information about getting information about workspaces and List objects for information about the list_objects command.

There are currently two ways to page through workspace data - via object ID limits, and via reference limit.

Sort order

When paging via the two methods mentioned above, the objects are returned in a specific sort order (but see the notes about ill-advised list_objects parameters in Paging by object ID limits):

  1. the workspace ID, ascending
  2. the object ID, ascending
  3. the version, descending

This makes it easy to determine how many versions an object has the first time it is encountered in the list of returned data.

Object ID versus reference limits

Paging by object ID limits works well for paging through data in a single workspace where only the most recent version of each object is required. Its advantage over reference-based paging is that any range of object IDs can be accessed at any time, whereas in reference-based paging each set of objects returned depends on the reference of the last object in the prior set.

Paging by reference works well for any number of workspaces and when all object versions are required. The drawback is that data from the prior set of objects is required to get the next set.

Paging by object ID limits

To page by object ID limits, provide the minObjectID and maxObjectID parameters to limit the object IDs, and thus the number of objects, returned in the result. Both parameters are inclusive. maxObjectID - minObjectID should always be <= 10,000. Setting a smaller limit can be done by specifying object ID parameters with a smaller difference.

Note that by default, hidden and deleted objects will not be returned in the set of objects. For this reason, it is important to specify both a minimum and maximum object ID when incrementing the minimum object ID by a set amount - otherwise there may be duplicated objects in subsequent list_objects calls.

Specifying that all object versions should be included in the results is not recommended, as the limit may be exceeded, and any excess objects will not be returned in the set.

Several parameters should be avoided when using the list_objects command to page through data with object ID limits:

  • Any of the time stamps (after, before, after_epoch, and before_epoch)
  • savedby
  • meta

Depending on the parameter set and the data to be listed, including any of these parameters may slow down the data lookup substantially and may cause the data not to be sorted.

The following example shows how to page through a large workspace <= 10,000 objects at a time:

In [3]: max_id = ws.get_workspace_info({'id': 19217})[4]

In [4]: max_id
Out[4]: 606146

In [5]: for i in range(0, max_id, 10000):
   ...:     objs = ws.list_objects({
   ...:         'ids': [19217],
   ...:         'minObjectID': i + 1,
   ...:         'maxObjectID': i + 10000}
   ...:     )
   ...:     # do something with objs

If the result is sorted, an alternative is to omit the maximum object ID and generate the minimum ID for each loop from the maximum object ID in the data from the prior loop. This may result in fewer overall list_objects calls, but has the disadvantage that getting each data set depends on the prior set.

Paging by reference limit

To page by a reference limit, provide the startafter parameter and optionally a limit if less than 10,000 objects should be returned at a time. Paging with a reference limit is useful for paging through multiple workspaces or object versions. The startafter parameter looks like a workspace object reference of the form X/Y/Z where X is the integer workspace ID, Y the integer object ID, and Z the version. The version may be omitted, and and object ID may be omitted if the version is omitted. To page through data, start by submitting only the smallest workspace ID from the set of workspaces submitted to the list_objects command and the object ID of the starting object of interest (often 1). Construct further startafter parameters from the last object in the list of returned objects.

The following list_objects parameters are incompatible with the startafter parameter and will cause an error to be returned if used:

  • Any of the time stamps (after, before, after_epoch, and before_epoch)
  • savedby
  • meta
  • object ID limits (minObjectID and maxObjectID)

The following example shows how the paging proceeds, using a very small limit for clarity:

In [5]: def calc_startafter(objects):
   ...:     if not objects:
   ...:         return None
   ...:     o = objects[-1]
   ...:     return '/'.join([str(o[6]), str(o[0]), str(o[4])])

In [6]: startafter = '1/1'

In [7]: while startafter:
   ...:     objs = ws.list_objects({
   ...:         'ids': [1, 2],
   ...:         'startafter': startafter,
   ...:         'limit': 3,
   ...:         'showAllVersions': 1
   ...:     })
   ...:     for o in objs:
   ...:         # do something more meaningful here
   ...:         print(f'{o[6]}/{o[0]}/{o[4]}')
   ...:     startafter = calc_startafter(objs)
   ...:     print(f'startafter: {startafter}')
startafter: 1/1/1
startafter: 2/1/2
startafter: 2/1/1
startafter: None

The last call to the workspace could be eliminated by checking that the number of objects returned is less than limit.