Skip to main content

Querying a graph

The query interface is split into these endpoints:

  • /query - graph query to retrieve instances matching a query.
  • /sync - graph query to retrieve instances matching a query that have changed since the previously issued cursor.
  • /search - retrieve instances matching full-text search queries.
  • /aggregate - aggregate instance data.

A few higher level query endpoints, like /byids and /list, provide simpler interfaces building on the /query endpoint.

The query interface supports a wide array of features.

How to define a graph query

You can use a query in both the /query and the /sync endpoints.

Queries are composed of:

  • A with section defining result set expressions that describe which instances to retrieve.
  • A parameters section with optional parameter substitutions if the query is parameterized.
  • A select section that defines which properties to return as part of the result.

The below examples use a pump and valve example, represented as two sets of nodes with edges between them indicating which pumps flow to which valves.

This query fetches a specific pump as well as the valves it flows to:

with: # Define the result set expressions
pumps:
nodes:
filter:
equals:
property: ['node', 'externalId']
value: { 'parameter': 'pumpExternalId' }
limit: 1
pump_flows_to_valves:
edges:
from: pumps
maxDistance: 1
direction: outwards
filter:
equals:
property: ['edge', 'type']
value: { 'space': 'types', 'externalId': 'flows-to' }
valves:
nodes:
from: pump_flows_to_valves
parameters: # Provide parameter values
pumpExternalId: pump42
select: # Define the result sets to return
pumps: {}
valves: {}

The query returns a result similar to this:

items:
pumps:
- instanceType: node
version: 1
space: equipment
externalId: pump42
createdTime: 123
lastUpdatedTime: 456
valves:
- instanceType: node
version: 2
space: equipment
externalId: valve1
createdTime: 321
lastUpdatedTime: 654
- instanceType: node
version: 3
space: equipment
externalId: valve2
createdTime: 213
lastUpdatedTime: 546
nextCursor:
pumps: cursorForPumps
valves: cursorForValves

Result set expressions

Result set expressions appear directly below with in a query, and define a set of either nodes or edges. You can use the set to return results, as stepping stones to derive other sets from, or both. Result set expressions are named and can be chained.

A result set expression can also define sort order and a limit. See sorting for more details.

Result set expressions can relate to each other via chaining, but they don't have to. You can query for unrelated items in the same query, but you'll generally use different sets to power graph traversals.

A set either queries nodes or edges, possibly recursively.

All fields:

  • nodes: An object to specify a result set of matching nodes.
  • edges: An object to specify a result set of matching edges.
  • sort: A list of sort configurations.
  • limit: How many nodes or edges to return in the result. Default: 100.

Pagination

The max limit you can set for any result set expression is 10,000. To retrieve the entire result set, queries accept a cursor object mapping result sets to cursors. Pagination cursors in the nextCursor object are emitted for each result set, allowing you to page through everything.

You can't combine pagination cursors with custom sorts, unless they are backed by a cursorable index. If no pagination cursor is present for a given result set expression in the response, there is no more matching data.

caution

We do not recommend paging through result sets repeatedly using the /query endpoint. Instead, use sync, which is designed to keep clients up to date with changes.

Node result set expressions

A nodes statement in your result set expression makes the set contain nodes.

A node result set can be chained off both node and edge result set expressions.

When chaining off another node result set, you'll retrieve the nodes pointed to by a direct relation property. The direct relation property is defined using the through field.

When chaining off an edge result set, you'll retrieve the end nodes defined by the edges in the set.

  • from: a different result set expression to chain from.

  • through: when from is a node result set expression, through defines which direct relation property we traverse.

  • direction: defines which direction we traverse the direct relation property defined in through.

  • chainTo: control which side of the edge to chain to. This option only applies if the view referenced in the from field consists of edges. chainTo can be one of:

    • source chains to start if you're following edges outwards, direction=outwards. If you're following edges inwards, direction=inwards, it chains to end.

    • destination (default) chains to end if you're following edges outwards, direction=outwards. If you're following edges inwards, direction=inwards, it chains to start.

  • filter: A filter to determine which nodes to match and return in the result set.

Edge result set expressions

An edges statement in a result set expression makes the set contain edges, and the statement defines the rules the graph traversal will follow.

A graph traversal can start from an initial set, defined by from, which names another result set expression.

The graph traversal follows edges in a particular direction, controlled by direction (Default: outwards.)

    Alice -is_parent-> Bob
Bob -fancies-> Mallory

In the above graph, if you follow any edge from Bob outwards (the default), you'll get the edge Bob -fancies-> Mallory.

If you follow edges inwards, direction=inwards, you'll get Alice -is-parent-> Bob.

The traversal happens breadth-first. See limitations for more details.

A traversal is defined by what edges to follow, what nodes to match, and what nodes to terminate the traversal at. This is controlled by filter, nodeFilter and terminationFilter:

  • filter is a filter on edges. You would typically filter on the property [edge, type], but you can filter on any property on an edge.

  • nodeFilter is a node filter, which the node on the "other" side must match.

    • With direction: outwards, that means the "end node" of the edge must match.

    • With direction: inwards, the "start node" must match.

  • terminationFilter is similar to nodeFilter, except if it matches, the traversal ends. A node must also match nodeFilter (if any) to steer the traversal to the node to terminate at in the first place.

    maxDistance controls how many hops away from the initial set traversal will go. maxDistance defaults to unlimited (but the set must respect its limit, defined on the result set expression). If maxDistance is 1, execution might be faster. If you know there will only be one level, use maxDistance: 1.

Full options:

Option
fromResult set expression to chain from.
filterEdges traversed must match this filter.
nodeFilterNodes on the "other" side of the edge must match this filter.
terminationFilterDo not traverse beyond nodes matching this filter.
maxDistanceHow many levels to traverse. Default unlimited.
directionWhether to traverse edges pointing out of the initial set, or into the initial set.
limitEachLimit the number of returned edges for each of the source nodes in the result set.

The indicated uniform limit applies to the result set from the referenced from. limitEach only has meaning when you also specify maxDistance=1 and from.

Selects

Select configurations appear directly below select in a query and specify which data to retrieve for the respective result set expression. The configuration specifies a number of sources (views) and a property selector for each. The property selectors define which view properties to emit in the query result.

You can have sets whose properties are not emitted. This is useful if the sets are necessary for chaining, but can be excluded in the final results. Sets that are neither chained nor selected are not be executed, but they'll cause a slight query processing overhead.

The results are grouped by their respective sets and contain properties that match the property selectors for the set.

Filters

Filters define what a part of the query matches. Filters are tree structures where the operator comes first and then the parameters for that operator.

A simple example is the in filter:

in:
property: [node, name]
values: [movie]

If the property node.name, which is text property, is equal to any of the values in the provided list, the node will match. Properties are typed, and what query operators you can use on a property depends on its type.

The supported filters are:

  • equals
  • in
  • containsAny
  • containsAll
  • range
  • exists
  • prefix
  • matchAll
  • nested
  • overlaps
  • hasData

Compound filters

You can combine filters with and, or, not:

and:
- not:
in:
property: ['node', 'type']
values: ['movie']
- range:
property: ['imdb', 'movie', 'released']
gte: { parameter: start }

This corresponds to (NOT node.type in ('movie')) AND imdb.movie.released >= $start.

HasData filter

A hasData filters matches if data is present in a set of containers or views.

There is an implicit AND between the containers and views referenced in the filter, and the filter matches only if the node or edge has data in all the specified containers and views.

When you specify a container, the filter matches if the instance has all required properties populated for the container.

When you specify a view, the filter matches nodes with data in all the containers that the view references through properties, respecting the filters of the view if they are defined (and the filters of views implemented by the view).

Example:

hasData:
- type: container
space: my_space
externalId: my_container
- type: view
space: my_space
externalId: my_view
version: v1

If my_space.my_view.v1 maps properties in the containers my_space.c1 and my_space.c2, the filter matches if there is data in my_space.my_container AND (my_space.c1 AND my_space.c2) if there is no filter defined on my_space.my_view.v1 and my_space.my_container AND my_space.my_view.v1.filter if there is a filter defined on my_space.my_view.v1.

Parameters

You can use parameters for values in filters. The parameters are provided as part of the query object, not in the filter itself.

This filter is parameterized:

range:
property: [imdb, movie, released]
gte: { parameter: start }

A query containing this filter will only run if you provide the start parameter. The parameter must be compatible with all the types and operators that refer to the parameter. In the above example, the "released" property is a date, and the start parameter must be compatible with the date type. Otherwise, the query will fail, even if the range filter is optional because it's OR-.ed

tip

We recommend that you parameterize queries that take user input. This allows you to reuse query plans across queries with a noticable effect on read-heavy workloads.

Sorting and limiting

Sorting and limiting can happen in different places in a query:

  • In the result set expression, that is, in the with object that defines a node or edge set.
  • In the result selection, that is, under select where you can define sets to emit as results.

Sorting and limiting the set definitions under with transitively affects dependent sets. If you only change the sort order of a with expression, dependent sets will not (necessarily) change (based on how the dependent sets are defined). If you, however, put a limit on an expression, all dependent sets will inherit the limit and consequently change. This is also true for sets that aren't emitted via select, that is, for sets that are only defined as stepping-stones for other sets.

Sorts and limits defined under select change the result appearing order for that set only, not for dependant sets.

This example query would let some_edges and target_nodes pull from the full amount of nodes in some_nodes, even if what's returned as a result for some_nodes is capped at 100:

with:
some_nodes:# Omitted. No sorting here.
some_edges:
from: some_nodes
# Omitted. Also no sorting.
target_nodes:
from: some_edges
# …
select:
some_nodes:
sources: ...
sort:
- property: ['node', 'space']
direction: descending
limit: 100

Order of sorting and limiting

note

A limit in an edge traversal applies to when to start traversing, which happens before sorting.

Nodes and edges have different sorting and limiting behavior: Nodes sort and limit simultaneously, while recursive edge exploration do limited traversal then sort.

The top-n set of nodes sorted by some property will be guaranteed to have the top-n of that property for the set.

For edges found through traversal, that is, via edges, the limit applies to how many edges to discover. This may not be all the edges that could be discovered in a full traversal. If you start traversing from some node and ask for 100 edges sorted by creation timestamp, the 100 edges discovered before the traversal stops get sorted. The full graph is not traversed to find the 100 newest edges in the graph defined by the traversal filters.

To sort with a recursive graph traversal, you'll need to specify the sort configuration via postSort.

An edge traversal with maxDistance=1 can take a standard sort configuration.

Syncing - subscribing to changes

The /sync endpoint lets you subscribe to changes on instances, matching an arbitrary filter. The interface for the /sync endpoint is largely very similar to the /query endpoint but differs in a few ways:

  • It always returns a value for nextCursor.

  • It returns instances that have changed since the provided cursor.

  • It returns soft-deleted instances until they have been hard-deleted. These will have a deletedTime property set to a non-null timestamp value. Read more about soft deletion here.

This endpoint is helpful to avoid continuously pulling a full set of instances and putting unnecessary load on the system. Instead, you can use the /sync endpoint to subscribe to changes. You'll pull everything once and then keep that state updated incrementally. The Python SDK documentation has an example of how to do this.

The endpoint also enables you to sync a subgraph into memory and then use libraries like NetworkX to perform various specialized graph analysis tasks on live data.

You can also use the endpoint to keep specialized dashboards up to date by syncing out all data about a data model.

Limitations

Search and aggregation endpoints are eventually consistent

Any query that involves search or aggregation is subject to eventual consistency. It may take a few seconds before changes to the data are reflected in the endpoints.

Graph traversal

Any query that involves a graph traversal will force nested loop-style execution. This will work well enough for traversals limited to a few hundred thousand unique paths.

The graph traversal is breadth-first, and all possible paths are traversed. This is especially important to keep in mind with traversals across loops. For example, a query that follows all the possible paths of a fully connected graph will likely be terminated due to constraints on either time, memory, or temporary disk usage.

Timeout errors

Queries get canceled with a 408 Request Timeout error if they take longer than the timeout. If you hit a timeout like this, you must reduce the load or contention or optimize your query.

Sync one space at a time

We recommend that you sync only one space at a time. More granular filters may result in poor performance.

Examples

Simple retrieval by node.type

This is an example query to retrieve all nodes of type Pump, and all the properties on the Pump/v1 view.

with:
pumps:
nodes:
filter:
equals:
property: ['node', 'type']
value: { 'space': 'types', 'externalId': 'pump' }
select:
pumps:
sources:
- source:
type: view
space: equipment
externalId: Pump
version: v1
properties: ['*'] # All properties

Retrieve everything in a space

This is an example query to retrieve all nodes in the equipment space and retrieve the manufacturer property on the Equipment/v1 view. This query is parameterized and can be used for both the /query and the /sync endpoints.

with:
pumps:
nodes:
filter:
equals:
property: ['node', 'space']
value: { 'parameter': 'space' }
select:
pumps:
sources:
- source:
type: view
space: equipment
externalId: Equipment
version: v1
properties: [manufacturer] # Only the manufacturer property
parameters:
space: equipment

This query retrieves all work orders related to the set of assets with external IDs asset1, asset2, and asset3. It traverses edges of the type relates-to that point from work orders to assets.

with:
assets:
nodes:
filter:
and:
- equals:
property: ['node', 'space']
value: 'assets'
- in:
property: ['node', 'externalId']
values: ['asset1', 'asset2', 'asset3']
assets_relate_to_workorders:
edges:
from: assets # chain off the assets result expression
direction: inwards # the edges point from the workorders to the assets
filter:
equals:
property: ['edge', 'type']
value: 'relates-to'
workorders:
nodes:
from: assets_relate_to_workorders
filter: # only get active workorders
equals:
property: ['workorders', '', 'status']
value: 'active'
select:
workorders: # only select the workorders
sources:
- source:
type: view
space: workorders
externalId: WorkOrder
version: v1
properties: ['*'] # get all the properties

Traverse direct relations to retrieve the locations of a set of sites

Consider a scenario where there are two sets of nodes representing sites and locations, respectively. The sites are connected to the locations using a direct relation property. This query retrieves the set of location nodes related to a set of sites.

note

To traverse direct relation properties performantly, make sure that the property is associated with a b-tree index.

with:
sites:
nodes:
filter:
and:
- hasData:
- type: container
space: sites
externalId: Site
- in:
property: ['node', 'externalId']
values: ['site1', 'site2', 'site3']
locations:
nodes:
from: sites
direction: outwards
through:
source: # The view mapping the direct relation property to traverse
type: view
space: sites
externalId: Site
version: v1
identifier: location # the identifier of the direct relation property to traverse
select:
locations:
sources:
- source:
type: view
space: locations
externalId: Location
version: v1
properties: ['*']