Aggregation capabilities in data models
Use the aggregation endpoint to summarize and analyze data by computing advanced metrics and bucketed distributions. For example, you can use the aggregation endpoint to generate data for dashboards and summaries, create numeric summaries, and obtain grouped statistical insights.
The aggregation endpoint supports:
-
Group by: partition or group data based on specified properties. For instance, to see the average temperature for equipment, you can group by the
equipment
property. -
Aggregation functions: compute metrics on the grouped data. This includes calculating sums, averages, minimums, maximums, counts, and histograms.
-
Filters and sorting: like the search endpoint, the aggregation endpoint supports filtering and sorting. This allows you to refine your data set before performing aggregations.
Recommendations
- Combine filters and aggregations: to minimize the volume of data to process, always apply filters before running aggregations.
- Use groupBy efficiently: restrict grouping to properties that are meaningful for your analysis. Excessive grouping can make results difficult to interpret and reduce performance.
- Understand tokenization for text queries: when searching for text, refer to the search endpoint's tokenization rules.
- Plan around indexing delays: because of indexing delays, it may take a few seconds for new or updated data to be available for aggregation.
Example aggregation request
The aggregation request below groups by equipment
and computes an average of the temperature
property.
{
"view": {
"type": "view",
"space": "testSpace",
"externalId": "Equipment",
"version": "v1"
},
"query": "temperature sensor",
"groupBy": ["equipment"],
"aggregates": [
{
"avg": {
"property": "temperature"
}
}
],
"instanceType": "node",
"properties": ["name", "description"],
"targetUnits": [],
"filter": {
"range": {
"property": "temperature",
"lt": 25,
"gt": 15
}
},
"includeTyping": false,
"limit": 100
}
Where:
view
: defines which view the query will read from. Required.aggregates
: lists one or more aggregation operations. Maximum five operations in a single request.groupBy
: is an array of property names used for grouping.query
andproperties
: combine to perform text-based matching on the specified properties (name
,description
).filter
: restricts data before aggregation. Only results withtemperature
between 15 and 25 are included.
Aggregating a single type
The examples below show different aggregations for a type called Company with these properties:
id | name | industry | revenue | head_count |
---|---|---|---|---|
2001c18... | Ryan and Friesen and Sons | Internet | 80734000 | 5 |
fe8bf42... | Pfannerstill Inc | Biotechnology | 37038000 | 50 |
6dc069... | Boyer and Kohler Group | Public Relations and Communications | 21598000 | 1 |
f23a8a... | Mohr Group | Fund-Raising | 7440000 | 56 |
d346e0... | Veum LLC | Real Estate | 5224000 | 10 |
Average head_count
across all companies
Calculate the average head_count
across every company in the dataset.
Request
{
"aggregates": [
{ "avg": { "property": "head_count" } }
],
"instanceType": "node",
"view": {
"type": "view",
"space": "Companies",
"externalId": "Company",
"version": "1"
}
}
Response
[
{
"instanceType": "node",
"aggregates": [
{
"aggregate": "avg",
"property": "head_count",
"value": 32.29912284331168
}
]
}
]
Average head_count
grouped by industry
Group companies by industry
and compute the average head_count
for each distinct industry value.
Request
{
"aggregates": [{ "avg": { "property": "head_count" } }],
"groupBy": ["industry"],
"view": {
"type": "view",
"space": "Companies",
"externalId": "Company",
"version": "1"
}
}
Response
[
{
"instanceType": "node",
"aggregates": [
{
"aggregate": "avg",
"property": "head_count",
"value": 310.941333333333333
}
],
"group": {
"industry": "Automotive"
}
},
{
"instanceType": "node",
"aggregates": [
{
"aggregate": "avg",
"property": "head_count",
"value": 39.10427807486631
}
],
"group": {
"industry": "Writing and Editing"
}
}
]
Histogram aggregation
Use the histogram
function to divide a numeric property (head_count
) into buckets. Below, each bucket has a width of 10. The result shows how many companies fall into each bucket, grouped by industry
.
Request
{
"aggregates": [
{ "histogram": { "property": "head_count", "interval": 10.0 } }
],
"groupBy": ["industry"],
"view": {
"type": "view",
"space": "Companies",
"externalId": "Company",
"version": "1"
}
}
Response (abbreviated)
{
"instanceType": "node",
"group": { "industry": "Automotive" },
"aggregates": [
{
"aggregate": "histogram",
"interval": 10.0,
"property": "head_count",
"buckets": [
{ "start": 0.0, "count": 52 },
{ "start": 10.0, "count": 65 },
{ "start": 20.0, "count": 55 },
...
]
}
]
},
{
"instanceType": "node",
"group": { "industry": "Writing and Editing" },
"aggregates": [
{
"aggregate": "histogram",
"interval": 10.0,
"property": "head_count",
"buckets": [
{ "start": 0.0, "count": 58 },
{ "start": 10.0, "count": 55 },
...
]
}
]
}
Text search within an aggregation
Similar to the search endpoint, you can combine text-based queries with aggregations. This example computes the average head_count
for companies where name
starts with "Inc".
Request
{
"query": "Inc",
"properties": ["name"],
"aggregates": [{ "avg": { "property": "head_count" } }],
"groupBy": ["industry"],
"instanceType": "node",
"view": {
"type": "view",
"space": "Companies",
"externalId": "Company",
"version": "1"
}
}
Response
[
{
"instanceType": "node",
"group": { "industry": "Farming" },
"aggregates": [
{
"aggregate": "avg",
"property": "head_count",
"value": 31.4811320754717
}
]
},
...
]
GraphQL examples
You can also perform the examples above through the GraphQL API as well:
# Average head_count for all companies:
aggregateCompany {
items {
avg {
head_count
}
}
}
# Average head_count grouped by industry:
aggregateCompany(groupBy: industry) {
items {
avg {
head_count
}
}
}
# Histogram of head_count (interval of 10)
aggregateCompany {
items {
histogram(interval: 10) {
head_count {
count
start
}
}
}
}
# Search by string ("Inc") combined with aggregation:
aggregateCompany(query: "Inc") {
items {
avg {
head_count
}
}
}