Handling latency variability in query APIs

Latency in distributed systems is rarely constant. When you query instances in data modeling, most requests complete quickly, but a small percentage can take significantly longer. This pattern is called latency variability.

Why it matters

Latency outliers can impact user experience and system reliability even when median latency is low. If you assume every request completes quickly, a small number of slow queries can cause retries, thread exhaustion, cascading timeouts, and noisy alerts. Designing for variability helps your application stay responsive and predictable at scale.

Where latency variability shows up

In data modeling, latency variability typically shows up when you query instances with these endpoints:

/models/instances/list
/models/instances/query (including GraphQL)

Latency variability usually shows up as a “long-tail” pattern in latency percentiles:

p50 (median) latency is low.
p90/p95 latency is higher.
A small percentage of requests (p99) can be much slower.

For example, you might see:

p50: 200 ms
p90: 1.5 s
p99: 4.5 s

This does not mean most queries are slow. It means most queries are fast, and a few outliers take longer due to query and backend execution conditions.

Why latency can vary

Latency for /list and /query depends on several factors.

Query shape and index alignment

Filters and sorting that align with defined indexes are typically much more efficient than queries that require scans or expensive sorting. To learn how indexes affect query execution, see Performance considerations.

Schema and view complexity

Wider views and more complex schemas can increase the amount of data the service needs to materialize and process. For example:

Views mapping many containers
Queries that traverse relations (direct or reverse)
Large property selection sets

Unused views and view versions can increase tail latency. The longest tail latencies often occur during full project schema reloads after cache expiration, when the service reconciles all views and properties. Cleaning up unused views and view versions can reduce these outliers.

Data volume and distribution

The number of instances and how values are distributed affects the cost of query execution. Two queries with the same structure can behave differently if the data distribution changes over time.

Payload size

Large responses (for example, hundreds of KB to multiple MB) increase total time due to:

Serialization on the server
Network transfer

If you only need a subset of fields, use property selectors in the select section to return only the properties you need. This reduces payload size and avoids wasteful transfer.

Backend execution conditions

Even when you run the same query repeatedly, internal conditions can change:

Cache state (cold vs. warm)
Execution plan preparation
Resource scheduling and load distribution

These factors can occasionally increase latency for individual requests.

What to expect

Occasional slow requests (outliers) are expected.
Outliers do not indicate service degradation if:
- Requests complete successfully (HTTP 200).
- You don’t see a sustained latency increase across most requests.

CDF provides availability guarantees, but not fixed per-request latency guarantees for individual endpoints.

How to design for latency variability

Design your SDKs and applications so they remain reliable when a small number of requests are slow.

Don’t assume constant response time

Avoid designs that require every call to complete within a strict time budget (for example, “always under 0.5 seconds”).

Use timeouts appropriate to context

Set client-side timeouts that match the user interaction context:

For interactive UX, keep timeouts shorter and show progress or partial results.
For background jobs, allow longer timeouts and add retries.

Data engineering

Handling latency variability in query APIs

Why it matters

Where latency variability shows up

Why latency can vary

Query shape and index alignment

Schema and view complexity

Data volume and distribution

Payload size

Backend execution conditions

What to expect

How to design for latency variability

Don’t assume constant response time

Use timeouts appropriate to context

Further reading

​Why it matters

​Where latency variability shows up

​Why latency can vary

​Query shape and index alignment

​Schema and view complexity

​Data volume and distribution

​Payload size

​Backend execution conditions

​What to expect

​How to design for latency variability

​Don’t assume constant response time

​Use timeouts appropriate to context

​Further reading

Why it matters

Where latency variability shows up

Why latency can vary

Query shape and index alignment

Schema and view complexity

Data volume and distribution

Payload size

Backend execution conditions

What to expect

How to design for latency variability

Don’t assume constant response time

Use timeouts appropriate to context

Further reading