Skip to main content
Latency in distributed systems is rarely constant. When you query instances in data modeling, most requests complete quickly, but a small percentage can take significantly longer. This pattern is called latency variability.

Why it matters

Latency outliers can impact user experience and system reliability even when median latency is low. If you assume every request completes quickly, a small number of slow queries can cause retries, thread exhaustion, cascading timeouts, and noisy alerts. Designing for variability helps your application stay responsive and predictable at scale.

Where latency variability shows up

In data modeling, latency variability typically shows up when you query instances with these endpoints:
  • /models/instances/list
  • /models/instances/query (including GraphQL)
Latency variability usually shows up as a “long-tail” pattern in latency percentiles:
  • p50 (median) latency is low.
  • p90/p95 latency is higher.
  • A small percentage of requests (p99) can be much slower.
For example, you might see:
  • p50: 200 ms
  • p90: 1.5 s
  • p99: 4.5 s
This does not mean most queries are slow. It means most queries are fast, and a few outliers take longer due to query and backend execution conditions.

Why latency can vary

Latency for /list and /query depends on several factors.

Query shape and index alignment

Filters and sorting that align with defined indexes are typically much more efficient than queries that require scans or expensive sorting. To learn how indexes affect query execution, see Performance considerations.

Schema and view complexity

Wider views and more complex schemas can increase the amount of data the service needs to materialize and process. For example:
  • Views mapping many containers
  • Queries that traverse relations (direct or reverse)
  • Large property selection sets
Unused views and view versions can increase tail latency. The longest tail latencies often occur during full project schema reloads after cache expiration, when the service reconciles all views and properties. Cleaning up unused views and view versions can reduce these outliers.

Data volume and distribution

The number of instances and how values are distributed affects the cost of query execution. Two queries with the same structure can behave differently if the data distribution changes over time.

Payload size

Large responses (for example, hundreds of KB to multiple MB) increase total time due to:
  • Serialization on the server
  • Network transfer
If you only need a subset of fields, use property selectors in the select section to return only the properties you need. This reduces payload size and avoids wasteful transfer.

Backend execution conditions

Even when you run the same query repeatedly, internal conditions can change:
  • Cache state (cold vs. warm)
  • Execution plan preparation
  • Resource scheduling and load distribution
These factors can occasionally increase latency for individual requests.

What to expect

  • Occasional slow requests (outliers) are expected.
  • Outliers do not indicate service degradation if:
    • Requests complete successfully (HTTP 200).
    • You don’t see a sustained latency increase across most requests.
CDF provides availability guarantees, but not fixed per-request latency guarantees for individual endpoints.

How to design for latency variability

Design your SDKs and applications so they remain reliable when a small number of requests are slow.

Don’t assume constant response time

Avoid designs that require every call to complete within a strict time budget (for example, “always under 0.5 seconds”).

Use timeouts appropriate to context

Set client-side timeouts that match the user interaction context:
  • For interactive UX, keep timeouts shorter and show progress or partial results.
  • For background jobs, allow longer timeouts and add retries.

Further reading

Last modified on June 17, 2026