Skip to main content
Use this guide to diagnose execution issues, performance problems, and monitoring gaps for CDF Transformations.

Operations and monitoring

Symptoms
  • Runs fail with HTTP 429 or 503 errors.
  • Multiple transformations fail in the same time window.
Cause
  • Peak concurrency or heavy workloads overload the service.
Resolution
  1. Spread transformation start times to reduce concurrency peaks.
  2. Orchestrate with Data workflows and add dependencies so jobs run only after prerequisites succeed.
  3. Reduce per‑run data volume with incremental filters such as is_new().
Prevention
  • Use workflow orchestration to balance load and avoid synchronized schedules.
  • Rebalance schedules instead of relying on retries when overload persists.
Symptoms
  • Runs fail with HTTP 408.
  • Data model queries take a long time to complete.
Cause
  • Slow queries or large joins increase runtime.
Resolution
  1. Apply incremental filters early to reduce scan volume.
  2. Review query patterns in SQL patterns and best practices.
  3. For data modeling sources, use the is_new() variant on cdf_nodes() or cdf_edges().
  4. Simplify complex joins in data modeling and test with smaller scopes first.
Prevention
  • Keep transformations scoped to one resource type and avoid wide RAW scans.
Symptoms
  • Long runs fail and restart.
  • The same job repeatedly fails after long execution time.
Cause
  • Driver restarts can occur during rollouts or for transformation hygiene. The service starts a new driver to handle new requests and gives the old driver time to finish. Long‑running jobs may fail during this transition.
Resolution
  1. Split large transformations into smaller, focused jobs.
  2. Orchestrate retries with Data workflows for automatic recovery.
  3. Use incremental processing to shorten run times.
Prevention
  • Favor incremental processing and smaller transformation scope.

Tooling limitations

Symptoms
  • Preview succeeds but full run fails.
  • Preview returns unexpected results with multi‑source joins.
  • Preview times out on long‑running transformations.
Cause
  • Preview uses sampled data from the first rows and may not exercise full joins or filters. Preview is not a full execution and does not evaluate all input data.
Resolution
  1. Validate logic with small but representative datasets.
  2. Run a full execution on a limited scope (for example, a single source or time window).
Prevention
  • Treat preview as a sanity check, not as a performance or correctness benchmark.
Symptoms
  • Runs fail with column‑not‑found errors.
  • Queries worked previously but fail after new RAW writes.
Cause
  • RAW is schema‑less. Transformations infer schema from the first 10,000 rows. If a column is not present in that sample, it is not available to SQL.
Resolution
  1. Use get_json_object for fields in semi‑structured payloads.
  2. Insert a schema row with all expected columns and sort it to the top.
Prevention
  • Keep RAW tables stable and avoid frequent schema drift in the first rows.

Logging and diagnostics

Symptoms
  • Only request IDs are visible in run history.
  • Error messages do not show root cause details.
Cause
  • Run history surfaces high‑level errors without expanded context.
Resolution
  1. Capture the full error response payload from the API or SDK.
  2. Correlate request IDs with backend logs when available.
  3. Use the expand control in run history to view detailed error messages when available.
Prevention
  • Use structured logging and store request IDs alongside transformation metadata.
Symptoms
  • Run fails with a generic internal error.
Cause
  • Transient service issues or unhandled edge cases.
Resolution
  1. Retry after reducing concurrency or data volume.
  2. If the error persists, contact Cognite Support with the transformation ID and request ID.

Further reading

Last modified on March 18, 2026