Use incremental processing with is_new()
Useis_new() to process only changed data. This reduces scan volume and transformation runtime. Apply change detection as close to the source as possible.
For patterns and syntax, see Start with incremental filters and the is_new function reference.
SQL query performance
For SQL optimization patterns including avoiding double reads, using efficient joins, handling wide RAW tables, and managing schema inference, see SQL patterns and best practices.Observability and load management
Use workflow orchestration to distribute load and avoid peak concurrency. If you see repeated 503 errors, rebalance schedules or dependencies instead of adding retries.Transformations have a concurrency limit of 10 parallel jobs per project. Plan schedules and workflow dependencies with this limit in mind.
Avoid RAW anti-patterns
RAW is optimized for staging, not repeated read-modify-write cycles. Avoid these patterns:- Writing updates back to RAW from transformations.
- Creating large multipurpose RAW tables used by many transformations.
- Designing RAW tables wider than what downstream transformations need.
Keep consumers off RAW
Data consumers should read from curated targets, not RAW. RAW has no schema guarantees, so direct consumption risks breaking downstream clients. Use transformations to write to data models or resource types that match consumer needs.Use Files and Functions for heavy data
RAW has limits on row and column size and is not suited for very large payloads or highly sparse tables. Instead:- Store large payloads as Files and reference them from metadata.
- Process files with CDF Functions for scalable, parallel processing.
- Orchestrate Functions and Transformations together using Data workflows.