Complete reference for recommended Cognite Data Fusion (CDF) resource naming conventions, including external IDs, display names, token grammars, and validation patterns.
Consistent naming makes Cognite Data Fusion (CDF) resources easier to find, govern, and automate. This reference defines Cognite’s recommended conventions for external IDs (machine-readable identifiers) and display names (human-readable labels) across building blocks, core resource types, and data modeling resources.Use it when you create resources through the CDF API, SDKs, the CDF portal application, or deployment tooling such as the Cognite Toolkit. This reference starts with the three patterns that cover every resource, then gives you resource-specific tables to use as a lookup.
These are Cognite’s recommended naming conventions and provide a consistent baseline. Actual conventions can vary based on customer-specific requirements and existing standards. Where a customer or project already has documented naming conventions, follow those and record any deviations in project governance documentation.
This reference governs naming for artifacts created during a CDF deployment — both resource definitions (schema) and instances (data). It covers three resource families:
CDF building blocks — extraction pipelines, hosted extractors, transformations, functions, data workflows, CDF RAW databases, data sets, entity matching, access groups, service principals, and related deployment resources
Core resource types — time series, streams, records, 3D models and revisions, 3D scenes, location filters, and files
Data modeling resources — data models, spaces, views, containers, edges, instances, and properties
Out of scope:
Legacy CDF asset-centric data model types unless mapped to the new framework
Code repository directory structure and Cognite Toolkit module names (governed by separate standards)
You need administrative or write access to your CDF deployment to create and manage resources. The automation example later in this reference assumes familiarity with Cognite Toolkit configuration and variable substitution. A git-based workflow is recommended when you manage resource definitions as code.
Every CDF external ID follows one of three patterns. Identify which pattern your resource uses, then look up its exact token order in the resource tables further down.
Pattern
Shape
Example
Used by
Prefixed
prefix_token_token_token
ep_files_valhall_sharepoint
Most building blocks, core resources
Source-to-target
prefix_source_location_to_target
tr_sap_valhall_to_cdm
Transformations, mapping workflows
Persona-led
persona_[scope]_environment
producer_pp_dev
Access groups, service principals
A name is an ordered sequence of tokens — short codes such as valhall, sap, or files — running from general to specific. The prefix tells you the resource type (ep = extraction pipeline); the remaining tokens add location, source, and intent.To name a resource:
Find its row in the relevant resource table.
Assemble the external ID from approved tokens, in the order the grammar column specifies.
Expand those tokens into a sentence-case display name (ep_files_valhall_sharepoint → Valhall SharePoint file extraction).
Tokens come from your approved list — a maintained, machine-readable set of valid codes for locations, sources, personas, types, and segments (one enum per token type; see layer 2b). Agreeing on this list up front prevents variation sprawl (ny, newyork, nyc all meaning the same site). If an artifact applies to all locations, use all (for example, ep_all_sap_assets) to keep parsing consistent.
These rules apply across all three patterns. Resource families that deviate (data models, properties) note the deviation in their own tables.
Rule
External ID
Display name
Charset
Lowercase letters, digits, and underscore (a-z, 0-9, _) only. Variable-name-safe.
English words, space-separated.
Casing
snake_case, unless a resource family specifies otherwise (PascalCase for data models and views; camelCase for properties; kebab-case for CDF projects).
Sentence case (for example, Valhall SAP maintenance extraction).
Language
English, singular nouns, present-tense verbs.
Same as external ID.
Separators
Underscore (_) between tokens. The _to_ connector is the only allowed connector word, used only in source-to-target resources.
Spaces between words. Do not use colons, hyphens, or underscores as token separators.
Length
Maximum 255 characters (external IDs). No null bytes.
Same 255-character limit where the API enforces it.
Environment
Environment-agnostic. Use the same ID in dev, test, and prod.
Same.
Tokens
Use approved values from your list. Reuse the same token for the same thing everywhere (description, never desc then comment).
Expand tokens into readable phrases.
Environment tokens (dev, test, prod) appear only in CDF project resource names (am-long-eu-no-oslo-dev) and persona-led names (producer_pp_dev). Never embed them in building-block external IDs such as extraction pipelines, transformations, or data sets.
Ambiguity and generic naming: Avoid my_script, test_pipeline, data_loader, or transformation_1. These provide no context for scope, ownership, or function.
Casing chaos: Identifiers that differ only by capitalization are prohibited (pumpid vs PumpID causes silent data loss).
Unsafe characters: Do not use spaces, special characters (&, %, /, \), or non-ASCII characters in external IDs.
Implicit context: Do not omit location or source because “everyone knows the deployment.” A pipeline named sap_assets collides when a second SAP site is added.
Hardcoded randomness: Random GUIDs sacrifice debuggability. Prefer semantic naming for operational awareness.
Environment in resource IDs: Do not embed dev, test, or prod in building-block external IDs.
Type prefix on instances: Data instances take no type prefix. A pump instance is P-101, not asset_P-101.
If your deployment already uses a documented convention that differs from this guide, follow that convention and record the deviation in project governance documentation. Treat deviations as informational, not errors.
Building blocks are operational resources you configure to ingest, transform, and orchestrate data. They use type prefixes on external IDs for filtering and sorting in the CDF portal application and API. Access groups and service principals are exceptions — they use the persona-led pattern. Display names do not include the type prefix.
Core resources store and expose industrial data — often ingested from source systems. The naming strategy prioritizes traceability.
Resource type
Token grammar
Example ID
Example name
Casing
Regex (charset gate)
Time series
{source}_{location}_{original_id} or preserve source ID
pi_valhall_2342
PI valhall 2342
snake_case
^[a-z0-9_]+$
Stream
{source}_{location}_{concept}
kafka_valhall_pi_pressure
Kafka valhall PI pressure
snake_case
^[a-z0-9_]+$
Record
{source}_{location}_{machine}_{id}
sap_valhall_rec_001
SAP valhall rec 001
snake_case
^[a-z0-9_]+$
3D model
N/A (system-generated)
N/A
ValhallModel
PascalCase
N/A
3D scene
{tokens}
scene_deck_a
Scene deck A
snake_case
^[a-z0-9_]+$
Location filter
loc_{location}
loc_valhall
Valhall location filter
snake_case
^loc_[a-z0-9_]+$
File
Preserve source ID or {source}_{location}_{original_id}
Source-defined
Source-defined
snake_case when generated
^[a-z0-9_]+$ when generated
Source ID preservation (time series and files). Time series and files often arrive with IDs from source systems. Decide as follows:
Preserve as-is when the source ID is already variable-name-safe (alphanumeric and underscore only) and unique within the deployment.
Prefix for uniqueness when the bare source ID could collide across sites or systems: {source}_{location}_{original_id} (for example, pi_valhall_2342).
Sanitize only when required — replace unsafe characters (spaces, /, %, non-ASCII) with underscore, trim leading and trailing underscores, and record the mapping in the resource metadata when traceability matters.
Prefer traceability over strict conformance when sanitization would break a known source-system reference.
Data modeling resources define the schema and instances in your knowledge graph. They use strict capitalization and layer suffixes to distinguish source, domain, and solution models. See Designing scalable data models for the layered architecture these abbreviations map to.
Property descriptions should follow AI-friendly description practices.Layer abbreviations:src (source), dom (domain/enterprise), sol (solution), inst (instances). The dm_ prefix identifies a space; the data model itself carries the layer as a _SRC/_DOM/_SOL suffix, not a prefix. See spaces and instances.
Building-block or scope token; use all for broad grants
ep, pp, ep_pi, all
environment
Yes
dev, test, prod
dev
The examples below show access group and service principal names built from these tokens.
Example
Resource
Meaning
producer_pp_dev
Access group
Producer access to processing pipelines in dev
consumer_valhall_all_prod
Access group
Consumer access to everything at Valhall in production
admin_all_prod
Access group
Admin access to everything in production
producer_eu_no_valhall_ep_dev
Access group
Producer access to extraction pipelines at Valhall, Norway, EU region, dev
producer_valhall_ep_pi_dev
Service principal
Service principal for a PI extraction pipeline at Valhall
producer_oid_pp_dev
Service principal
Service principal for processing pipelines at OID in dev
Apply the same pattern to identity provider (IdP) access groups and CDF access groups (CDF groups mirror the IdP name).
Alternative schemes — ACL-based, data-domain-based, or source-system-based — are valid when they better match your identity provider. Document the chosen scheme in project governance docs.
Security categories: If you use security categories for fine-grained access on time series and files, name them with an sc_ prefix mirroring the persona-led pattern (for example, sc_producer_pp_dev). Security categories apply only to time series and files — not to data modeling instances linked via instanceId. See Security categories.
If you manage resources as code, the Cognite Toolkit produces correct external IDs by construction through variable substitution. Define shared tokens once in config.yaml and reference them in resource YAML files to assemble every name from the same approved values:
# config.yamlvariables: location: valhall source: sap prefix_ep: ep prefix_db: raw
The same approach works with SDK scripts or API calls — centralize token values and assemble external IDs programmatically. This eliminates most manual errors, but names that bypass the Cognite Toolkit (hand-edited YAML, ad-hoc API calls) still need the gates below.
The regex column in each resource table validates prefix, charset, and casing only. It deliberately does not validate token values, order, or count — a trailing [a-z0-9_]+ group accepts any underscore-separated tokens. Treat regex as a fast structural filter, not a correctness guarantee.Implement it in CI by scanning resource configuration files for externalId fields and rejecting pull requests that fail the hard-gate regex.
This is a recommended capability. The rules below describe what a token-validation step should assert; your team implements it as a small CI script. Until then, these checks fall back to review (layer 3).
Shape regex alone lets ep_files_newyork_sharepoint (unapproved location), ep_sharepoint_files_valhall (wrong order), and ep_a_b_c_d (wrong count) pass. The token check closes that gap by validating against the approved-list enum rather than a pattern.Maintain the approved list as machine-readable enums, one entry per token type, as the single source of truth for authors and CI:
A token check validates each externalId against its resource grammar: every token must exist in its type’s enum, and the token count must match the grammar. Any miss fails the build. New tokens are added only by a reviewed change to approved_tokens.yaml.
With token membership, order, and count enforced at layer 2b, review focuses on what machines cannot judge: is the display name clear, and does a proposed new token belong in the enum? Use this checklist for the remaining human-judgment items.
Category
Checklist item
Names
Are display names sentence case, space-separated, English, singular, and present tense?
Names
Is the name clear and unambiguous to someone outside the authoring team?
Source IDs
For time series and files, was the preserve/sanitize decision documented when non-conformant?
New tokens
Does each new token proposed for approved_tokens.yaml deserve to be added, and is it distinct from existing tokens?
Exceptions should be rare and documented. If a legacy or brownfield source system requires a naming format that violates the standard, document the deviation in the resource description or metadata field in CDF.