Populate a data model (5 mins)
This article describes how to populate a data model from the data model page, with CDF Transformations or from the command line with the Transformations CLI.
Prerequisites
Before you populate the data model, make sure you've completed the steps in these unit to create transformations to ingest data from the CDF staging area (RAW) into a data model:
Populate a data model from the data model page
Follow the steps in this section to populate the data model from the data model page.
We will populate instances of the type Actor
and Movie
from the data model we created earlier but also relationships to populate the list of actors a movie has.
Step 1: Populate the Actor, Director, and Movie fields
Navigate to Cognite Data Fusion and select Explore and build > Data modeling and select the data model you want to populate.
On the Data management tab, select
Actor
, and then Populate in bulk.Confirm the suggested transformation name and external ID. You can change both values later.
In the SQL editor, switch to the mapping editor to map the source and target columns.
Select Change source, then select the
actors
table, and select Apply selection.Enter the mappings, and select Preview to see confirm that the mappings are correct.
tipIf you're unsure about the mappings, select View suggestions to get help. This is especially useful for larger data types with many attributes.
Select Run > Run as current user.
When the transformation has completed, return to the data modeling page to see the populated data. You may need to refresh the page to see the data.
Repeat the steps above to populate
Director
from thedirectors
table.The next step is to map and populate
Movie
from themovies
table. In the Mapping editor, use the following mappings, or the SQL code below the image:
SQL for ingesting Movie
select
cast(`id` as STRING) as externalId,
cast(`Movie_Name` as STRING) as name,
cast(`Description` as STRING) as description,
cast(`Run_time` as INT) as runTime,
cast(`Gross_USD` as INT) as gross,
node_reference(`Director`) as director,
cast(`IMDB_Rating` as DOUBLE) as imdbRating,
cast(`Released_Year` as INT) as releasedYear
from `Movies`.`movies`;
Step 2: Populate the Movie.actors-[Actor] relationship
For many-to-many relationships, like Movie.actors ->[Actor]
, use a separate transformation.
Navigate to Data model > Data management, and select Movie.
Select Populate in bulk > Create but this time choose Load relationship and the (
Movie.actors
) relationship, and select Next.In the SQL editor, switch to the mapping editor to map the source and target columns.
Select Change source, then select the
actor_movie_appearances
table, and select Apply selection.Use the dropdown menus to map the values manually:
Select Preview, and then Run > Run as current user.
On the Data management page, notice that the actors column for Movie has multiple actors.
You've have now ingested actors, directors, and movies, and defined the actors' movie appearances as relationships (edges).
In the next unit, we'll look at how you can query the data.
SQL for ingesting Movie.actors -> [Actor]
select
cast(`id` as STRING) as externalId,
node_reference(`movie`) as startNode,
node_reference(`actor`) as endNode
from
`Movies`.`actor_movie_appearances`;
Populate a data model with CDF Transformations
You can create transformations directly from CDF Transformations page.
As in the example above, you can populate instances of the type Actor
, Director
, and Movie
from the data model you created earlier. You can also use relationships to populate a list of the actors in each movie.
To work with transformations, follow the steps in the Transform data article.
When you specify the destination, you need to identify the space, data model (version) and the data type / relationship.
For example, to load data into Actor
:

Follow the steps in the tutorial above to see which queries you should to use in the transformations.
Populate a data model with the Transformations CLI
You can also use the CLI to populate the data to better manage population via CI/CD.
This process will require you to understand a few more concepts about how the CDF API talks about data models. There's some concepts which the configuration file will reference that may seem complex, but in summary:
- Data Model Type is also called a View in the API
- instances are called nodes or edgesin the API, where
- nodes are the actual instances of data of a data type
- edges are the actual relationships between instances / nodes of data types
To learn more, check out the advanced concepts
If necessary, follow the steps in the Transformations CLI article to install the Transformation CLI.
Populating a type
In the manifest, map the following destination
parameters:
- Type - This will always be
nodes
for transformations to a data model type - Space and Instance Space - specify the space where the data model and instances that you would like to load in, respectively (for example
MovieDM
) - View - specify the name of the data type, which acts as the external ID of the View that powers the data type (for example
Movie
)- Version - specify the version of the data model or the version on the @view directive, which acts as the version of the view (for example
1
).
- Version - specify the version of the data model or the version on the @view directive, which acts as the version of the view (for example
For example, to load the data into Movie on version 1 of a data model in the MovieDM space
destination:
type: 'nodes'
instanceSpace: 'MovieDM'
view:
space: 'MovieDM'
externalId: 'Movie'
version: '1'
Sample manifest
Below is a sample manifest you can deploy with transformations-cli deploy
.
externalId: 'sample-dm-transformation'
name: 'sample-dm-transformation'
destination:
type: 'nodes'
# The external ID of your space to put instances in (usually same as the data model instance space).
instanceSpace: 'MovieDM'
# The view to load data into
view:
# The external ID of the space of the data model (and view)
space: 'MovieDM'
externalId: 'Movie'
version: '1'
action: 'upsert'
# Required
query: 'select * from SomeSource' # The query you want to run to transform data.
# You can also specify the path to a file containing the SQL query for the transformation.
# query:
# file: query.sql
schedule: '* * * * *'
# The client credentials for the transformation
authentication:
clientId: ${CLIENT_ID}
clientSecret: ${CLIENT_SECRET}
tokenUrl: ${TOKEN_URL}
scopes:
- ${SCOPES}
cdfProjectName: ${CDF_PROJECT_NAME}
Populating a relation
In the manifest, map the following destination
parameters:
- Type - This will always be
edges
for transformations to a relations field within a data model type - Space and Instance Space - specify the space where the data model and instances that you would like to load in, respectively (for example
MovieDM
) - Edge - specify the name of the data type and field, which acts as the external ID of the View that powers the data type (for example
Movie.actors
)
For example, to load the data into Movie on version 1 of a data model in the MovieDM space
destination:
type: 'edges'
instanceSpace: 'MovieDM'
edgeType:
space: 'MovieDM'
externalId: 'Movie.actor'
Sample manifest
Below is a sample manifest you can deploy with transformations-cli deploy
.
externalId: 'sample-dm-transformation-edge'
name: 'sample-dm-transformation-edge'
destination:
type: 'edges'
# The external ID of your space to put relations in (usually same as the data model instance space).
instanceSpace: 'MovieDM'
edgeType:
# The external ID of the space of the data model (and view)
space: 'MovieDM'
# The data type and relation to load data in
externalId: 'Movie.actor'
action: 'upsert'
# Required
query: 'select * from SomeSource' # The query you want to run to transform data.
# You can also specify the path to a file containing the SQL query for the transformation.
# query:
# file: query.sql
schedule: '* * * * *'
# The client credentials for the transformation
authentication:
clientId: ${CLIENT_ID}
clientSecret: ${CLIENT_SECRET}
tokenUrl: ${TOKEN_URL}
scopes:
- ${SCOPES}
cdfProjectName: ${CDF_PROJECT_NAME}