Skip to main content

Populate a data model (5 mins)

This article describes how to populate a data model from the data model page, with CDF Transformations or from the command line with the Transformations CLI.

Prerequisites

Before you populate the data model, make sure you've completed the steps in these unit to create transformations to ingest data from the CDF staging area (RAW) into a data model:

  1. Upload demo data.
  2. Create a data model.

Populate a data model from the data model page

Follow the steps in this section to populate the data model from the data model page.

We will populate instances of the type Actor and Movie from the data model we created earlier but also relationships to populate the list of actors a movie has.

Step 1: Populate the Actor, Director, and Movie fields

  1. Navigate to Cognite Data Fusion and select Explore and build > Data modeling and select the data model you want to populate.

  2. On the Data management tab, select Actor, and then Populate in bulk.

    Choose Populate in bulk to use Transformations to populate the data model
  3. Confirm the suggested transformation name and external ID. You can change both values later.

  4. In the SQL editor, switch to the mapping editor to map the source and target columns.

  5. Select Change source, then select the actors table, and select Apply selection.

    Choose the actors table as the source
  6. Enter the mappings, and select Preview to see confirm that the mappings are correct.

    tip

    If you're unsure about the mappings, select View suggestions to get help. This is especially useful for larger data types with many attributes.

  7. Select Run > Run as current user.

    Select Run > Run as current user
  8. When the transformation has completed, return to the data modeling page to see the populated data. You may need to refresh the page to see the data.

  9. Repeat the steps above to populate Director from the directors table.

  10. The next step is to map and populate Movie from the movies table. In the Mapping editor, use the following mappings, or the SQL code below the image:

    Use these mappings
SQL for ingesting Movie

   select
cast(`id` as STRING) as externalId,
cast(`Movie_Name` as STRING) as name,
cast(`Description` as STRING) as description,
cast(`Run_time` as INT) as runTime,
cast(`Gross_USD` as INT) as gross,
node_reference(`Director`) as director,
cast(`IMDB_Rating` as DOUBLE) as imdbRating,
cast(`Released_Year` as INT) as releasedYear
from `Movies`.`movies`;

Step 2: Populate the Movie.actors-[Actor] relationship

For many-to-many relationships, like Movie.actors ->[Actor], use a separate transformation.

  1. Navigate to Data model > Data management, and select Movie.

  2. Select Populate in bulk > Create but this time choose Load relationship and the (Movie.actors) relationship, and select Next.

    Use these mappings for the Movie.actors-[Actor] relationship
  3. In the SQL editor, switch to the mapping editor to map the source and target columns.

  4. Select Change source, then select the actor_movie_appearances table, and select Apply selection.

    Choose the actor_movie_appearances table as the source
  5. Use the dropdown menus to map the values manually:

    Map the source and target values manually
  6. Select Preview, and then Run > Run as current user.

  7. On the Data management page, notice that the actors column for Movie has multiple actors.

    Multiple actors for each movie

    You've have now ingested actors, directors, and movies, and defined the actors' movie appearances as relationships (edges).

In the next unit, we'll look at how you can query the data.

SQL for ingesting Movie.actors -> [Actor]

select
cast(`id` as STRING) as externalId,
node_reference(`movie`) as startNode,
node_reference(`actor`) as endNode
from
`Movies`.`actor_movie_appearances`;

Populate a data model with CDF Transformations

You can create transformations directly from CDF Transformations page.

As in the example above, you can populate instances of the type Actor, Director, and Movie from the data model you created earlier. You can also use relationships to populate a list of the actors in each movie.

To work with transformations, follow the steps in the Transform data article.

When you specify the destination, you need to identify the space, data model (version) and the data type / relationship.

For example, to load data into Actor:

Follow the steps in the tutorial above to see which queries you should to use in the transformations.

Populate a data model with the Transformations CLI

You can also use the CLI to populate the data to better manage population via CI/CD.

Advanced Concepts

This process will require you to understand a few more concepts about how the CDF API talks about data models. There's some concepts which the configuration file will reference that may seem complex, but in summary:

  • Data Model Type is also called a View in the API
  • instances are called nodes or edgesin the API, where
    • nodes are the actual instances of data of a data type
    • edges are the actual relationships between instances / nodes of data types

To learn more, check out the advanced concepts

If necessary, follow the steps in the Transformations CLI article to install the Transformation CLI.

Populating a type

In the manifest, map the following destination parameters:

  • Type - This will always be nodes for transformations to a data model type
  • Space and Instance Space - specify the space where the data model and instances that you would like to load in, respectively (for example MovieDM)
  • View - specify the name of the data type, which acts as the external ID of the View that powers the data type (for example Movie)
    • Version - specify the version of the data model or the version on the @view directive, which acts as the version of the view (for example 1).

For example, to load the data into Movie on version 1 of a data model in the MovieDM space

destination:
type: 'nodes'
instanceSpace: 'MovieDM'
view:
space: 'MovieDM'
externalId: 'Movie'
version: '1'

Sample manifest

Below is a sample manifest you can deploy with transformations-cli deploy.

manifest.yml
externalId: 'sample-dm-transformation'
name: 'sample-dm-transformation'
destination:
type: 'nodes'
# The external ID of your space to put instances in (usually same as the data model instance space).
instanceSpace: 'MovieDM'
# The view to load data into
view:
# The external ID of the space of the data model (and view)
space: 'MovieDM'
externalId: 'Movie'
version: '1'
action: 'upsert'

# Required
query: 'select * from SomeSource' # The query you want to run to transform data.
# You can also specify the path to a file containing the SQL query for the transformation.
# query:
# file: query.sql

schedule: '* * * * *'

# The client credentials for the transformation
authentication:
clientId: ${CLIENT_ID}
clientSecret: ${CLIENT_SECRET}
tokenUrl: ${TOKEN_URL}
scopes:
- ${SCOPES}
cdfProjectName: ${CDF_PROJECT_NAME}

Populating a relation

In the manifest, map the following destination parameters:

  • Type - This will always be edges for transformations to a relations field within a data model type
  • Space and Instance Space - specify the space where the data model and instances that you would like to load in, respectively (for example MovieDM)
  • Edge - specify the name of the data type and field, which acts as the external ID of the View that powers the data type (for example Movie.actors)

For example, to load the data into Movie on version 1 of a data model in the MovieDM space

destination:
type: 'edges'
instanceSpace: 'MovieDM'
edgeType:
space: 'MovieDM'
externalId: 'Movie.actor'

Sample manifest

Below is a sample manifest you can deploy with transformations-cli deploy.

manifest.yml
externalId: 'sample-dm-transformation-edge'
name: 'sample-dm-transformation-edge'
destination:
type: 'edges'
# The external ID of your space to put relations in (usually same as the data model instance space).
instanceSpace: 'MovieDM'
edgeType:
# The external ID of the space of the data model (and view)
space: 'MovieDM'
# The data type and relation to load data in
externalId: 'Movie.actor'
action: 'upsert'

# Required
query: 'select * from SomeSource' # The query you want to run to transform data.
# You can also specify the path to a file containing the SQL query for the transformation.
# query:
# file: query.sql

schedule: '* * * * *'

# The client credentials for the transformation
authentication:
clientId: ${CLIENT_ID}
clientSecret: ${CLIENT_SECRET}
tokenUrl: ${TOKEN_URL}
scopes:
- ${SCOPES}
cdfProjectName: ${CDF_PROJECT_NAME}