Skip to main content

Populate a data model (5 mins)

Beta

The features described in this section are currently in beta testing and are subject to change. We recommend that you don't use the features in production systems.

For more information, request to join the data modeling group on the Cognite Hub.

This article describes how to populate a data model with CDF Transformations or from the command line with the Transformations CLI.

Prerequisites

Before you populate the data model, make sure that you have completed these steps:

  1. Upload data to RAW
  2. Create a data model

Populate a data model with CDF Transformations

Follow the steps in this unit to create transformations to ingest data from RAW (CDF's staging area) into a data model.

We will populate instances of the type Actor and Movie from the data model we created earlier but also edges (relationships) to populate the list of actors a movie has. Since we have three types (including the edge), we must create three transformations.

Step 1: Create the Actor transformation

  1. Follow the steps in the Transform data article to create a transformation.

    Note

    Data modeling is not yet available in the new CDF Transformations design, so select the Go to previous version link in the top blue banner before proceeding with the next steps.

  2. On the Recipe tab, in Destination type, select Data Models (Beta).

  3. In Action, select Create or Update.

  4. In Data Model Space, specify the external_id of the data model (for example, moviedm).

  5. In Data Model, specify the type and version of the data model you want to populate (for example, Actor_1).

    Note

    For data, always specify the type and version, for example Movie_1 or Actor_1 to load into version 1 of each data model's types.

    For relationships, always specify the two types, followed by the property name and ended with the version, for example Movie_actors_1 to load into version 1 of the Movie.actors field.

  6. In Instance Space, specify the same value as in Data Model Space, for example moviedm.

  7. The data model for Actor has two properties: name and age. The file we uploaded only has the name of the actors, and not the age. age is not a required field, so we can leave it out.

    You need to specify both the externalId and name properties as both are required.

  8. Enter this transformation query in the SQL editor. We use the actor name as the externalId because we will use it as an identifier when we insert the edges:

    SELECT name as externalId, name FROM movies.actors
  9. Select the Preview button to preview the results.

  10. Select the Schedule and Run tab to run the transformation.

    note

    Before you can run the transformation, you must specify the credentials the transformation should use to authenticate with CDF. See the transformations guide for details.

  11. When you have set up the credentials, return to this screen and select the Transform button to run the transformation.

Step 2: Create the Movie transformation

  1. Repeat steps in 1-6 the previous section, but this time choose the Movies_1 data model.

    This data model has more properties and different primitive types. Let's add the query first and discuss it afterwards.

  2. Enter the following transformation query in the SQL editor:

        SELECT Series_Title as externalId,
    Series_Title as name,
    Overview as description,
    int(Gross) as gross,
    float(IMDB_Rating) as imdbRating,
    int(Runtime) as runTime,
    int(Released_Year) as releasedYear
    FROM movies.movies
  3. Select Preview, and notice how we have casted some fields to their correct types. We have also renamed some fields since the columns in the .csv file do not match the names in our data model.

    Again, we choose the movie title as externalId because we'll use it as the identifier when we insert the edges.

  4. If you are satisfied with the preview results, select the Schedule and Run tab, specify the credentials, and select the Transform button to run the transformation.

Step 3: Create the ActorMovieAppearances transformation

The last transformation adds the movie appearances of actors as edges, or relationships to the data model.

  1. Repeat steps 1-6 the previous section above, but this time choose Movie_actors_1 as the data model.

    Edges point from an instance of a type to an instance of a (potentially different) type. We call these instances nodes. In our case, the edge points from a Movie node to an Actor node. We need to know the external ID of both the start and the end nodes.

    In the data set we uploaded, each row in the actor_movie_appearances table contains the title of a movie, and the name of the actor that the row represents. This is why we chose these as external IDs for Actor and Movie.

  2. Enter this transformation query in the SQL editor:

    SELECT key as externalId,
    concat("moviedm:", movie) as startNode,
    concat("moviedm:", actor) as endNode,
    "moviedm:Movie_actors_1" as type
    FROM movies.actor_movie_appearances
  3. Select the Preview button to preview the results. Note that:

    • The startNode and endNode should be in the format $spaceExternalId:$nodeExternalId, where $spaceExternalId is the external ID of the space the node exists in, and $nodeExternalId is the external ID of the node itself. For most cases, you can assume everything lives in the same space (or external iID of the model).

    • In the preview table, you can see that for the movie The Truman Show, the startNode becomes moviedm:The Truman Show, and similarly for the actor names.

    • In our case, the type field has to be exactly the string moviedm:Movie_actors_1. It is the name of the space, and the name of the edge type Movie_actors_1.

    • You can ignore the dummy and the spaceExternalId fields.

    Note

    All nodes (actors and movies) must exist for this transformation to work. Currently, the transformation will fail if you refer to an external ID that does not exist.

  4. If you are satisfied with the preview results, select the Schedule and Run tab, specify the credentials, and select the Transform button to run the transformation.

You have now ingested actors, movies and the actors' movie appearances as edges. In the next unit, we'll have a look at how you can query the data.

Populate a data model with the Transformations CLI

  1. If necessary, follow the steps in the Transformations CLI article to install the Transformation CLI.

  2. In the manifest, map the following destination parameters:

    • Data Model Space - specify the external_id of the data model, for example moviedm).
    • Data Model - specify the DataType_version or DataType_PropertyName_version.
    • Instance Space - specify the same value as in Data Model Space, for example moviedm.

    For example, to load the data into Movie on version 3 of a data model with MovieDM as the externalId:

    destination:
    type: 'datamodelinstances'
    instanceSpaceExternalId: 'moviedm'
    modelExternalId: 'Movie_3'
    spaceExternalId: 'moviedm'

Sample manifest

Below is a sample manifest you can deploy with transformations-cli deploy.

manifest.yml
externalId: 'sample-fdm-transformatons'
name: 'sample-fdm-transformatons'
destination:
type: 'datamodelinstances'
instanceSpaceExternalId: 'moviedm' # The external ID of your data model.
modelExternalId: 'Movie_3' # The type or relationship in your data.
spaceExternalId: 'moviedm' # The external ID of your data model.
action: 'upsert'

# Required
query: 'select * from SomeSource' # The query you want to run to transform data.
# You can also specify the path to a file containing the SQL query for the transformation.
# query:
# file: query.sql

schedule: '* * * * *'

# The client credentials for the transformation
authentication:
clientId: ${CLIENT_ID}
clientSecret: ${CLIENT_SECRET}
tokenUrl: ${TOKEN_URL}
scopes:
- ${SCOPES}
cdfProjectName: ${CDF_PROJECT_NAME}