Skip to main content

Modeling an asset hierarchy

This article describes the asset hierarchy template in the Cognite Data Fusion (CDF) templates repository. An asset hierarchy describes connections between larger assets and the smaller components inside them.

info

We recommend that you read this article to become familiar with the asset hierarchy template before you try it in your Cognite Data Fusion (CDF) project.

The template first creates a generic asset data model and then a use-case-specific data model and demonstrates how to:

The template models an asset hierarchy with two levels, lift stations and pumps, where the pumps are children of a lift station:

Pump lift station

Each of the pumps has properties in a metadata key-value store:

Pump metadata

The data models

This section describes the views, the solution data model for the template. The containers, the physical storage, are covered in the next section.

Generic asset model

The template starts with a generic asset model. The data model has one view, Asset:

Generic asset model

The data model is generic, and you can use it for both lift stations and pumps. However, you can't define properties specifically for pumps since the properties would also apply to lift stations. Therefore, the model is useful for creating an overview of all assets, but you need additional data models tailored to specific use cases.

Use-case-specific data model

The use-case-specific data model has two views, LiftStation and Pump. There's a one-to-many edge from LiftStation to Pump and a direct relation from Pump to LiftStation.

Generic asset model

The data model fits the use case, and you can query it in a language close to the domain, for example, "get all pumps in lift station 1."

Note that while the data model is well suited for building a solution in the domain of pump lift stations, you likely can't use it for other use cases.

Implementing the physical schema backing your data model

When you create data models, you need to consider containers and how to store data. Containers handle the physical storage of data models and are similar to SQL tables.

While views are reasonably easy to create, modify, and delete, containers are more permanent, and any changes often require costly migration of data. Carefully consider which properties to include in each container, keeping the following questions in mind:

  • How can you avoid duplication of data? Duplication can lead to inconsistencies and increased storage costs.

  • What queries do you expect? Consider which indexes you need in the containers. Indexes speed up queries, but increase the cost of writing.

  • What structure do you want to enforce on the data? Defining required properties or requiring a matching row to exist in another container lets you protect data quality by rejecting write operations of incomplete data.

tip

When you design containers, Cognite recommends using system containers and views as much as possible. They're pre-defined and designed to fit common industrial use cases and are available in the cdf_core space.

Implementing the pump lift station use case

Generic asset model

The generic asset model is similar to the Asset system view in the cdf_core space. The only difference is the added metadata property.

note

We don't recommend using a generic metadata property of type JSON. See how the Asset system view can be extended in the example.

To store the metadata property, the template uses a new container, AssetMetadata, in the myModelSpace space. One potential issue is that the container could be used to store any information in the metadata property. To make sure that it's only used for assets, we've added a constraint to the Asset container in the cdf_core system space.

The template defines the container specification in a YAML file and writes it to CDF using the container API.

externalId: AssetMetadata
name: Asset
space: myModelSpace
usedFor: node
properties:
metadata:
autoIncrement: false
defaultValue: null
description: Custom, application specific metadata. String key -> String value.
name: metadata
nullable: true
type:
list: false
type: json
constraints:
requiredAsset:
constraintType: requires
require:
type: container
space: cdf_core
externalId: Asset

Next, the template creates an Asset view for the generic asset model. Because the view is in the myModelSpace space, it won't collide with the system Asset view.

Instead of creating one property in the new view for each property in the Asset, Describable, and Sourceable containers, the template uses the Asset view from the cdf_core space, which already has all the properties. Only the metadata property needs to be added:

externalId: Asset
name: Asset
space: myModelSpace
version: '1'
implements:
- type: view
space: cdf_core
externalId: Asset
version: v1
properties:
metadata:
container:
externalId: Asset
space: myModelSpace
type: container
containerPropertyIdentifier: metadata
name: metadata

The final step is to create the data model with a single view:

externalId: myGenericAssetModel
name: myGenericAssetModel
space: myModelSpace
version: 1
views:
- externalId: Asset
space: myModelSpace
type: view
version: 1

Use-case-specific data model

For the use-case-specific data model, the template uses two views, LiftStation and Pump. The LiftStation view doesn't have any properties that aren't already in the system Asset view. (The pumps property is a one-to-many edge that isn't stored in containers.) Therefore, we don't need a container for the lift station.

The pump view has four new properties, DesignPointFlowGPM, DesignPointHeadFt, LowHeadFT, and LowHeadFlowGPM, that you need to create a container for. In the template, we've added a constraint to the Asset container to disallow pumps without an asset.

This is the specification for the Pump container:

externalId: Pump
name: Pump
space: myModelSpace
usedFor: node
properties:
DesignPointFlowGPM:
autoIncrement: false
defaultValue: null
description: The flow the pump was designed for, specified in gallons per minute.
name: DesignPointFlowGPM
nullable: true
type:
list: false
type: float64
DesignPointHeadFT:
autoIncrement: false
defaultValue: null
description: The flow head pump was designed for, specified in feet.
name: DesignPointHeadFT
nullable: true
type:
list: false
type: float64
LowHeadFT:
autoIncrement: false
defaultValue: null
description: The low head of the pump, specified in feet.
name: LowHeadFT
nullable: true
type:
list: false
type: float64
LowHeadFlowGPM:
autoIncrement: false
defaultValue: null
description: The low head flow of the pump, specified in gallons per minute.
name: DesignPointHeadFT
nullable: true
type:
list: false
type: float64
constraints:
requiredAsset:
constraintType: requires
require:
type: container
space: cdf_core
externalId: Asset

When creating the LiftStation view, the template doesn't use the Asset view from the system space because we don't need the root and parent properties. Instead, the template uses the Describable and Sourceable views from the system space.

You also need to add the pumps property as a one-to-many edge from LiftStation to Pump. You don't want to get all assets from the Asset container, only the lift stations. To achieve this, the template uses a filter on the externalId of the asset. During data population, the template ensures that all lift stations have an externalId that starts with lift_station.

This is the specification for the LiftStation view:

externalId: LiftStation
name: LiftStation
space: myModelSpace
version: 1
implements:
- type: view
space: cdf_core
externalId: Sourceable
version: v1
- type: view
space: cdf_core
externalId: Describable
version: v1
properties:
pumps:
type:
space: myModelSpace
externalId: LiftStation.pumps
source:
space: myModelSpace
externalId: Pump
version: 1
type: view
direction: outwards
name: pumps
filter:
prefix:
property:
- node
- externalId
value: lift_station

Next, the template creates the Pump view. As with the LiftStation view, we can't use the Asset view from the system space. Instead, the template uses the Describable and Sourceable views from the system space.

The template uses the parent property from the Asset container in the system space. Instead of calling it parent, the template adds a direct relation property linking to the parent property and names it liftStation.

To distinguish the pumps from other assets, the template adds a filter on the externalId of the asset. During data population, the template ensures that all pumps have an externalId that starts with pump.

externalId: Pump
name: Pump
space: myModelSpace
version: 1
implements:
- type: view
space: cdf_core
externalId: Sourceable
version: v1
- type: view
space: cdf_core
externalId: Describable
version: v1
properties:
liftStation:
container:
externalId: Asset
space: cdf_core
type: container
containerPropertyIdentifier: parent
name: liftStation
source:
external_id: LiftStation
space: myModelSpace
version: 1
DesignPointFlowGPM:
container:
externalId: Pump
space: myModelSpace
type: container
containerPropertyIdentifier: DesignPointFlowGPM
name: DesignPointFlowGPM
DesignPointHeadFT:
container:
externalId: Pump
space: myModelSpace
type: container
containerPropertyIdentifier: DesignPointHeadFT
name: DesignPointHeadFT
LowHeadFT:
container:
externalId: Pump
space: myModelSpace
type: container
containerPropertyIdentifier: LowHeadFT
name: LowHeadFT
LowHeadFlowGPM:
container:
externalId: Pump
space: myModelSpace
type: container
containerPropertyIdentifier: LowHeadFlowGPM
name: LowHeadFlowGPM
filter:
prefix:
property:
- node
- externalId
value: pump

Having defined the two views, we can create the data model:

externalId: LiftStationPump
name: LiftStationPump
space: myModelSpace
version: 1
views:
- externalId: Pump
space: myModelSpace
type: view
version: 1
- externalId: LiftStation
space: myModelSpace
type: view
version: 1

Populating data models

Data models and the data they contain often require different access patterns. Data models are typically read-only, while the data is read-write. In the template, we use a new space, myDataSpace, to store the data for the data models.

Generic asset model

To populate the generic asset model, the template uses a transformation to move data into the Asset view created earlier from the CDF asset hierarchy, lift_pump_stations:root, or the CDF staging area, RAW.

Under the Asset view, the template uses the Asset system container from the cdf_core system space. The container has a constraint on the parent property and requires that the parent exists to allow write operations. Consequently, the data must be populated in one step per level in the asset hierarchy, first the lift stations, and then the pumps. To separate lift stations from pumps, the externalId must start with lift_station for the lift stations and with pump for the pumps.

This is the specification for the SQL transformation:

-- Pump stations
select
concat('lift_station:', cast(`externalId` as STRING)) as externalId,
null as parent,
null as root,
cast(`name` as STRING) as title,
cast(`source` as STRING) as source,
cast(`description` as STRING) as description,
cast(`labels` as ARRAY < STRING >) as labels,
to_json(`metadata`) as metadata
from
cdf_assetSubtree('lift_pump_stations:root')
where
isnotnull(`externalId`) and isnotnull(`parentExternalId`) and not startswith(name, 'Pump')

UNION ALL
-- Pumps
select
concat('pump:', cast(`externalId` as STRING)) as externalId,
node_reference('myDataSpace', `parentExternalId`) as parent,
null as root,
cast(`name` as STRING) as title,
cast(`source` as STRING) as source,
cast(`description` as STRING) as description,
cast(`labels` as ARRAY < STRING >) as labels,
to_json(`metadata`) as metadata
from
cdf_assetSubtree('lift_pump_stations:root')
where
isnotnull(`externalId`) and isnotnull(`parentExternalId`) and startswith(name, 'Pump');

When the transformation has completed, inspect the data in CDF to verify that the lift stations and pumps have been populated in Asset model.

Asset model populated

Use-case-specific data model

The use-case-specific data model uses the Asset, Sourceable, and Describable views from the cdf_core system space. The views are already populated through the generic asset model, and you only need to populate the new properties.

The LiftStation view already has data, but not for the new pumps property:

LiftStation populated

The Pump view also has data, but misses data in the new DesignPointFlowGPM, DesignPointHeadFT, LowHeadFT, and LowHeadFlowGPM properties:

Pump populated

To add the missing data, the template uses one transformation to populate the pump property in the LiftStation view, and another to populate the new properties in the Pump view. Normally, the data would be in the CDF staging area, RAW, or in a CDF asset hierarchy. In this case, the template uses the existing data in the Asset container in the cdf_core system space.

To populate the pumps property in the LiftStation view, the transformation first iterates over all the assets while filtering for the pumps. Then, it creates an edge for each pump.

This is the specification for the SQL transformation:

select
concat(cast(`parent`.externalId as STRING), ':', cast(`externalId` as STRING)) as externalId,
`parent` as startNode,
node_reference('myDataSpace', cast(`externalId` as STRING)) as endNode
from
cdf_data_models("myModelSpace", "myGenericAssetModel", "1", "Asset")
where
startswith(title, 'Pump')

When the transformation has completed, inspect the data in CDF to verify that the lift stations and pumps are populated in Asset model.

LiftStation populated

To populate the new properties in the Pump view, the transformation first iterates over all the assets while filtering for the pumps. Then, it fetches the data from the metadata property, converts it to the correct type, and writes it to the Pump container.

This is the specification for the SQL transformation:

select
cast(`externalId` as STRING) as externalId,
cast(get_json_object(`metadata`, '$.DesignPointHeadFT') as DOUBLE) as DesignPointHeadFT,
cast(get_json_object(`metadata`, '$.LowHeadFT') as DOUBLE) as LowHeadFT,
cast(get_json_object(`metadata`, '$.DesignPointFlowGPM') as DOUBLE) as DesignPointFlowGPM,
cast(get_json_object(`metadata`, '$.LowHeadFlowGPM') as DOUBLE) as LowHeadFlowGPM
from
cdf_data_models("myModelSpace", "myGenericAssetModel", "1", "Asset")
where
startswith(title, 'Pump')

When the transformation has completed, inspect the data in CDF to verify that the new properties are populated in the Pump view.

Pump populated

Consuming data models

Having created the data models and populated them with data, you can start consuming them. For example, you can use a GraphQL query with a range filter on the DesignPointFlowGPM property to find all the pumps with a design point flow greater than 200 GPM.

{
listPump(first: 100, filter: { DesignPointFlowGPM: { gt: 200 } }) {
items {
externalId
DesignPointFlowGPM
}
}
}

Try it

To implement the examples in this article in your own CDF project, use the cognite-toolkit Python package in the CDF Toolkit repository.

pre-requisites
  • A CDF project with the necessary access.
  • A Python environment project.

The toolkit uploads data to the RAW staging area and sets up the data models and transformations to populate the data models.

  1. Set up your Python environment: python -m venv venv && source venv/bin/activate.

  2. Install the toolkit: pip install cognite-toolkit.

  3. Run the init command: cdf-tk init my_project.

  4. Create the .env file using .env.templ as a template, in my_project/.env.templ and paste it into your current folder: .env.

  5. Change the content under the local section of my_project/environments.yaml to:

    environments.yaml
    local:
    project: <your-CDF project>
    type: dev
    deploy:
    - example_pump
  6. Make sure you have access to the CDF project you'll be using.

  7. Run cdf-tk build my_project --env local to build the project.

  8. Run cdf-tk deploy --env local to deploy the project.

  9. Run the transformations to populate the data models.

    • Run the pump_asset_hierarchy-load-collections_pump to populate the asset hierarchy.
    • Run the sync-asset_hierarchy_cdf_asset_source_model to populate the generic asset data model.
    • Run the pump_model-populate-pump_container and pump_model-populate-lift_station_pumps_edges to populate the use-case-specific data model.