# Entity matching
Contextualize your data by using machine learning and rules engines. Then let domain experts validate and fine-tune the results.
Different sources of industrial data can use different naming standards when they refer to the same entity. CDF lets you select entities like time series, files, and sequences from different source systems and match them to assets. You can also contextualize 3D models so you can se them related to the asset.
Select one of these options for matching entities to assets:
Create a pipeline to re-run matching models on data sets. If the data set receives new data, you can re-run the pipeline to find additional matches.
Run a quick match for one-time matching of individual resources, or groups of resources, to assets. The matching model is not stored, and you cannot reuse it for more matching tasks.
Match the nodes in a 3D model to an asset
In this article:
# Step 1: Select matching process
- Navigate to fusion.cognite.com (opens new window).
- Sign in with your CDF project name and credentials.
- Select Match resources.
- Quick match for simple matching processes.
- Create pipelines to create matching models that you can re-run.
- Match 3D model.
# Step 2: Select entities and assets to match
- Under Select entities, select the resource types you want to match or select a data set if you are creating a pipeline.
- Under Select assets, select assets if you are running a quick match or one or more data sets if you are creating a pipeline.
# Step 3: Set up the matching model and generate suggested matches
By default, the model uses the similarity between the
namefields when it searches for matches.
If you want to select other fields for matching, use the dropdown menus and then click Add fields.
Select the similarity scoring model:
Simple: Calculates a similarity score based on identical letter or digit sequences, henceforth referred to as tokens, for each pair of fields defined above. This is the fastest option.
Insensitive: Similar to simple, but ignores lowercase/uppercase differences.
Bigram: Similar to simple, but adds similarity score based on bigrams of the tokens (two adjacent tokens). For instance would "AA-11-BB" be considered more similar too "AA-11-CC" than "AA-00-BB", while **Simple would see them as equally similar.
Frequency weighted bigram: Similar to bigram but gives higher weights to less commonly occurring tokens.
Bigram extra tokenizers: Similar to bigram, but able to learn that leading zeros, spaces, and lowercase/uppercase should be ignored in matching.
Bigram combo: Calculates all of the above options, relying on the machine learning model to determine the appropriate features to use. Hence, this is an appropriate choice if there already exists some matches the model can train on (see option below). This is the slowest option.
The different feature-types are created to improve the accuracy of the model for different types of input data. Hence, which feature-type that works best for your model will vary based on what your data look like.
- Train the matching model
For pipeline, the model matches entities based on your manual confirmations by default. For quick-match, an unsupervised model is used by default. However, the model can also use matches already in CDF to learn and improve. Select the checkbox Use matched resources as training data to allow the model to train on already existing matches in CDF.
Select the matching scope
- Unmatched only: This option filters the entities to only include entities without an asset ID in CDF. Entities with an asset ID are already matched to an asset and will not get a new suggestion with this option.
- All resources: This option creates suggestions for all available data, which allows you to see whether there is a different and better match result to an entity than the current match.
- Click Run model to train the matching model on the data you have selected and to generate suggested matches.
# Step 4: Validate suggested matches and update CDF
For pipelines, CDF suggests matches in this order:
Confirmed matches - for pipeline: already confirmed matched entity and asset.
Confirmed patterns - matches created by one of the already confirmed pattern.
Predictions from the entity matching model.
Use the drop-down menu to select the entities you want to work with:
All resources: Show all the entities you have selected for matching.
Matched: Show entities that have already been matched to an asset. Select this option to change the existing matching for entities.
Unmatched entities: Show entities that have not yet been matched to an asset. Select this option to validate the suggestions, and to match individual entities or groups of entities to assets.
Different recommendation: Show entities that have already been matched to an asset, but where CDF recommends a new match. Select this option to change the existing matching for entities.
- Select Group by pattern to match individual entities that fit the same pattern.
- For each entity (or group of entities), you can see the suggested asset matching and search for an asset to match the entity to. Select the checkmark to confirm the matching and move the entity to the draft matches section.
Review all the draft matches, and move matches out of the draft matches, if necessary.
If you are creating a pipeline, click Save this pipeline to use this matching model when new data is ingested into the selected data sets.
Select Write to CDF to update CDF with the matches.
# Step 5: Re-run a matching model in existing pipelines
If a data set receives new data, you can click Rerun pipeline on the overview page to find additional matches. To adjust the matching model, select Open on the eclipse button.
# Step 6: Match the nodes in a 3D model to an asset
Contextualize your 3D models so you can see them related to the asset. You can also see the models in the apps (InField, Maintain, and Remote).
Note! Loading the 3D page takes a bit of time.
Select 3D model revision. The revision number is the version of the model. All the available 3D models in your project are listed.
Select the model type:
- PDMS - improves response time as it filters out nodes that do not need to be mapped to assets based on keywords in the name
Click Next. Note! It may take up to 10 minutes to load the result.
The asset mapping result states how many nodes have been contextualized. Click a node to see its contextualization. Use Confidence threshold to see different results.
Click Save to CDF if you approve the result.