const result = await client.entityMatching.create({
sources: [{externalId: 'asset1', name: 'asset1'}, {externalId: 'asset2', name: 'asset2'}],
targets: [{externalId: 'ts1', name: 'ts1'}, {externalId: 'ts2', name: 'ts2'}],
externalId: 'model123',
name: 'model123',
});{
"id": 4503599627370496,
"externalId": "my.known.id",
"status": "Queued",
"createdTime": 1730204346000,
"startTime": 1730204346000,
"statusTime": 1730204346000,
"name": "simple_model_1",
"description": "Simple model 1",
"errorMessage": null,
"featureType": "simple",
"matchFields": [
{
"source": "name",
"target": "name"
},
{
"source": "name",
"target": "someField"
}
],
"ignoreMissingFields": true,
"classifier": "randomforest",
"originalId": 111
}Required capabilities:
entitymatchingAcl:WRITE
Train a model that predicts matches between entities (for example, time series names to asset names). This is also known as fuzzy joining. If there are no trueMatches (labeled data), you train a static (unsupervised) model, otherwise a machine learned (supervised) model is trained.
const result = await client.entityMatching.create({
sources: [{externalId: 'asset1', name: 'asset1'}, {externalId: 'asset2', name: 'asset2'}],
targets: [{externalId: 'ts1', name: 'ts1'}, {externalId: 'ts2', name: 'ts2'}],
externalId: 'model123',
name: 'model123',
});{
"id": 4503599627370496,
"externalId": "my.known.id",
"status": "Queued",
"createdTime": 1730204346000,
"startTime": 1730204346000,
"statusTime": 1730204346000,
"name": "simple_model_1",
"description": "Simple model 1",
"errorMessage": null,
"featureType": "simple",
"matchFields": [
{
"source": "name",
"target": "name"
},
{
"source": "name",
"target": "someField"
}
],
"ignoreMissingFields": true,
"classifier": "randomforest",
"originalId": 111
}Access token issued by the CDF project's configured identity provider. Access token must be an OpenID Connect token, and the project must be configured to accept OpenID Connect tokens. Use a header key of 'Authorization' with a value of 'Bearer $accesstoken'. The token can be obtained through any flow supported by the identity provider.
List of custom source object to match from, for example, time series. String key -> value. Only string values are considered in the matching. Both id and externalId fields are optional, only mandatory if the item is to be referenced in trueMatches.
2000000List of custom target object to match to, for example, assets. String key -> value. Only string values are considered in the matching. Both id and externalId fields are optional, only mandatory if the item is to be referenced in trueMatches.
1 - 2000000 elementsA list of confirmed source/target matches, which will be used to train the model. If omitted, an unsupervised model is trained.
1 - 2000000 elementsA pair of source ID and target ID, that indicates a match between two entities in the source and target spaces. Internal and external IDs are supported.
Show child attributes
{
"sourceId": 23,
"targetExternalId": "my.known.id"
}The external ID provided by the client. Must be unique for the resource type.
255"my.known.id"
User defined name.
256"simple_model_1"
User defined description.
500"Simple model 1"
Each feature type defines one combination of features that will be created and used in the entity matcher model. All features are based on matching tokens. Tokens are defined at the top of the Entity matching section. The options are:
matchFields. This is the fastest option.simple, but adds similarity score based on matching bigrams of the tokens.bigram, but give higher weights to less commonly occurring tokens.bigram, but able to learn that leading zeros, spaces, and uppercase/lowercase differences should be ignored in matching.simple, insensitive, bigram, frequencyweightedbigram, bigramextratokenizers, bigramcombo "simple"
List of pairs of fields from the target and source items, used to calculate features. All source and target items should have all the source and target fields specified here.
Show child attributes
[
{ "source": "name", "target": "name" },
{ "source": "name", "target": "someField" }
]The classifier used in the model. Only relevant if there are trueMatches/labeled data and a supervised model is fitted.
randomforest, decisiontree, logisticregression, augmentedlogisticregression, augmentedrandomforest "randomforest"
If True, replaces missing fields in sources or targets entities, for fields set in set in matchFields, with empty strings. Else, returns an error if there are missing data.
true
Success
A server-generated ID for the object.
1 <= x <= 9007199254740991The external ID provided by the client. Must be unique for the resource type.
255"my.known.id"
The status of the job.
Queued, Running, Completed, Failed The number of milliseconds since 00:00:00 Thursday, 1 January 1970, Coordinated Universal Time (UTC), minus leap seconds.
x >= 01730204346000
The number of milliseconds since 00:00:00 Thursday, 1 January 1970, Coordinated Universal Time (UTC), minus leap seconds.
x >= 01730204346000
The number of milliseconds since 00:00:00 Thursday, 1 January 1970, Coordinated Universal Time (UTC), minus leap seconds.
x >= 01730204346000
User defined name.
256"simple_model_1"
User defined description.
500"Simple model 1"
If the job failed, some more information about the error cause.
null
Each feature type defines the combination of features that will be created and used in the entity matcher model.
simple, insensitive, bigram, frequencyweightedbigram, bigramextratokenizers, bigramcombo "simple"
List of pairs of fields from the target and source items, used to calculate features. All source and target items should have all the source and target fields specified here.
Show child attributes
[
{ "source": "name", "target": "name" },
{ "source": "name", "target": "someField" }
]If True, missing fields in sources or targets entities set in matchFields, are replaced with empty strings.
true
Name of the classifier used in the model, "Unsupervised" if unsupervised model.
randomforest, decisiontree, logisticregression, augmentedlogisticregression, augmentedrandomforest "randomforest"
The ID of original model, only relevant when the model is a retrained model.
111
Was this page helpful?