Skip to main content

Configure the EDM extractor

To configure the EDM extractor, you must create a configuration file. The file must be in YAML format.

You can use the sample minimal configuration file as a starting point for your configuration settings.

You can use substitutions with environment variables in the configuration files. The values wrapped in ${} are replaced with environment variables with that name. For example, ${COGNITE_PROJECT} will be replaced with the value of the environment variable called COGNITE_PROJECT.

`idp-authentication`:
project: ${COGNITE_PROJECT}
idp-authentication:
tenant: ${COGNITE_TENANT_ID}
client-id: ${COGNITE_CLIENT_ID}
secret: ${COGNITE_CLIENT_SECRET}
scopes:
- ${COGNITE_SCOPE}

Cognite

Include the cognite section to configure which CDF project the extractor loads data into and how to connect to the project. This section is mandatory and should always contain the project and authentication configuration.

ParameterDescription
clientIdInsert the CDF client ID. This is mandatory if you're using OIDC authentication.
clientSecretInsert the CDF client secret for your CDF project. This is mandatory if you're using OIDC authentication.
token_urlInsert the token URL. This is mandatory if you're using OIDC authentication. You'll find further details on the OIDC token URL here
apiUrlInsert the base URL of the CDF project.
apiKeyWe've deprecated API-key authentication and strongly encourage customers to migrate to authentication with IdP.
projectInsert the CDF project name you want to ingest data into. This field is required.
rawDatabaseNameEnter the name of the CDF RAW database to extract data into. If it doesn't already exist, the extractor creates the database.
queueBufferSizeEnter the size for the queue between extraction and writing objects to CDF RAW. The default value is 500. A larger queue will allow for buffering more extracted entities from EDM, but lead to higher memory usage.

Logging

Include the logging section to set up logging to a standard output, such as a terminal window.

ParameterDescription
levelSelect the verbosity level for console logging. Valid options are trace, debug, info, warn, and error.
cdfLevelSet the log level for the CDF Java SDK. Valid options are trace, debug, info, warn, and error. This is an optional value. The default value is warn.

EDM

Include the edm section to configure the parameters needed to connect to your EDM instance.

ParameterDescription
dsisAuthServerIf you set dsisAuthType to token, enter the URL for DSIS authentication
dsisAuthTypeSelect how to authenticate to DSIS. Valid options are token and basic.
  • token expects the bearer token from the DSIS authentication server.
  • basic uses dsisUsername and dsisPassword to authenticate.
  • gracePeriodEnter the time in seconds between each scan for changes to incremental entities. The default value is 180. This field is optional.
    crontabCompleteSet up a cron schedule for starting the extraction of complete entities.
    runOnStartupSet to true to run a complete extraction on startup bypassing any schedule set in crontabComplete. This is typically used during development. The default value is false. This field is optional.
    dsisEdmPathInsert the path to the EDM server.
    dsisPathCartoServicesInsert the path to the EDM Carto services to extract Carto entities from EDM, such as CD_CARTO.
    dsisParallelismInsert the number of threads to extract data from EDM. Be aware that a high volume load on the EDM server may cause a crash. The default value is 2. This field is optional.
    processingParallelismInsert the number of threads to process and upload extracted entities to CDF RAW. The default value is 4. This field is optional.
    startPaddingExtractor internal parameter. Do not change.
    measurementSystemEnter the measurement system defined in EDM/DSIS. If you set a value, the measurementsystem will be appended to all queries. The default value is blank. This field is optional.
    connectTimeoutEnter the maximum time in seconds to wait for a connection to DSIS. The default value is 5. This field is optional.
    readTimeoutSet the maximum timeout in seconds for read requests to DSIS. This field is optional. If you don't enter a value or the value is less then 120, the default value is set to 120.
    entitiesCompleteExtractionList the EDM entities to extract completely on each run, even if the entities are incremental. This can be used as a workaround if EDM is missing create_date and update_date on incremental entities.
    oDataConsumerDebugEnable debug logging of EDM traffic for troubleshooting. Valid strings are ALL, REQUEST_HEADER, REQUEST_FULL, RESPONSE_HEADER, RESPONSE_FULL, OFF. This field is optional.
    cartoSyncSet to true to extract the EDM cartography database used (entities carto and carto alias). These will be extracted into CDF RAW tables named Carto and CartoAlias, respectively.
    dsisBaseUrlInsert the base URL of the DSIS. The extractor combines dsisBaseUrl, dsisEdmPath, dsisEdmVdb, and dsisEdmProject to connect to your EDM instance. This is a required field.
    dsisEdmVdbEnter the name of the EDM virtual database. This is a required field.
    dsisEdmProjectEnter the name of the EDM project. The project name ALL_PROJECTS can be used to query across all projects in EDM. This is a required field.
    dsisAuthRealmEnter the name of the EDM authentication REALM. This is a required field.
    dsisAuthClientIdEnter the DSIS client ID. This is a required field.
    dsisUsernameEnter a valid DSIS username. This is a required field.
    dsisPasswordEnter a valid DSIS password. This is a required field.
    entitiesList the EDM entities you want to extract. See also the entities section below.
    consistencyScheduleSet up a cron schedule for triggering a consistency check of the state of RAW compared to EDM to mark deleted objects in EDM as deleted in RAW.
    consistencyOnStartupSet to true to run a consistency check between the state in RAW and EDM on startup bypassing any schedule set in consistencySchedule.

    Entities

    Include the entities section to list the entities you want to extract from the EDM server to CDF. You'll find the list of supported EDM entities here.

    ParameterDescription
    nameEnter the name of the EDM entity, for instance, CD_WELL. This is a required field.
    filterOData-based custom filters for EDM entity level. Use this to filter unwanted records from EDM.
    includeFieldsList of properties on the entity to include in the extraction. If not specified, all fields are included. This field can't be combined with excludefields.
    excludeFieldsList of properties to exclude from the extraction of an entity. Don't use this parameter together with includeFields.
    overrideIncrementalThis parameter is only relevant when the entity is listed in the entitiesCompleteExtraction parameter and confirms for the extractor that it should both extract incrementally and complete. The entry under entitiesCompleteExtraction parameter should have a filter specified to only select entities that are missed by the incremental extraction. Use in combination with a filter to get all entities with create/update_date set to NULL. See the sample configuration file. Use this parameter for special cases only. By default, this parameter is disabled.
    batchSizeThe number of entities to retrieve per request. The default value is 8, or 50 for the related parameter. Some heavy entities, such as CD_WELL, have further configuration in the extractor to reduce the number of entities to retrieve in a batch to reduce the load on the EDM server. This value will override these built-in standard values. This field is optional.
    relatedSet to true to include related entities in the object extracted. By default set to true.
    relatedListSet to an optional list of related entities to include. If not set, all related entities are included.
    consistencyCheckSet to false to exclude an entity from the set of entities that are consistency checked periodically (according to the schedule set in consistencySchedule). The consistency check verifies whether all extracted objects in RAW still exist in EDM. If they don't exist, they're marked as deleted in RAW. The default value is true.