Skip to main content

Setting up the Documentum extractor

Follow the steps below to set up the extractor.

Before you start

  1. Check the server requirements for the extractor.

  2. Make sure the extractor has the following access capabilities in a Cognite Data Fusion (CDF) project:

    • files:read and files:write
    • raw:read, raw:write, and raw:list if you're ingesting metadata into the CDF staging area.
    Tip

    You can use OpenID Connect and your existing identity provider (IdP) framework to manage access to CDF data securely. Read more.

  3. Set up a Windows Update schedule. Note that the update may reboot the machine, causing extractor downtime.

  4. In the CDF user interface, navigate to the Extract data section to download the installation files for the extractor.

    Permission issues

    Set Modify permission under Properties in the installation folder to avoid permission issues.

Connect to Documentum

Use the the mode configuration parameter to configure the extractor to connect to Documentum through the D2 REST API or the Documentum Foundation Classes (DFC) Java SDK.

info

Cognite recommends connecting to Documentum using the D2 REST API.

  • If you connect using the D2 REST API, you may have to install the root certificate from the D2 server into the Java Virtual Machine (JVM) trusted CA store. Download the certificate file, and import it into the appropriate Java Runtime Environment (JRE):

    keytool -importcert -trustcacerts -keystore cacerts -file .\Path\to\certificate.cer -keystore 'C:\Program Files\Java\jdkVERSION\jre\lib\security\cacerts'
  • If you connect using the DFC Java SDK, you must create a configuration file for the DFC library, dfc.properties, in addition to the standard configuration file. The DFC configuration must be in the Java properties format.

Run as a jar file

  1. Download and add the dfc.jar SDK file from Documentum to the lib folder in the installation directory.

  2. Run the jar file:

    java -jar path/to/documentum-extractor-<version>.jar path/to/config.yaml
  3. If you're connecting via the DFC Java SDK, create a DFC properties file as a second argument:

    java -jar path/to/documentum-extractor-<version>.jar path/to/config.yaml path/to/dfc.properties

Schedule automatic runs

To schedule automatic runs using Windows, see Run extractors in Windows Task Scheduler.

If you use a batch script to call the .jar file, the Windows Task Scheduler terminates the script process without terminating children (the Java process). Use the Task Manager to end the Java process to stop the extractor.

tip

Leave some overlap between the quick sync intervals and the scheduled run intervals to make sure you don't miss any documents.

Schedule using cron jobs

You can schedule runs using cron jobs for Linux and macOS operating systems. Run crontab -e to edit the cron file with the default system text editor.

For example, to run the extractor every night at 01:00:

0 1 * * * /path/to/java -jar /path/to/documentum-extractor-<version>-<timestamp>-jar-with-dependencies.jar /path/to/config.yaml

Delete data

Documentum performs two types of delete data processes:

  • Hard delete - When a document is deleted, it's removed from the system and doesn't appear in searches. Documentum rarely performs hard deletes.
  • Soft delete - When a document is deleted, it isn't removed but marked as deleted. Typically, you do this by setting a value in metadata that indicates that a document is voided or deleted.

For the Cognite Documentum extractor to detect soft-deleted documents, set the soft-delete-key and soft-delete-values configuration parameters to key/value pairs that signify a soft deletion on your system.