Setting up the Documentum extractor
Follow the steps below to set up the extractor.
Before you start
Check the server requirements for the extractor.
Make sure the extractor has the following access capabilities in a Cognite Data Fusion (CDF) project:
raw:listif you're ingesting metadata into the CDF staging area.
You can use OpenID Connect and your existing identity provider (IdP) framework to manage access to CDF data securely. Read more.
Set up a Windows Update schedule. Note that the update may reboot the machine, causing extractor downtime.
In the CDF user interface, navigate to the Extract data section to download the installation files for the extractor.Permission issues
Set Modify permission under Properties in the installation folder to avoid permission issues.
Connect to Documentum
Use the the
mode configuration parameter to configure the extractor to connect to Documentum through the D2 REST API or the Documentum Foundation Classes (DFC) Java SDK.
Cognite recommends connecting to Documentum using the D2 REST API.
If you connect using the D2 REST API, you may have to install the root certificate from the D2 server into the Java Virtual Machine (JVM) trusted CA store. Download the certificate file, and import it into the appropriate Java Runtime Environment (JRE):
keytool -importcert -trustcacerts -keystore cacerts -file .\Path\to\certificate.cer -keystore 'C:\Program Files\Java\jdkVERSION\jre\lib\security\cacerts'
If you connect using the DFC Java SDK, you must create a configuration file for the DFC library, dfc.properties, in addition to the standard configuration file. The DFC configuration must be in the Java properties format.
Run as a jar file
Download and add the dfc.jar SDK file from Documentum to the lib folder in the installation directory.
Run the jar file:
java -jar path/to/documentum-extractor-<version>.jar path/to/config.yaml
If you're connecting via the DFC Java SDK, create a DFC properties file as a second argument:
java -jar path/to/documentum-extractor-<version>.jar path/to/config.yaml path/to/dfc.properties
Schedule automatic runs
To schedule automatic runs using Windows, see Run extractors in Windows Task Scheduler.
If you use a batch script to call the .jar file, the Windows Task Scheduler terminates the script process without terminating children (the Java process). Use the Task Manager to end the Java process to stop the extractor.
Leave some overlap between the quick sync intervals and the scheduled run intervals to make sure you don't miss any documents.
You can schedule runs using
cron jobs for Linux and macOS operating systems. Run
crontab -e to edit the
cron file with the default system text editor.
For example, to run the extractor every night at 01:00:
0 1 * * * /path/to/java -jar /path/to/documentum-extractor-<version>-<timestamp>-jar-with-dependencies.jar /path/to/config.yaml
Documentum performs two types of delete data processes:
- Hard delete - When a document is deleted, it's removed from the system and doesn't appear in searches. Documentum rarely performs hard deletes.
- Soft delete - When a document is deleted, it isn't removed but marked as deleted. Typically, you do this by setting a value in metadata that indicates that a document is voided or deleted.
For the Cognite Documentum extractor to detect soft-deleted documents, set the
soft-delete-values configuration parameters to key/value pairs that signify a soft deletion on your system.