Set up the Documentum extractor
We are deprecating the Documentum extractor in favor of the File extractor. We strongly encourage you to adopt the File extractor as soon as possible.
Follow the steps below to set up the extractor.
Before you start
-
Check the server requirements for the extractor.
-
Make sure the extractor has the following access capabilities in a Cognite Data Fusion (CDF) project:
files:read
andfiles:write
raw:read
,raw:write
, andraw:list
if you're ingesting metadata into the CDF staging area.
TipYou can use OpenID Connect and your existing identity provider (IdP) framework to manage access to CDF data securely. Read more.
-
Set up a Windows Update schedule. Note that the update may reboot the machine, causing extractor downtime.
-
Navigate to Data management > Integrate > Extractors to download the installation files for the extractor.
Permission issuesSet Modify permission under Properties in the installation folder to avoid permission issues.
Connect to Documentum
Use the the mode
configuration parameter to configure the extractor to connect to Documentum through the D2 REST API or the Documentum Foundation Classes (DFC) Java SDK.
Cognite recommends connecting to Documentum using the D2 REST API.
-
If you connect using the D2 REST API, you may have to install the root certificate from the D2 server into the Java Virtual Machine (JVM) trusted CA store. Download the certificate file, and import it into the appropriate Java Runtime Environment (JRE):
keytool -importcert -trustcacerts -keystore cacerts -file .\Path\to\certificate.cer -keystore 'C:\Program Files\Java\jdkVERSION\jre\lib\security\cacerts'
-
If you connect using the DFC Java SDK, you must create a configuration file for the DFC library, dfc.properties, in addition to the standard configuration file. The DFC configuration must be in the Java properties format.
Run as a jar file
-
Download and add the dfc.jar SDK file from Documentum to the lib folder in the installation directory.
-
Run the jar file:
java -jar path/to/documentum-extractor-<version>.jar path/to/config.yaml
-
If you're connecting via the DFC Java SDK, create a DFC properties file as a second argument:
java -jar path/to/documentum-extractor-<version>.jar path/to/config.yaml path/to/dfc.properties
Schedule automatic runs
To schedule automatic runs using Windows, see Run extractors in Windows Task Scheduler.
If you use a batch script to call the .jar file, the Windows Task Scheduler terminates the script process without terminating children (the Java process). Use the Task Manager to end the Java process to stop the extractor.
Leave some overlap between the quick sync intervals and the scheduled run intervals to make sure you don't miss any documents.
Schedule using cron
jobs
You can schedule runs using cron
jobs for Linux and macOS operating systems. Run crontab -e
to edit the cron
file with the default system text editor.
For example, to run the extractor every night at 01:00:
0 1 * * * /path/to/java -jar /path/to/documentum-extractor-<version>-<timestamp>-jar-with-dependencies.jar /path/to/config.yaml
Delete data
Documentum performs two types of delete data processes:
- Hard delete - When a document is deleted, it's removed from the system and doesn't appear in searches. Documentum rarely performs hard deletes.
- Soft delete - When a document is deleted, it isn't removed but marked as deleted. Typically, you do this by setting a value in metadata that indicates that a document is voided or deleted.
For the Cognite Documentum extractor to detect soft-deleted documents, set the soft-delete-key
and soft-delete-values
configuration parameters to key/value pairs that signify a soft deletion on your system.