About document parsing

Beta

The features described in this section are currently in beta testing and are subject to change. The features are currently only available to customers via our Early Adopter program. For more information and to sign up, visit the Early Adopter group on the Cognite Hub.

Extract data from documents with key-value data representation, such as datasheets, equipment specifications, or sections in reports into data model views. You can modify and verify the extracted data before you approve the parsing job. The approved data is ingested into Cognite Data Fusion (CDF) data model instances and becomes available for users to explore and analyze, for instance, for asset and equipment monitoring.

The input for the document parsing job must meet these criteria:

PDF file format with English text and a maximum of 100 pages.
- Results improve with smaller files.
Embedded text or scanned documents
Describe a single asset or piece of equipment
Key-value pair data representation

Before you start

Ingest the documents into CDF.
Set up access capabilities.
Create a view in a data model with properties that reflect the key-value data.

tip

See best practices for optimizing data models for better search results when using AI in CDF.

Parse documents

Navigate to Data management > Contextualize > Document parsing.
Select Create parsing task and the documents you want to parse.

info

You can parse several documents simultaneously, but the data from each document is ingested into a separate data model view.

Select Next to continue.
Select the views you want to populate parsed data into and select Run.
Review the parsed data.
- Select a property in the Parsed data sidebar to zoom into a field in the document.
- Hover over a field to update ()values.
Reject or approve the parsing. The approved data is stored as a data model instance.

Before you start​

Parse documents​

Before you start

Parse documents