Saltar al contenido principal

About document parsing

Beta

The features described in this section are currently in beta testing and are subject to change. The features are currently only available to customers via our Early Adopter program. For more information and to sign up, visit the Early Adopter group on the Cognite Hub.

Extract data from documents with key-value data representation, such as datasheets, equipment specifications, or sections in reports into data model views. You can modify and verify the extracted data before you approve the parsing job. The approved data is ingested into Cognite Data Fusion (CDF) data model instances and becomes available for users to explore and analyze, for instance, for asset and equipment monitoring.

The input for the document parsing job must meet these criteria:

  • PDF file format with English text and a maximum of 100 pages.
    • Results improve with smaller files.
  • Embedded text or scanned documents
  • Describe a single asset or piece of equipment
  • Key-value pair data representation

Before you start

consejo

See best practices for optimizing data models for better search results when using Copilot in CDF.

Parse documents

  1. Navigate to Data management > Contextualize > Document parsing.

  2. Select Create parsing task and the documents you want to parse.

información

You can parse several documents simultaneously, but the data from each document is ingested into a separate data model view.

  1. Select Next to continue.

  2. Select the views you want to populate parsed data into and select Run.

  3. Review the parsed data.

    • Select a property in the Parsed data sidebar to zoom into a field in the document.

    • Hover over a field to update (Edit icon)values.

  4. Reject or approve the parsing. The approved data is stored as a data model instance.