Skip to main content
The features described in this section are currently in beta testing and are subject to change. The features are currently only available to customers via our Early Adopter program. For more information and to sign up, visit the Early Adopter group on the Cognite Hub.
Document parsing extracts data from documents with key-value representation into data model views in Cognite Data Fusion (CDF). Each key-value pair in the document maps to a property in the data model view. You can modify and verify extracted data before you approve the parsing job. Approved data is ingested into data model instances and becomes available for users to explore and analyze, for example in asset and equipment monitoring.

Assess extraction accuracy

The document parser assigns a confidence score to each extracted property value. The score appears as a percentage (for example, 60%). See the score ranges in Parsing documents for how to interpret it.

What the scores measure

The score compares two strings: the property name from your data model view and the field name as it appears in the document, for example, the label or caption next to a value, such as “Design Pressure” or “Manufacturer”. A higher score means a closer match. The score does not guarantee that the extracted value is correct.

How the score is calculated

The parser uses a hybrid calculation: it first scores how alike the two strings look (syntactic similarity). Only if that score falls below a threshold does it also score whether the strings mean the same thing (semantic similarity). Syntactic similarity compares form: characters, spelling, and string structure. The document parser computes a similarity ratio by finding the longest contiguous character sequences that match in both strings, then applying the same idea to the leftover parts. Identical or near-identical strings score high, while different spelling or wording scores lower. For example, Design Pressure vs design pressure gives a high score, while Design Pressure vs DP gives a low score. Semantic similarity (optional) compares meaning, not spelling or layout: two strings can match when they describe the same concept, even if the words or formatting differ. The parser uses this step only when the syntactic score is already low. It may turn each string into a vector embedding (a numeric representation of meaning) and measure how close those vectors are. For example, Design pressure (bar) vs designPressure can score high semantically even though they look different.

Next steps

  • Parse documents – Step-by-step procedure to parse documents and ingest data into data model views
Last modified on March 23, 2026