> ## Documentation Index
> Fetch the complete documentation index at: https://docs.cognite.com/llms.txt
> Use this file to discover all available pages before exploring further.

# Parsing documents

> Step-by-step guide to extract data from documents with key-value representation into data model views in Cognite Data Fusion (CDF).

<Warning>
  The features described in this section are in [public preview](/cdf/product_feature_status#public-preview) and may change. The features are currently only available to customers via our **Early Adopter** program. For more information and to sign up, visit the [Early Adopter group](https://hub.cognite.com/groups) on the [Cognite Hub](https://hub.cognite.com).
</Warning>

This procedure is for **data engineers** who populate data model views from key-value PDFs. You should already have source documents that meet [Input requirements](#input-requirements) and have completed the tasks in [Before you start](#before-you-start).

Each extracted property value includes a [confidence score](/cdf/integration/guides/contextualization/document_parser) to help you decide which values to review and correct before you approve the parsing job. Approved data is ingested into data model instances.

## Input requirements

The input for the document parsing job must meet these criteria:

* PDF documents with English text and up to 100 pages. Smaller files usually give better results.
* Embedded text or scanned documents.
* Documents that describe a single asset or piece of equipment.
* Key-value pair data representation.

## Before you start

* [Ingest the documents](/cdf/integration/concepts/transformation/index) into CDF.
* Set up [access capabilities](/cdf/access/guides/capabilities#document_parser).
* [Create a view](/cdf/dm/dm_concepts#view) in a data model with properties that reflect the key-value data.

<Tip>
  For better confidence scores when parsing, align property names with how fields appear in your documents. For broader guidance on data models for AI (including document parsing and search), see [Optimizing data models for AI](/cdf/dm/dm_guides/dm_best_practices_ai_search).
</Tip>

## Parse documents

<Steps>
  <Step title="Navigate to document parsing">
    Navigate to **Data fusion** > **Contextualize** > **Document parsing**.
  </Step>

  <Step title="Create parsing task">
    Select **Create parsing task**, and then select the documents you want to parse.

    <Info>
      You can parse several documents at the same time, but data from each document is ingested into a separate data model view.
    </Info>
  </Step>

  <Step title="Continue to view selection">
    Select **Next** to continue.
  </Step>

  <Step title="Select views and run">
    Select the views you want to populate parsed data into and select **Run**.
  </Step>

  <Step title="Review the parsed data" id="review-the-parsed-data">
    Review the parsed data.

    * Select a property in the **Parsed data** sidebar to zoom into a field in the document.

    * Hover over a field to update the value.

    * Enable **Confidence score** to view the confidence level for each extracted property. Use the confidence score to decide how much to trust each value and where to focus your review before you approve:

    | Score range                  | Interpretation                                                                                                              |
    | ---------------------------- | --------------------------------------------------------------------------------------------------------------------------- |
    | High (for example, 80–100%)  | Strong match. The extracted value is likely correct; spot-check if needed.                                                  |
    | Medium (for example, 50–80%) | Moderate match. Review the value and the document to confirm it maps to the right property.                                 |
    | Low (for example, below 50%) | Weak match. The field name in the document may differ from the property name. Verify or correct the value before approving. |

    <Info>
      Exact score bands may vary. Focus review on lower-confidence properties; properties with higher scores usually require less review and can be trusted for automated workflows.
    </Info>

    If many properties show low scores, your **view property names** may not align with **field names in the document**, for example, abbreviations, different wording, or spelling. Rename properties in the view so they resemble the field names more closely, then run parsing again.
  </Step>

  <Step title="Approve or reject parsing">
    Approve or reject the parsing. The approved data is stored as a data model instance.
  </Step>
</Steps>

## Further reading

* [About document parsing](/cdf/integration/guides/contextualization/document_parser) – Overview of document parsing, what the confidence score means, and how it is calculated
