Input requirements
The input for the document parsing job must meet these criteria:- PDF file format with English text and a maximum of 100 pages. Results improve with smaller files.
- Embedded text or scanned documents.
- Documents that describe a single asset or piece of equipment.
- Key-value pair data representation.
Before you start
- Ingest the documents into CDF.
- Set up access capabilities.
- Create a view in a data model with properties that reflect the key-value data.
Parse documents
Create parsing task
Select Create parsing task and the documents you want to parse.
You can parse several documents simultaneously, but the data from each document is ingested into a separate data model view.
Review the parsed data
Review the parsed data.
- Select a property in the Parsed data sidebar to zoom into a field in the document.
- Hover over a field to update the values.
- Enable Show confidence score to view the confidence level for each extracted property. The confidence score helps you assess the quality of extracted data and identify fields that may need manual review.
The confidence score is calculated based on the quality of bounding boxes detected in the document and the similarity between property keys in your data model view and the extracted text. Higher scores indicate more reliable extractions that can be trusted for automated workflows.