Assess extraction accuracy
The document parser assigns a confidence score to each extracted property value. The score appears as a percentage (for example, 60%). See the score ranges in Parsing documents for how to interpret it.What the scores measure
The score compares two strings: the property name from your data model view and the field name as it appears in the document, for example, the label or caption next to a value, such as “Design Pressure” or “Manufacturer”. A higher score means a closer match. The score does not guarantee that the extracted value is correct.How the score is calculated
The parser uses a hybrid calculation: it first scores how alike the two strings look (syntactic similarity). Only if that score falls below a threshold does it also score whether the strings mean the same thing (semantic similarity). Syntactic similarity compares form: characters, spelling, and string structure. The document parser computes a similarity ratio by finding the longest contiguous character sequences that match in both strings, then applying the same idea to the leftover parts. Identical or near-identical strings score high, while different spelling or wording scores lower. For example,Design Pressure vs design pressure gives a high score, while Design Pressure vs DP gives a low score.
Semantic similarity (optional) compares meaning, not spelling or layout: two strings can match when they describe the same concept, even if the words or formatting differ. The parser uses this step only when the syntactic score is already low. It may turn each string into a vector embedding (a numeric representation of meaning) and measure how close those vectors are. For example, Design pressure (bar) vs designPressure can score high semantically even though they look different.
Next steps
- Parse documents – Step-by-step procedure to parse documents and ingest data into data model views