Diagram parsing for data modeling
The features described in this section are currently in beta testing and are subject to change.
Diagram parsing analyzes engineering diagrams to extract valuable information. It automatically detects and maps asset and file tags to resources in Cognite Data Fusion (CDF). You can review and verify the automatic mappings as part of the process. For vectorized files (high-quality engineering diagrams), it also identifies symbols, lines, and connections, building a knowledge graph stored in CDF.
In vectorized diagrams, the files contain SVG elements, whereas in rasterized diagrams, the files are scanned PDFs.
Before you start
-
Make sure the files for parsing are present in the data model.
-
You must add the necessary access capabilities for parsing.
-
You must configure a location for the engineering diagram.
-
Set the
mime_type
to application/pdf, image/jpeg, image/png, or image/tiff.
Access capabilities
See access capabilities to add the necessary capabilities for diagram parsing workflows.
Concepts
Library
A library is a collection of symbols for automatic and manual detection of objects in an engineering diagram. Users create and modify libraries that are available only in their CDF project. The user must specify a library before running a parsing job.
Make sure to give meaningful and unique names when you create libraries.
Template
A template is also a collection of symbols. Cognite provides templates, which are available by default for every project. They're read-only but can be duplicated as project-specific libraries.
Templates are available out of the box. It's recommended to start with a template that contains a set of symbols. Using a template helps you get started with already detected symbols.
Symbol
A symbol is a blueprint to detect a particular type of equipment, such as a valve, instrument, or pipe. A symbol always exists within a library and contains one or more geometries. Symbols can be mapped to assets and files.
Geometry
Geometry defines one of the many possible compositions of a symbol. Different SVG paths are used to make a specific symbol, like a valve. Each of the combinations is saved as a geometry.
Adding geometries to each symbol makes automated detection more accurate by providing the algorithm with more examples. This approach also accounts for situations where a single symbol has several visual representations or minor visual differences exist across different files.
Diagram
A diagram is a single page of a parsed file that contains all the detected symbols and their connections. Each diagram is created with a particular library.
During the parsing process, the diagram is initially placed in a queue with the in queue status. Once a user picks it up, the status changes to parsing. After the parsing is completed, the status updates to parsed. If an error occurs, the status changes to failed, and the diagram will include a message about the error details.
Run diagram parsing
To parse a diagram:
-
Navigate to CDF > Data management > Contextualize > Diagram parsing.
-
Select the symbol library you want to use to detect symbols in the diagram.
-
Select the file you want to parse, and then select Run parsing.
-
Wait for the parsing to complete, and in the Actions column, select 👁 to view the parsed file.
View the parsed output
The parsed file has three tabs to verify different aspects of the automatic mappings: Symbol, Mapping, and Connection.
Vectorized files are high-quality files containing SVG elements suitable for symbol, map, and connection detection. When you parse rasterized files, such as images or scanned PDFs, that don't contain SVG elements, only the mapping tab is available.
-
On the Symbol tab, make sure that the relevant symbols are correctly detected.
-
To add a new symbol to the library, select the symbol and select which Asset class and Asset type the symbol represents. You can also add the symbol as a geometry to an existing symbol.
-
If a symbol maps to the wrong asset, select the symbol and then Detach from symbol. Only this particular instance of the symbol in that specific file is detached.
-
-
On the Mapping tab, map the detected tags and symbols to the correct assets.
You can view and map the detected tags and symbols to the correct instances in the CDF. The mapping process has two steps:
-
Tag detection and mapping: Identifies tags and maps to instances of of CogniteAsset or CogniteFile in CDF. The CogniteDiagramAnnotation defines this mapping.
-
Symbol to asset mapping: Maps detected symbols to CogniteAsset instances by comparing the locations of symbols and tags. If a symbol and a tag overlap, the symbol is mapped to the asset instance that serves as the end node of the annotation.
The P&ID diagram has different indicators to show the mapping status of detected tags and symbols:
-
Blue box indicates an approved CogniteDiagramAnnotation.
-
Purple box indicates a suggested CogniteDiagramAnnotation.
-
Purple symbol indicates the detected symbol is either linked to a suggested CogniteDiagramAnnotation or not linked to any CogniteDiagramAnnotation.
-
Blue symbol indicates the detected symbol is linked to an approved CogniteDiagramAnnotation.
All symbols mapped to an instance in CDF should be blue. If a symbol is purple, you need to review it. See the following mapping scenarios:
-
For a valid CogniteDiagramAnnotation/symbol, select Verify to verify the mappings and change the annotation status to Approved.
-
If a CogniteDiagramAnnotation/symbol is mapped to the wrong instance, select the CogniteDiagramAnnotation/symbol and Change mapping.
-
For an unmapped symbol, there are two options:
-
When the CogniteDiagramAnnotation links to a correct CDF instance, select the CogniteDiagramAnnotation and the symbol, and Link. This maps the symbol to the instance and changes the status of CogniteDiagramAnnotation to Approved. When the CogniteDiagramAnnotation links to an incorrect CDF instance, select the CogniteDiagramAnnotation and link it to the correct instance before linking it to the symbol.
-
Select the unmapped symbol and Add asset mapping +. Select the asset to map it to. This creates a CogniteDiagramAnnotation with the status, Approved.
-
-
Select 🗑 to remove incorrect mappings of detected symbols. This unlinks and deletes the linked CogniteDiagramAnnotation.
-
To unlink a symbol and its linked CogniteDiagramAnnotation, select the symbol and Unlink.
-
To add an annotation manually, select the
+
(Add mapping). Create a bounding box and select an asset to link it to. This creates a CogniteDiagramAnnotation with the status, Approved.
-
-
-
On the Connection tab, check that the symbols are correctly connected.
-
Hover over a symbol to view the entire connection group and select a symbol to view or edit the closest connection.
-
To delete incorrect connections, select .
-
Select two symbols to create missing connections and then select Create connection.
-
-
Rerun the parsing and verify that your changes have been recognized.
- In Edit layer visibility icon, toggle Show background document to show and hide the diagram's background document details (SVG paths).
Manage the parsed output
Use the API to retrieve your parsed output. For more information, see API documentation.
Manage symbol libraries
Libraries contain the symbols that diagram parsing uses to detect objects in engineering diagrams. You can create or modify libraries to add new symbols or geometries of an existing symbol.
To add a symbol to a library:
-
Navigate to CDF > Data management > Contextualize > Diagram parsing.
-
Select Libraries and the library to which you want to add the symbol. If you don't have a symbol library, create one:
-
To make a copy of a library template, select ⋮ > Duplicate template.
-
Select + to create a new library.
-
-
Select Open file and open the file containing the symbols you want to add. We recommend using legend files containing a set of symbols.
-
To select the symbol you want to add:
- Drag and move the cursor to select the symbol.
(or)
- Use the keyboard shortcut: Shift while you drag to select a symbol.
Specify which Asset class (for example, Pump) and Asset type (for example, Centrifugal) the symbol represents.
- To add a geometry to an existing symbol, select the symbol and Add as geometry.
For pipe detection, a line is identified as a pipe only if it has an endpoint touching at least one detected symbol. Adding more geometries of piping symbols does not affect the detection process. When you review the parsed output, adding lines as pipes will create additional detected pipes from the selected lines.
When you delete a symbol from the library page, the symbol is deleted from every file using it. When you delete a geometry from the library page, this shape of the geometry won't be identified as a symbol for parsing diagrams in future parsings.