Diagram parsing for data modeling
The features described in this section are currently in beta testing and are subject to change.
Use Diagram parsing to extract information from engineering diagrams by detecting and mapping assets and files to resources in Cognite Data Fusion (CDF). You can parse all assets and files from the same location or configure detection parameters to narrow the parsing scope. You review and verify the automatic mappings as part of the process.
For vectorized diagrams, this tool also identifies symbols, lines, and connections, building a knowledge graph stored in CDF.
Core concepts
A library is a set of symbols used to detect objects in engineering diagrams. Libraries are CDF project-specific, and you must select one before running a parsing job. Use clear, unique names for each library.
A template is a set of symbols that's available by default for every CDF project. They're read-only but can be duplicated as project-specific libraries.
Use templates to get started with diagram parsing.
A symbol is a blueprint to detect a particular type of equipment, such as a valve, instrument, or pipe. A symbol always exists within a library and contains one or more geometries. Symbols can be mapped to assets and files.
Geometry defines one of the many possible compositions of a symbol. Different SVG paths are used to make a specific symbol, like a valve. Each of the combinations is saved as a geometry. Adding geometries to each symbol makes automated detection more accurate by providing the algorithm with more examples. This approach also accounts for situations where a single symbol has several visual representations or minor visual differences exist across different files.
A diagram is a single page of a parsed file that contains all the detected symbols and their connections. Each diagram is created with a particular library. In vectorized diagrams, the files contain SVG elements. In rasterized diagrams, the files are scanned PDFs.