Data Modeling Concepts
The most fundamental building block of a data model in CDF are data models, data types, instances, and spaces. For more details, check out the introduction to concepts of data modeling.
However, to better understand what happens under the hood and solve more advanced use cases, it is important to understand some more granular concepts that powers data models in CDF. In this article, we will walk through the concepts again with greater details, focusing through 3 key aspects of a data model - Struture, Data and Access Control.
Structure of a Data Model
Data model: A data model represents data entities and how they relate to each other. Regardless of their complexity and size, each data model is composed of data types.
Each version of a data model gives you a
/graphql
end point at which you can query for data. This means that you can have multiple versions of the same data model alive at the same time, so there is no downtime on migrations between data models.
Data type: A data type represents the structure of an entity and is analogous to a database table or view. In CDF, you can import and reuse data types across data models.
Each data type is composed of fields that describe the attributes of the entity. Each field is described by a name and the type of data that is contained within. These are represented in GraphQL via interfaces and types.
However, while our data modeling language, powered by GraphQL, speaks in this language. The API provides a different set of concepts to help power the ability for you to iterate on a data model without worrying about breaking the data.
How is this represented in the API?
Views
In the API, a data type is called a view. Like GraphQL interfaces, views is a group of properties that represents a data type, and can extend from other views, and they can be imported across different data models.
When importing views across different data models, you can assign new names to the properties, implement another view and apply filters to only find subsets of the data.
While we use properties when describing views, in real use, they act in the same way as a field for a data type: defining structure.
In fact, fields are powered by properties just as a data type is powered by a view.
Views are versioned, so you may update a view (add/remove properties, change filters etc), and keep all previous versions to avoid breaking changes in any consumer of the view. To enable this functionality while making sure the data is trustable and maintainable, a new concept is needed to power the view. These are called containers.
Containers
Containers are the physical storage of data. Think of them as tables in an SQL database, with properties as columns, and indexs and constraints on the data. All views are composed of these containers, and each container can be used in multiple views.
In addition, a property can be a direct relation, which is a reference to a singlular node. These are 1 -> 1 relationships, meaning a one-way single relation from one node to another.
These are the building blocks of the data modeling capabilities in CDF. In summary, another view of the data model can be:
Data model: Grouping of views.
Views: Exactly the same as a data type - Grouping of properties that represents a data type (e.g.
Movie
orActor
). Composed of containers and other views (which are also powered by containers).- Container: Physical storage of data. A collection of properties, indexes and constraints.
Data in a Data Model
We call all data that conforms to a data model an instance of data. Instances are categorized into the following concepts:
Node: An object of data with a set of properties attached to it.
Relation: An relational object, that points from one node to another. These are then categorized into 2 different type, depends on the type of data within:
- Direct relation - a 1-1 relation with no additional information other than the node it starts and ends with.
- Edge - any other type of relation, with the added bonus of being able to add additional data into the edge.
More on edges
An edge is pointing from a node to another node, with a defined type:
startNode
: a reference to the start nodeendNode
: a reference to the end nodetype
: a identifier that acts as a name to the edge. This by default is the name of the relation when defining a data model in GraphQL, like Pump.locatedIn for a Pump and Facility relationship.
Additionally, an edge can be tied to a view, providing the ability to add extra properties to a relationship.
All instances (nodes and edges) has to conform to the structures defined in a "view" (data type) with all its properties. Furthermore, they have built-in properties associated with them, like e.g. version and last modified.
Access control of a Data Model
Space: A workspace for data models and instances. You can scope access control to spaces, for example, to define who can read or write to your data model or instances. You can also be more granular and put all the compositional pieces of a data model (views, containers) and data (edges, nodes) into different spaces.
While Containers are composed into Views, which gets grouped into Data Model. For the purpose of composability, Views and Containers can exist independently, outside of data models. You can do by directly creating managing them via the API.
Hence, while the logical grouping of concept is above, it is important to keep in mind that while Data Model, View and Container can help each other be organized better, they can also serve their purpose in isolation.
What does this mean for unique identifier (external ID)
For data model (and underlying concepts)
All data model, views (data types) and containers lives in exactly one space and has an externalId
, so a unique identifier to an instance is the tuple (space, externalId)
.
For data
An instance lives in exactly one space and has an externalId
, so a unique identifier to an instance is the tuple (space, externalId)
.
This also means all node references for an edge will also have to specify the space:
startNode
: a reference to the start node(space, externalId)
endNode
: a reference to the end node(space, externalId)
In the next unit, you'll learn how the GraphQL data modeling language relates to the concepts defined above.