Skip to main content

Get started with data modeling

Prerequisites

Before starting this tutorial, it is important to understand the Fundamental building blocks of data modeling in CDF. If you already know the building blocks, you're good to get started!

Overview of the quickstart

In this quickstart, you can choose between a simple data model that uses data about popular movies with their actors and directors, as well as a (simplified) model of an Asset Performance Management system. The latter is a more complex data model, but shows how you can model a real-world system, including references to CDF resource types.

Both the quickstarts have a predefined sample data model and data so you can populate the data model. The steps you go through are the same for both:

  1. Upload the source data into the RAW data store in CDF (the simplest way to load data into CDF quickly).
  2. Create a data model.
  3. Populate a data model with data using CDF transformations.
  4. Query the data from a data model.
Show GraphQL Explorer

The above diagram shows the various flows you will go through in this quickstart. The source data you will use lives in the GitHub repository data-model-examples. Each of the steps to load the data into Cognite Data Fusion (from the green left into the blue middle) can either be done through the CDF portal or the command line using the CDF toolkit in the same repository as the example data. If you don't want to install Python and use the command line toolkit, there is also a Jupyter notebook called Quickstart that you can find in the Fusion UI. This allows you to go through the steps of loading the various parts of the data set without installing anything locally on your computer.

The dark blue boxes in the middle indicate the core storage services in Cognite Data Fusion. When you create queries later, the query service will automatically use the various data storage services and allow you to only query once to get the data you need.

CLI TOOLKIT

Cognite Data Fusion supports the creation and management of data models and data both in a simplified way through the CDF portal, and in a more advanced method for large-scale management of enterprise data using Software Development Kits (SDKs), Command-Line Interface (CLI) tools, and the APIs. This quickstart allows you to either use the portal or if you are comfortable with the command line and Python, to use a toolkit to load your data directly into CDF. The same toolkit is also available as a Quickstart notebook in the Fusion Jupyter Notebook. This quickstart will focus on the portal-based approach while explaining how to use the toolkit.

CLI TOOLKIT IN FUSION'S JUPYTER NOTEBOOK

The CLI Toolkit is also available in Fusion's Jupyter Notebook (in the browser), so you don't have to install anything on your computer. Navigate to CDF portal application and select Explore > Jupyter notebooks to get started. The examples are already available in the quickstart folder of the File explorer. Open the Quickstart.ipynb notebook and follow the steps to upload the data into CDF.