> ## Documentation Index
> Fetch the complete documentation index at: https://docs.cognite.com/llms.txt
> Use this file to discover all available pages before exploring further.

# Customizing modules

> This article describes the design of the example modules and provides guidelines to create custom modules or to adjust the existing ones.

A [**module**](/cdf/deploy/cdf_toolkit/references/packages/index) is a bundle of logically connected **resource configuration** files, each of which sets up **one** CDF resource type with the minimum required access controls. For example, a module can contain resource configuration files to set up the necessary extractors, extraction pipelines, data sets, and transformation resources for a **complete data pipeline**.

Other use cases for modules include:

* **Transfer data** between different data structures.
* **Contextualize** data.
* **Set up** the required resources for an application, such as [InField](/cdf/deploy/cdf_toolkit/guides/set_up_infield).
* Load static **demo data**.

In general, modules should be **self-contained** and not rely on external resources. In some cases, referring to resources outside of the module makes sense. However, inter-module dependencies often come at the cost of maintainability, and you should avoid them if possible.

## Directory structure and naming standards

Establishing a directory structure and naming standard for modules and configuration files is vital to successfully deploying and administering CDF projects with the Cognite Toolkit.

### Module directory structure

Each module is implemented as a **directory** in the file system. The directory's name describes the content or the purpose of the module. Use **lowercase** for the name and **underscores** to separate words.

For example, a module that sets up the required resources to transfer asset data from a source system called "Scada" in the "Springfield" location could be named *springfield\_scada\_asset\_data\_pipeline*. To add more information, create a *README.md* file in the module directory.

Modules contain YAML configuration files that describe the resources needed to solve the task, including [access groups](/cdf/access/entra/guides/create_groups_oidc). The Cognite Toolkit requires that the YAML files are stored in subdirectories according to the resource type they describe. For an overview of the default resource hierarchy, see the [reference section](/cdf/deploy/cdf_toolkit/references/resource_library).

Example module structure for a data pipeline:

```shell theme={"languages":{"custom":["/_languages/kuiper.json","../_languages/kuiper.json"]}}
modules/
  springfield_scada_asset_data_pipeline/
    auth/
      asset.extractor.groups.yaml
      asset.processing.groups.yaml
      asset.read.groups.yaml
    
    data_sets/
      dataset.yaml
    
    extraction_pipelines/
      extractor_asset_valhall_workmate.yaml
      extractor_springfield_scada_asset.config.yaml
    
    transformations/
      springfield_scada_asset_hierarchy.sql
      springfield_scada_asset_hierarchy.Transformation.yaml
      springfield_scada_asset_hierarchy.yaml
    
    README.md
```

To organize large modules, the Cognite Toolkit accepts additional directory levels with arbitrary names:

```shell theme={"languages":{"custom":["/_languages/kuiper.json","../_languages/kuiper.json"]}}
modules/
  springfield_scada_asset_data_pipeline/
    access/
      auth/
        asset.extractor.groups.yaml
        asset.processing.groups.yaml
        asset.read.groups.yaml
    
    data_sets/
      dataset.yaml
    
    processing/
      extraction_pipelines/
        extractor_asset_valhall_workmate.yaml
        extractor_springfield_scada_asset.config.yaml
      
      transformations/
        springfield_scada_asset_hierarchy.sql
        springfield_scada_asset_hierarchy.Transformation.yaml
        springfield_scada_asset_hierarchy.yaml
    
    README.md
```

In both the examples above, the *config.\[env].yaml* file needs only to mirror the path to the module, not its content:

```yaml theme={"languages":{"custom":["/_languages/kuiper.json","../_languages/kuiper.json"]}}
environment:
  name: dev
  project: my-project
  type: dev
  selected:
    - modules/springfield_scada_asset_data_pipeline
```

### Resource configuration files (YAML)

To establish logical coherence across modules in Cognite Data Fusion, we highly recommend using the external IDs and following [the naming conventions](/cdf/deploy/reference/cdf_resource_naming_conventions) in your configuration files. The Cognite Toolkit accepts configurations that don't follow these recommendations but will display warnings.

The structure of the YAML resource configuration files matches the [Cognite Data Fusion API specification](/api-reference/concepts/20230101/api-description). The required formats are described in the [YAML configuration reference](/cdf/deploy/cdf_toolkit/references/resource_library) article.

<Tip>
  To include an existing CDF resource in a module, you can use the Python SDK to export it to a YAML file using the `dump_yaml()` method. See the [Python SDK documentation](https://cognite-sdk-python.readthedocs-hosted.com/en/latest/base_data_classes.html#cognite.client.data_classes._base.CogniteResource) for more information.
</Tip>

With a few exceptions, you can describe all resources as a list or a single instance in a YAML file:

**Option 1:**

```yaml theme={"languages":{"custom":["/_languages/kuiper.json","../_languages/kuiper.json"]}}
# first.dataset.yaml
externalId: df_asset_{{location}}
description: 'Asset data from {{location}}'
```

```yaml theme={"languages":{"custom":["/_languages/kuiper.json","../_languages/kuiper.json"]}}
# second.dataset.yaml
externalId: df_files_{{location}}
description: 'Files from {{location}}'
```

**Option 2:**

```yaml theme={"languages":{"custom":["/_languages/kuiper.json","../_languages/kuiper.json"]}}
# my.datasets.yaml
- externalId: df_asset_{{location}}
  description: 'Asset data from {{location}}'
- externalId: df_files_{{location}}
  description: 'Files from {{location}}'
```

## Security

Access control should be self-contained within each module. You need to configure which roles (**groups**) each module requires and which **resources** the groups can access.

Follow the principle of least privilege and allow each module to access only the information and resources necessary for its purpose. A module can have multiple groups, but those groups should not have access to resources defined outside of the module.

<Warning>
  Changing a Cognite Data Fusion project requires elevated privileges. Using elevated groups requires strict governance and should not be available to individual contributors in production environments or other environments containing sensitive data. Instead, approval mechanisms and dedicated IdP service principals/accounts should be established.
</Warning>

Example:

```yaml theme={"languages":{"custom":["/_languages/kuiper.json","../_languages/kuiper.json"]}}
# waterdata.reader.group.yaml
name: gp_waterdata_reader
sourceId: '{{ waterdata_reader_source_id }}'
description: ''
metadata:
  origin: cdf-project-templates
  sourceName: '{{ waterdata_reader_source_name }}'
capabilities:
  - dataModelsAcl:
      actions:
        - READ
      scope:
        spaceIdScope:
          spaceIds:
            - sp_waterdomain_data
            - sp_waterdomain_models
  - dataModelInstancesAcl:
      actions:
        - READ
      scope:
        spaceIdScope:
          spaceIds:
            - sp_waterdomain_data
            - sp_waterdomain_models
  - datasetsAcl:
      actions:
        - READ
      scope:
        idScope:
          ids:
            - ds_waterdomain
            - ds_hydrology
  - assetsAcl:
      actions:
        - READ
      scope:
        datasetScope:
          ids:
            - ds_waterdomain
            - ds_waterprep
            - ds_hydrology
  - filesAcl:
      actions:
        - READ
      scope:
        datasetScope:
          ids:
            - ds_waterdomain
            - ds_waterprep
            - ds_hydrology
```

The datasets, spaces, and RAW databases should be known to the module.

<Note>
  `idscopes` uses the **external** ID, while the API uses the **internal** ID. The Cognite Toolkit resolves the external ID to the internal ID before sending the request to the Cognite API to make the configuration transferrable between environments.
</Note>