Using Cognite Functions
Early adopter
The features described in this section are currently only available to customers via our Early Adopter program. Contact your Cognite representative for more information.
IMPORTANT
We have deprecated legacy authentication with API keys and strongly encourage customers to migrate from using API keys to using OpenID Connect authentication as soon as possible. This article describes authentication with OpenID Connect.
Cognite Functions lets users deploy Python code to Cognite Data Fusion (CDF) and call the code on-demand or schedule the code to run at regular intervals.
Authentication with OpenID Connect
With OpenID Connect, you can instantiate your CogniteClient
with either a user token or with client credentials.
Authenticate with a user token
client = CogniteClient(project="my-project", token=my_token)
Note
my_token
is a string or a callable.
Authenticate with client credentials
To instantiate your CogniteClient
with client credentials:
base_url = "https://api.cognitedata.com"
tenant_id = "my-tenant-id"
client = CogniteClient(
project="my-project",
token_client_id="my-client-id",
token_client_secret="my-client-secret",
token_scopes=[f"{base_url}/.default"],
token_url=f"https://login.microsoftonline.com/{tenant_id}/oauth2/v2.0/token",
)
Create a function
You can create a function from a Jupyter Notebook, a folder, an uploaded zip file, or from CDF. The examples below assume that you have the experimental Python SDK installed in your environment.
Create a function from a Jupyter Notebook
To create a function from a Jupyter Notebook, refer directly to a Python function named handle
. This is the easiest way to get started and is only suitable for short and simple functions.
Example:
Define the following function in your notebook:
def handle(client, data):
asset_no = data["assetNo"]
print("Returning asset number {}".format(asset_no))
assets = client.assets.list()
return {"assetName": assets[asset_no].name}Deploy it to CDF:
from cognite.experimental import CogniteClient
# Instantiate with user token or client credentials
client = CogniteClient(project="my-project", token="my-token")
func = client.functions.create(
name="my-function",
external_id="my-function",
function_handle=handle
)
NOTE
The function_handle
argument points to the handle
function defined above, and the name in the definition must be handle
.
When you deploy with the function_handle
argument, you need to import within handle()
. In other cases (with the folder
or file_id
arguments), the imports can be outside handle()
.
Create a function from a folder
For more advanced functions, you can define your function in a module that imports other modules. To create a function, replace the function_handle
argument with the folder
argument and set it equal to the path of the folder containing your code.
Example:
from cognite.experimental import CogniteClient
# Instantiate with user token or client credentials
client = CogniteClient(project="my-project", token="my-token")
func = client.functions.create(
name="my-function",
external_id="my-function",
folder="path/to/folder"
)
Your modules inside path/to/folder
can have nested imports of arbitrary depth. The only requirement is a file called handler.py
(assumed to be in the root folder unless otherwise specified) and a function named handle
within this module. This serves as the entry point to the function.
If your handler.py
file is not in the root folder, you must specify its location via the argument function_path
(defaults to handler.py
).
If your function depends on other packages, you can list these packages in a requirements.txt
file. You must place the file in the root folder. These packages will be pip
installed when the function is deployed.
Create a function from an uploaded zip file
You can also upload a zip file containing the function code directly to the files API and refer to its id
when you create the function.
Example:
from cognite.experimental import CogniteClient
# Instantiate with user token or client credentials
client = CogniteClient(project="my-project", token="my-token")
func = client.functions.create(
name="my-function",
external_id="my-function",
file_id=123456789
)
The zip file must follow the same structure as when you create a function from a folder with the zip file itself as the root folder (there should not be an additional root folder within the zip file).
Additional arguments
The most important optional arguments, in addition to the ones above, are:
function_path
(string): The relative path from the root folder to the file containing thehandle
function. Defaults tohandler.py
.secrets
(dict): Secrets which will be stored securely on your function and available call-time.env_vars
(dict): Environment variables that can be accessed inside your function call-time.runtime
(string): The function runtime. Valid values are[“py37”, “py38”, “py39”, None]
, andNone
translates to the API default.index_url
(string): A URL pointing to a different package repository. Supports basic HTTP authentication.extra_index_urls
(list): A list of URLs pointing to additional python package repositories. Supports basic HTTP authentication.
IMPORTANT
Be aware of the intrinsic security implications of using the index_url
and
extra_index_urls
options. These settings can open for attacks known as
dependency
confusion,
whereby the package dependency resolver incorrectly installs a public package with
the same name as the intended private package.
To mitigate this: only use the index_url
option and have your private repository server satisfy the following:
- Mirror the public default PyPi package index. I.e., your index should be able to serve public packages from the public index.
- In the event of a package name being shared between a private package in your private index, and a public package in the public index, your server must explicitly choose the private package.
Some package index server tools satisfy these requirements out of the box, for instance DevPi.
Additional information: Dependency Confusion
In the Python ecosystem, this is made possible when multiple indices are
specified using the --extra-index-url
option for PIP. When installing a
package, the dependency resolver will look for a package with the requested
name in the default index (PyPI by default, can be overridden by --index-url
).
If the requested package is not found there, it will look at the indices
specified in --extra-index-url
.
This means that if a malicious entity manages to guess the name of a package hosted in the additional indices, they can upload a package with the same name to the public package index, containing malicious code. When resolving, if the package version is not pinned, PIP will choose the package with the highest version number. Therefore, if the malicious package has the highest version number, it will be chosen by PIP. The user can to some degree mitigate this risk by pinning the package version. In that case, PIP will pull the package from the private index if that particular version only exists there. However, if the version exists in both indices, the behaviour is undefined, and you cannot determine which index PIP will pull the package from.
The whitepaper 3 Ways to Mitigate Risk When Using Private Package Feeds by Microsoft is a good introduction to the dependecy confusion problem.
For a complete list of arguments, refer to the SDK documentation.
The function definition
The entry point of your function must be named handle
with any of the following arguments:
client
- a pre-instantiatedCogniteClient
automatically available inside your function call-time. It is instantiated with the same permissions as the entity calling the function (on-behalf-of flow). Note that theclient
argument is only specified when you define the function but not when you call it; the client is automatically provided when you call it.data
- (dict) any data you send to your function.WARNING
Secrets or other confidential information should not be passed via this argument. Use the dedicated
secrets
argument in functions.client.create() for this purpose.secrets
- (dict) serves as a way to send sensitive information to your function that must be available when calling it. You can provide these secrets as a dictionary via the secrets argument inclient.functions.create()
.function_call_info
- (dict) contains information about the function being called. It has the keyfunction_id
, and, if the call is scheduled, the keysschedule_id
andscheduled_time
.
The only requirement for the return of your function is that it is JSON-serializable (and on GCP, is less than 1 MB in size). In general, we recommend that you pass small amounts of data to and from functions and instead read from and write to CDF directly from within your function.
The function client.functions.create()
returns a function object with various metadata. One of the properties of this object is status
which will start with Queued
, change to Deploying
when the deploy process has started, and end in Ready
if the build was successful, or Failed
if it failed. You need to wait until the function is Ready
before calling it. This typically takes about 3-10 minutes. To retrieve the latest status of your function, do:
func.update()
Call a function
When your function is Ready, you can call it.
Example:
call = func.call(data={"assetNo": 0})
Note that you don’t pass a client
here, even though it is part of the function definition. You get it automatically inside the function.
When the function is called, a session is created on behalf of the user, and the user’s permissions and scopes are inherited by the client inside the function.
The call object contains metadata about the call, but not the response itself:
{
"id": 1234,
"start_time": "2020-07-15 08:23:17",
"end_time": "2020-07-15 08:25:15",
"status": "Completed",
"function_id": 456789
}
To retrieve the response:
call.get_response()
To get the logs:
call.get_logs()
All print statements appear in your logs. If an exception is raised, the traceback appears in the logs.
Important
Functions does not currently support the logging module, but print statements will turn up as logs. In some cases, importing the logging module can hide prints and exception tracebacks in the logs. If you experience Failed
calls with no traceback in the logs, make sure that the logging module is not imported. If you still don’t see any logs, your function probably ran out of memory during execution. You can check the memory consumption of your function locally with the Memory Profiler.
Schedule a function
Create a schedule to run a function at regular intervals. Note that when you have created the schedule, you cannot change it. You must delete it and create a new schedule.
Example:
schedule = client.functions.schedules.create(
name="my-schedule",
cron_expression="0 0 * * *",
function_id=123456789,
data={"assetNo": 0},
client_credentials={"client_id": "my-client-id", "client_secret": "my-client-secret"}
)
This example runs your function every day at 12:00 AM, as specified by the cron expression. You attach the schedule to your function via the function_id
argument. Note that you need to provide client_credentials
. These will be used to create a session for the schedule, which will be kept alive for the lifetime of the schedule.
Deleting a function also deletes the associated schedule.
Upload and run a function from CDF
In addition to uploading and running functions via the experimental Python SDK and the API, you can upload, run, and share functions directly from the CDF portal application at fusion.cognite.com.
To upload a function to CDF:
Navigate to fusion.cognite.com, and select Cognite Functions.
Select Upload function, and drag and drop the zip file with at least a Python file called
handler.py
with a function namedhandle
with any of following arguments:data
,client
,secrets
andfunction_call_info
.See Create a function from an uploaded zip file for more information.
Give your function a name and, if necessary, fill in any of the optional fields.
Select Upload
Call a function from CDF
To call a function from CDF:
Navigate to fusion.cognite.com, and select Cognite Functions.
Choose the function you want to call, and then select Call.
Optional. To schedule the function:
- Select Create schedule and enter a name, client credentials and a cron expression for when you want to run your function.
The columns Calls, Schedules, and Details provide more information about the functions. You can also find information about call status, response, and logs.