At last, Cognite’s long awaited standard support for external identity providers is here! Over the last year or so, it has been referred to as "native token", "OIDC support", "support for Azure Active Directory", and "proper OAuth2". Alas, it can be hard to understand, so let me try to explain in "not-so-technical" terms what all this is about.
So, what do our customers get with this release that they didn’t have before? We have supported external identity providers, like Google login before, right?
The answer is simple: they now get total control over who can access what data in Cognite Data Fusion (CDF) and which applications the users can use, and which applications are allowed to access data. No more extractors or applications that can run as long as you have an API key. And no more mix of accounts created in Cognite Data Fusion and the identity provider like Azure Active Directory (AAD) or Google. Access control is now enterprise grade and according to how enterprises typically manage access internally for other applications and services.
Alright, hold on tight! This section is to arm you with the basic understanding!
Your customer needs an Identity Provider that supports standard OAuth2 and Open ID Connect (OIDC) AND the ability to share security group memberships into the so-called OAuth2 token. Google does not support this last bit, Azure Active Directory (AAD) does, and in this release, we have fully tested and validated Azure Active Directory.
The customer CDF project is first configured to trust the customer’s instance of Azure Active Directory, and then AAD is configured to recognise the CDF SaaS service. This allows users to log in with their corporate account. Then, groups are created in CDF for each set of access rights needed by something or someone connecting to CDF: users, applications, extractors, dashboards, Power BI, and so on. To be precise, this is not one group per entity accessing CDF, but just one group per each set of access rights. Each group is then configured to map to a corresponding group in AAD by using the id of the group in AAD.
For each user that will access CDF, the customer uses the existing company user accounts and adds them to one of these AAD groups that are mapped to CDF groups. For each piece of code (like applications, extractors, dashboards and so on) to grant access to, they create a so-called app registration in Azure Active Directory and adds them to the correct group(s).
When CDF receives a request for access, it will check the identity of the connecting user or application by receiving a "statement of access rights" (a token) from the customer’s Azure Active Directory. Through this token, CDF can verify the identity, get the list of group memberships and grant access to the right data!
This means that it’s not possible anymore to create a service account in the CDF project and use an API key to access data (i.e. once the project is configured with an external Identity Provider).
"That’s fine, but I really want to understand a bit more than that!"
Alright then, read on! This section introduces the basic concepts of cloud service authentication and authorisation in a way everybody(?!) can understand. The basic building blocks are introduced first, before these are explored using an analogy of a hotel room with key cards to control access.
At its core, the problem is very simple: you go to an application living on a web page (let’s say https://fusion.cognite.com/ (opens new window) and prove your Identity (typically your email address) by using a password, a one-time code, or another way to prove that you are you. The web page will guide you through the login process, but in the background, it will contact a service that has a register of people (i.e. identities) who are allowed to access this application. This service is called the "Identity Provider". Typically, when you log into Facebook or another web app, you see no difference between the two. Facebook is its own Identity Provider.
When you work for a company, your company typically has its own Identity Provider, like Azure Active Directory (AAD). If you as a corporate user can log into many different applications and systems using the same username/email and password (i.e., you are "authenticating" using the same Identity), it’s because all these will communicate and authenticate with the same Identity Provider owned by your company. Now we can introduce another challenge: what if you have access to an application your colleague does not have access to? This is referred to as "authorization", knowing where and what you have access to.
So, to summarize: we have the Identity (who to approve access for), the Identity Provider (the decision maker verifying your identity and granting access), the process of authenticating (verifying that you are the identity you claim to be), and finally authorization (granting you access to the application you want to access). There’s one final and important thing to understand: the application you access in your browser is the "go-between" in the interaction between you and the Identity Provider to verify your identity and grant you access. But exactly what do you get access to? The application in your browser itself or the data in CDF in the cloud? Or both?
In fact, as long as you sit in front of your browser, the browser app will communicate with the cloud CDF services as if it’s you. Practically, this is done by an elaborate process to give the web app a token. This token is basically a very targeted key to get access to your data using your identity and it will expire quickly, so it will have to be refreshed regularly (which happens in the background). We say that the web application is a client and that the CDF cloud services is the resource that the client accesses.
In the case of an extractor or if you make a python application where you use the CDF SaaS service directly, the extractor or application could of course do the same as the CDF Fusion application and be a client that uses a token that gives access to your data. However, since these applications may run over longer time (and a token is short-lived) and without you watching over them, it makes more sense to give them their own identity instead of using yours. This identity is often called a service principal to differentiate them from people identities.
"Wow, that was a bit rough… How can I get my head around this?!"
It is often easier to keep things straight in your head if it’s possible to relate all the terms to something you already know, something physical. So, let’s use a hotel room with a key card and lock as an analogy. You have reserved a room (the resource to access, so this is the CDF SaaS service). At the front desk, they register your identity in the hotel computer system. From here on the hotel computer system will act as the identity provider.
Now you will get a key card that will open the lock to the right door for a time-limited period (authorised by the hotel system). The key card is the token you use to get into the room. As you probably recognise, you are now acting as a service principal, you access your own resource (the room) with your own identity, and you use a token that is made out to you. You enter the room and find that the lights are not working. You return to the lobby and requests that it’s fixed while you eat a snack in the restaurant. As this is a security conscious hotel, they request you to authorise access to the room for their janitor the next hour only. You approve, and the janitor’s key card is updated with access to the room.
As you order your food, the janitor accesses the room exactly the way your web browser accesses CDF cloud services, both are authorised by you with time-limited access, but now the janitor (the browser) uses a key card (token) issued to him that gives access to the room (the resource), and you used your identity to authorise the janitor to get the token (similar to logging in).
Trust is established and kept through these exchanges between you, the front desk, the hotel security system, the lock on the door, and by loading tokens onto the key cards that are used to open the lock. Trust is essential to any computer system. The various software components involved, like the app in the browser, the CDF SaaS services, and the Identity Provider, all need to communicate with each other and trust each other. Very often security issues arise as a result of a weak way of establishing trust. For example, if you used your passport to prove your identity to the hotel the first time, that may have been a strong trust. However, if you only have to say your name the next time, the trust chain is broken because anybody can walk into the hotel and claim they are you. The security of the rest of the system does not matter anymore.