Access Control in Y42
Y42 enables detailed control over data assets and operations using a hierarchical role-based access control system that is integrated throughout the platform.
Overview
To understand access control in Y42, this general summary of relevant concepts may be helpful:
- Subject: typically a user, sometimes also an API token that is used to access Y42 programmatically. Users can be grouped into teams as well, and a team can be managed like a single subject.
- Resource: can be any object inside Y42, like an integration, a table, a job, but also containers like the whole of the integrations module or even a space or an organization.
- Resource Hierarchy: Resources can be "parents" of other resources.
- An organization is the parent of all spaces inside that organization
- The integrations module is the parent of each specific integration, the models module contains every UI-Model and SQL Model, etc.
- An integration or model is the parent of each output table contained in it (usually 1 table, but UI models can have multiple output tables)
- Role: roles are held by subjects, relative to resources. For example, the user "John Doe" can hold the "owner" role relative on the "ACME" organization.
A user/subject will have different permissions on a resource depending on their assigned (or inherited) role.
Roles in detail
The following provides a simplified overview of the different roles that exist on each resource type inside Y42, and which permissions are effectively entailed by that role.
Role | Applicable to | Permissions | Inheritance |
---|---|---|---|
Administrator | Organization | All permissions | Always |
Billing Administrator | Organization | Read/change organization billing info only | Never (N/A) |
Owner° | Nearly all resources | read, edit, delete*, create child resources, read subjects** | Always |
Manager | Team, API Key | Same as owner | Never (N/A) |
Editor° | Nearly all resources | read, edit, create child resources, read subjects | Always |
Viewer | Nearly all resources | read, read subjects | Always |
Member | Organization, Space, Team, all modules | discover resource (without reading data), read subjects | Partial - Organization members are not automatically space members - explicit grant required. Inherits down from space to all modules |
Guest | Organization, Space | discover resource (without reading data) | Partial - Organization guests are not automatically space guests - explicit grant required. Space guests are implicitly members of all modules in the space |
*only administrators can delete an organization - for other resource types, owners can delete as well.
**reading subjects is only possible at two levels: organization and space. In practice, every subject can discover other subjects, except when the subject is a guest in the organization and also a guest in the space.
°These roles are considered "fine-grained" roles in specific contexts. For more information, see below.
Discovering resources vs. reading data
The difference between "read" and "discover resource" stems from the way Y42 runs git under the hood. For technical reasons, as soon as a subject is able to access a space in any way (i.e. guest role or higher), they need to be able to do a git checkout of that space. This will let them discover all models, integrations, dashboards and other resources in the space, but, importantly, it does not necessarily let them read the underlying data from tables in the data warehouse. Nevertheless, this may have important governance, security and privacy implications that are important for an organization to be aware of.
To let downstream users and tools consume data managed by Y42 without exposing read access to the information managed in git, Y42 provides 2 main mechanisms: data exports, and the publish layer.
Visibility of git resources and data stored in the warehouse
By default, the Y42 application will filter out non-readable resources in the user interface. This means that a user will generally only see those integrations, models, tables, widgets and dashboards that they hold a Viewer role or higher on.
Similarly, they will not be able to read data from the data warehouse or any Y42 backend for an asset unless they hold a sufficiently high role. That is to say, Y42's access control layer will, of course, protect from unauthorized read and write access to git and the data warehouse.
However, it's important to note that any guest or member of a Space will have the ability to clone the git repository of that Space, as this is a technical necessity for Y42 to operate. That means that such users will be able to see all files that are stored in git, as well as their commit history and other git metadata (by directly cloning the repository, or by accessing their browser's internal storage). Naturally, this does not grant them the ability to read the corresponding data from the data warehouse, but is still an important fact to be aware of for particularly security-sensitive organizations, as the code that defines a data asset may itself be sensitive.
If you require full compartmentalization of data pipelines and their definitions for various teams in your organization, we recommend you enforce a hard separation between then by setting up multiple Spaces. You will still be able to share data between these Spaces using integrations that access the datasets in the warehouse, but the git repositories, pipeline definitions and therefore also data lineage will be totally separated between them.
Granting roles
A subject can generally grant other users equivalent access to their own role, but not higher. For example, only administrators can grant or revoke the administrator role to others, an editor can only grant or revoke the editor role to others.
There are some exceptions to this: for example, only Owners and Administrators can make grants at the organization-level, and Viewers, members and guests cannot generally make grants at all (unless they're given a higher role on a specific resource, in which case they might be able to make grants on that resource and its child resources).
Role inheritance and resource hierarchy
When a resource is the parent of another resource, the role on the parent resource will generally be inherited down to the children, as outlined in the table above.
The diagram below visualizes the basic structure of the resource hierarchy in Y42.
A role with elevated permissions at a higher level in this hierarchy will, in general, always override a role with fewer permissions lower in the hierarchy. So if, for example, a user is a an owner at the organization level, that user can read, edit and delete all resources in all spaces of that organization, even if they are manually assigned the viewer role in some space (the organization-level role will override the space-level role).
"Fine-grained" roles (beta)
Some roles are considered "fine-grained", specifically in how they relate to accessing a Space's git repository. Consider the following:
- A role of Owner or Administrator at the Space or Organization level will grant a user full write access to the Space's git repository.
- A role of Editor at or below the Space level, or Owner below the Space level (e.g. on the integrations module, or some specific SQL model) grants partial write access to git
- A role of Viewer, Member or Guest at any level, as long as neither case 1 or case 2 apply to the same subject, means that this subject only requires read access to git
Any case covered by 2. is considered a "fine-grained" role in Y42, and is currently disabled for new Spaces by default, as the enforcement of these roles is still in beta stage. Generally, this feature is safe to use, but be advised that while it's in beta stage, users in such roles may face occasional issues in committing files to git, or reading data from the warehouse.
To enable this feature, navigate to your Company settings page, to the Security tab and toggle the corresponding setting for each space you wish to enable fine-grained roles in:
Recommended approach to role management
We recommend the following best practices to ensure easy-to-maintain, scalable role management and data governance:
- All internal users of the organization are assigned the Member role on the Organization level by default (with the exception of Administrators)
- All internal users are grouped into Teams (according to functional role, mission-driven teams, or a combination of both). Each user should generally be in at least one Team.
- For smaller organizations, setting up all data pipelines in a single Space is usually the best approach, but if use-cases are strongly separable, multiple Spaces may be preferable
- All role management at or below the Space-level should now be done with respect to Teams to avoid assigning direct roles to any one user, keeping role management scalable
- If all members of a Team have a data engineering (or adjacent) role, that Team can typically be made the Owner of the Space
- Data Analyst and Data Science Teams can typically be made Members of a Space, with an Owner role for the Visualizations module, and Viewer roles on the specific tables they may need to work with
- If such a team should be able to contribute to the data pipeline itself by creating new models or editing existing ones, a Contributor role at the module level and Editor/Owner role at the level of specific models can be assigned
- Business users would typically also be members of the Organization and Space, but only assigned Viewer roles on the specific data (typically Dashboards/Widgets/the Visualization module) they need to consume
- Onboarding a new user is a matter of adding that user to to the right team(s) in this setup, and giving/revoking many users access to a specific asset is also usually a simple operation
- In general, it's good practice to grant users access to data on an as-needed basis. This is a good practice with respect to data security, but it also makes it easier for any given user to find the correct data, i.e. the data that is in fact meant to be used in production.
Inviting external guests
If you want to invite external read-only guests to view e.g. one or more specific tables or dashboards, you should make them a guest at the space and/or module level and assign them the viewer role specifically on the resources you wish for them to see.
Note that they will still be able to check out the git repository of the space and see all files contained in it. If you need to avoid this, consider using a data export or at the data warehouse level via the publish layer.
Updated almost 2 years ago