data-governance

Unlocking Effective Data Governance with Unity Catalog – Data Bricks – Part 2

In the first part of Data Governance with Unity Catalog(Part 1), we explored the fundamentals of Unity Catalog, including its core features, advantages, and a comparison with other data catalog tools. We also delved into the object hierarchy in the metastore, setting the foundation for a comprehensive understanding of this powerful governance solution.

Now, in this part, we will continue our deep dive by examining the critical admin roles in Unity Catalog, uncovering how data lineage is captured and visualized, and simplifying data access through Delta Sharing. Let’s further unlock the potential of Unity Catalog as we explore these essential aspects of data governance.

In this part we will cover –

  1. Identifying the Admin Roles in Unity Catalog
  2. Unveiling Data Lineage in Unity Catalog: Capture and Visualize
  3. Simplifying Data Access using Delta Sharing

1. Identifying the Admin Roles in Unity Catalog

Different admin roles in Unity Catalog have varied responsibilities and privileges, allowing for efficient control and governance of data assets. Here is a thorough explanation of each role and its duties.

Privilege Inheritance Model

Privileges in Unity Catalog are hierarchical and inherited downward. Granting a privilege on a catalog or schema automatically extends the privilege to all current and future objects within that catalog or schema
All securable objects in Unity Catalog have an owner who holds all privileges on that object, including the ability to grant privileges to other principals.

Account Admin:

Responsibilities:
Workspace Management: Account admins can create new workspaces and manage existing ones.
Role Assignment: They have the authority to assign account admin and metastore admin roles to other users.
User and Group Management: Account admins can add users, service principals, and groups to the workspace.

Workspace Admin:

Responsibilities:
User Management: Workspace admins can add and invite users to the workspace, assign the workspace admin role to other users, and create service principals and groups.
Default Privileges: If a workspace is automatically enabled for Unity Catalog, workspace admins have default privileges on the attached metastore and the workspace catalog.

Metastore Admin:

Who Has Metastore Admin Privileges?
Manual Creation: An account admin becomes the original owner and metastore admin if they manually create the metastore.
Automatic Provisioning: A metastore admin is not needed when the metastore is built if it is provisioned automatically. In this instance, the metastore admin job is optional because workspace admins have access to certain privileges.

Responsibilities:
Catalog Management: Metastore admins can create and manage catalogs within the metastore.
Privilege Management: They are the only users who can grant privileges on the metastore itself.
External Locations and Storage Credentials: They can create external locations and storage credentials for managing data governance.
Privilege and Ownership Management: Metastore admins can manage privileges or transfer ownership of any object within the metastore, including storage credentials, external locations, connections, shares, recipients, and providers.
Access Granting: They can grant themselves read and write access to any data in the metastore by transferring ownership of objects.
Audit and Compliance: Metastore admins can read and update the metadata of all objects in the metastore, ensuring compliance with data governance policies.

2. Unveiling Data Lineage in Unity Catalog: Capture and Visualize

Whether you are a data engineer or an analyst having visibility into data lineage empowers you to make informed decisions and maintain a high level of trust in data assets.

Data Lineage Visualization

What is Data Lineage?

The tracking of data from its source to its destination, through different processing stages, is referred to as data lineage. Understanding the dependencies, transformations, and lifecycle of the data is aided by it.

Features:

Lineage Tracking:
It captures runtime data lineage across all queries running on Databricks, supporting all languages Like Python, SQL, R, Scala and execution modes like batch and streaming.
The Lineage is captured down on the column level, providing insights into transformations and dependencies on Data.

Notebooks, Jobs, and Dashboards Lineage:
Unity Catalog provides visibility by tracking the lineage of notebooks, workflows, and dashboards.
This aids in understanding how changes to the data affect downstream consumers and guarantees the quality of the data.

Security:
Lineage graphs make use of the shared permission model in Unity Catalog. It adds another level of security to view lineage data, the users must have the necessary permissions.
Data breaches are reduced because users can only view lineage information for objects they are authorized to access.

Column Granularity:
It provides a granular view of data flow both upstream and downstream from a particular table and it captures data lineage at the column level.

Export and Visualization:
Lineage can be visualized in near real-time using Catalog Explorer and this makes it easy to integrate with other data catalogs and governance solutions.

3. Simplifying Data Access using Delta Sharing

This innovative capability allows providers to effortlessly share data across organizations, ensuring secure and efficient collaboration.

delta sharing

What is Delta Sharing?

It is an open protocol for safe data sharing. With any computing platform, you can share the real-time data from your Delta Lake tables. It offers centralized management and auditing of shared data and is natively integrated with Delta Sharing.

Features:

Secure Data Sharing: The Data providers can control access and permissions, ensuring data security and compliance and by tracking data lineage and usage to ensure transparency and accountability.

Supports Structured Streaming: Allows data recipients to stream changes from a shared Delta Table, enabling real-time data applications.

Final Thought

A reliable and integrated solution for handling data governance, security, and discovery is provided by Databricks’ Unity Catalog. It differs from other data catalog tools with its standards-based security model, extensive feature set, and integration with Databricks. Unity Catalog is a potent tool in the contemporary data landscape because it enables organizations to achieve streamlined data governance, enhanced security, and effective data management.

Whether you’re new to Unity Catalog or exploring its advanced features, knowing its advantages and distinctive features will help you get the most possible from your data governance plan.