Data Mesh with Snowflake AI Recipes

What is Data Mesh?

A Data Mesh is a platform architecture that embraces the ubiquity of data in the enterprise by leveraging a domain-oriented, self-serve design.

Data Mesh Principles and Logical Architecture by Zhamak Dehghani,3 shows the four core principles that define a data mesh architecture:

  • Domain-driven ownership
  • Data as a product
  • Self-service infrastructure
  • Federated governance

 

four core principles of a data mesh architecture

Domain-driven Ownership and Architecture

The first principle of a data mesh is shifting the power of data and ownership into the hands of the domain teams. They own the data end to end—from ensuring they have the suitable sources or ingested data to work with, to building and maintaining any processing pipelines necessary, to serving the data out for other domain teams to tap into as products (more on that later) with the excellent quality guarantees and governance controls in place. The domain teams can be defined by the department, business unit, or other similarly motivated groupings. If they are correctly implemented, new domain teams should be able to be added fluidly and significantly when data is being correlated into new data products.

Data as a Product

As alluded to in the first principle, domain teams aren’t just responsible for the data but also for the resulting data products. And data products need to be treated like any other product. Data products need to be discovered and usable by consumers and other domain teams. The domain owner is responsible for maintaining and updating (or deprecating) these products to ensure quality and accuracy.

What can this look like in practice? Imagine a supply chain team creating an inventory data product that a marketing team can tap into to develop new discount campaigns, or regional groups can use that for placing new orders.

Self-service Infrastructure as a Platform

The third principle is making all this self-service easy for the domain teams. Complex technologies and niche skills are not sustainable in a data mesh design. There needs to be a common platform and tool that any domain team can tap into to build and serve their data products without getting bogged down in infrastructure maintenance or resource limitations.

Federated Governance

The final piece of a successful data mesh is governance. A data mesh architecture cannot come at the expense of access controls and data protection. There must be a balance between global governance policies and rules, ensuring each domain team can define and implement these policies when developing and sharing their data products. This federated governance is critical for ensuring data privacy and compliance and aiding discovery at scale.

Data Fabric

Data fabric is a single environment consisting of a unified architecture and services or technologies running on that architecture that helps organizations manage their data. The ultimate goal of data fabric is to maximize your data’s value and accelerate digital transformation. Think of a data fabric as a weave stretched over an ample space that connects multiple locations, types, and data sources, with methods for accessing that data.

The data can be processed, managed, and stored as it moves within the data fabric. The data can also be accessed by or shared with internal and external applications for various analytical and operation use cases for all organizations – including advanced analytics for forecasting, product development, and sales & marketing. The goals are many: to improve customer engagement through more advanced mobile apps and interactions, comply with data regulations and optimize supply chains, to name a few.

What constitutes a data fabric differs based on someone’s role (analyst vs. executive vs. data engineer vs. data scientist vs. line of business data analyst). But the premise that a data fabric enables accessing, ingesting, integrating, and sharing health data in a distributed data environment is widely accepted. More specifically, a data fabric:

  • Connects to any data source via pre-packaged connectors and components, eliminating the need for coding
  • Provides data ingestion and integration capabilities – between and among data sources as well as applications
  • Supports batch, real-time, and big data use cases
  • Manages multiple environments – on-premises cloud, hybrid, and multi-cloud – both as a data source and as a data consumer
  • Provides built-in data quality, data preparation, and data governance capabilities, bolstered by machine learning augmented automation to improve data health
  • Supports data sharing with internal and external stakeholders via API support

 

How is the Data Mesh Better for the Demand Forecasting?

 Data mesh is, first and foremost, an organizational transformation. This transformation has many non-technical implications but often requires IT architecture and technology changes. How can data mesh principles apply to the Demand Forecasting business case?

Traditionally forecasting was primarily based on internal data residing in enterprise systems and managers’ knowledge of external data. Demand planners made manual corrections to forecasting based on marketing inputs related to external data(Market trends, Weather, Local events, etc.). Often the process created a massive lag with forecast accuracy.

If we take supply chain management as an organization, we can make Demand management one Domain & Order management another domain. Here we can set up a separate data warehouse and keep the access control and ownership to the specific domain users.

From the below picture, we can have an overall Snowflake warehouse setup across the organization for the business users. Here it’s going to address the principle of Domain Driven Ownership Architecture.

No alt text provided for this image

Next comes the Data as a product. Here we have a specific problem that will address in Demand forecasting. Our core problem is confined to the demand forecast. The demand management team can ensure the product is developed within the framework to address the issues.

As we have already considered Snowflake a data cloud platform, we can leverage the other functionalities of snowflake, likesnowflakelike snowflake-like SnowPark, for the scripting and developing an application, SnowPipe, to make the data pipelines established to address ETL/ELT. By that, we can leverage the Software-Service as an Infrastructure.

Developing an AI framework for Demand Forecasting

A scalable framework on the Snowflake for demand forecasting was developed, keeping data mesh in purview.

  • A SnowPark for the Machine Learning model implementation
  • Snowflake environment as a Data Warehouse
  • Stream Lit for the frontend development

 

Considering some limitations, we have created a plug-and-play setup that can be customizable to the business needs. AI-enabled demand planning framework engages demand managers by sensing and analyzing hundreds to thousands of internal and external demand-influencing variables with machine learning, optimization, Bayesian approaches, and granular forecasts. These advanced algorithms provide probability distributions of the expected demand volume rather than a single forecast number. These new technologies significantly improve demand forecast accuracy, often reducing the forecasting error by 30-50%.

Benefits of Demand Forecasting

  1. Improve forecast accuracy
  2. Minimize inventory
  3. Maximize service levels