Unlike traditional monolithic data infrastructures that handle the consumption, storage, transformation, and output of data in one central data lake, a data mesh supports distributed, domain-specific data consumers and views “data-as-a-product”, with each domain handling its own data pipelines. The tissue connecting these domains and their associated data assets is a universal interoperability layer that applies the same syntax and data standards.
In short, data mesh is a new approach based on a modern, distributed architecture for analytical data management. It enables end-users to easily access and query data where it lives without first transporting it to a data lake or data warehouse. The decentralized strategy of data mesh distributes data ownership to domain-specific teams that manage, own, and serve the data as a product.
Data mesh introduces the concept of data product as its architectural quantum. Architectural quantum is the smallest unit of architecture that can be independently deployed with high functional cohesion and includes all the structural elements required for its function.
In the context of Data Mesh, according to Martin Fowler "Data product is a composition of all components - code, data and infrastructure - at the granularity of a domain's bounded context."
Fowler describes the three aspects of data product as follows.
Code: it includes (a) code for data pipelines responsible for consuming, transforming and serving upstream data - data received from domain’s operational system or an upstream data product; (b) code for APIs that provide access to data, semantic and syntax schema, observability metrics and other metadata; (c) code for enforcing traits such as access control policies, compliance, provenance, etc.
Data and Metadata: well that’s what we are all here for, the underlying analytical and historical data in a polyglot form. Depending on the nature of the domain data and its consumption models, data can be served as events, batch files, relational tables, graphs, etc., while maintaining the same semantic. For data to be usable there is an associated set of metadata including data computational documentation, semantic and syntax declaration, quality metrics, etc; metadata that is intrinsic to the data e.g. its semantic definition, and metadata that communicates the traits used by computational governance to implement the expected behavior e.g. access control policies.
Infrastructure: The infrastructure component enables building, deploying and running the data product's code, as well as storage and access to big data and metadata.
Benefits of Data Mesh
Business Agility and Scalability
Data mesh powers decentralized data operations, independent team performance, and data infrastructure as a service provision, resulting in improved time-to-market, scalability, and business domain agility. It eliminates the process complexities and IT backlog to reduce operating and storage costs.
Faster Access and Accurate Data Delivery
Data mesh offers easily governable and centralized infrastructure based on a self-service model without underlying complexity for faster data access and accurate delivery. Businesses can access data from anywhere with SQL queries with much lower latency. The distributed architecture reduces the processing and intervention layers that delay time to insight.
Flexibility and Independence
Enterprises adopting data mesh architecture are becoming vendor-agnostic businesses that are not locked in with one data platform. The distributed infrastructure allows companies unparalleled flexibility and choices due to connectors to many systems.
Platform Connectivity and Data Security
The decentralized framework allows cloud applications to be connected to on-site sensitive data, which can be live streaming or existing on devices in real-time. Data mesh queries/compiles data analytics where the data resides, instead of requiring users to make a copy and route it through a public network to a data warehouse.
It eliminates the risk of data breach or information loss to improve security and reduces data latency to improve overall performance in various use cases including, live streaming, online gaming, financial trading, etc., through platform connectivity in a distributed model.
Robust Data Governance for End-to-End Compliance
Distributed architecture reconciles data ingestion with its sources, formats, and volumes to allow businesses to control their security at the source system. The decentralized data operations simplify compliance with global data governance guidelines for quality data delivery and ease of data access.
Cross-Functional Teams for Improved Transparency
The centralized data ownership of traditional data platforms isolates expert teams, creates a lack of transparency, and fails to provide contingency against data control/ownership loss. Data mesh decentralizes data ownership by distributing it among cross-functional domain teams, including domain experts, business teams, IT, and agile virtual teams through its domain-oriented approach for improved transparency and data quality.
Commentaires