As data grows exponentially across organizations, data management becomes an arduous process. The other challenges, as a result, are security risks, data quality problems, data duplication, and delays in processing. An organization that implements an architecture that helps them manage data across their business systems and applications can navigate this challenge better and derive value from their data.
At To-Increase, since we have solutions for data management and analytics, we understand the importance of using a methodology that works for your organization. We have over 18 years of experience helping over 2200 organizations solve everyday challenges using our solutions for Microsoft-based ERPs.
To help organizations currently evaluating relatively new architectures such as a data fabric and data mesh, we have put together a couple of blogs that explain both. This will help you better understand these methodologies and decide what will work for your organization.
In our previous blog, we shared a comprehensive guide to data fabric. In this blog, we deep-dive into data mesh and close the loop with a data mesh and data fabric comparison to make it easier for our readers to decide between the two architectures.
In this blog, we cover:
- Data mesh meaning
- Data mesh architecture
- Data mesh benefits
- Data mesh versus data fabric
- Is data mesh or data fabric the right way forward?
What is a data mesh?
The data mesh framework, a concept established by Thoughtworks, considers the data lake and data warehouse as the two main technology stacks for data analytics. The concept recognizes that the data lake supports data science patterns while the data warehouse supports business intelligence and analytics and has different use cases and data users and differs in the way they can be used. A data mesh framework, unlike a data fabric, is a methodology that connects the two. The focus here is to derive value from analytical and historical data at scale by deriving an approach to managing data in these two architectures.
The idea is that although there are differences in technology to manage data lakes and data warehouses, the organization and teams that manage the two should be united and take ownership of data management and governance. The data mesh framework focuses on a more bottom-up approach to ensure the roles and responsibilities of the data governance team are aligned with the idea of improving data accountability.
Also known as an extension of the modern warehouse, a data mesh poses a big threat to subject matter experts working independently handling data in silos. This is why organizations have to ensure a structure to find congruency between a data mesh and a data fabric using integration and a well-defined and concise data governance strategy.
What are the core components that make up the architecture of a data mesh?
Data mesh and data fabric are relatively new methodologies that emerged as an alternative way to manage data within large organizations and are still evolving and may vary across organizations. To break away from the traditional centralized approach, data mesh was born.
Therefore, it focuses on a decentralized and domain-centered approach instead. A decentralized system such as a data mesh paves the way for agile data delivery within an enterprise. Since data ownership and infrastructure are managed by business units, it equips them to make quicker data-driven decisions that can help the organization accelerate growth.
To explain further, we share the main components of a data mesh below:
Domain-centric decentralized approach
Traditionally, in such a scenario, there is a centralized team that is responsible end-to-end for all data-related tasks, from collecting to managing data. However, this approach can reduce the time and efficiency of getting tasks done, leading to bottlenecks. However, the data mesh takes a domain-oriented approach and encourages the idea of distributing ownership within business units.
Therefore, each business domain is responsible and accountable for its data, and they set up rules and guidelines for data governance as well.
Infrastructure as a platform
This self-serve platform ensures there is no duplication as the data solutions are set up by data stewards, which are available for use to all business units, while the core domain teams are responsible for the data. The infrastructure is, therefore, a platform that is available for domains. This platform includes the main capabilities that can be leveraged and used across domains, such as storage, processing, integration, and security.
Domain-centric data accountability
Since each domain is responsible for its data, a self-serve approach to building and maintaining data infrastructure needs to be taken. So, domains need to set up their data pipelines that need to be monitored and manage their data solutions. The individual business units have to collect, categorize, transform, and present data related to their domain.
It is important to note that domain data does not flow into a central repository in a data mesh but is hosted and shared by the business unit responsible for that data. This ensures accountability of that data but could also lead to data silos and duplication if not integrated.
Since each domain is responsible for the access and use of its data, they need to clearly set out guidelines such as documentation and data catalogs for the organization. They also have to ensure data access and discoverability are monitored and share data using APIs for self-serve data available to the organization.
As the business units are responsible for their own data, they also need to ensure data governance policies are collaboratively developed across teams and rolled out which are approved by the managerial leadership. However, there is a lot of independence as standards, data quality rules, and policies are identified and implemented by the domain teams.
Data as a product
In the scenario of a data mesh, domains need to treat data as a product for this methodology to be successful. Therefore, domains need to apply product thinking tactics with the rest of the organization as their customers for their dataset to be successful. So, the way data is categorized as metadata or presented whether as a report on Power BI, for example, or as a table in the Dataverse is what the domain team is responsible for.
For data as a product to be successful, the teams need to ensure the data is useful for their customers and provides value. If data is not relevant, it needs to be archived or updated. The data should be easy to access following the naming standards in the organization and discoverable centrally with a data catalog. Additionally, ensuring data governance and following organizational guidelines and standards is important for every team.
What are the benefits of a data mesh?
Below are some benefits of data mesh:
Better data governance
Since data mesh has distributed data ownership with each domain responsible for their own data governance. With such an approach, there is autonomy and a better understanding of the data. This results in improved data quality as each team has a vested interest in ensuring the accuracy and completeness of their own data.
Data mesh is a modern approach to data management that allows domains the ownership and responsibility of their own data. They can act faster with better insight and make decisions based on data while solving issues and catering to business requirements.
This can be done at the domain level since each business unit has to manage its data products and can leverage existing platforms. They can experiment on how to present data or make data available.
A data mesh is a bottom-up approach. Therefore, they need to collaborate to share and communicate data governance rules, policies, and other such requirements for their domains with the rest of the organization.
Data mesh uses data infrastructure that can be leveraged across domains. So, this structure allows for new domains to be added using existing infrastructure.
As there is no dependence on a centrally managed system and the responsibilities are distributed across domains, this leads to fewer bottlenecks.
Every domain has business goals that need to be supported by accurate reporting and data management. And in turn, as there is a clearer understanding of business needs, data can be managed more effectively by domains to support their goals.
Data fabric versus data mesh
Before we dive into this comparison, let’s briefly understand what is a data fabric.
Data fabric and data mesh are often confused as the same concept. Both are relatively new methodologies. A data fabric focuses on creating a unified and holistic view of data across an organization. The methodology uses architecture to create a coherent data layer over different business systems, applications, and databases. To achieve this layer, technologies for data integration, data virtualization, and orchestration are used to ensure data synchronization, access, and interoperability.
What makes data fabric different from data mesh is centralized governance and control. Data mesh, on the other hand, has a decentralized approach with domain-focused ownership and data management.
Although there is an overlap between a data fabric and data mesh as both are methods to access data across multiple databases, they can be used together to offer two important benefits. The data fabric solution has a centralized approach to data while making data available via APIs or integration tools. Data mesh, on the other hand, focuses on building a decentralized framework to manage data across databases with a focus on analytics.
Let's compare the methodologies based on some key aspects:
Data integration and interoperability
Data fabric creates a unified view of data by integrating data from various sources and systems into a connected layer using APIs and tools. The goal is to provide easy data access and interoperability.
Although each domain or business unit manages its own data in a data mesh, integration, and interoperability can be achieved using solutions and APIs between domains.
Data governance and ownership
Data Fabric works on the principle of centralized governance, and therefore there is a central team managing integration, governance, and quality. A central team provides a data governance framework for the entire organization.
Although data governance is across the organization in a data mesh, it uses a collaborative and bottom-up approach. Therefore, each domain team contributes data governance based on their business unit’s requirements. The domain teams are responsible for data quality, documentation, and managing data based on the data governance standards in use.
Agility and autonomy
Since data fabric is a centralized approach, there is no autonomy for domains in comparison to a data mesh wherein business teams can experiment with how they present and manage data. Since they have autonomy, they can make decisions based on their business needs and goals on how to manage data with agility.
Scalability and flexibility
Both data fabric and data mesh offer scalability as the former focuses on integrating systems and databases and scaling data infrastructure while the latter uses existing infrastructure that can be leveraged by domains. However, data fabric does not offer flexibility for domains to manage data the way data mesh works.
An organization that prefers a standardized, top-down approach with centralized decision-making at its focus would prefer a data fabric. However, an organization that prefers a bottom-up, decentralized approach and is looking for quicker decision-making, encourages autonomy at a domain level, and is open to experimentation of their data techniques can consider a data mesh.
Is data fabric or data mesh the right way forward for your organization?
To decide what is the right way forward for your organization, your organization needs to
- Inspect your existing infrastructure
- Set out your data governance objectives and strategy
- Understand your near-term goals and priorities
- Analyze your organization’s culture
There are benefits and limitations of both data fabric and data mesh and they are evolving concepts and are yet to reach maturity. So, you do not have to consider one over the other. You can pick and choose what works for your organization and use a hybrid approach that fits your organization’s unique requirements. No data management architecture is a one-size-fits-all approach after all.
So, while you might use parts of data fabric and data mesh as a framework for data management, you will need to tailor this framework to fit your specific business needs. Ideally, picking principles from both these frameworks could give your organization an edge to go from improved data governance using an integration layer to better insights for business intelligence and analytics using a decentralized domain-centric approach.
If you use Microsoft Dynamics 365 Finance & Supply Chain Management, we offer solutions that can help you with your data integration, governance, and analytics goals. Watch our video below for a snapshot on our integration tool. And you can also download our beginner’s guide to data governance e-book to get started on your journey to data transformation.