Home Data Principles of Data Fabric

Principles of Data Fabric

By Sonia Mezzetta
books-svg-icon Book
Subscription FREE
eBook + Subscription $12.99
eBook $31.99
Print + eBook $39.99
READ FOR FREE Free Trial for 7 days. $12.99 p/m after trial. Cancel Anytime! BUY NOW BUY NOW BUY NOW
What do you get with a Packt Subscription?
This book & 6500+ ebooks & video courses on 1000+ technologies
60+ curated reading lists for various learning paths
50+ new titles added every month on new and emerging tech
Early Access to eBooks as they are being written
Personalised content suggestions
Customised display settings for better reading experience
50+ new titles added every month on new and emerging tech
Playlists, Notes and Bookmarks to easily manage your learning
Mobile App with offline access
What do you get with a Packt Subscription?
This book & 6500+ ebooks & video courses on 1000+ technologies
60+ curated reading lists for various learning paths
50+ new titles added every month on new and emerging tech
Early Access to eBooks as they are being written
Personalised content suggestions
Customised display settings for better reading experience
50+ new titles added every month on new and emerging tech
Playlists, Notes and Bookmarks to easily manage your learning
Mobile App with offline access
What do you get with eBook + Subscription?
Download this book in EPUB and PDF formats
This book & 6500+ ebooks & video courses on 1000+ technologies
60+ curated reading lists for various learning paths
50+ new titles added every month on new and emerging tech
Early Access to eBooks as they are being written
Personalised content suggestions
Customised display settings for better reading experience
50+ new titles added every month on new and emerging tech
Playlists, Notes and Bookmarks to easily manage your learning
Mobile App with offline access
What do you get with a Packt Subscription?
This book & 6500+ ebooks & video courses on 1000+ technologies
60+ curated reading lists for various learning paths
50+ new titles added every month on new and emerging tech
Early Access to eBooks as they are being written
Personalised content suggestions
Customised display settings for better reading experience
50+ new titles added every month on new and emerging tech
Playlists, Notes and Bookmarks to easily manage your learning
Mobile App with offline access
What do you get with eBook?
Download this book in EPUB and PDF formats
Access this title in our online reader
DRM FREE - Read whenever, wherever and however you want
Online reader with customised display settings for better reading experience
What do you get with video?
Download this video in MP4 format
Access this title in our online reader
DRM FREE - Watch whenever, wherever and however you want
Online reader with customised display settings for better learning experience
What do you get with Audiobook?
Download a zip folder consisting of audio files (in MP3 Format) along with supplementary PDF
READ FOR FREE Free Trial for 7 days. $12.99 p/m after trial. Cancel Anytime! BUY NOW BUY NOW BUY NOW
Subscription FREE
eBook + Subscription $12.99
eBook $31.99
Print + eBook $39.99
What do you get with a Packt Subscription?
This book & 6500+ ebooks & video courses on 1000+ technologies
60+ curated reading lists for various learning paths
50+ new titles added every month on new and emerging tech
Early Access to eBooks as they are being written
Personalised content suggestions
Customised display settings for better reading experience
50+ new titles added every month on new and emerging tech
Playlists, Notes and Bookmarks to easily manage your learning
Mobile App with offline access
What do you get with a Packt Subscription?
This book & 6500+ ebooks & video courses on 1000+ technologies
60+ curated reading lists for various learning paths
50+ new titles added every month on new and emerging tech
Early Access to eBooks as they are being written
Personalised content suggestions
Customised display settings for better reading experience
50+ new titles added every month on new and emerging tech
Playlists, Notes and Bookmarks to easily manage your learning
Mobile App with offline access
What do you get with eBook + Subscription?
Download this book in EPUB and PDF formats
This book & 6500+ ebooks & video courses on 1000+ technologies
60+ curated reading lists for various learning paths
50+ new titles added every month on new and emerging tech
Early Access to eBooks as they are being written
Personalised content suggestions
Customised display settings for better reading experience
50+ new titles added every month on new and emerging tech
Playlists, Notes and Bookmarks to easily manage your learning
Mobile App with offline access
What do you get with a Packt Subscription?
This book & 6500+ ebooks & video courses on 1000+ technologies
60+ curated reading lists for various learning paths
50+ new titles added every month on new and emerging tech
Early Access to eBooks as they are being written
Personalised content suggestions
Customised display settings for better reading experience
50+ new titles added every month on new and emerging tech
Playlists, Notes and Bookmarks to easily manage your learning
Mobile App with offline access
What do you get with eBook?
Download this book in EPUB and PDF formats
Access this title in our online reader
DRM FREE - Read whenever, wherever and however you want
Online reader with customised display settings for better reading experience
What do you get with video?
Download this video in MP4 format
Access this title in our online reader
DRM FREE - Watch whenever, wherever and however you want
Online reader with customised display settings for better learning experience
What do you get with Audiobook?
Download a zip folder consisting of audio files (in MP3 Format) along with supplementary PDF
  1. Free Chapter
    Chapter 1: Introducing Data Fabric
About this book
Data can be found everywhere, from cloud environments and relational and non-relational databases to data lakes, data warehouses, and data lakehouses. Data management practices can be standardized across the cloud, on-premises, and edge devices with Data Fabric, a powerful architecture that creates a unified view of data. This book will enable you to design a Data Fabric solution by addressing all the key aspects that need to be considered. The book begins by introducing you to Data Fabric architecture, why you need them, and how they relate to other strategic data management frameworks. You’ll then quickly progress to grasping the principles of DataOps, an operational model for Data Fabric architecture. The next set of chapters will show you how to combine Data Fabric with DataOps and Data Mesh and how they work together by making the most out of it. After that, you’ll discover how to design Data Integration, Data Governance, and Self-Service analytics architecture. The book ends with technical architecture to implement distributed data management and regulatory compliance, followed by industry best practices and principles. By the end of this data book, you will have a clear understanding of what Data Fabric is and what the architecture looks like, along with the level of effort that goes into designing a Data Fabric solution.
Publication date:
April 2023
Publisher
Packt
Pages
188
ISBN
9781804615225

 

Introducing Data Fabric

Data Fabric is a distributed data architecture that connects scattered data across tools and systems with the objective of providing governed access to fit-for-purpose data at speed. Data Fabric focuses on Data Governance, Data Integration, and Self-Service data sharing. It leverages a sophisticated active metadata layer that captures knowledge derived from data and its operations, data relationships, and business context. Data Fabric continuously analyzes data management activities to recommend value-driven improvements. Data Fabric works with both centralized and decentralized data systems and supports diverse operational models. This book focuses on Data Fabric and describes its data management approach, differentiating design, and emphasis on automated Data Governance.

In this chapter, we’ll focus on understanding the definition of Data Fabric and why it’s important, as well as introducing its building blocks. By the end of this chapter, you’ll have an understanding of what a Data Fabric design is and why it’s essential.

In this chapter, we’ll cover the following topics:

  • What is Data Fabric?
  • Why is Data Fabric important?
  • Data Fabric building blocks
  • Operational Data Governance models

Note

The views expressed in the book belong to the author and do not necessarily represent the opinions or views of their employer, IBM.

 

What is Data Fabric?

Data Fabric is a distributed and composable architecture that is metadata and event driven. It’s use case agnostic and excels in managing and governing distributed data. It integrates dispersed data with automation, strong Data Governance, protection, and security. Data Fabric focuses on the Self-Service delivery of governed data.

Data Fabric does not require the migration of data into a centralized data storage layer, nor to a specific data format or database type. It can support a diverse set of data management styles and use cases across industries, such as a 360-degree view of a customer, regulatory compliance, cloud migration, data democratization, and data analytics.

In the next section, we’ll touch on the characteristics of Data Fabric.

What Data Fabric is

Data Fabric is a composable architecture made up of different tools, technologies, and systems. It has an active metadata and event-driven design that automates Data Integration while achieving interoperability. Data Governance, Data Privacy, Data Protection, and Data Security are paramount to its design and to enable Self-Service data sharing. The following figure summarizes the different characteristics that constitute a Data Fabric design.

Figure 1.1 – Data Fabric characteristics

Figure 1.1 – Data Fabric characteristics

Data Fabric takes a proactive and intelligent approach to data management. It monitors and evaluates data operations to learn and suggest future improvements leading to productivity and prosperous decision-making. It approaches data management with flexibility, scalability, automation, and governance in mind and supports multiple data management styles. What distinguishes Data Fabric architecture from others is its inherent nature of embedding Data Governance into the data life cycle as part of its design by leveraging metadata as the foundation. Data Fabric focuses on business controls with an emphasis on robust and efficient data interoperability.

In the next section, we will clarify what is not representative of a Data Fabric design.

What Data Fabric is not

Let’s understand what Data Fabric is not:

  • It is not a single technology, such as data virtualization. While data virtualization is a key Data Integration technology in Data Fabric, the architecture supports several more technologies, such as data replication, ETL/ELT, and streaming.
  • It is not a single tool like a data catalog and it doesn’t have to be a single data storage system like a data warehouse. It represents a diverse set of tools, technologies, and storage systems that work together in a connected ecosystem via a distributed data architecture, with active metadata as the glue.
  • It doesn’t just support centralized data management but also federated and decentralized data management. It excels in connecting distributed data.
  • Data Fabric is not the same as Data Mesh. They are different data architectures that tackle the complexities of distributed data management using different but complementary approaches. We will cover this topic in more depth in Chapter 3, Choosing between Data Fabric and Data Mesh.

The following diagram summarizes what Data Fabric architecture does not constitute:

Figure 1.2 – What Data Fabric is not

Figure 1.2 – What Data Fabric is not

We have discussed in detail what defines Data Fabric and what does not. In the next section, we will discuss why Data Fabric is important.

 

Why is Data Fabric important?

Data Fabric enables businesses to leverage the power of connected, trusted, protected, and secure data no matter where it’s geographically located or stored (cloud, multi-cloud, hybrid cloud, on-premises, or the edge). Data Fabric handles the diversity of data, use cases, and technologies to create a holistic end-to-end picture of data with actionable insights. It addresses the shortcomings of previous data management solutions while considering lessons learned and building on industry best practices. Data Fabric’s approach is based on a common denominator, metadata. Metadata is the secret sauce of Data Fabric architecture, along with automation enabled by machine learning and artificial intelligence (AI), deep Data Governance, and knowledge management. All these aspects lead to the efficient and effective management of data to achieve business outcomes, therefore cutting down on operational costs and increasing profit margins through strategic decision-making.

Some of the key benefits of Data Fabric are as follows:

  • It addresses data silos with actionable insights from a connected view of disparate data across environments (cloud, multi-cloud, hybrid cloud, on-premises, or the edge) and geographies
  • Data democratization leads to a shorter time to business value with frictionless Self-Service data access
  • It establishes trusted, secure, and reliable data via automated Data Governance and knowledge management
  • It enables a business user with intuitive discovery, understanding, and access to data while addressing a technical user’s needs, supporting various data processing techniques in order to manage data. Such approaches are batch or real time, including ETL/ELT, data virtualization, change data capture, and streaming

Now that we have a view of why Data Fabric is important and how it takes a modern approach to data management, let’s review some of the drawbacks of earlier data management approaches.

Drawbacks of centralized data management

Data is spread everywhere: on-premises, across cloud environments, and on different types of databases, such as SQL, NoSQL, data lakes, data warehouses, and data lakehouses. Many of the challenges associated with this in the past decade, such as data silos, still exist today. The traditional data management approach to analytics is to move data into a centralized data storage system. Moving data into one central system facilitates control and decreases the necessary checkpoints across the large number of different environments and data systems. Thinking about this logically, it makes total sense. If you think about everyday life, we are successful at controlling and containing things if they are in one central place.

As an example, consider the shipment of goods from a warehouse to a store that requires inspection during delivery. Inspecting the shipment of goods in one store will require a smaller number of people and resources as opposed to accomplishing this for 100 stores located across different locations. Seamless management and quality control become a lot harder to achieve across the board. The same applies to data management, and this is what led to the solution of centralized data management.

While centralized data management was the de facto approach for decades and is still used today, it has several shortcomings. Data movement and integration come at an expensive cost, especially when dealing with on-premises data storage solutions. It heavily relies on data duplication to satisfy a diverse set of use cases requiring different contexts. Complex and performance-intensive data pipelines built to enable data movement require intricate maintenance and significant infrastructure investments, especially if automation or governance is nowhere in the picture. In a traditional operating model, IT departments centrally manage technical platforms for business domains. In the past and still today, this model creates bottlenecks in the delivery of and access to data, minimizing the time to value.

Enterprise data warehouses

Enterprise data warehouses are complex systems that require consensus across business domains on common definitions of data. An enterprise data model is tightly coupled to data assets. Any changes to the physical data model without proper dependency management breaks downstream consumption. There are also challenges in Data Quality, such as data duplication and the lack of business skills to manage data within the technical platform team.

Data lakes

Data lakes came after data warehouses to offer a flexible way of loading data quickly without the restrictions of upfront data modeling. Data lakes can load raw data as is and later worry about its transformation and proper data modeling. Data lakes are typically managed in NoSQL databases or file-based distributed storage such as Hadoop. Data lakes support semi-structured and unstructured data in addition to structured data. Challenges with data lakes come from the very fact that they bypass the need to model data upfront, therefore creating unusable data without any proper business context. Such data lakes have been referred to as data swamps, where the data stored has no business value.

Data lakehouses

Data lakehouses is a new technology and is a combination of both Data Warehouse and Data Lake design. Data lakehouses support structured, unstructured and semi-structured data and are capable of addressing data science and business intelligence use cases.

Decentralized data management

While there are several great capabilities in centralized data systems, such as data warehouses, data lakes, and data lakehouses, the reality is, we are at a time where all these systems have a role and create the need for decentralized data management. A single centralized data management system is not equipped to handle all possible use cases in an organization and at the same time excel in proper data management. I’m not saying there is no need for a centralized data system, but rather, it can represent a progression. For example, a small company might start with one centralized system that fits their business needs, and as they grow, they evolve into more decentralized data management.

Another example is a business domain within a large company that might own and manage a data lake, or a data lakehouse that needs to co-exist with several other data systems owned by other business domains. This again represents decentralized data management. Cloud technologies have further provoked the proliferation of data. There is a multitude of cloud providers with their own set of capabilities and cost incentives, leading to organizations having multi-cloud and hybrid cloud environments.

We have evolved from a world of centralized data management as the best practice to a world in which decentralized data management is necessary. There is a seat at the table for all types of centralized systems. What’s important is for these systems to have a data architecture that connects data in an intelligent and cohesive manner. This means a data architecture with the right level of control and rigor while balancing quick access to trusted data, which is where Data Fabric architecture plays a major role.

In the next section, let’s briefly discuss considerations in building Data Fabric architecture.

Building Data Fabric architecture

Building Data Fabric architecture is not an easy undertaking. It’s not a matter of building a simple 1-2-3 application or applying specific technologies. It requires collaboration, business alignment, and strategic thinking about the design of the data architecture; the careful evaluation and selection of different tools, data storage systems, and technologies; and thought into when to buy or build. Metadata is the common thread that ties data together in a Data Fabric design. Metadata must be embedded into every aspect of the life cycle of data from start to finish. Data Fabric actively manages metadata, which enables scalability and automation and creates a design that can handle the growing demands of businesses. It offers a future-proof design that can grow to add subsequent tools and technologies.

Now, with this in mind, let’s introduce a bird’s-eye view of a Data Fabric design by discussing its building blocks.

     
About the Author
  • Sonia Mezzetta

    Sonia Mezzetta is a senior certified IBM architect working as a Data Fabric Program Director. She has an eye for detail and enjoys problem solving data pain points. She started her data management career in IBM as a data architect specializing in enterprise architectures. She is an expert in Data Fabric, DataOps, Data Governance, and Data Analytics. With over 20 years of experience, she has designed and architected several Enterprise data solutions. She has authored numerous data management white papers and has a master’s and bachelor’s degree in Computer Science. Sonia is originally from New York City, and currently resides in the area of Westchester County, New York.

    Browse publications by this author
Principles of Data Fabric
Unlock this book and the full library FREE for 7 days
Start now