Reader small image

You're reading from  Multi-Cloud Strategy for Cloud Architects - Second Edition

Product typeBook
Published inApr 2023
PublisherPackt
ISBN-139781804616734
Edition2nd Edition
Right arrow
Author (1)
Jeroen Mulder
Jeroen Mulder
author image
Jeroen Mulder

Jeroen Mulder is a certified enterprise and security architect, and he works with Fujitsu (Netherlands) as a Principal Business Consultant. Earlier, he was a Sr. Lead Architect, focusing on cloud and cloud native technology, at Fujitsu, and was later promoted to become the Head of Applications and Multi-Cloud Services. Jeroen is interested in the cloud technology, architecture for cloud infrastructure, serverless and container technology, application development, and digital transformation using various DevOps methodologies and tools. He has previously authored “Multi-Cloud Architecture and Governance”, “Enterprise DevOps for Architects”, and “Transforming Healthcare with DevOps4Care”.
Read more about Jeroen Mulder

Right arrow

Choosing the right platform for data

It is a cliché, but nonetheless it’s also very true: data is the new gold. It’s for good reasons that in enterprise architecture frameworks data is named as the first thing that a business must do is to analyse what data it should use and how to gain optimal benefits from data. No business can operate without data: it needs the data to gain insights into markets and the demands of their customers. It needs data to drive the business.

You will find the term data-driven in almost every cloud assessment study. What does data-driven mean? A company makes decisions based on the analysis of data. Intuition or decision based on previous experience are ruled out. Every action is supported by the analysis of data.

To enable a data-driven business, we need one thing: the data itself, and typically in vast amounts and preferably (near) real-time. The collection of data is prerequisite number one. Prerequisite number two is that this data must...

Building and sizing a data platform

As with every service that we deploy in cloud, we need something to build a platform on a foundation. Hence, building a landing zone that can hold raw data is the first step. This landing zone should be an environment that serves only one purpose: to capture raw data. It’s recommended to build this landing zone separate from core IT systems. It should be scalable, but at low-cost, since it will hold a lot of data. The issue with keeping data is that it might increase the cloud bill exponentially. Data storage comes at a very low price per unit of data, but the catch is that we need a lot of these small units.

Important is to implement governance from the start. This includes defining and implementing guardrails for classification of data and tagging.

Once the landing zone has been established, data analysts can start using the data lake as a sandbox environment. This is the second stage. Analysts can start building prototypes of data models and...

Designing for interoperability and portability

Portability and interoperability should be driven by use and business cases – not for the pure sake of portability or interoperability. In IT-systems there are four levels that define portability of systems: data, applications, platforms, and infrastructure, following the Architecture Development Method (ADM) of TOGAF.

Data represents information in such form that it can be processed by computers. Data is stored in storage that is accessible to computers.

Applications is software that performs actions that are triggered by business requests.

Platforms support the applications.

Infrastructure is a collection of computation, storage, and network resources. Computation can also refer to cloud computing including virtual machines, containers and serverless functions.

One important note that we have to make at this point, is that cloud computing does cause the effect of ‘blurring’ in the demarcation of infrastructure, platforms...

Overcoming challenges of data gravity

Applications don’t just hold data, but they also produce a lot of data that they share with other applications. Data will attract new data and services in other applications. As data accumulates, more and more applications and services will use it. Data and applications are attracted to each other, as in the law of gravity. To put it short and simply: the amounts of data will grow, either autonomously, but likely because data sources will be connected to other data sources.

In addition to a strategic advantage of having access to this data, this also presents a major challenge. Databases are becoming so large that it becomes almost impossible to move the data. This can lead to the situation that companies are tied to a certain location to hold that data. In addition, companies that use each other's data and services must stay close to each other in order to provide good service. By keeping data physically close together, it can be exchanged...

Managing the foundation for data lakes

Data engineers design, build and manage the data pipelines, but the foundation of the data lake and data warehouse is the specific landing zone for the data platform. Typically, landing zones in cloud are operated by cloud engineers who take care of the compute, storage, and network resources.

Looking at management of data platforms, we can distinguish various roles:

  • Data architect or engineer: the architect and data engineer are often combined in one role. The role is responsible for design, development, and deployment of the data pipelines. The engineer must have extensive knowledge of ETL or ELT principles and technologies, making sure that data from sources get collected and transformed into usable datasets in data warehouses or other data products where the data can be further analyzed. Data also needs to be validated, which is a required skill of the engineer too. In essence, the engineer makes sure that data that is ingested into warehouses...

Summary

In this chapter, we discussed the basic architecture principles to build and manage a data platform. We looked at data lakes that can hold vast amounts of raw data and how we can build these lakes on top of cloud storage. The next step is to fetch the right data that is usable in data models. We must extract, transfer and load – ETL or ELT for short - the accurate data sets in environments where data analysts can work with this data. Typically, data warehouses are used for this.

We studied the various propositions for data operations of the major cloud providers AWS, Azure, Google Cloud, Alibaba, and Oracle. Next, we discussed the challenges that come with building and operating data platforms. There will be challenges with respect to access to data, accuracy, but also privacy and compliancy. Data gravity is another problem that we must solve. It’s not easy to move huge amounts of data across platform, hence we must find other solutions to work with data in different...

Questions

  1. What does the term ETL mean?
  2. What would be the first step in building a data platform?
  3. True or false: data lakes are typically built on the common storage layers of major cloud providers such as Azure blob storage and Amazon S3.
  4. What does Oracle’s GoldenGate do?

Further reading

  • Data Lake for Enterprises, by Tomcy John and Pankaj Misra, Packt Publishing
lock icon
The rest of the chapter is locked
You have been reading a chapter from
Multi-Cloud Strategy for Cloud Architects - Second Edition
Published in: Apr 2023Publisher: PacktISBN-13: 9781804616734
Register for a free Packt account to unlock a world of extra content!
A free Packt account unlocks extra newsletters, articles, discounted offers, and much more. Start advancing your knowledge today.
undefined
Unlock this book and the full library FREE for 7 days
Get unlimited access to 7000+ expert-authored eBooks and videos courses covering every tech area you can think of
Renews at €14.99/month. Cancel anytime

Author (1)

author image
Jeroen Mulder

Jeroen Mulder is a certified enterprise and security architect, and he works with Fujitsu (Netherlands) as a Principal Business Consultant. Earlier, he was a Sr. Lead Architect, focusing on cloud and cloud native technology, at Fujitsu, and was later promoted to become the Head of Applications and Multi-Cloud Services. Jeroen is interested in the cloud technology, architecture for cloud infrastructure, serverless and container technology, application development, and digital transformation using various DevOps methodologies and tools. He has previously authored “Multi-Cloud Architecture and Governance”, “Enterprise DevOps for Architects”, and “Transforming Healthcare with DevOps4Care”.
Read more about Jeroen Mulder